target-ducklake
is a Singer target for ducklake. This target is in development and may not be stable. I built this with GCS in mind, but it should work with S3 and local storage, though this has not been tested extensively.
Currently only supports append and merging data. If no key properties are provided, data is appended. If key properties are provided, data is automatically merged.
- support deleting data from the target or setting _sdc_deleted_at field (example use-case: syncing from Postgres using log-based change data capture)
Setting | Type | Required | Default | Description |
---|---|---|---|---|
catalog_url |
string | ✅ | - | URL connection string to your catalog database |
data_path |
string | ✅ | - | GCS, S3, or local folder path for data storage |
storage_type |
string | ✅ | "local" |
Type of storage: GCS, S3, or local |
catalog_type |
string | ✅ | "postgres" |
Type of catalog database: postgres, sqlite, mysql, or duckdb. See https://ducklake.select/docs/stable/duckdb/usage/choosing_a_catalog_database |
meta_schema |
string | ❌ | - | Schema name in the catalog database to use for Ducklake metadata tables. If not provided will use the default schema for the catalog database (eg public for Postgres) |
public_key |
string | ❌ | - | Public key for private GCS and S3 storage authentication (optional) |
secret_key |
string | ❌ | - | Secret key for private GCS and S3 storage authentication (optional) |
region |
string | ❌ | - | AWS region for S3 storage type (required when using S3 with explicit credentials) |
default_target_schema |
string | ❌ | - | Default database schema where data should be written. If not provided schema will attempt to be inferred from the stream name (inferring schema only works for database extractors ) |
target_schema_prefix |
string | ❌ | - | Prefix to add to the target schema name. If not provided, no prefix will be added |
add_record_metadata |
boolean | ❌ | false |
When True, automatically adds Singer Data Capture (SDC) metadata columns to target tables |
flatten_max_level |
integer | ❌ | 0 |
Maximum depth for flattening nested fields. Set to 0 to disable flattening |
temp_file_dir |
string | ❌ | "temp_files/" |
Directory path for storing temporary parquet files |
max_batch_size |
integer | ❌ | 10000 |
Maximum number of records to process in a single batch |
partition_fields |
object | ❌ | - | Object mapping stream names to arrays of partition column definitions. Each stream key maps directly to an array of column definitions |
auto_cast_timestamps |
boolean | ❌ | false |
When True, automatically attempts to cast timestamp-like fields to timestamp types in ducklake |
validate_records |
boolean | ❌ | false |
Whether to validate the schema of the incoming streams |
overwrite_if_no_pk |
boolean | ❌ | false |
When True, truncates the target table before inserting records if no primary keys are defined in the stream |
plugins:
# ... other plugins ...
loaders:
- name: target-ducklake
namespace: target_ducklake
pip_url: https://github.com/definite-app/target-ducklake.git
config:
catalog_url: postgres://test:test@localhost:5432/test_db
data_path: gs://test-bucket/test-path
storage_type: GCS # Optional (default local)
public_key: GOOG1234567890 # Optional (required for private GCS and S3 storage)
secret_key: GOOG1234567890 # Optional (required for private GCS and S3 storage)
default_target_schema: my_schema # Optional (default None)
target_schema_prefix: my_prefix # Optional
max_batch_size: 10000 # Optional (default 10000)
add_record_metadata: true # Optional
auto_cast_timestamps: true # Optional
overwrite_if_no_pk: true # Optional (default false)
partition_fields: {"my_stream": [{"column_name": "created_at", "type": "timestamp", "granularity": ["year", "month"]}]} # Optional
Partitioning is supported by providing a partition_fields
configuration object. Each stream name maps to an array of column definitions.
Example:
{
"stream_name": [
{
"column_name": "created_at", # The column name to partition on
"type": "timestamp", # The partition type (timestamp or identifier)
"granularity": ["year", "month"] # The granularity of the partition (only for timestamp type)
}
],
"stream_name_2": [
{
"column_name": "status", # The column name to partition on
"type": "identifier", # The partition type (timestamp or identifier)
}
]
}
A full list of supported settings and capabilities for this target is available by running:
target-ducklake --about
Solution to this issue: duckdb/ducklake#240
The SQL to manually migrate your catalog schema (assuming Postgres) can be found in the schema_migration.sql
file.
This Singer target will automatically import any environment variables within the working directory's
.env
if the --config=ENV
is provided, such that config values will be considered if a matching
environment variable is set either in the terminal context or in the .env
file.
You can easily run target-ducklake
by itself or in a pipeline using Meltano.
target-ducklake --version
target-ducklake --help
# Test using the "Smoke Test" tap:
tap-smoke-test | target-ducklake --config /path/to/target-ducklake-config.json
Follow these instructions to contribute to this project.
Prerequisites:
- Python 3.9+
- uv
uv sync
Create tests within the tests
subfolder and
then run:
# Run all unit tests (no database required)
uv run pytest tests/ -v
You can also test the target-ducklake
CLI interface directly using uv run
:
uv run target-ducklake --help
Testing with Meltano
Note: This target will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.
Next, install Meltano (if you haven't already) and any needed plugins:
# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd target-ducklake
meltano install
Now you can test and orchestrate using Meltano:
# Test invocation:
meltano invoke target-ducklake --version
# OR run a test ELT pipeline with the Smoke Test sample tap:
meltano run tap-smoke-test target-ducklake
See the dev guide for more instructions on how to use the Meltano Singer SDK to develop your own Singer taps and targets.