-
Notifications
You must be signed in to change notification settings - Fork 348
Open
Description
The write operation currently generates a COPY
command like the following:
COPY "PUBLIC"."some_table" FROM 's3://some-bucket/tmp/manifest.json' CREDENTIALS 'aws_access_key_id=__;aws_secret_access_key=__' FORMAT AS CSV NULL AS '@NULL@' manifest
This relies on the DataFrame to have the columns in the same order as the table if it already exists. However, the COPY
command supports specifying column lists or JSONPath expressions to map columns (documentation). It would be nice to at least support the column list, potentially as an option on the write operation like:
df.write
.format("com.databricks.spark.redshift")
.option("url", "jdbc:redshift://redshifthost:5439/database?user=username&password=pass")
.option("dbtable", "my_table_copy")
.option("tempdir", "s3n://path/for/temp/data")
.option("include_column_list", "true")
.mode("error")
.save()
Looks like this should be fairly straightforward to add here.
slpsys, apurvis, timchan-lumoslabs, mcross1991, babartareen and 3 moreslpsys, apurvis and timchan-lumoslabs
Metadata
Metadata
Assignees
Labels
No labels