david_tout
04/23/2021, 3:31 AMtarget-postgres
that I am not really understanding.
If got a tap that is reading from a CSV file, and there are a few columns in that CSV file I want to exclude. I've done this with the select extra on that tap to exlude those columns.
The tap additionally adds a few extra columns of its own (_sdc_source_bucket for example). (the tap is tap-s3-csv
)
When the target runs, it's creating a table that includes all the excluded columns in the schema and defines them as not null.
Then it tries to insert and correctly excludes the fields (by nulling them out) and the load fails becuase of the not null constraint.
My work around was to fully define the schema (including extra columns) in my tap config...taylor
04/23/2021, 12:24 PMdavid_tout
04/25/2021, 2:11 AMdavid_tout
04/25/2021, 2:12 AMdavid_tout
04/26/2021, 7:16 AMDETAIL: Failing row contains (72d60960-5614-4c88-bb7e-6b8d742c57cc, 2021-04-23 02:15:54.3426+00, null, 1619144154, null, null, ,
null, null, 4800000, null, , CS Bearings, 498, Sydney, Industrial, NSW, Cancelled, 2012-10-10 17:45:00+00, null, , null, null, null).
target-postgres | CONTEXT: COPY tmp_134ef136_aed3_4c3f_88f6_9a4fc5cf0c5f, line 1: "72d60960-5614-4c88-bb7e-6b8d742c57cc,2021-04-23 02:15:54.3426+0
0:00,NULL,1619144154,NULL,NULL,,NULL,..."
target-postgres |
target-postgres | CRITICAL ('Exception writing records', NotNullViolation('null value in column "_sdc_source_bucket" violates not-null constraint
We can see that it is creating this column:
target-postgres | INFO Writing table batch schema for `('test',)`...
target-postgres | INFO Table Schema Change [`test`.`('__sdc_primary_key',)`:`__sdc_primary_key`] New column (took 13 millis)
target-postgres | INFO Table Schema Change [`test`.`('_sdc_batched_at',)`:`_sdc_batched_at`] New column (took 2 millis)
target-postgres | INFO Table Schema Change [`test`.`('_sdc_received_at',)`:`_sdc_received_at`] New column (took 2 millis)
target-postgres | INFO Table Schema Change [`test`.`('_sdc_sequence',)`:`_sdc_sequence`] New column (took 1 millis)
target-postgres | INFO Table Schema Change [`test`.`('_sdc_table_version',)`:`_sdc_table_version`] New column (took 3 millis)
my meltano.yml
plugins:
extractors:
- name: tap-s3-csv
namespace: tap_s3_csv
pip_url: tap-s3-csv
executable: tap-s3-csv
capabilities:
- properties
- discover
- state
settings:
- name: start_date
- name: account_id
- name: role_name
- name: bucket
- name: external_id
- name: tables
config:
account_id: 908421949328
bucket: etl-data-raw-ap-southeast-2-dev
role_name: cevo-bastion-ssm
external_id: none
start_date: 2021-04-25 00:00:00+00:00
tables: "[\n {\n \"search_prefix\": \"HY/sql\",\n \"search_pattern\"\
: \"vwProjectRepl\\\\/.*\\\\.csv\",\n \"table_name\": \"projects\",\n \
\ \"key_properties\": \"__sdc_primary_key\",\n \"date_overrides\": \"\
_sdc_batched_at,_sdc_received_at,updated_at\",\n \"delimiter\": \"\\t\"\
\n },\n {\n \"search_prefix\": \"HY/sql\",\n \"search_pattern\": \"\
vwProjectMetricRepl\\\\/.*\\\\.csv\",\n \"table_name\": \"project_metrics\"\
,\n \"key_properties\": \"__sdc_primary_key\",\n \"date_overrides\"\
: \"_sdc_batched_at,_sdc_received_at,updated_at\",\n \"delimiter\": \"\\\
t\"\n }\n]"
select:
- projects.*
- project_metrics.*
- !*._sdc_*
It seems like the _sdc_
columns are being filtered out, but for some reason the column is still created by the target, and since the column was filtered, it is therefore NULL and violates the constraint.
My work around in the mean-time has been to just load all columns.
For context, the data I am trying to import was created by Sitch (the SaaS) version. This use case is somewhat temporary, eventually we will go directly to source, but in the interim I need to use the CSV data produced by Stich which includes all the _sdc_
columns in it...johann_du_toit
04/26/2021, 9:52 PM_sdc_
columns as well - it's where it stores various meta data. So maybe it's treating _sdc_
columns as special somehow and they can't be turned off?david_tout
04/27/2021, 12:34 AMdavid_tout
04/27/2021, 12:36 AMdavid_tout
04/27/2021, 12:36 AMtap-mssql
in this case...)