Hi all, There's something about `target-postgres`...
# singer-targets
d
Hi all, There's something about
target-postgres
that I am not really understanding. If got a tap that is reading from a CSV file, and there are a few columns in that CSV file I want to exclude. I've done this with the select extra on that tap to exlude those columns. The tap additionally adds a few extra columns of its own (_sdc_source_bucket for example). (the tap is
tap-s3-csv
) When the target runs, it's creating a table that includes all the excluded columns in the schema and defines them as not null. Then it tries to insert and correctly excludes the fields (by nulling them out) and the load fails becuase of the not null constraint. My work around was to fully define the schema (including extra columns) in my tap config...
t
I seem to recall someone else seeing this issue too. I found https://github.com/datamill-co/target-postgres/issues/160 as well. @andrew_madonna have you seen this issue before?
d
This is a little differe-t because it is the first ingestion.
I'll try to get the logs & paste them here tomorrow
Here's my error:
Copy code
DETAIL:  Failing row contains (72d60960-5614-4c88-bb7e-6b8d742c57cc, 2021-04-23 02:15:54.3426+00, null, 1619144154, null, null, ,
 null, null, 4800000, null, , CS Bearings, 498, Sydney, Industrial, NSW, Cancelled, 2012-10-10 17:45:00+00, null, , null, null, null).
target-postgres       | CONTEXT:  COPY tmp_134ef136_aed3_4c3f_88f6_9a4fc5cf0c5f, line 1: "72d60960-5614-4c88-bb7e-6b8d742c57cc,2021-04-23 02:15:54.3426+0
0:00,NULL,1619144154,NULL,NULL,,NULL,..."
target-postgres       |
target-postgres       | CRITICAL ('Exception writing records', NotNullViolation('null value in column "_sdc_source_bucket" violates not-null constraint
We can see that it is creating this column:
Copy code
target-postgres       | INFO Writing table batch schema for `('test',)`...
target-postgres       | INFO Table Schema Change [`test`.`('__sdc_primary_key',)`:`__sdc_primary_key`] New column (took 13 millis)
target-postgres       | INFO Table Schema Change [`test`.`('_sdc_batched_at',)`:`_sdc_batched_at`] New column (took 2 millis)
target-postgres       | INFO Table Schema Change [`test`.`('_sdc_received_at',)`:`_sdc_received_at`] New column (took 2 millis)
target-postgres       | INFO Table Schema Change [`test`.`('_sdc_sequence',)`:`_sdc_sequence`] New column (took 1 millis)
target-postgres       | INFO Table Schema Change [`test`.`('_sdc_table_version',)`:`_sdc_table_version`] New column (took 3 millis)
my
meltano.yml
Copy code
plugins:
  extractors:
  - name: tap-s3-csv
    namespace: tap_s3_csv
    pip_url: tap-s3-csv
    executable: tap-s3-csv
    capabilities:
    - properties
    - discover
    - state
    settings:
    - name: start_date
    - name: account_id
    - name: role_name
    - name: bucket
    - name: external_id
    - name: tables
    config:
      account_id: 908421949328
      bucket: etl-data-raw-ap-southeast-2-dev
      role_name: cevo-bastion-ssm
      external_id: none
      start_date: 2021-04-25 00:00:00+00:00
      tables: "[\n  {\n    \"search_prefix\": \"HY/sql\",\n    \"search_pattern\"\
        : \"vwProjectRepl\\\\/.*\\\\.csv\",\n    \"table_name\": \"projects\",\n \
        \   \"key_properties\": \"__sdc_primary_key\",\n    \"date_overrides\": \"\
        _sdc_batched_at,_sdc_received_at,updated_at\",\n    \"delimiter\": \"\\t\"\
        \n  },\n  {\n    \"search_prefix\": \"HY/sql\",\n    \"search_pattern\": \"\
        vwProjectMetricRepl\\\\/.*\\\\.csv\",\n    \"table_name\": \"project_metrics\"\
        ,\n    \"key_properties\": \"__sdc_primary_key\",\n    \"date_overrides\"\
        : \"_sdc_batched_at,_sdc_received_at,updated_at\",\n    \"delimiter\": \"\\\
        t\"\n  }\n]"
    select:
    - projects.*
    - project_metrics.*
    - !*._sdc_*
It seems like the
_sdc_
columns are being filtered out, but for some reason the column is still created by the target, and since the column was filtered, it is therefore NULL and violates the constraint. My work around in the mean-time has been to just load all columns. For context, the data I am trying to import was created by Sitch (the SaaS) version. This use case is somewhat temporary, eventually we will go directly to source, but in the interim I need to use the CSV data produced by Stich which includes all the
_sdc_
columns in it...
j
I'm not sure if this is the issue, but target-postgres creates it's own
_sdc_
columns as well - it's where it stores various meta data. So maybe it's treating
_sdc_
columns as special somehow and they can't be turned off?
d
That's kind of what I suspected
I think the takeaway is, if you're re-ingesting csv data that was produced by a singer target, you have to ingest it in it's entirety
There are probably better ways of chaining this all together in a production scenario. This particular instance is part of a dev/sandbox prototype. Eventually we will ingest directly from the source (
tap-mssql
in this case...)