Hi all There s something about `target postgres` that I am n Meltano #singer-targets

Hi all, There's something about `target-postgres`...

david_tout

04/23/2021, 3:31 AM

Hi all, There's something about

target-postgres

that I am not really understanding. If got a tap that is reading from a CSV file, and there are a few columns in that CSV file I want to exclude. I've done this with the select extra on that tap to exlude those columns. The tap additionally adds a few extra columns of its own (_sdc_source_bucket for example). (the tap is

tap-s3-csv

) When the target runs, it's creating a table that includes all the excluded columns in the schema and defines them as not null. Then it tries to insert and correctly excludes the fields (by nulling them out) and the load fails becuase of the not null constraint. My work around was to fully define the schema (including extra columns) in my tap config...

taylor

04/23/2021, 12:24 PM

I seem to recall someone else seeing this issue too. I found https://github.com/datamill-co/target-postgres/issues/160 as well. @andrew_madonna have you seen this issue before?

david_tout

04/25/2021, 2:11 AM

This is a little differe-t because it is the first ingestion.

david_tout

04/25/2021, 2:12 AM

I'll try to get the logs & paste them here tomorrow

david_tout

04/26/2021, 7:16 AM

Here's my error:

Copy code

DETAIL:  Failing row contains (72d60960-5614-4c88-bb7e-6b8d742c57cc, 2021-04-23 02:15:54.3426+00, null, 1619144154, null, null, ,
 null, null, 4800000, null, , CS Bearings, 498, Sydney, Industrial, NSW, Cancelled, 2012-10-10 17:45:00+00, null, , null, null, null).
target-postgres       | CONTEXT:  COPY tmp_134ef136_aed3_4c3f_88f6_9a4fc5cf0c5f, line 1: "72d60960-5614-4c88-bb7e-6b8d742c57cc,2021-04-23 02:15:54.3426+0
0:00,NULL,1619144154,NULL,NULL,,NULL,..."
target-postgres       |
target-postgres       | CRITICAL ('Exception writing records', NotNullViolation('null value in column "_sdc_source_bucket" violates not-null constraint

We can see that it is creating this column:

Copy code

target-postgres       | INFO Writing table batch schema for `('test',)`...
target-postgres       | INFO Table Schema Change [`test`.`('__sdc_primary_key',)`:`__sdc_primary_key`] New column (took 13 millis)
target-postgres       | INFO Table Schema Change [`test`.`('_sdc_batched_at',)`:`_sdc_batched_at`] New column (took 2 millis)
target-postgres       | INFO Table Schema Change [`test`.`('_sdc_received_at',)`:`_sdc_received_at`] New column (took 2 millis)
target-postgres       | INFO Table Schema Change [`test`.`('_sdc_sequence',)`:`_sdc_sequence`] New column (took 1 millis)
target-postgres       | INFO Table Schema Change [`test`.`('_sdc_table_version',)`:`_sdc_table_version`] New column (took 3 millis)

meltano.yml

Copy code

plugins:
  extractors:
  - name: tap-s3-csv
    namespace: tap_s3_csv
    pip_url: tap-s3-csv
    executable: tap-s3-csv
    capabilities:
    - properties
    - discover
    - state
    settings:
    - name: start_date
    - name: account_id
    - name: role_name
    - name: bucket
    - name: external_id
    - name: tables
    config:
      account_id: 908421949328
      bucket: etl-data-raw-ap-southeast-2-dev
      role_name: cevo-bastion-ssm
      external_id: none
      start_date: 2021-04-25 00:00:00+00:00
      tables: "[\n  {\n    \"search_prefix\": \"HY/sql\",\n    \"search_pattern\"\
        : \"vwProjectRepl\\\\/.*\\\\.csv\",\n    \"table_name\": \"projects\",\n \
        \   \"key_properties\": \"__sdc_primary_key\",\n    \"date_overrides\": \"\
        _sdc_batched_at,_sdc_received_at,updated_at\",\n    \"delimiter\": \"\\t\"\
        \n  },\n  {\n    \"search_prefix\": \"HY/sql\",\n    \"search_pattern\": \"\
        vwProjectMetricRepl\\\\/.*\\\\.csv\",\n    \"table_name\": \"project_metrics\"\
        ,\n    \"key_properties\": \"__sdc_primary_key\",\n    \"date_overrides\"\
        : \"_sdc_batched_at,_sdc_received_at,updated_at\",\n    \"delimiter\": \"\\\
        t\"\n  }\n]"
    select:
    - projects.*
    - project_metrics.*
    - !*._sdc_*

It seems like the

_sdc_

columns are being filtered out, but for some reason the column is still created by the target, and since the column was filtered, it is therefore NULL and violates the constraint. My work around in the mean-time has been to just load all columns. For context, the data I am trying to import was created by Sitch (the SaaS) version. This use case is somewhat temporary, eventually we will go directly to source, but in the interim I need to use the CSV data produced by Stich which includes all the

_sdc_

columns in it...

johann_du_toit

04/26/2021, 9:52 PM

I'm not sure if this is the issue, but target-postgres creates it's own

_sdc_

columns as well - it's where it stores various meta data. So maybe it's treating

_sdc_

columns as special somehow and they can't be turned off?

david_tout

04/27/2021, 12:34 AM

That's kind of what I suspected

david_tout

04/27/2021, 12:36 AM

I think the takeaway is, if you're re-ingesting csv data that was produced by a singer target, you have to ingest it in it's entirety

david_tout

04/27/2021, 12:36 AM

There are probably better ways of chaining this all together in a production scenario. This particular instance is part of a dev/sandbox prototype. Eventually we will ingest directly from the source (

tap-mssql

in this case...)

Open in Slack

Previous Next