Hello all I am Kenneth a beginner in Meltano. We h...
# singer-taps
k
Hello all I am Kenneth a beginner in Meltano. We have been piloting migrating data ingestion using Meltano from a custom in-house solution and I have been having some challenges. I have shared some below and would appreciate some assistance and/or pointers to documentation. 1. While using for example the csv extractor and postgres loader when I export and load the data into an existing table it creates duplicate columns instead of using the ones already existing. Is there a way to force meltano to use the existing columns 2. What if for example the data has a unique column 'email' is it possible to update the data on only that unique column and ignore the primary key 'Id' incase the Id is different? Thank you in advance... This is an example of my yml file
Copy code
version: 1
send_anonymous_usage_stats: false
project_id: xxx-xxx
plugins:
  extractors:
  - name: tap-csv
    variant: meltano
    pip_url: git+<https://gitlab.com/meltano/tap-csv.git>
    config:
      files:
      - entity: tmp_voucher_seeder_meltano_may8
        file: ../update/payments_C.csv
        keys:
        - Id
      - entity: tmp_voucher_seeder_meltano_may8
        file: ../update/payments_Y.csv
        keys:
        - Id
  - name: tap-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
  loaders:
  - name: target-postgres
    variant: transferwise
    pip_url: pipelinewise-target-postgres
    config:
      add_record_metadata: false
      batch_size_rows: 10000
      default_target_schema: raw_data
      flush_all_streams: true
      parallelism: 1
      primary_key_required: false
      dbname: postgres
      user: user_name
      host: <http://myhost.com|myhost.com>
a
Re 1, could you give some examples of the 'duplicate columns' in postgres? are they named slightly differently? I'm not sure how a postgres table could have two columns with identical names.
👍 1
k
Thanks @Andy Carter for your reply. Below is an example from the screen output
Copy code
time=2024-05-29 01:27:19 name=target_postgres level=INFO message=Table '"tmp_voucher_seeder_meltano_may8"' exists
time=2024-05-29 01:27:19 name=target_postgres level=INFO message=Versioning column: ALTER TABLE raw_data."tmp_voucher_seeder_meltano_may8" RENAME COLUMN "amount" TO "amount_20240529_0127"
a
Check the type of your existing column in postgres, if it doesn't match what your tap is producing for this column (if it's csv then probably every column will be
varchar
) then the target will 'version' the column to avoid overwriting: https://transferwise.github.io/pipelinewise/user_guide/schema_changes.html#versioning-columns To avoid the issue, you could drop your original target table
tmp_voucher_seeder_meltano_may8
and let meltano create it and manage the table schema. I prefer that method.
🙌 1