hey everyone....first day, but did a search and di...
# troubleshooting
j
hey everyone....first day, but did a search and didnt see anything. I am trying to extract data from Postgres using the tap-postgres extractor. I have it configured and is able to list the entities so it is making the connection. I have the table i want to extranct in the yaml
Copy code
select:
        - public-truck_stops.zipcode.*
        - public-truck_stops.*.*
However when I invoke the extractor, i get and error of
Copy code
{"type": "ACTIVATE_VERSION", "stream": "public-truck_stops", "version": 1671052198343}
time=2022-12-14 15:09:58 name=tap_postgres level=CRITICAL message='NoneType' object has no attribute 'get'
Traceback (most recent call last):
I cant seem to figure out what it is looking for Also, is there a way to only download a set number of rows at a time? Some of future tables i will be downloading are large so dont want lock tables as the extract happens.
- name: dev config: plugins: extractors: - name: tap-postgres config: host: *********** port: 5432 user: ************* dbname: mudflap default_replication_method: INCREMENTAL filter_schemas: public select: - public-truck_stops.zipcode.* - public-truck_stops..
t
I think if you specify
default_replication_method: INCREMENTAL
you have to define the field used to identify new/updated rows
e
Also, indentation might be off. Try
Copy code
- name: dev
  config:
  plugins:
    extractors:
    - name: tap-postgres
      config:
        host: ***********
        port: 5432
        user: *************
        dbname: mudflap
        default_replication_method: INCREMENTAL
        filter_schemas: public
      select:
      - public-truck_stops.zipcode.*
      - public-truck_stops.*.*
t
Well, to use incremental replication in general that's true
j
How/where do you do that? did see something about that, but no examples in the getting started tutorial
I think this is what you talking about
Copy code
meltano config tap-postgres set _metadata some_entity_id replication-key updated_at
meltano config tap-postgres set _metadata some_entity_id replication-key id
t
Somewhere you'll have to specify the replication-key property... something like
Copy code
metadata:
      public-truck_stops.*:
        replication-method: INCREMENTAL
        replication-key: dbTimeStamp
yup yup that's the one
The YAML I just posted is what it'll look like in meltano.yml but the CLI commands you posted will produce that
j
anyway for it not to grab the entire table in one go?
t
Not that I know of. What exactly is possible/available will depend on which tap variant you're using though.
j
i belive transferwise since that is the default
t
That is most likely, yeah
That does have an option for max run time, not that I've tried it, which might be a proxy for what you're getting at. Will reading the rows cause locking problems though? I thought PG was MVCC in which case readers shouldn't block anything...
j
we have see issues on long extracts where if an update is made (live system) it errors out as the record it thought it was getting was now deleted/updated (previous version deleted)
c
The 'variants' concept for taps and targets is probably the key thing to understand first. And when it comes to the 'variants' the main differentiation will be whether a tap/target is built with the Meltano SDK or not. Taps that are built with the Meltano SDK will have some nice common "shared features" that you can expect to always work for all Meltano SDK based taps.
One of those features would be the
start_date
setting in Meltano SDK based taps, "*should*" be available and working out of the box for most of the those taps.
That
start_date
setting is what could give you an initial point of investigation for limiting the data to extract, if your source has a time-based column that you can use as the 'replication key'
j
i am using the transferwise one...i see it now further down in the yaml, but how do i know if that was built with the SDk?
this select_filter might be what i need
OK so we got the extract working...i think the load to Snowflake is now failing
Copy code
loaders:
      - name: target-snowflake
        config:
          account: <http://uxb77134.snowflakecomputing.com|uxb77134.snowflakecomputing.com>
          username: <mailto:jaye@mudflapinc.com|jaye@mudflapinc.com>
          role: PROD_EL
          database: PROD_RAW
          warehouse: PROD_EL
          schema: MUDFLAP_MELTANO
          batch_size: 5000
          timestamp_column: ETL__loaded_at
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-postgres
    variant: transferwise
    pip_url: pipelinewise-tap-postgres
  loaders:
  - name: target-snowflake
    variant: meltano
    pip_url: git+<https://gitlab.com/meltano/target-snowflake.git>
I am getting a warn of
Copy code
WARNING: /Users/jayehowell/Documents/REPOS/meltano-demo/demo/.meltano/loaders/target-snowflake/venv/bin/python is loading libcrypto in an unsafe way
2022-12-14T22:22:30.463690Z [debug    ] Deleted configuration at /Users/jayehowell/Documents/REPOS/meltano-demo/demo/.meltano/run/target-snowflake/target.8aec8b61-ddc6-4062-adc8-1112bdf53c59.config.json
But dont think that is the issue. I ran the invoke of the extract to a file then cat and pipe the file to invoke of the loader. A file is produced.
t
Yeah I think that warning is innocuous
j
does the table need to be created in the destination already or will it do that? the user/role has permissions
t
In the .meltano/logs/elt/<pipeline_name>/ directory there will be some subdirectories containing elt.log files... I would start by looking in those for errors.
Generally the target will create tables for you. Been a long time since I worked with the snowflake target though, honestly, so I could be wrong.
j
is that in the project folder? or bin folder
t
project
If it helps: My notes from my PoC w/Snowflake say to ensure that all the stuff described here is done and that the easiest way to make everything work is to let the target create the destination schema, i.e. don't create it in advance. That's a shot in the dark though...
j
yea, didnt use the transferwise version since it used CVS. and we have lots of things in the data which break csv so went with meltano one
the output from tap-postgress seems to be json
t
That is correct
j
@thomas_briggs @christoph thanks both for you help.
t
All taps produce JSON defined by the Singer spec, all targets consume said JSON
j
ok
t
It's the very simple secret to being able to feed data from any source to any destination 😉
j
previously where i worked we always exported to avro
t
Gotta run, time to cook dinner. 🙂 I'm happy to answer more questions tomorrow. Good luck!
j
Here is i think the relevant section 2022-12-14T223624.857497Z [debug ] waiting for process completion or exception 2022-12-14T223624.896565Z [info ] time=2022-12-14 163624 name=tap_postgres level=INFO message=Selected streams: ['public-truck_stops'] cmd_type=elb consumer=False name=tap-postgres producer=True stdio=stderr string_id=tap-postgres 2022-12-14T223625.016251Z [info ] WARNING: /Users/jayehowell/Documents/REPOS/meltano-demo/demo/.meltano/loaders/target-snowflake/venv/bin/python is loading libcrypto in an unsafe way cmd_type=elb consumer=True name=target-snowflake producer=False stdio=stderr string_id=target-snowflake 2022-12-14T223625.016779Z [debug ] tail consumer completed first 2022-12-14T223625.018301Z [debug ] Deleted configuration at /Users/jayehowell/Documents/REPOS/meltano-demo/demo/.meltano/run/tap-postgres/tap.f7960e9a-8c3a-41b9-b0c7-42979be39613.config.json 2022-12-14T223625.018569Z [debug ] Deleted configuration at /Users/jayehowell/Documents/REPOS/meltano-demo/demo/.meltano/run/target-snowflake/target.7ebc940c-4f6c-44f9-b95a-4090b52842fc.config.json 2022-12-14T223625.154587Z [error ] Loader failed
t
Interesting. I'm not sure what to make of that error. It doesn't seem like a problem with the target; it almost seems like meltano is having problems just running the target. If you do
meltano invoke target-snowflake
what do you get?
j
Nothing was created in the logs folder however i did send stdout and err to this file...
i am building a new project with datamill varient since it looks like transferwise only uses csv..not sure what datamil uses
t
Weird that that log file has basically nothing in it 🤔