Hi I m completely newbie in Meltano just trying to learn the Meltano #getting-started

Join Slack

Hi, I'm completely newbie in Meltano, just trying ...

# getting-started

csp

04/12/2023, 9:18 PM

Hi, I'm completely newbie in Meltano, just trying to learn the basics. Is it ok to ask tap-specific question here? 🙂

csp

04/12/2023, 9:29 PM

I'm trying to use this tap-postgres, and try to set

default_replication_method

INCREMENTAL

, but the document said that I need to configure

replication_key

column within the catalog's stream definitions. Not sure what it means exactly, can someone give an example? I can set the replication method to full_table and it works fine, but I'd like to do incrementally when there are new rows and rows updated.

aaronsteers

04/12/2023, 10:34 PM

Hi, @csp. And welcome! The

default_replication_method

config option that

tap-postgres

provides is not available on all taps. Assuming you set the value to

INCREMENTAL

, then the way the tap interprets this setting is something like: "Use incremental replication for all the tables that have an incremental key defined, assuming I have some bookmark to continue from. Otherwise, use "Full table" replication."

aaronsteers

04/12/2023, 10:35 PM

It should be safe to set

default_replication_method

to INCREMENTAL but you will also have to specify what the incremental key columns are (e.g. 'last_updated_as_of', etc.) in the

meltano.yml

file.

aaronsteers

04/12/2023, 10:38 PM

There's one additional option, which we recommend if/when you are able to configure it on the backend. That option is

LOG_BASED

replication, and that uses Postgres's own internal changelogs to track just the rows that are changed. While in theory, this is the same as a column-based incremental sync ... in practice it's much more powerful because it doesn't miss updates if your 'updated_at' columns miss an update, and it can also track changes on tables that have no incremental column set. That said, you might require assistance from a DB admin to configure it, and so its perfectly normal to INCREMENTAL or FULL_TABLE replication when you're first getting started. https://github.com/transferwise/pipelinewise-tap-postgres#log-based-replication-requirements

csp

04/13/2023, 12:36 AM

Thanks @aaronsteers for the detailed explanation. I did set the

updated_at

column as the

replication_key

for INCREMENTAL, but when I run

meltano config tap-postgres test

, it gave me the error

Copy code

Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
join our friendly Slack community.

Plugin configuration is invalid
AttributeError: 'NoneType' object has no attribute 'get'

that's when I got confused. The error message is not very clear as to what it meant. So I checked out the code to take a look, and got more confused. That's when I came here for help. I'll try to make sense out of the codes.

csp

04/13/2023, 12:37 AM

The
default_replication_method
config option that
tap-postgres
provides is not available on all taps.

What do you mean by this?

aaronsteers

04/13/2023, 3:43 PM

Hi, @csp. Do you mind providing your

meltano.yml

content, with sensitive info redacted?

aaronsteers

04/13/2023, 3:43 PM

AttributeError: 'NoneType' object has no attribute 'get'

- Also, if you have the line number from the error message, or a fuller error message, that might be helpful for debugging.

aaronsteers

04/13/2023, 3:46 PM

My hypothesis is that some part of the config is not correct, and the tap is failing unexpectedly when it parses a specific part of the config or catalog metadata, etc. Not to dive too deep into Python internals, but "get()" is generally called against a python dictionary - but if the value is missing, then it will be seen as "NoneType" - and not have "get()" available to the call.

csp

04/13/2023, 5:55 PM

I stripped this to the minimum, just extracting data from postgresql and dump to a csv file:

Copy code

version: 1
default_environment: dev
project_id: 4457f4ce-a3b3-4e19-8f1c-861c13d1d809
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-postgres
    variant: transferwise
    pip_url: pipelinewise-tap-postgres
    config:
      host: localhost
      port: 5432
      dbname: mydb
      filter_schemas: ''
      user: itsme
      default_replication_method: INCREMENTAL
      replication_key: updated_at
  loaders:
  - name: target-csv
    variant: hotgluexyz
    pip_url: git+<https://github.com/hotgluexyz/target-csv.git>
    config:
      destination_path: data/output/
      quotechar: '"'

And this is just a simplified test table:

Copy code

create table customers(id int, first_name varchar(255), last_name varchar(255), age int, created_at timestamp, updated_at timestamp);

Running the test command

meltano config tap-postgres test

just failed with the error that I gave above. Nothing specific about the error, and not very explanatory. I believe the error means that I'm missing something in

meltano.yml

when I changed the replication method, but I don't know what exactly. The tap's doc is not clear either 🙂

csp

04/13/2023, 6:10 PM

Do I need to specify the sql type of that replication key column? If so, any document I can refer to on how to do that? I'm tracing it to this line

Copy code

replication_key_sql_datatype = md_map.get(('properties', replication_key)).get('sql-datatype')

in the

sync_strategies/incremental.py

file. Seems like incremental needs more config, but it's not documented anywhere.

aaronsteers

04/13/2023, 6:23 PM

Can you try removing the entry for filter_schemas and let us know if you still get the same NoneType/get error?

csp

04/13/2023, 7:07 PM

I removed it, and it still throws the same error.

edgar_ramirez_mondragon

04/13/2023, 10:54 PM

The replication key is part of the tap metadata and not configuration, so you need to set it there. This configuration works for me with a

public.state

table with

updated_at

as replication key:

Copy code

version: 1
default_environment: dev
project_id: 4457f4ce-a3b3-4e19-8f1c-861c13d1d809
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-postgres
    variant: transferwise
    pip_url: pipelinewise-tap-postgres
    config:
      host: localhost
      port: 5432
      dbname: mydb
      filter_schemas: ''
      user: itsme
      default_replication_method: INCREMENTAL
    select:
    - public-state.*
    metadata:
      public-state:
        replication-key: updated_at
  loaders:
  - name: target-csv
    variant: hotgluexyz
    pip_url: git+<https://github.com/hotgluexyz/target-csv.git>
    config:
      destination_path: data/output/
      quotechar: '"'

(see docs for

metadata

in https://docs.meltano.com/concepts/plugins#metadata-extra)

csp

04/14/2023, 2:46 AM

@edgar_ramirez_mondragon Thanks for the tip. This one works:

Copy code

version: 1
default_environment: dev
project_id: 4457f4ce-a3b3-4e19-8f1c-861c13d1d809
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-postgres
    variant: transferwise
    pip_url: pipelinewise-tap-postgres
    config:
      host: localhost
      port: 5432
      dbname: mydb
      user: itsme
    metadata:
      public-customers:
        replication-method: INCREMENTAL
        replication-key: updated_at
        updated_at:
          is-replication-key: 'true'
  loaders:
  - name: target-csv
    variant: hotgluexyz
    pip_url: git+<https://github.com/hotgluexyz/target-csv.git>
    config:
      destination_path: data/output/
      quotechar: '"'

Open in Slack

Previous Next