Hi, I'm completely newbie in Meltano, just trying ...
# getting-started
c
Hi, I'm completely newbie in Meltano, just trying to learn the basics. Is it ok to ask tap-specific question here? 🙂
I'm trying to use this tap-postgres, and try to set
default_replication_method
to
INCREMENTAL
, but the document said that I need to configure
replication_key
column within the catalog's stream definitions. Not sure what it means exactly, can someone give an example? I can set the replication method to full_table and it works fine, but I'd like to do incrementally when there are new rows and rows updated.
a
Hi, @csp. And welcome! The
default_replication_method
config option that
tap-postgres
provides is not available on all taps. Assuming you set the value to
INCREMENTAL
, then the way the tap interprets this setting is something like: "Use incremental replication for all the tables that have an incremental key defined, assuming I have some bookmark to continue from. Otherwise, use "Full table" replication."
It should be safe to set
default_replication_method
to INCREMENTAL but you will also have to specify what the incremental key columns are (e.g. 'last_updated_as_of', etc.) in the
meltano.yml
file.
There's one additional option, which we recommend if/when you are able to configure it on the backend. That option is
LOG_BASED
replication, and that uses Postgres's own internal changelogs to track just the rows that are changed. While in theory, this is the same as a column-based incremental sync ... in practice it's much more powerful because it doesn't miss updates if your 'updated_at' columns miss an update, and it can also track changes on tables that have no incremental column set. That said, you might require assistance from a DB admin to configure it, and so its perfectly normal to INCREMENTAL or FULL_TABLE replication when you're first getting started. https://github.com/transferwise/pipelinewise-tap-postgres#log-based-replication-requirements
c
Thanks @aaronsteers for the detailed explanation. I did set the
updated_at
column as the
replication_key
for INCREMENTAL, but when I run
meltano config tap-postgres test
, it gave me the error
Copy code
Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
join our friendly Slack community.

Plugin configuration is invalid
AttributeError: 'NoneType' object has no attribute 'get'
that's when I got confused. The error message is not very clear as to what it meant. So I checked out the code to take a look, and got more confused. That's when I came here for help. I'll try to make sense out of the codes.
The
default_replication_method
config option that
tap-postgres
provides is not available on all taps.
What do you mean by this?
a
Hi, @csp. Do you mind providing your
meltano.yml
content, with sensitive info redacted?
AttributeError: 'NoneType' object has no attribute 'get'
- Also, if you have the line number from the error message, or a fuller error message, that might be helpful for debugging.
My hypothesis is that some part of the config is not correct, and the tap is failing unexpectedly when it parses a specific part of the config or catalog metadata, etc. Not to dive too deep into Python internals, but "get()" is generally called against a python dictionary - but if the value is missing, then it will be seen as "NoneType" - and not have "get()" available to the call.
c
I stripped this to the minimum, just extracting data from postgresql and dump to a csv file:
Copy code
version: 1
default_environment: dev
project_id: 4457f4ce-a3b3-4e19-8f1c-861c13d1d809
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-postgres
    variant: transferwise
    pip_url: pipelinewise-tap-postgres
    config:
      host: localhost
      port: 5432
      dbname: mydb
      filter_schemas: ''
      user: itsme
      default_replication_method: INCREMENTAL
      replication_key: updated_at
  loaders:
  - name: target-csv
    variant: hotgluexyz
    pip_url: git+<https://github.com/hotgluexyz/target-csv.git>
    config:
      destination_path: data/output/
      quotechar: '"'
And this is just a simplified test table:
Copy code
create table customers(id int, first_name varchar(255), last_name varchar(255), age int, created_at timestamp, updated_at timestamp);
Running the test command
meltano config tap-postgres test
just failed with the error that I gave above. Nothing specific about the error, and not very explanatory. I believe the error means that I'm missing something in
meltano.yml
when I changed the replication method, but I don't know what exactly. The tap's doc is not clear either 🙂
Do I need to specify the sql type of that replication key column? If so, any document I can refer to on how to do that? I'm tracing it to this line
Copy code
replication_key_sql_datatype = md_map.get(('properties', replication_key)).get('sql-datatype')
in the
sync_strategies/incremental.py
file. Seems like incremental needs more config, but it's not documented anywhere.
a
Can you try removing the entry for filter_schemas and let us know if you still get the same NoneType/get error?
c
I removed it, and it still throws the same error.
e
The replication key is part of the tap metadata and not configuration, so you need to set it there. This configuration works for me with a
public.state
table with
updated_at
as replication key:
Copy code
version: 1
default_environment: dev
project_id: 4457f4ce-a3b3-4e19-8f1c-861c13d1d809
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-postgres
    variant: transferwise
    pip_url: pipelinewise-tap-postgres
    config:
      host: localhost
      port: 5432
      dbname: mydb
      filter_schemas: ''
      user: itsme
      default_replication_method: INCREMENTAL
    select:
    - public-state.*
    metadata:
      public-state:
        replication-key: updated_at
  loaders:
  - name: target-csv
    variant: hotgluexyz
    pip_url: git+<https://github.com/hotgluexyz/target-csv.git>
    config:
      destination_path: data/output/
      quotechar: '"'
(see docs for
metadata
in https://docs.meltano.com/concepts/plugins#metadata-extra)
c
@edgar_ramirez_mondragon Thanks for the tip. This one works:
Copy code
version: 1
default_environment: dev
project_id: 4457f4ce-a3b3-4e19-8f1c-861c13d1d809
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-postgres
    variant: transferwise
    pip_url: pipelinewise-tap-postgres
    config:
      host: localhost
      port: 5432
      dbname: mydb
      user: itsme
    metadata:
      public-customers:
        replication-method: INCREMENTAL
        replication-key: updated_at
        updated_at:
          is-replication-key: 'true'
  loaders:
  - name: target-csv
    variant: hotgluexyz
    pip_url: git+<https://github.com/hotgluexyz/target-csv.git>
    config:
      destination_path: data/output/
      quotechar: '"'