is it possible to set default metadata for all str...
# troubleshooting
j
is it possible to set default metadata for all streams for tap-postgres? I'm trying to transition our tap-postgres setup from FULL_REFRESH to INCREMENTAL, with Rails as our backend, all tables use the
id
column for the primary key and
updated_at
should be the
replication-key
for almost all tables. Is it possible to set
updated_at
as the default for all tables?
here's what my plugin config looks like for 1 table.
Copy code
plugins:
    extractors:
    - name: tap-postgres
      variant: transferwise
      pip_url: pipelinewise-tap-postgres
      config:
        host: ${ARC_DB_HOST}
        user: ${ARC_DB_USER}
        port: ${ARC_DB_PORT}
        password: ${ARC_DB_PASSWORD}
        dbname: ${ARC_DB_NAME}
        default_replication_method: INCREMENTAL
        filter_schemas: public
        ssl: true
  
      select:
      - users.*

      metadata:
        users: # from what I can tell, we'll need to duplicate these lines for EVERY table? 
          replication-method: INCREMENTAL
          replication-key: updated_at
          updated_at:
            is-replication-key: true
We have ~80 tables though and ideally i'd be able to set it up so that the metadata that's there for
users
would instead be the default for all tables. is this possible?
p
@jacob_mulligan yep! you can use
'*'
in place of
users
under the metadata key and it should apply that metadata to all streams
j
happy dance
ah perfect!
p
checkout https://docs.meltano.com/guide/integration#setting-metadata for more details! The docs are hidden away a bit
dance
j
oh nice, thanks for the docs link. i did try to find documentation but clearly wasn't able to get there.. cool that we can also do something like
"*_full":
because there is always an exception.. turns out we have a few tables which i think we need to
FULL_REFRESH
every time because there are no timestamp columns which we can use for incremental loads. I thought I might be able to do this to have these tables full refresh, however, it looks like the
replication-method
config does what i expected because Meltano still tries syncing it incrementally and fails because the table doesn't have the (default)
updated_at
column
Copy code
'some_table':
        # this table doesn't have a created_at or updated_at column, so we're doing a full refresh every day.
        replication-method: FULL_REFRESH
what are my options for just full_refreshing these tables?
p
As far as I'm aware you should be able to keep the * syntax and add in explicit references to the other tables to override them as full table replication. If that doesnt work you could always inherit the tap, exclude the full table tables from the original select criteria, and add them to the new inherited one with full table metadata. Effectively having
tap-postgres-incremental
and
tap-postgres-full-table
, sometimes this is a recommended practice anyways because sync frequency is commonly coupled with replication method i.e. incremental is run more often (hourly) than full table (daily?)