dean_morin
01/27/2022, 1:50 AMpayload
in the job
table in the meltano db
3. Run meltano
This will do a full table sync, then go back to log based replication after.
This is fine, but there’s a period where that data is missing (or partially available). This is a small table so it this period is short, but larger tables can take hours to do the initial sync.
Is there any feature I’m unaware of that could help with this?paul_tiplady
01/28/2022, 7:41 PMPostgres =E=> Meltano =L=> Snowflake raw =T=> Snowflake deduped/transformed
And if you’re doing this, you can just run a full table sync to “snowflake raw”. The dedupe is something like:
QUALIFY ROW_NUMBER() OVER (PARTITION BY id ORDER BY _uploaded_at DESC) = 1
Also — I think there’s an easier way to do the full table sync without messing with the meltano DB. I’m only recently experimenting with this, but I think you can say:
meltano elt tap-postgres target-snowflake --transform=skip --select tap-table-selector
DBT_TARGET=snowflake meltano invoke dbt:run --models snowflake-dest-tablename
To run the EL and T steps for just the table in question. Since you omit --job_id
it’ll do a full sync without using the meltano.db
state.paul_tiplady
01/28/2022, 7:42 PM