Is there an easy way to do a full refresh on a table in a po Meltano #troubleshooting

Is there an easy way to do a full refresh on a tab...

dean_morin

01/27/2022, 1:50 AM

Is there an easy way to do a full refresh on a table in a postgres -> snowflake pipeline? I’m using log based replication, and I’ve discovered that there was a small period a couple of months ago where some rows weren’t replicated for whatever reason. I think at this point the easiest thing to do is to: 1. Drop the table in snowflake 2. Remove the state for the table from the

payload

in the

job

table in the meltano db 3. Run meltano This will do a full table sync, then go back to log based replication after. This is fine, but there’s a period where that data is missing (or partially available). This is a small table so it this period is short, but larger tables can take hours to do the initial sync. Is there any feature I’m unaware of that could help with this?

paul_tiplady

01/28/2022, 7:41 PM

@dean_morin I think it should be possible to do this without downtime. Are you de-duping in your snowflake table? If so you should be able to run a full table sync into your “loaded” table, before de-duping into your final “transformed” table. (you need to do de-duping for incremental update to work correctly either way: https://gitlab.com/meltano/meltano/-/issues/2504) I think the standard pipeline is something like:

Copy code

Postgres =E=> Meltano =L=> Snowflake raw =T=> Snowflake deduped/transformed

And if you’re doing this, you can just run a full table sync to “snowflake raw”. The dedupe is something like:

Copy code

QUALIFY ROW_NUMBER() OVER (PARTITION BY id ORDER BY _uploaded_at DESC) = 1

Also — I think there’s an easier way to do the full table sync without messing with the meltano DB. I’m only recently experimenting with this, but I think you can say:

Copy code

meltano elt tap-postgres target-snowflake --transform=skip --select tap-table-selector
DBT_TARGET=snowflake meltano invoke dbt:run --models  snowflake-dest-tablename

To run the EL and T steps for just the table in question. Since you omit

--job_id

it’ll do a full sync without using the

meltano.db

state.

paul_tiplady

01/28/2022, 7:42 PM

Unless — does log-based replication remove the need for de-duping?

Open in Slack

Previous Next