Is there an easy way to do a full refresh on a tab...
# troubleshooting
d
Is there an easy way to do a full refresh on a table in a postgres -> snowflake pipeline? I’m using log based replication, and I’ve discovered that there was a small period a couple of months ago where some rows weren’t replicated for whatever reason. I think at this point the easiest thing to do is to: 1. Drop the table in snowflake 2. Remove the state for the table from the
payload
in the
job
table in the meltano db 3. Run meltano This will do a full table sync, then go back to log based replication after. This is fine, but there’s a period where that data is missing (or partially available). This is a small table so it this period is short, but larger tables can take hours to do the initial sync. Is there any feature I’m unaware of that could help with this?
p
@dean_morin I think it should be possible to do this without downtime. Are you de-duping in your snowflake table? If so you should be able to run a full table sync into your “loaded” table, before de-duping into your final “transformed” table. (you need to do de-duping for incremental update to work correctly either way: https://gitlab.com/meltano/meltano/-/issues/2504) I think the standard pipeline is something like:
Copy code
Postgres =E=> Meltano =L=> Snowflake raw =T=> Snowflake deduped/transformed
And if you’re doing this, you can just run a full table sync to “snowflake raw”. The dedupe is something like:
Copy code
QUALIFY ROW_NUMBER() OVER (PARTITION BY id ORDER BY _uploaded_at DESC) = 1
Also — I think there’s an easier way to do the full table sync without messing with the meltano DB. I’m only recently experimenting with this, but I think you can say:
Copy code
meltano elt tap-postgres target-snowflake --transform=skip --select tap-table-selector
DBT_TARGET=snowflake meltano invoke dbt:run --models  snowflake-dest-tablename
To run the EL and T steps for just the table in question. Since you omit
--job_id
it’ll do a full sync without using the
meltano.db
state.
Unless — does log-based replication remove the need for de-duping?