Hello community! For those who have mastered targ...
# plugins-general
l
Hello community! For those who have mastered target-snowflake: How do I drop a table (in the Meltano pipeline) before pushing a new one? I want to do this every time I use the target: drop in the one that is present and upload a new one with the day's data. It can be any variant of target-snowflake.
p
@luis_henrique_barbosa_da_silva I'm actually not aware of any variants that will do this for you yet. I created an issue in the SDK repo to support different sync modes, one of which could be trunc and load like you're looking for. Theres a variety of ways that people achieve this today with a bit of work. Usually this is done by using the
add_metadata_columns
setting which adds sync timestamps to the rows, then transforming with dbt to stage only the latest set of data.
heres a similar thread with some helpful comments https://meltano.slack.com/archives/C013EKWA2Q1/p1663363416288579
l
Implementing the DBT Snowflake transformer didn't seem too easy for me to give a single "drop table" command. Am I mistaken? Could you suggest me the steps?
p
The idea isnt so much that you'd use dbt to run a drop table command but rather to create a view off your raw target data that only shows the most recent data. So when you select from the view its as if you only have the latest data. Usually most people are already using the pattern of writing raw tap/target to snowflake then staging it by deduplicating/renaming/obfuscating/etc. with dbt (or any SQL transformation tool) before exposing it to downstream consumers of that data
l
Here we use the dbt after the EL for desired drops. The problem is that I have the same case as Nicholas's (link suggested by you above) and the solution to solve it was to use the dbt as a transformer in the Meltano. That's why I commented above.
@felipe_souto_campelo Look here
a
@luis_henrique_barbosa_da_silva re: use dbt as a transform to run drop. Thats non atomic so if your loader fails, you have a critical breakdown... I solve for this in target-bigquery via a config called
overwrite
that supports pattern matching stream names with
fnmatch
. I also have a config called
upsert
. Same concept using
primary_keys
sent with schema message. Default is append. If you are going to use dbt to run drop, you should write your table to a staging table. Then within a single transaction, execute
create or replace table ... as select * from stage.table
or whatever equivalent ddl
p
FYI theres a new feature that is part of the SDK v0.31.0 now that adds a metadata column thats static for the entire sync. This solves the problem you might have run into with deduplicating using the
_sdc_extracted_at
column
I also have a pending PR related to standardizing the load methods that targets have i.e. insert/upsert/overwrite https://github.com/meltano/sdk/pull/1893
Feel free to leave comments!