https://meltano.com/ logo
#announcements
Title
# announcements
p

proud-jackal-38797

03/23/2021, 4:27 PM
Hi, I was exploring the INCREMENTAL replication in tap-mysql and tested and found that for the next job run, it includes the last state also. For ex.: 1. I implemented INCREMENTAL replication on a column ID 2. I populated the data (IDs from 1-20) 3. Next, I added some data (IDs from 21-25) in the source table and ran the ELT 4. Instead of like appending the datas from 21-25, 20-25 is appending. i.e. 20 is getting duplicated. Am I missing something. Or is it a bug to be resolved? Thanks is advance.
r

ripe-musician-59933

03/23/2021, 4:33 PM
This is expected behavior: https://www.stitchdata.com/docs/replication/replication-methods/key-based-incremental#limitation-3--replication-key-inclusivity, but it is more relevant when using a timestamp as a replication key than an ID.
Is this causing issues for you? If your target upserts based on the key properties, it should be fine, but if it only appends this could result in duplicate records in the destination (https://gitlab.com/meltano/meltano/-/issues/2504)
p

proud-jackal-38797

03/23/2021, 4:38 PM
Oh OK.. In target-bigquery, from my knowledge the replication methods available are append and truncate. Is there a way to upsert data?
r

ripe-musician-59933

03/23/2021, 4:41 PM
I don't think it supports that today, no. Is it an option to deduplicate inside the destination using dbt? Alternatively, you can look into modifying tap-mysql to only use
> <replication key>
instead of
>= <replication key>
when the key is an interger rather than a timestamp
p

proud-jackal-38797

03/23/2021, 4:44 PM
I'm running only EL through meltano. Can we configure that in meltano.yml? If yes, Can U guide me how?
r

ripe-musician-59933

03/23/2021, 4:48 PM
It's not as simple as adding "deduplicate on key X" to
meltano.yml
, unfortunately, no. you'd need to setup a dbt project and add your own dbt model to do this. Are you familiar with dbt already? @salmon-salesclerk-77709 Can you help out here? I wonder if we could make it as easy as "deduplicate on key X" if we can generate the dbt models etc...
s

salmon-salesclerk-77709

03/23/2021, 4:53 PM
I was thinking about this earlier today. It’d be an interesting iteration on the target sdk to assist with creating a base layer using dbt. We’d have to pass through some info from the tap and make the SQL fairly agnostic, but there’s no reason we couldn’t solve that. To your point @proud-jackal-38797 my comment here is still my recommendation https://meltano.slack.com/archives/CFG3C3C66/p1616509800079700?thread_ts=1616480614.078000&amp;cid=CFG3C3C66 at GitLab even for sources coming from paid tools we still had a cleaning and validation step for all raw data.
🙌 1
I made https://gitlab.com/meltano/singer-sdk/-/issues/56 to document for further discussion
p

proud-jackal-38797

03/25/2021, 4:20 AM
Thank U for the support @salmon-salesclerk-77709 @ripe-musician-59933
One more clarification. Please.. Can I have a unique key constraint in the loader using
truncate
in the meltano without
dbt
? I don't know what
truncate
does. I don't find the syntax for that.. Help me here.. Thanks for the earlier support and in advance here.
s

salmon-salesclerk-77709

03/25/2021, 1:30 PM
Truncate is the sql keyword used to remove all rows (data) from a table but keep its structure in place. https://docs.snowflake.com/en/sql-reference/sql/truncate-table.html I don’t know many taps that support that since that’s a very destructive operation
p

proud-jackal-38797

03/26/2021, 8:59 AM
Hi, @salmon-salesclerk-77709. Thank U for the response. I know about the truncate in sql keywords. But there is a configuration, 'truncate' in the replication method for target-bigquery. https://meltano.com/plugins/loaders/bigquery.html#replication-method .. Is it like removing all the data for every run(i.e. only new values are loaded, (in case of incremental replication in the tap-mysql side)). If so, Is there a feature to truncate with
unique key
constraint, so that duplicate values are removed ?? I'm eager in deduplicating in the EL part itself, so that the cost of DML query in the target-bigquery side using
dbt
won't be expensive, since deduplicating DML query has to run for every scheduled elt job.
s

salmon-salesclerk-77709

03/26/2021, 10:23 AM
ah my mistake, apologies. I’ve scanned through the code and documentation and I’m not quite sure what the expected behavior is. The naming of truncate makes me nervous as I’m more used to Snowflake’s merge statement… can you try it out with a small dataset and see what happens?
p

proud-jackal-38797

03/26/2021, 11:26 AM
Sure. Thanks @salmon-salesclerk-77709