Hello, Is there a way to make sure Meltano only lo...
# getting-started
c
Hello, Is there a way to make sure Meltano only load data in my target if it does not already exist? The data is coming from an API that I can query once every 5 minutes. Data captured is in json format. I'm willing to put the data in a postgres database
b
I’m not familiar with
target-postgres
so you’d need to check if such a mechanism is implemented I guess. In our use case (we have built a custom target for s3), we don’t check if data has been loaded already, we instead only extract from the tap data that hasn’t been extracted yet (using replication keys like
start_date
for example). We then run an ETL job to de-dupe data if needed.
d
@charley_guillaume Taps typically communicate to the target which properties are to be treated as primary keys, so that the target can upsert and update existing rows instead of just inserting new duplicate ones. Is that not the behavior you are seeing? What tap are you using?
Many taps also support incremental replication, so that records that are already synced are excluded in subsequent pipeline runs. But it's possible that your tap does not.
c
Thank you for your answers! @benjamin_maquet Is your tap open source? I'm working on a custom tap to get data from an application we are using. But we only have access to the latest data (i.e. the lastModifiedDate for a particular id had changed) @douwe_maan Do you know where I can find documentation about the properties to be treated as primary keys? If needed I can share the git on which my tap is
d
@charley_guillaume See
key_properties
under https://hub.meltano.com/singer/spec#schemas. It's part of the
SCHEMA
message your tap outputs