Hi, I got a question. I made a pipeline. I'm using...
# troubleshooting
g
Hi, I got a question. I made a pipeline. I'm using my own made tap to extract and I'm using target-jsonl to load. Every run I do, I give the same --job_id. But the problem is that every run, the same data is being added again to the .json file. Is there a way to only add new records?
s
Same data as in it does a full table sync each time or same data as in it's duplicating entries?
g
@Stéphane Burwash it's duplicating each run. So the first run it fills the file with all the records. The second run it generates the same file, and adds the same records one more time. So you have the same file, with two times the same data. And for example if i run it 6 times, i have a file with 6 times the same data
s
Ok cool. What does your tap look like? And what loader are you using?
g
i made an own tap via cookiecutter to get data from a rest api, and im using target-jsonl to load the data to
t
Feels to me like the tap is not respecting the state data. The only way the target would be receiving all the data each time is if the tap is sending it, after all. Depending on how the tap works though it may also be a case of ">=" picking up rows that have already been processed. If your tap is respecting the state data but the field you're using to identify new rows happens to have the same value for every record and you're doing a ">=" comparison you'll pick up rows that have been processed before (and maybe all of them!)
s
I'm unsure about the duplication (indeed, as @thomas_briggs says, somewhere you don't seem to be managing state) but in the case of only getting new entries, to build upon what was said before, here is an example on how to only load new data that has not been recorded in your state: https://github.com/potloc/tap-hubspot/blob/2710d0e14c708b38257302f217e919143731f7ff/tap_hubspot/client.py#L96-L103
v
Here's a writeup of why this happens with Singer https://github.com/MeltanoLabs/Singer-Working-Group/issues/13 (Design choice, and there's some ideas here for how to add functionality around it)
c
But the problem is that every run, the same data is being added again to the .json file. Is there a way to only add new records?
@g._ozdemir You need to implement a 'replication key' in your custom tap. And after that you can enable 'Incremental' mode in your meltano project configuration.
g
Hi, first of all thanks for the replies. I tried to do it this way but kept getting errors. Later i added the replication-method incremental in my meltano.yml file. But this still doesn't work. You can see the error I keep getting in the 3rd picture. Does anyone know how I can add the replication key in the correct way?
v
Do you see the "KeyError" there?
g
@visch Yes I do see the error. But I don't know what to do to solve this error