Hi, In my pipeline, I utilize tap-mysql to extract...
# singer-targets
a
Hi, In my pipeline, I utilize tap-mysql to extract data and target-postgres to load it into a PostgreSQL database. I observed that tap-mysql processed 120,404 records, while target-postgres processed only 86,365 records. Tap-mysql employs a log-based replication method. What could be the reason for the lower number of records processed by target-postgres?
v
I'd question every assumption here, 120,404 records do we know that's true? What exactly are the records? Same with target-postgres. Look at the records themselves, are you missing anything? Could it be duplicate keys? Could it be a filter on tap-mysql?
a
There is no filter at all. I verified logs... It is saying tap-mysql processed 120,404 records. For target-postgres, I did sum of records processed per tables. I made script to do that. I even did manually. There is no any miss.
v
something's filtered is my point
meltano invoke tap-mysql > out
is what I'd do and count the records myself, then compare that with the count the logs give me. Then run that out file with target-postgres and compare the count
a
Thank you @visch I will try
np 1
a
I saw similar behavior. I extract 11579 entries with another tap and want to write them to the PostgreSQL database, but only 5988 are effectively written (see picture as proof). I can't explain this. Target-postgres from "transferwise" If something is filtered, I want to know that and it should appear in the log entries. @Anita Bajariya: Were you able to find something or do you have a solution?
e
Is that number 5988 what you get at the end of the whole sync?
a
Yes, only 5988 from total 11579 entries. What I have already found, it only happens with those that have a UUID as primary key. Could there be a problem (at least for me)?
e
Hmm that's odd
If you use a different target (eg target-jsonl) do you see the entire set of rows?
💯 1
a
When I create a JSONL file, they are all included, so they are correct. However, I had to change the UUIDs to string. Would possibly work without, but didn't know how.
After a more detailed analysis, I realized that I have duplicate entries. I am still looking for where these originate from or why this is the case. Thank you for your help, and sorry for the effort involved.
👀 1
e
Oh, so some records have the same primary key or are just entirely duplicate? That'd explain it.
a
They have the same primary key. I suspect something is wrong with the extractor, or the REST API from which the data is obtained. A paging mechanism is being used, maybe there is something wrong with the data retrieval.
e
Gotcha. It's tap-bexio, and it's not your own custom tap?