brainy-appointment-14137
01/18/2021, 6:24 AMextractors:
- name: tap-csv
variant: meltano
pip_url: git+<https://gitlab.com/meltano/tap-csv.git>
config:
files:
- entity: titanic
file: train.csv
keys: ["survived", "sex", "age", "n_siblings_spouses", "parch", "fare", "class", "deck", "embark_town", "alone"]
- entity: abalone
file: abalone_train.csv
keys: ["Length", "Diameter", "Height", "Whole weight", "Shucked weight", "Viscera weight", "Shell weight", "Age"]
loaders:
- name: target-postgres
variant: meltano
pip_url: git+<https://github.com/meltano/target-postgres.git>
and these are the links to download them:
titanic and abalone
I set up a local Postgres server
the abalone file synced correctly to Postgres (3320 rows)
but the titanic table in Postgres had only 558 rows while the CSV file has 628 🤔
this is the command I ran :
meltano elt tap-csv target-postgres --transform=skip --job_id=csv-postgres
am I doing something wrong ?ripe-musician-59933
01/18/2021, 5:06 PMmeltano invoke tap-csv
locally with your train.csv
, I get 627 records as expected (since the first line in the CSV just has headers), so the issue isn't in tap-csvbrainy-appointment-14137
01/18/2021, 8:13 PMripe-musician-59933
01/18/2021, 8:45 PMtransferwise
variant of target-postgres, while you were using the meltano
variant in the snippet you posted before.
Are you seeing this same incomplete sync issue with both variants?meltano
variant specifically, or if it's instead Postgres-specifickeys
as the compound primary key, meaning that they are treated as together uniquely identifying the record. If multiple records in your CSV have the exact same set of values, they would only be stored in Postgres once, because the second record would be interpreted as a duplicate of the first record$ meltano invoke tap-csv > train.jsonl
$ cat train.jsonl | grep RECORD | wc -l
627
$ cat train.jsonl | grep RECORD | uniq | wc -l
624
keys: []
brainy-appointment-14137
01/18/2021, 9:36 PMripe-musician-59933
01/18/2021, 9:46 PM