https://linen.dev logo
#plugins-general
Title
# plugins-general
j

jacob_matson

10/17/2022, 8:36 PM
hey everyone, have a silly question with regard to
tap-spreadsheets-anywhere
&
target-parquet
but it might be more generally applicable. I'm hitting a csv file on the web, where the data inside updates ~once per day. When I execute
meltano run tap-spreadsheets-anywhere target-parquet
for a csv file on the web, it fails to find a "modified date", thus executes a full load, duplicating the entire CSV. This does not occur if it is located in an S3 bucket (or on my local machine, anywhere where meltano can find a modified date). In this case (web sourced CSV), I want to execute an upsert instead of an insert. Is there any way to do this? My current work around is dropping the dataset from the file (adding a dupe check into my transform step), but that is kludgy at best because there is no import date timestamp in the table built by meltano (if there is an import data, I can simply grab the latest record for a given key).