https://meltano.com/ logo
#announcements
Title
# announcements
g

gray-cricket-92960

02/02/2021, 1:58 PM
Have a question regarding a sync. I used tap-covid-19 tap and target-csv. I set the start_date of the tap to be 2021-01-01. I was thinking that maybe it would only pull records >= that timestamp. But it pulled the whole dataset. I ran this last night meltano elt tap-covid-19 target-csv --job_id=covid19-to-csv and the same command this morning. The files that resulted were pretty much the same. I would have expected the second run this morning to basically have no data. My thought was that if the job_id remained constant the line in the sand for the first full would have been the filter as the starting point for the second run. After the second run this morning i ran meltano elt tap-covid-19 target-csv --job_id=covid19-to-csv again and this time it told me no new data (which is what I expected this morning to be
1
r

ripe-musician-59933

02/02/2021, 6:35 PM
@gray-cricket-92960 It looks like
tap-covid-19
doesn't use
start_date
and the timestamp stored between runs to only select records created since that point, but rather to only select files that were changed on GitHub since then: https://github.com/singer-io/tap-covid-19/blob/5940583111b1978b0ef252d943f8fb5728bc90e7/tap_covid_19/sync.py#L134-L135 Out of those matching files, it looks like it still imports all records
I think this matches what you're seeing: your first and second runs both selected all records, presumably because the file on GitHub changed in the mean time, while your third run didn't select any records, since the file didn't change.
g

gray-cricket-92960

02/02/2021, 6:37 PM
Thanks for this. Is the pattern those implementation details are always found in the sync file? One more question about the elt pipeline. If i haven’t formally created a pipeline yet does meltano keep track of what i execute the job with (e.g. tap target job_id)?
1
r

ripe-musician-59933

02/02/2021, 6:45 PM
Not all taps use the same file structure, but since handling
start_date
and bookmarks is always the responsibility of the tap itself rather than Meltano, you can usually find it by searching the repo for
start_date
which is what I did here: https://github.com/singer-io/tap-covid-19/search?q=start_date
If i haven’t formally created a pipeline yet does meltano keep track of what i execute the job with (e.g. tap target job_id)?
Yep, the runs end up in the
job
table in the system database with their
job_id
whether. Using
meltano schedule
and the
schedules
list in
meltano.yml
is completely optional, if you're happy constructing and running the appropriate
meltano elt
commands yourself
g

gray-cricket-92960

02/07/2021, 1:49 AM
Thanks for pointing out the system database. I should have read the docs!
👍 1