david_baussart
10/07/2023, 11:31 AMtap-github
(MeltanoLabs version) to avoid extracting all commits since the last extraction, but only the new ones. I've seen the start_date
option, but that seems tedious, ie. I'd need to pass a variable.
Is that possible, or is the philosophy to extract all and deal with deduplication using the replication methods in the target?
thanks!david_baussart
10/07/2023, 11:35 AMedgar_ramirez_mondragon
10/07/2023, 2:37 PMmeltano run
, the state file will automatically be managed for you, so you'll get only new records.edgar_ramirez_mondragon
10/07/2023, 2:38 PMdavid_baussart
10/09/2023, 9:30 AMmeltano run
locally
meltano run tap-github target-jsonl
When running this, I read:
2023-10-09T09:16:59.112869Z [info ] Incremental state has been updated at 2023-10-09 09:16:59.112798.
2023-10-09T09:16:59.117692Z [info ] Block run completed. block_type=ExtractLoadBlocks err=None set_number=0 success=True
But it doesn't seem to be creating/writing to any file on the repo š¤ When I re-run, it's displaying the same message, and fetching all rows once more.
āāāā
I created extract/tap-github-state.json
and added to meltano.yml
the following
[...]
plugins:
extractors:
- name: tap-github
state: extract/tap-github.state.json
variant: meltanolabs
select:
- events.id
[...]
It's still syncing the whole event history, including events that were created before the last_record
bookmark.
// tap-github.state.json
{
"bookmarks": {
"events": {
"last_record": "2023-09-17T14:48:24Z"
}
}
}
I must be doing something wrong, I can't find where the state file is created by default on a local project. I am looking in the right direction here?edgar_ramirez_mondragon
10/09/2023, 2:31 PM.meltano/meltano.db
from your project root. You could run a query like
$ sqlite3 -markdown .meltano/meltano.db "select id, job_name, substr(payload, 1, 100) from runs where
to confirm that state is being saved.
If you want the state file to be created in your repo, you'll need to configure it as shown in the docs link you shared.