hey all! I'm a beginner, I've tried to search onli...
# plugins-general
d
hey all! I'm a beginner, I've tried to search online/on this Slack without success. I'm curious if there's a way to set a "bookmark" on
tap-github
(MeltanoLabs version) to avoid extracting all commits since the last extraction, but only the new ones. I've seen the
start_date
option, but that seems tedious, ie. I'd need to pass a variable. Is that possible, or is the philosophy to extract all and deal with deduplication using the replication methods in the target? thanks!
I'm asking this for the GitHub extractor; I'm also curious about other data sources. There aren't many commits so retrieving all wouldn't be a problem, but how would I deal with an event stream for example.
e
Hi @david_baussart šŸ‘‹šŸ¼ How are you running the tap? If you're using
meltano run
, the state file will automatically be managed for you, so you'll get only new records.
d
hi Edgar! Thanks so much, I do use
meltano run
locally
Copy code
meltano run tap-github target-jsonl
When running this, I read:
Copy code
2023-10-09T09:16:59.112869Z [info     ] Incremental state has been updated at 2023-10-09 09:16:59.112798.
2023-10-09T09:16:59.117692Z [info     ] Block run completed.           block_type=ExtractLoadBlocks err=None set_number=0 success=True
But it doesn't seem to be creating/writing to any file on the repo šŸ¤” When I re-run, it's displaying the same message, and fetching all rows once more. –––– I created
extract/tap-github-state.json
and added to
meltano.yml
the following
Copy code
[...]
plugins:
  extractors:
  - name: tap-github
    state: extract/tap-github.state.json
    variant: meltanolabs
    select:
    - events.id
[...]
It's still syncing the whole event history, including events that were created before the
last_record
bookmark.
Copy code
// tap-github.state.json
{
    "bookmarks": {
      "events": {
        "last_record": "2023-09-17T14:48:24Z"
      }
    }
  }
I must be doing something wrong, I can't find where the state file is created by default on a local project. I am looking in the right direction here?
e
Thanks for the detailed explanation David! So, the default state backend is the system database, which would be in
.meltano/meltano.db
from your project root. You could run a query like
Copy code
$ sqlite3 -markdown .meltano/meltano.db "select id, job_name, substr(payload, 1, 100) from runs where
to confirm that state is being saved. If you want the state file to be created in your repo, you'll need to configure it as shown in the docs link you shared.