Hi- this is my first question to this community. P...
# troubleshooting
t
Hi- this is my first question to this community. Please redirect me if a better place is available. We’re having issues of using
state.json
to incremental load tap-github. The issues I’m encounterings are: 1. The state file is stored in a timestamped folder so every time when the tap runs, there’s no state found. e.g.
2024-09-13T175856--tap-github--target-jsonl/state.json
2. Even if I provide
--state state.json
to override the default setting, I can see that all history data are being fetched, instead of the net-new
pull-requests
that I’m expecting. Maybe I’m understanding it wrong. Please let me know. Thanks!
v
Can you provide your meltano.yml, and what commands you're running to make your issue come up?
t
Copy code
version: 1
default_environment: dev
project_id: 8d46d142-ee51-42fd-90fd-0f1b3a8bba5d
environments:
- name: dev
- name: staging
- name: prod
state_backend:
  uri: file:///${MELTANO_SYS_DIR_ROOT}/state/
plugins:
  extractors:
  - name: tap-github
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-github.git>
    config:
      # organizations: 
      #   - 'ThirtyMadison'
      repositories:
      - ThirtyMadison/data-eng-admin
    select:
    - pull_requests.*
command I ran: 1.
meltano el tap-github target-jsonl
2.
meltano el tap-github target-jsonl --state state/03/state.json
v
That helps 1. Why not use
meltano run
2. Why are you passing in a state file? Meltano handles state for y ou
t
1. ok I’ve just been using
meltano el
. Let me try the other. 2. the reason is what said in 1) , the state file is stored with a timestamp in it
v
I see now you aren't passing a state-id. iirc the logs have a message about this if you don't provide a state id. Could you post the logs from a run?
meltano run will do it for you 🙂
t
ok using run is the same thing:
v
First run yes, next run I'd think you'd have state?
t
oh I see. Using
meltano run
does change the folder name to something different now.
v
What folder? Can you
ls
or picture?
t
the first 4 are from `meltano el`:
v
zoom out please, tree the diretory. You're looking at the
.meltano
directory so it seems?
t
no, that’s my project root folder
v
tree
your project root please
t
that’s quite lot of things. I can’t send all of them.
image.png
v
ok how about go to your root directory and type
ls -l
t
The 1) issue is now resolved. The 2) issue is- even the second run, it still fetches data that I don’t expect to be net-new. e.g. logs:
Copy code
2024-09-13T19:57:52.565226Z [info     ] 2024-09-13 15:57:52,564 | INFO     | tap-github           | Beginning incremental sync of 'pull_requests' with context: {'org': 'ThirtyMadison', 'repo': 'data-eng-admin', 'repo_id': 396928701}

2024-09-13T19:57:54.325915Z [info     ] 2024-09-13 15:57:54,324 | INFO     | singer_sdk.metrics   | METRIC: {"type": "counter", "metric": "record_count", "value": 100, "tags": {"stream": "pull_requests", "context": {"org": "ThirtyMadison", "repo": "data-eng-admin", "repo_id": 396928701}}} cmd_type=elb consumer=False job_name=dev:tap-github-to-target-jsonl name=tap-github producer=True run_id=7cc53685-c32e-49b5-838c-4af92a6faafc stdio=stderr string_id=tap-github
dancingpenguin 1
v
I'm just confused why
state
is in your main root directory is the only thing I"m left with
But I can let it go 🙂
t
oh because I specified it in `meltano.yml`:
Copy code
state_backend:
  uri: file:///${MELTANO_SYS_DIR_ROOT}/state/
1
v
There we go, I missed that! You could also provide no state_backend and it'd auto use the sqllite DB in
.meltano
t
got it. Yeah I merely just want to see it easierly.
😄 1
👍 1
you can see the logs, even I just ran a first run without state seconds ago. The second time run, it runs incrementally, but still fetches 100 records. Is that expected?
v
Glad we have you working 🙂
I'm not sure I'd have to understand your data more https://sdk.meltano.com/en/v0.40.0/incremental_replication.html probably answers your question
t
Yes, thank you for that! Didn’t know such a big difference between
run
and
el
v
run
just handles the state-id for you
without one you get a new state every time (not what you want!)
t
is that the only difference I should bear in mind?
🤷 1
t
ok fair enough. Thank you!
np 1
e
Hey @Tony Yun 👋, welcome to the community! I'm glad you got things sorted out. I'm curious if you were following any guides or examples of
meltano el
that were missing an explanation of how to enable incremental replication (ie
--state-id
). All fine if you found it, for example, by just exploring the CLI. > The 2) issue is- even the second run, it still fetches data that I don’t expect to be net-new. > That may be because of the At-Least-Once nature of the replication that Meltano extractors use. Happy to answer other questions.