Hey team, I'm encountering a peculiar problem wher...
# troubleshooting
n
Hey team, I'm encountering a peculiar problem where state isn't being observed the second time I run the tap-gitlab tap (I'm using this for testing purposes). Example log output:
Copy code
### End of first run ###
target-snowflake | time=2021-11-15 22:40:09 name=target_snowflake level=INFO message=Emitting state {"project_7603319": "2021-11-15T22:26:51.086Z", "project_7603319_issues": "2021-11-15T19:28:49.465Z", "project_7603319_merge_requests": "2021-11-15T22:26:51.889Z", "project_7603319_commits": "2021-11-14T00:00:00Z", "project_7603319_pipelines": "2021-11-15T22:26:56.367Z", "project_8838074": "2021-11-15T16:23:43.704Z", "project_8838074_issues": "2021-11-15T16:28:40.937Z", "project_8838074_merge_requests": "2021-11-15T16:26:03.171Z", "project_8838074_commits": "2021-11-14T00:00:00Z", "project_8838074_pipelines": "2021-11-14T00:00:00Z"}
meltano          | Incremental state has been updated at 2021-11-15 22:40:09.739856.
meltano          | Extract & load complete!
meltano          | Transformation skipped.

### Second run ###
āžœ  meltano-project git:(main) āœ— meltano elt tap-gitlab target-snowflake
meltano          | Running extract & load...
meltano          | No state was found, complete import.
tap-gitlab       | INFO Starting sync
tap-gitlab       | INFO Skipping stream: merge_request_commits
tap-gitlab       | INFO Skipping stream: epics
tap-gitlab       | INFO Skipping stream: epic_issues
tap-gitlab       | INFO Skipping stream: pipelines_extended
tap-gitlab       | INFO GET <https://gitlab.com/api/v4/users>
tap-gitlab       | INFO Skipping request to <https://gitlab.com/api/v4/users>
I'm using the default SQLite database and running on my local machine. Any ideas here?
I can see the state being saved in the job table:
Copy code
{
  "singer_state": {
    "project_7603319": "2021-11-15T22:26:51.086Z",
    "project_7603319_issues": "2021-11-15T19:28:49.465Z",
    "project_7603319_merge_requests": "2021-11-15T22:26:51.889Z",
    "project_7603319_commits": "2021-11-14T00:00:00Z",
    "project_7603319_pipelines": "2021-11-15T22:26:56.367Z",
    "project_8838074": "2021-11-15T16:23:43.704Z",
    "project_8838074_issues": "2021-11-15T16:28:40.937Z",
    "project_8838074_merge_requests": "2021-11-15T16:26:03.171Z",
    "project_8838074_commits": "2021-11-14T00:00:00Z",
    "project_8838074_pipelines": "2021-11-14T00:00:00Z"
  }
}
e
@niall_woodward are you passing a
--job_id=
with the elt command? Meltano needs it in order to retrieve the sync state (https://meltano.com/docs/command-line-interface.html#parameters-2)
n
I am not. Thank you! Is there a reason why Meltano can't work this out for itself? Or have some kind of default behaviour where it generates a default job id from the tap / target pair?
e
Meltano actually generates a job id from the tap and target names plus a timestamp: https://gitlab.com/meltano/meltano/-/blob/29cbbab6d7cb6a83f5561a05ea1d5ba0bddd0fad/src/meltano/cli/elt.py#L112. Douwe here explains a good rationale for not defaulting to using state:
To me, the expectation when running a simple command like
meltano elt <tap> <target>
without any additional arguments is that it would do exactly what the command/arguments suggest and nothing more: it runs a (single one-off full-table) EL(T) pipeline with the tap and target in question. Taking results from previous runs or influencing future runs seems to me like additional non-obvious behavior that should be explicitly requested, in this case by providing a Job ID.
Here's an issue to improve documentation around
job_id
: https://gitlab.com/meltano/meltano/-/issues/2574
n
Thanks for the follow up here @edgar_ramirez_mondragon, I've added a comment to the issue linked.