I was trying to get tap-gitlab working last night....
# plugins-general
b
I was trying to get tap-gitlab working last night. When running a pipeline through the GUI, it worked the first time but wouldn’t work any time after that. I was able to get it working from the CLI be adding a --full-refresh options
meltano elt tap-gitlab target-snowflake --transform=skip --job_id=gitlab-to-snowflake --full-refresh
Is this expected behavior for tap gitlab? Error:
Copy code
tap-gitlab | CRITICAL can't compare offset-naive and offset-aware datetimes
tap-gitlab | Traceback (most recent call last):
tap-gitlab | File "/project/.meltano/extractors/tap-gitlab/venv/bin/tap-gitlab", line 8, in <module>
tap-gitlab | sys.exit(main())
tap-gitlab | File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/tap_gitlab/__init__.py", line 714, in main
tap-gitlab | raise exc
tap-gitlab | File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/tap_gitlab/__init__.py", line 711, in main
tap-gitlab | main_impl()
tap-gitlab | File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/tap_gitlab/__init__.py", line 706, in main_impl
tap-gitlab | do_sync()
tap-gitlab | File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/tap_gitlab/__init__.py", line 666, in do_sync
tap-gitlab | sync_group(gid, pids)
tap-gitlab | File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/tap_gitlab/__init__.py", line 547, in sync_group
tap-gitlab | sync_project(project['id'])
tap-gitlab | File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/tap_gitlab/__init__.py", line 627, in sync_project
tap-gitlab | if project['last_activity_at'] >= get_start(state_key):
tap-gitlab | File "/project/.meltano/extractors/tap-gitlab/venv/lib/python3.6/site-packages/tap_gitlab/__init__.py", line 172, in get_start
tap-gitlab | if entity not in STATE or parse_datetime(STATE[entity]) < parse_datetime(CONFIG['start_date']):
tap-gitlab | TypeError: can't compare offset-naive and offset-aware datetimes
meltano | Extraction failed (1): TypeError: can't compare offset-naive and offset-aware datetimes
d
Failing with an error is definitely not expected behavior 😄 Looks like a bug triggered by incompatible timestamps in the
start_date
setting and the stored incremental replication state... Can you share the value of
start_date
(as in
meltano config tap-gitlab
) and the state payload (
meltano schedule run gitlab-to-snowflake --dump=state
)?
b
I’m a bit confuse about the start date…. From meltano.yml
Copy code
- name: tap-gitlab
    variant: meltano
    pip_url: git+<https://gitlab.com/meltano/tap-gitlab.git>
    config:
      fetch_merge_request_commits: true
      groups: halosight halosight/analytics
      start_date: '2019-07-01'
      ultimate_license: true
But when I try to access via command line, I get the following:
Copy code
root@106a7c474d38:/project# meltano config tap_gitlab start_date
Usage: meltano config [OPTIONS] PLUGIN_NAME COMMAND [ARGS]...
Try 'meltano config --help' for help.

Error: No such command 'start_date'.
d
Looks like tap-gitlab also needs start_date to be a full timestamp 😅
And a
meltano config
subcommand to view a setting doesn't currently exist, you can only get the whole thing from
meltano config tap-gitlab
b
Copy code
root@106a7c474d38:/project# meltano schedule run gitlab-to-snowflake --dump=state
[2020-12-21 21:43:12,745] [206|MainThread|meltano.core.plugin.singer.tap] [INFO] Found state from 2020-12-21 03:40:53.922900.
{
  "project_13923103": "2019-12-09T22:56:05.992000Z",
  "project_13923103_issues": "2019-12-09T22:56:05.982Z",
  "project_13923103_jobs": "2019-07-01T00:00:00.000000Z",
  "project_13923103_merge_requests": "2019-07-01T00:00:00",
  "project_13923103_commits": "2019-08-21T09:50:05.000-06:00",
  "project_13923103_pipelines": "2019-07-01T00:00:00",
  "group_5855163_epics": "2020-04-14T14:33:49.652Z",
  "project_23206398": "2020-12-20T14:52:54.108000Z",
  "project_23206398_issues": "2019-07-01T00:00:00",
  "project_23206398_jobs": "2019-07-01T00:00:00.000000Z",
  "project_23206398_merge_requests": "2019-07-01T00:00:00",
  "project_23206398_commits": "2020-12-19T10:45:41.000-07:00",
  "project_23206398_pipelines": "2019-07-01T00:00:00",
  "project_23179509": "2020-12-18T04:53:09.630000Z",
  "project_23179509_issues": "2019-07-01T00:00:00",
  "project_23179509_jobs": "2019-07-01T00:00:00.000000Z",
  "project_23179509_merge_requests": "2020-12-18T05:10:46.564Z",
  "project_23179509_commits": "2020-12-18T04:55:49.000+00:00",
  "project_23179509_pipelines": "2019-07-01T00:00:00",
  "group_10436463_epics": "2019-07-01T00:00:00"
}
Copy code
root@106a7c474d38:/project# meltano config tap-gitlab
{
  "api_url": "<https://gitlab.com>",
  "private_token": "xxx",
  "groups": "halosight halosight/analytics",
  "projects": "",
  "ultimate_license": true,
  "fetch_merge_request_commits": true,
  "fetch_pipelines_extended": false,
  "start_date": "2019-07-01"
}
d
Interesting, the "can't compare offset-naive and offset-aware datetimes" error message suggests that it's tripping over the inconsistent timezones
@bryan_wise Can you try setting
start_date
to
'2019-07-01T00:00:00Z'
, running with
--full-refresh
again, and then running without
--full-refresh
to see if state now works?
b
will do
d
I wonder if setting
Z
on
start_date
would fix things, but we need to run with
--full-refresh
once to wipe out those
"group_10436463_epics": "2019-07-01T00:00:00"
entries in the state file that copied the
start_date
but should get a timezone. The other timestamps in the state file already have a TZ, coming from the server.
Either way this is a bug and needs an issue
But that workaround may help you get up and running
b
That seems to be working. Second pass is currently running.
d
All right. Could can you dump the state another time to verify that the TZs are now all set? You can do that in parallel in a different shell if you like
b
Copy code
Found state from 2020-12-21 21:54:24.461460.
{
  "project_13923103": "2019-12-09T22:56:05.992000Z",
  "project_13923103_issues": "2019-12-09T22:56:05.982Z",
  "project_13923103_jobs": "2019-07-01T00:00:00Z",
  "project_13923103_merge_requests": "2019-07-01T00:00:00Z",
  "project_13923103_commits": "2019-08-21T09:50:05.000-06:00",
  "project_13923103_pipelines": "2019-07-01T00:00:00Z",
  "group_5855163_epics": "2020-04-14T14:33:49.652Z",
  "project_23206398": "2020-12-21T03:43:54.981000Z",
  "project_23206398_issues": "2019-07-01T00:00:00Z",
  "project_23206398_jobs": "2019-07-01T00:00:00Z",
  "project_23206398_merge_requests": "2019-07-01T00:00:00Z",
  "project_23206398_commits": "2020-12-19T10:45:41.000-07:00",
  "project_23206398_pipelines": "2019-07-01T00:00:00Z",
  "project_23179509": "2020-12-18T04:53:09.630000Z",
  "project_23179509_issues": "2019-07-01T00:00:00Z",
  "project_23179509_jobs": "2019-07-01T00:00:00Z",
  "project_23179509_merge_requests": "2020-12-18T05:10:46.564Z",
  "project_23179509_commits": "2020-12-18T04:55:49.000+00:00",
  "project_23179509_pipelines": "2019-07-01T00:00:00Z",
  "group_10436463_epics": "2019-07-01T00:00:00Z"
}
d
That looks more like it!
b
Thanks for your help getting that working. Need me to log anything on this?
d
Yes please! YYYY-MM-DD start_dates should either be supported fully, or explicitly raise an error saying they're not, not the odd seemingly unrelated error we're seeing here. So I think we can handle the TZ mismatch better and continue to support YYYY-MM-DD
So if you could file an issue about YYYY-MM-DD not being supported because of a timezone mismatch, with relevant quotes/links to this thread, that'd be great 🙂
b
Will do. Jumping to another meeting, but I’ll get that logged later today.
d
Perfect, thanks!
d