Hi, doing meltano getting started tutorial ( tap-g...
# getting-started
v
Hi, doing meltano getting started tutorial ( tap-github, target-postgres) and got:
Run invocation could not be completed as block failed: Loader failed.
Running on WSL2 Ubuntu 22.04.01 LTS. Both loader and extractor were installed successfully.
s
Hello @vladimir_krivokapic, I think there is a bit of logging output just above the colored block you posted first, could you post that as well? And also share your meltano.yml project file? That always helps.
v
Hi, thanks for replying. Loading into target-jsonl works w/o any problems. I followed the tutorial step by step. https://docs.meltano.com/getting-started/part2
Some information that might help: Python: 3.10.6 Postgres: 14.5 Pipx: 1.1.0 Meltano: 2.11.0 Docker: 20.10.21
s
Hm so yes, the relevant error is in what you just posted, let me take a look.
v
"AttributeError: module 'collections' has no attribute 'MutableMapping' might be the issue. After a quick google search, downgrading pyhton to 3.9 might solve the issue: https://stackoverflow.com/questions/70943244/attributeerror-module-collections-has-no-attribute-mutablemapping
s
You're too fast for me šŸ˜‰ So yes meltano/meltano:v2.4.0-python3.8 works.
v
Alright, I'll try that and post results here once I'm done! Thank you again! šŸ™‚
s
FWIW, it looks like that's a dependency inside the target-postgres. We're already working on our own version of that target. It looks like our new target removes this dependency (https://github.com/MeltanoLabs/target-postgres, @visch?).
v
Give our postgres version a shot šŸ˜„
v
Update: I uninstalled postgres target that i had and installed your's just by editing meltano.yml file. When I ran 'meltano run' command, I got this error:
'Instance <Job at 0x7fd83886f850> is not bound to a Session;          │ │
│ │            attribute refresh operation cannot proceed (Background on this error at:          │ │
│ │            <https://sqlalche.me/e/14/bhk3)>'
message has been deleted
v
Hmm looks like a target-postgres bug. I"ll try to replicate it I'll give tap-github a quick shot myself and see what happens. Can you give me the settings you have set for tap-github
v
I put my private dummy github repo in the config, so you can just use your's if u got any. I don't think you need mine for this test šŸ˜„ I'm new to docker also, trying to setup a new test project with https://hub.docker.com/r/meltano/meltano/tags?page=1&amp;name=2.4.0-python3.8 image and postgres one, but unsure what to put into the Dockerfile. I got latest meltano installed on my WSL2, but i need 2.4.0 to run the meltano getting started demo. If i put 'COPY . usr/src/app', and then cd into it via WSL2,how do i then use meltano2.4.0 instead of the latest one when i continue with the tutorial? I also sent my 1st try at writing a Dockerfile and docker-compose
v
Instead of pictures text is very helpful. Trying to replicate your select statement rightnow
Copy code
pip_url: tap-github
    config:
      repository: meltano/meltano
      start_date: '2020-01-01'
    select:
      - "!teams.*"
      - "!team_members.*"
      - "!team_memberships.*"
      - "!collaborators.*"
      - "!comments.*"
      - "*.*"
So far no errors yet
Could you post your entire log error file ie
meltano run tap-github target-postgres > out 2>&1
Paste the out file minus sensitive data (Specifically the error info, and Ideally not a screen shot) I'm still waiting on the run on my end to complete
Ok got the actual errror (At least one of them)
Copy code
sqlalchemy.exc.DataError: (psycopg2.errors.NumericValueOutOfRange) integer out of range
It was above where your pictures were at I think I can replicate it as well which is good I thought I fixed this here https://github.com/MeltanoLabs/target-postgres/issues/26 but maybe bigint isn't enough? I'll dive
@vladimir_krivokapic Took the time on this one. The tap you're using isn't meltano's recommended tap from https://hub.meltano.com/extractors/tap-github , which is fine but we hit an issue from the singer tap. The issue we hit is that the
events
stream defines the schema of the field
id
as an
integer
but then provides the data as a decimal ie
1234569999999999.0
which fails for us. I think this is a tap issue more than a target one. We could use a schema override and set id to a string to get around this issue (I believe) for github. The easier solution here was to just switch to the
meltanolabs
tap-github as well
v
Hmm, i was following https://docs.meltano.com/getting-started/part1:
Copy code
meltano add extractor tap-github --variant singer-io
Sorry for the screenshots, will post text from now on. Thank you for helping! I'll try to use meltanolabs tap-github and see how it goes.
Ok I ran: meltano run tap-github target-postgres > out 2>&1 and got the following error:
vladimirk@VLADIMIR:/mnt/d/meltano/first-proj$ cat out 2022-12-14T082349.318285Z [info ] Environment 'dev' is active 2022-12-14T082356.026693Z [warning ] No state was found, complete import. 2022-12-14T082359.018448Z [info ] INFO Sync stream ['commits'] cmd_type=elb consumer=False name=tap-github producer=True stdio=stderr string_id=tap-github 2022-12-14T082359.019473Z [info ] INFO Starting sync of organization: vladimirkys cmd_type=elb consumer=False name=tap-github producer=True stdio=stderr string_id=tap-github 2022-12-14T082359.022151Z [info ] INFO Starting sync of repository: vladimirkys/meltano-first-app cmd_type=elb consumer=False name=tap-github producer=True stdio=stderr string_id=tap-github 2022-12-14T082359.023108Z [info ] INFO Final url is: https://api.github.com/repos/vladimirkys/meltano-first-app/commits?since=2022-01-01 cmd_type=elb consumer=False name=tap-github producer=True stdio=stderr string_id=tap-github 2022-12-14T082400.180253Z [info ] INFO METRIC: {"type": "timer", "metric": "http_request_duration", "value": 1.150611400604248, "tags": {"endpoint": "commits", "http_status_code": 200, "status": "succeeded"}} cmd_type=elb consumer=False name=tap-github producer=True stdio=stderr string_id=tap-github 2022-12-14T082400.183618Z [info ] INFO METRIC: {"type": "counter", "metric": "record_count", "value": 9, "tags": {"endpoint": "commits"}} cmd_type=elb consumer=False name=tap-github producer=True stdio=stderr string_id=tap-github 2022-12-14T082405.563931Z [info ] 2022-12-14 092405,563 Target 'target-postgres' is listening for input from tap. cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082405.565127Z [info ] 2022-12-14 092405,563 Initializing 'target-postgres' target sink... cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082405.565971Z [info ] 2022-12-14 092405,564 Initializing target sink for stream 'commits'... cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.324532Z [info ] 2022-12-14 092406,225 Target 'target-postgres' completed reading 16 lines of input (9 records, (0 batch manifests, 6 state messages). cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.440599Z [info ] 2022-12-14 092406,439 Inserting with SQL: INSERT INTO temp_a2d30526_da59_46bf_bb5e_ac4f9c1a3969 (_sdc_repository, node_id, pr_id, pr_number, id, updated_at, sha, url, parents, files, html_url, comments_url, commit, committer, author, stats) VALUES (:_sdc_repository, :node_id, :pr_id, :pr_number, :id, :updated_at, :sha, :url, :parents, :files, :html_url, :comments_url, :commit, :committer, :author, :stats) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.455218Z [info ] Traceback (most recent call last): cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.455979Z [info ] File "/mnt/d/meltano/first-proj/.meltano/loaders/target-postgres/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1880, in _execute_context cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.473579Z [info ] self.dialect.do_executemany( cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.474313Z [info ] File "/mnt/d/meltano/first-proj/.meltano/loaders/target-postgres/venv/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 982, in do_executemany cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.489795Z [i…
2022-12-14T082406.525107Z [info ] return ctx.invoke(self.callback, **ctx.params) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.525567Z [info ] File "/mnt/d/meltano/first-proj/.meltano/loaders/target-postgres/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.529513Z [info ] return __callback(*args, **kwargs) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.530089Z [info ] File "/mnt/d/meltano/first-proj/.meltano/loaders/target-postgres/venv/lib/python3.10/site-packages/singer_sdk/target_base.py", line 566, in cli cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.539407Z [info ] target.listen(file_input) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.540101Z [info ] File "/mnt/d/meltano/first-proj/.meltano/loaders/target-postgres/venv/lib/python3.10/site-packages/singer_sdk/io_base.py", line 35, in listen cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.548995Z [info ] self._process_endofpipe() cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.550107Z [info ] File "/mnt/d/meltano/first-proj/.meltano/loaders/target-postgres/venv/lib/python3.10/site-packages/singer_sdk/target_base.py", line 282, in _process_endofpipe cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.552099Z [info ] self.drain_all(is_endofpipe=True) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.552617Z [info ] File "/mnt/d/meltano/first-proj/.meltano/loaders/target-postgres/venv/lib/python3.10/site-packages/singer_sdk/target_base.py", line 443, in drain_all cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.554186Z [info ] self._drain_all(list(self._sinks_active.values()), self.max_parallelism) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.554645Z [info ] File "/mnt/d/meltano/first-proj/.meltano/loaders/target-postgres/venv/lib/python3.10/site-packages/singer_sdk/target_base.py", line 469, in _drain_all cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.556096Z [info ] self.drain_one(sink) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.556561Z [info ] File "/mnt/d/meltano/first-proj/.meltano/loaders/target-postgres/venv/lib/python3.10/site-packages/singer_sdk/target_base.py", line 463, in drain_one cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.557958Z [info ] sink.process_batch(draining_status) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.558403Z [info ] File "/mnt/d/meltano/first-proj/.meltano/loaders/target-postgres/venv/lib/python3.10/site-packages/target_postgres/sinks.py", line 58, in process_batch cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.567288Z [info ] self.bulk_insert_records( cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T08240…
2022-12-14T082406.625898Z [info ] HINT: You will need to rewrite or cast the expression. cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.626361Z [info ] cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.626783Z [info ] [SQL: INSERT INTO temp_a2d30526_da59_46bf_bb5e_ac4f9c1a3969 (_sdc_repository, node_id, pr_id, pr_number, id, updated_at, sha, url, parents, files, html_url, comments_url, commit, committer, author, stats) VALUES (%(_sdc_repository)s, %(node_id)s, %(pr_id)s, %(pr_number)s, %(id)s, %(updated_at)s, %(sha)s, %(url)s, %(parents)s::JSONB[], %(files)s::JSONB[], %(html_url)s, %(comments_url)s, %(commit)s, %(committer)s, %(author)s, %(stats)s)] cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2022-12-14T082406.627776Z [info ] (Background on this error at: https://sqlalche.me/e/14/f405) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-14T082406.885181Z [error ] Loader failed ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/vladimirk/.local/pipx/venvs/meltano/lib/python3.10/site-packages/meltano/core/logging/outp │ │ ut_logger.py:201 in redirect_logging │ │ │ │ 198 │ │ │ *ignore_errors, │ │ 199 │ │ ) │ │ 200 │ │ try: │ │ ā± 201 │ │ │ yield │ │ 202 │ │ except ignored_errors: # noqa: WPS329 │ │ 203 │ │ │ raise │ │ 204 │ │ except Exception as err: │ │ │ │ ╭────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ err = RunnerError('Loader failed') │ │ │ │ ignore_errors = () │ │ │ │ ignored_errors = (<class 'KeyboardInterrupt'>, <class 'asyncio.exceptions.CancelledError'>) │ │ │ │ logger = <RootLogger root (INFO)> │ │ │ │ self = <meltano.core.logging.output_logger.Out object at 0x7f48c3d87d90> │ │ │ ╰─────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /home/vladimirk/.local/pipx/venvs/meltano/lib/python3.10/site-packages/meltano/core/block/extrac │ │ t_load.py:461 in run │ │ │ │ 458 │ │ │ # TODO: legacy
meltano elt
style logging should be deprecated │ │ 459 │ │ │ legacy_log_handler = self.output_logger.out("meltano", logger) │ │ 460 │ │ │ with legacy_log_handler.redirect_logging(): │ │ ā± 461 │ │ │ │ await self.run_with_job() │ │ 462 │ │ │ │ return │ │ 463 │ │ else: │ │ 464 │ │ │ logger.warning( …
New extractor setup:
Copy code
extractors:
  - name: tap-github
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-github.git>
    config:
      repositories: [ myrepo ]
      start_date: '2022-01-01'
Running meltano run tap-github target-postgres now gives Extractor error
Full output is long, ill post some lines that i think may help
2022-12-14T091712.442948Z [info ] raise FatalAPIError(msg) cmd_type=elb consumer=False name=tap-github producer=True stdio=stderr string_id=tap-github 2022-12-14T091712.443450Z [info ] singer_sdk.exceptions.FatalAPIError: 404 Client Error: b'Not Found' (Reason: Not Found) for path: /vladimirkys/meltano-first-app/network/dependents cmd_type=elb consumer=False name=tap-github producer=True stdio=stderr string_id=tap-github
message has been deleted
Running meltano run tap-github target-jsonl yields the same error, but i got the data in .jsonl files: assisngees.jsonl collaborators.jsonl commits.jsonl contributors.jsonl
2022-12-14T093332.288979Z [info ] raise FatalAPIError(msg) cmd_type=elb consumer=False name=tap-github producer=True stdio=stderr string_id=tap-github 2022-12-14T093332.289563Z [info ] singer_sdk.exceptions.FatalAPIError: 404 Client Error: b'Not Found' (Reason: Not Found) for path:
v
@vladimir_krivokapic
https://meltano.slack.com/archives/CMN8HELB0/p1671005988831139?thread_ts=1670842341.023229&amp;cid=CMN8HELB0
Yeah the tutorial doesn't' pull in all of the streams. The specific stream we're having an issue with here is the
events
stream. Which is why you're seeing an issue with it that we don't have locally. Yes it is an issue but it is isolated to just that stream (I believe) which you could deselect
psycopg2.errors.DatatypeMismatch: column "parents" is of type jsonb but expression is of type jsonb[]
From the long log at https://meltano.slack.com/archives/CMN8HELB0/p1671006909708769?thread_ts=1670842341.023229&amp;cid=CMN8HELB0 It looks like another error I'm not sure if that's a tap/target issue without looking at the data. It really should work. I'll have to double check I have the stream correct. @vladimir_krivokapic if you're using the meltano labs target could you run
meltano install --clean
to get the latest version for me and run it again. Then post the Log again. I just updated the target to show the stream name in the temp table to make life a bit easier for us while debugging.
404 not found
I think the config you have needs to have the repo name in quotes so
Copy code
extractors:
  - name: tap-github
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-github.git>
    config:
      repositories: ["myrepo"]
      start_date: '2022-01-01'
v
Update: Reinstalled both loader and extractor with meltano install --clean and got better error message this time. Still extractor error, this seems like its the problem on my end. @visch
Copy code
2022-12-16T07:27:32.226398Z [info     ]     raise RetriableAPIError(msg, response) cmd_type=elb consumer=False name=tap-github producer=True stdio=stderr string_id=tap-github
2022-12-16T07:27:32.226954Z [info     ] singer_sdk.exceptions.RetriableAPIError: 401 Client Error: b'{"message":"This endpoint requires you to be authenticated.","documentation_url":"<https://docs.github.com/graphql/guides/forming-calls-with-graphql#authenticating-with-graphql>"}' (Reason: Unauthorized) for path: /graphql cmd_type=elb consumer=False name=tap-github producer=True stdio=stderr string_id=tap-github
meltano.yml:
Copy code
plugins:
  extractors:
  - name: tap-github
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-github.git>
    config:
      access_token: mytoken
      repositories: ["myrepo"]
      start_date: '2022-01-01'
  loaders:
  - name: target-jsonl
    variant: andyh1203
    pip_url: target-jsonl
  - name: target-postgres
    variant: default
    pip_url: git+<https://github.com/MeltanoLabs/target-postgres.git>
    config:
      user: meltano
      password: password
      host: localhost
      database: postgres
v
Note that you can limit your select's to only be on streams you want right now which may or may not requires the graphql stuff
v
I'll try with commits.sha and commits.url selected only and see how it goes. Btw, i tried the original getting started taps and targets with python 3.9.16 and it worked w/o any errors.