Hello everyone :waves: I am working through the <G...
# getting-started
d
Hello everyone waves I am working through the Getting Started tutorial. Hoping someone here can help me troubleshoot! The
tap-github
is working perfectly! The
target-postgres
appears to be writing data to a temp table, but not creating the schema or writing data to a real table in my database. I have tested the postgres connection credentials with psql, so I know the database is available. Can anyone advise me where I might look next? Thank you! Logs in thread...
This is what makes me think
target-postgres
is only loading into the temp table, it creates an
INSERT
statement for a table with a UUID name...
Copy code
2024-03-27T22:21:20.169428Z [info     ] 2024-03-27 15:21:20,169 | INFO     | target-postgres      | Target 'target-postgres' completed reading 52 lines of input (1 schemas, 10 records, 0 batch manifests, 41 state messages). cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2024-03-27T22:21:20.242439Z [info     ] 2024-03-27 15:21:20,241 | INFO     | target-postgres.commits | Inserting with SQL: INSERT INTO "5b5f7557_4671_47b5_9fb4_2189abb7d079" (node_id, url, sha, commit_timestamp, _sdc_extracted_at, _sdc_received_at, _sdc_batched_at, _sdc_deleted_at, _sdc_sequence, _sdc_table_version, _sdc_sync_started_at) VALUES (:node_id, :url, :sha, :commit_timestamp, :_sdc_extracted_at, :_sdc_received_at, :_sdc_batched_at, :_sdc_deleted_at, :_sdc_sequence, :_sdc_table_version, :_sdc_sync_started_at) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
I did some further testing and found that I can load from
tap-github
into
target-sqlite
tap-smoke-test
works with
target-sqlite
, but not
target-postgres
, which makes me think the trouble is with my
target-postgres
somehow.
e
@visch have you noticed this ☝️ before? I think someone was running into a similar problem in another thread.
v
I haven't seen this and I've been using the latest version of postgres for a while. I can take a peek tomorrow and double check with GitHub. @Desiree Cox could you share your meltano.yml?
❤️ 1
🙏 1
e
Also fwiw
Copy code
2024-03-27T22:21:20.169428Z [info     ] 2024-03-27 15:21:20,169 | INFO     | target-postgres      | Target 'target-postgres' completed reading 52 lines of input (1 schemas, 10 records, 0 batch manifests, 41 state messages). cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
those are way too many state messages 😅, so it's probably worth understanding what's going on
❤️ 1
💯 2
😬 1
d
Wow, y'all are amazing! Thank you for taking a look. Here's my config:
Copy code
version: 1
default_environment: dev
project_id: 32ff6ca8-5729-44ca-99b1-c62dad5a4ad7
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-github
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-github.git>
    config:
      repositories:
      - myCompany/myRepo
      start_date: '2024-03-26T00:00:00'
    select:
    - commits.*
  - name: tap-smoke-test
    variant: meltano
    pip_url: git+<https://github.com/meltano/tap-smoke-test.git>
    config:
      streams: [{stream_name: animals, input_filename:
            <https://gitlab.com/meltano/tap-smoke-test/-/raw/main/demo-data/animals-data.jsonl}>]
  loaders:
  - name: target-jsonl
    variant: andyh1203
    pip_url: target-jsonl
  - name: target-postgres
    variant: meltanolabs
    pip_url: meltanolabs-target-postgres
    config:
      user: meltano
      database: postgres
      host: localhost

  - name: target-sqlite
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/target-sqlite.git>
    config:
      database: targetsqlite
Versions. I did
meltano add target-postgres
twice to be sure, but I'm wondering if I should be doing
pipx
instead.
Copy code
❯ meltano --version
meltano, version 3.3.2

❯ meltano invoke target-postgres --version
target-postgres v[could not be detected], Meltano SDK v0.35.2

❯ meltano invoke tap-github --version
tap-github v1.3.0, Meltano SDK v0.30.0
v
Logs
This works for me, I also get 0 records (because there hasn't been any commits on the repo since 2024-03-26 The big difference is I don't get the temp table log, and I do get a table in postgres that's empty. Maybe run meltano install --clean just to be sure you're on the latest versions of things?
🙌 1
Few things on the tap-github repo @Edgar Ramírez (Arch.dev) State issue Is at #4 1. SDK Version is old - https://github.com/MeltanoLabs/tap-github/issues/248 - Doesn't fix the state issue (PR up) 2. Added another comment to https://github.com/MeltanoLabs/tap-github/issues/222 - Auth tokens error is hard to grok quickly 3. https://github.com/MeltanoLabs/tap-github/issues/249 - Lock file for jsonl not there (PR up) 4. Reason for all the state messages. The
repository
stream is the parent to a bunch of streams (
commits
included) , when https://github.com/meltano/sdk/blob/acb74875542af9f077e6395b17aa7237c2648975/singer_sdk/tap_base.py#L468 (
finalize_state_progres_markers
) is called each
child_stream
gets looped through for the
repository
stream. https://github.com/meltano/sdk/blob/acb74875542af9f077e6395b17aa7237c2648975/singer_sdk/streams/core.py#L1001 so all of the streams that are descendants of repository get their state written even though they aren't selected. Maybe this is a bug in the SDK? We could also check to be sure the stream is selected here https://github.com/meltano/sdk/blob/acb74875542af9f077e6395b17aa7237c2648975/singer_sdk/streams/core.py#L1000 instead of just printing them all, but maybe there's some reason for it (I don't think so?)
d
This was working perfectly, the issue was my end, I had to
set search_path to public, tap_github
when inspecting my postgresdb. Thank you so much for the repro, tips, and investigation!
v
@Desiree Cox can you be more specific here, what did you set exactly? Doesn't make sense to me that you'd have to do that
d
Sure! So I was inspecting my postgres db like this:
Copy code
psql -d postgres -U meltano -W
postgres=# \dt
Did not find any relations.
So I was like, ah, no tables created. But then I thought to look at the schemas and I saw it was indeed creating the
tap_github
schema. So I added that schema to my
search_path
and then
\dt
worked to show the
commits
table.
Copy code
postgres=# \dn
        List of schemas
    Name    |       Owner
------------+-------------------
 public     | pg_database_owner
 tap_github | meltano
(2 rows)

postgres=# set search_path to public, tap_github;
SET
postgres=# \dt
           List of relations
   Schema   |  Name   | Type  |  Owner
------------+---------+-------+---------
 tap_github | commits | table | meltano
(1 row)
I figured it out by switching to the transferwise variant for the postgres target—it was saying stuff like
table 'commits' already created
, so I was like, hm, it's there but I'm not seeing it.
v
Got it you're using
psql
and having an issue looking the the different schemas available! I hop into
psql
once in a while, but have been a fan of dbeaver as I tend to need to see things and a gui is nice 🤷
d
Oh neat! My team uses DataSpell so I was trying there as well, but having similar visibility issues. I'll check out dbeaver, thank you.