Hi everyone, I’m new to meltano and made my first ...
# troubleshooting
t
Hi everyone, I’m new to meltano and made my first
tap-gitlab
to
tap-jsonl
pipeline work, but I’m now having trouble with
target-postgres
. The connection is up, but I’m getting
Loader Failed
. The traceback seems to be hold pretty useless information, but in the log before that, I stumbled over the following lines:
Copy code
2023-01-15T07:04:47.084980Z [info     ]   File "/workspaces/engineering-metrics/meltano/engineering-metrics/.meltano/loaders/target-postgres/venv/lib/python3.10/site-packages/singer_sdk/connectors/sql.py", line 668, in prepare_table cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-15T07:04:47.087739Z [info     ]     self.create_empty_table(   cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-15T07:04:47.088341Z [info     ]   File "/workspaces/engineering-metrics/meltano/engineering-metrics/.meltano/loaders/target-postgres/venv/lib/python3.10/site-packages/target_postgres/connector.py", line 140, in create_empty_table cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-15T07:04:47.089901Z [info     ]     self.to_sql_type(property_jsonschema), cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-15T07:04:47.090177Z [info     ]   File "/workspaces/engineering-metrics/meltano/engineering-metrics/.meltano/loaders/target-postgres/venv/lib/python3.10/site-packages/target_postgres/connector.py", line 89, in to_sql_type cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-15T07:04:47.098893Z [info     ]     if "integer" in jsonschema_type["type"]: cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-15T07:04:47.099580Z [info     ] KeyError: 'type'               cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-15T07:04:47.215360Z [error    ] Loader failed
It looks to me as if the loader is trying to create a new table, but doesn’t pass a proper schema, which makes the type lookup fail
KeyError: 'type'
Here is my
meltano.yml
for what its worth:
Copy code
version: 1
default_environment: dev
project_id: 3d9554d1-42dd-42d0-b2e1-914ca4237494
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-gitlab
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-gitlab.git>
    config:
      api_url: <https://gitlab.company.com>
      groups: group1 group2
      start_date: '2022-12-01'
    select:
    - commits.url
    - commits.created_at
    - commits.sha
    - commits.author_name
    - commits.authored_date
    - commits.message
  loaders:
  - name: target-jsonl
    variant: andyh1203
    pip_url: target-jsonl
  - name: target-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/target-postgres.git>
    config:
      user: meltano
      database: postgres
      add_metadata_columns: 'True'
      host: 192.168.123.45
I got one step further by running the job with debug flags on and found this:
Copy code
{
  "type": "SCHEMA",
  "stream": "commits",
  "schema": {
    "properties": {
      "id": {
        "type": [
          "null",
          "string"
        ]
      },
      "project_id": {
        "type": [
          "null",
          "integer"
        ]
      },
      "short_id": {
        "type": [
          "null",
          "string"
        ]
      },
      "title": {
        "type": [
          "null",
          "string"
        ]
      },
      "author_name": {
        "type": [
          "null",
          "string"
        ]
      },
      "author_email": {
        "type": [
          "null",
          "string"
        ]
      },
      "authored_date": {
        "anyOf": [
          {
            "type": "string",
            "format": "date-time"
          },
          {
            "type": "null"
          }
        ]
      },
      "committer_name": {
        "type": [
          "null",
          "string"
        ]
      },
      "committer_email": {
        "type": [
          "null",
          "string"
        ]
      },
      "committed_date": {
        "anyOf": [
          {
            "type": "string",
            "format": "date-time"
          },
          {
            "type": "null"
          }
        ]
      },
      "created_at": {
        "anyOf": [
          {
            "type": "string",
            "format": "date-time"
          },
          {
            "type": "null"
          }
        ]
      },
      "message": {
        "type": [
          "null",
          "string"
        ]
      },
      "allow_failure": {
        "type": [
          "null",
          "boolean"
        ]
      },
      "parent_ids": {
        "anyOf": [
          {
            "type": "array",
            "items": {
              "type": [
                "null",
                "string"
              ]
            }
          },
          {
            "type": "null"
          }
        ]
      },
      "stats": {
        "properties": {
          "additions": {
            "type": [
              "null",
              "integer"
            ]
          },
          "deletions": {
            "type": [
              "null",
              "integer"
            ]
          },
          "total": {
            "type": [
              "null",
              "integer"
            ]
          }
        },
        "type": "object"
      }
    },
    "type": "object"
  },
  "key_properties": [
    "id"
  ]
}
the
"type": ["null", …
seems off
slowly making progress here: the
null
is probably not the issue, since the error complains about
type
not being a key in the dict. After scanning through the above, I found that some `type`s are nested below a
anyOf
, which probably is an issue with the implementation
I just found this
JSON Schema combinations such as anyOf and oneOf are not supported.
on https://github.com/datamill-co/target-postgres
so now the obvious follow-up questions: Are tap-gitlab and target-postgres incompatible? Can I change that schema somehow?
@visch ok, I switched over to this thread.
Note that to work around this you can use the schema extra in meltano see https://docs.meltano.com/concepts/plugins#schema-extra
Or just not select those few fields that have the issue 🤷
t
the schema extra sounds pretty much what I need!
thanks for the help!
v
Thank you for the detail, this should "just work" as well and shouldn't be an issue you have to deal with 😄
t
What are you aiming to fix in the target-postgres? To make it work with
anyOf
?
v
Exactly!
t
cool
v
I though the* test suite had a test for that but obviously not! (I'm an editing addict today)
t
its even described in the repo that it is not working with
anyOf
, so it sounds to me more like a new feature than a fix
v
From your
meltano.yml
file I think you're using the
meltanolabs
variant, and the repo link you sent is for the
datamill
variant 🤷
t
oh, true
so that was supposed to be working already. Still finding my way around here 😉
v
Well it should be working, there is a
transferwise
variant as well you could try. I'm going to vouch for the
meltanolabs
variant as we've put a bunch of work into it but obviously there's still some bugs left!
t
I’ll stick with the workaround for now and will be looking out for your fix
@visch I made quite some progress (a small data set got loaded successfully), but when I’m adding new streams, I’m not always getting the JSON Schema from the debug/logging info. Is there a way to get that directly via CLI?
Nevermind. I found a way by logging the output of
meltano invoke tap-gitlab