Hey everyone, how to set primary key properties in...
# troubleshooting
r
Hey everyone, how to set primary key properties in a stream (transferwise-tap-postgres to transferwise-target-postgres). I am facing the following issue
Primary key is set to mandatory but not defined in the XXXX stream. Exception: key_properties field is required
.
c
You can set it in the catalog.json file but also when defining the schema of your stream: https://hub.meltano.com/singer/spec#schemas
d
@revanth_peddi tap-postgres should detect primary keys automatically. Does the table you’re pulling from have a primary key set?
r
The primary_key isn’t
id
, it’s some custom column name
d
So the source tables all have primary keys? They don’t have to be
id
. What is the type of the primary key?
r
it something like schemaid and it’s data type is int
What if I want to use combination of two columns as my primary key?
I was trying
meltano config tap-postgres-XXXX set _metadata public-XXXX table-key-properties '["XXXX", "XXXX"]'
d
I was just looking into that as well 🙂
Can you share what the result of setting that looks like in
meltano.yml
, so that I can verify it looks correct?
r
Copy code
- name: tap-postgres-xxxxx
    inherit_from: tap-postgres
    model_name:
    config:
      port:
      user:
      dbname:
      default_replication_method: LOG_BASED
      host:
      ssl:
      streams:
      - name: stream_xxxxx
        table: public-xxxxx
        datetime_key: updatedat
    select:
    - public-xxxxx
    metadata:
      public-xxxxx:
        replication-key: updatedat
        table-key-properties:
        - schemaid
d
Ok that looks like it should work
Let me know if it does 🙂
r
It isn’t working
d
Still the same error? Do the
table-key-properties
show up in
meltano elt … --dump=catalog
as expected?
r
No
Copy code
"breadcrumb": [],
          "metadata": {
            "table-key-properties": [],
d
Weird. You’re not providing an explicit
catalog: path
in
meltano.yml
, are you?
I see that
tap-postgres-xxxxx
inherits from
tap-postgres
, is that defined separately?
r
No, I am not providing any catalog path
yes, it’s defined seperately
d
And
public-xxxxx
matches the
tap_stream_id
for the stream in question?
r
yes tap_stream_id and
public-xxxxx
matches
d
Is the
select: [public-xxxxx]
resulting in the
selected:true
metadata being set correctly on that stream, but not any others?
I don’t know why
select
would work but
metadata
wouldn’t 😕
r
Now, i see the table-key-properties in catalog file but the error is persistant
Copy code
[2021-07-15 17:02:25,034] {subprocess.py:79} INFO - target-postgres-staging | time=2021-07-15 17:02:25 name=target_postgres level=CRITICAL message=Primary key is set to mandatory but not defined in the [public-XXXXX] stream
[2021-07-15 17:02:25,036] {subprocess.py:79} INFO - target-postgres-staging | Traceback (most recent call last):
[2021-07-15 17:02:25,037] {subprocess.py:79} INFO - target-postgres-staging |   File "/project/.meltano/loaders/target-postgres-staging/venv/bin/target-postgres", line 8, in <module>
[2021-07-15 17:02:25,037] {subprocess.py:79} INFO - target-postgres-staging |     sys.exit(main())
[2021-07-15 17:02:25,038] {subprocess.py:79} INFO - target-postgres-staging |   File "/project/.meltano/loaders/target-postgres-staging/venv/lib/python3.6/site-packages/target_postgres/__init__.py", line 373, in main
[2021-07-15 17:02:25,039] {subprocess.py:79} INFO - target-postgres-staging |     persist_lines(config, singer_messages)
[2021-07-15 17:02:25,039] {subprocess.py:79} INFO - target-postgres-staging |   File "/project/.meltano/loaders/target-postgres-staging/venv/lib/python3.6/site-packages/target_postgres/__init__.py", line 209, in persist_lines
[2021-07-15 17:02:25,039] {subprocess.py:79} INFO - target-postgres-staging |     raise Exception("key_properties field is required")
[2021-07-15 17:02:25,039] {subprocess.py:79} INFO - target-postgres-staging | Exception: key_properties field is required
[2021-07-15 17:02:25,102] {subprocess.py:79} INFO - meltano                 | Loading failed (1): Exception: key_properties field is required
d
That’s very weird. Can you run with
meltano --log-level=debug elt …
and look for the line that starts with
tap-postgres-xxxxx (out)
that writes the
SCHEMA
message that triggers that error in the target? I want to see if
key_properties
is actually missing/wrong
r
tap-postgres-XXXXX (out) | {“type”: “STATE”, “value”: {“currently_syncing”: “fund-msfund”, “bookmarks”: {“public-XXXXXX”: {“last_replication_method”: “LOG_BASED”, “lsn”: 5688452019168, “version”: 1626261638498, “xmin”: null}, “public-XXXXXX”: {“last_replication_method”: “LOG_BASED”, “lsn”: 5688452019168, “version”: 1626261649757, “xmin”: null}, “public-XXXXXX”: {“last_replication_method”: “FULL_TABLE”, “version”: 1626346077584, “xmin”: null}, “public-XXXXXX”: {“last_replication_method”: “LOG_BASED”, “lsn”: 5688533226336, “version”: 1626370711148}}}}
d
Do you see a
SCHEMA
?
r
tap-postgres-XXXXX (out) | {“type”: “SCHEMA”, ...... ,“key_properties”: [], “bookmark_properties”: []}
here it’s blank
but in catalog I see it as
Copy code
{
      "table_name": "XXXXX",
      "stream": "XXXXX",
      "metadata": [
        {
          "breadcrumb": [],
          "metadata": {
            "table-key-properties": [
              "schemaid"
            ],
I think I know what’s going on, @edward_smith ran into the same thing recently: https://github.com/transferwise/pipelinewise-tap-postgres/issues/101
If you point your
pip_url
for
tap-postgres
at his PR branch, it should work as expected: https://github.com/transferwise/pipelinewise-tap-postgres/pull/102
I feel a bit dumb for not having made the connection until now, since he and I were talking about the exact same issue a few weeks ago!
r
I have pointed it to PR branch, but stream is skipping the table
Copy code
There are no columns selected for stream public-XXXXX, , skipping it
d
That’s odd. @edward_smith Any idea what may be going on here?
e
Hi @revanth_peddi, I've been experiencing this same issue, and actually plan to spend today working on it. The branch @douwe_maan pointed you two actually fixes a separate problem I was facing that deals with whether or not fields are marked as
selected
. So, two things I'd like to work through: (1) The problem of overriding the primary key of a postgres table to that the target understands how to eliminate duplicates when inserting. Is this your use case? (2) Why are you getting no selected columns when using my branch? The default behavior of pipelinewise-tap-postgres seems to be select all tables, regardless of the
selected
property in the catalog. My branch ensures that non-discoverable metadata is retained in the catalog after discovery. Can you double check your meltano.yml for me? Above you have:
Copy code
select:
    - public-xxxxx
But I think you need this to be:
Copy code
select:
    - public-xxxxx.*
s
hey @douwe_maan @edward_smith thanks for all the help! I'm working with @revanth_peddi on this issue and have a failing test that might help debug the issue. I think the problem is with
key-properties
metadata attribute containing a list, rather than a literal value. What do you think?
e
This is good work! I have thought that the combination key may also be a problem. When this fails, what is the value of
table-key-properties
?
s
It is empty
[]
, because its picking it up from the discovery metadata.
e
Are you setting an override value in
meltano.yml
?
s
I'm setting the override value in catalog.json and passing the catalog using the
--catalog
argument like this
meltano invoke tap-postgres --catalog catalog.json