Hi all! We have a column type array from postgresq...
# troubleshooting
t
Hi all! We have a column type array from postgresql and trying to sync with big-query. I'm overwriting the schema by using
Copy code
{
  "streams": {
    "public-NAME_OF_THE_TABLE": {
      "force_fields": {
        "NAME_OF_THE_COLUMN": {
          "type": "STRING",
          "mode": "REPEATED"
        }
      }
    }
  }
}
and the schema is:
Copy code
public-NAME_OF_THE_TABLE schema: {
    'type': 'object', 
    'properties': {
        'id': {
            'type': ['integer'], 'minimum': -9223372036854775808, 'maximum': 9223372036854775807
        },
        'NAME_OF_THE_COLUMN': {
            'type': ['null', 'array'], 
            'items': {'$ref': '#/definitions/sdc_recursive_string_array'}
        },
    }
}
Unfortunately the load is still failing with error:
Copy code
CRITICAL `$ref` path "
{
    'type': ['null', 'string', 'array'], 
    'items': {'$ref': '#/definitions/sdc_recursive_string_array'}
}" is recursive
Does anyone have any example on a successful syncs with type arrays using
tap-postgresql
and
target-bigquery
?
e
Hi @TomasB! There’s a few things to unpack. • How are you overwriting the schema of the tap? Are you using the schema extra, or a custom catalog file? • What does the schema of
#/definitions/sdc_recursive_string_array
look like?
t
Hmm I have overwritten only the loader. Should I overwrite the schema of the tap?
e
Hmm I have overwritten only the loader
I don’t understand what that means. Can you describe how you did that?
t
The example is above in the thread
Copy code
{
  "streams": {
    "public-NAME_OF_THE_TABLE": {
      "force_fields": {
        "NAME_OF_THE_COLUMN": {
          "type": "STRING",
          "mode": "REPEATED"
        }
      }
    }
  }
}
e
But that’s the tap schema, right?
To be clear tap = extractor, target = loader
t
e
Oh, I didn’t know that target supported its own schema overrides 🙏.
Ok, so how are you setting that config for the target? In
meltano.yml
?
t
no, a
.json
file in a
target-configs
directory that we have created in the meltano project. The target config overwrite works for some other columns that were coming as
RECORD
and overwrote them to
STRING
but the array one isn't working as I have set it up like above
e
Ok, so if
force_fields
is working for other fields, I’d • Look out for typos in the column name • Dive in the
target-bigquery
venv under
.meltano/loaders/target-bigquery/venv/lib/pythonX.Y/site-packages/target_bigquery/
and add tweak the code to emit a log message here: https://github.com/adswerve/target-bigquery/blob/74ea806e1c681bd5d731b96e2ae8cc6f04c8ad9a/target_bigquery/schema.py#L359-L365
t
I didn't find any typo. Is it possible to override the schema for the extractor? I have tried it and it doesn't pick it up
This is what I tried
Copy code
- name: tap-postgres
  config:
    schema:
      public-NAME_OF_THE_TABLE:
        NAME_OF_THE_COLUMN:
        - type: ['null', array]
          items:
            type: string
  metadata:
    public-NAME_OF_THE_TABLE:
      NAME_OF_THE_COLUMN:
      - type: ['null', array]
        items:
          type: string
e
I think you have an array there instead of an object. Try:
Copy code
- name: tap-postgres
  config:
    schema:
      public-NAME_OF_THE_TABLE:
        NAME_OF_THE_COLUMN:
        - type: ['null', array]
          items:
            type: string
  metadata:
    public-NAME_OF_THE_TABLE:
      NAME_OF_THE_COLUMN:
        type: ['null', array]
        items:
          type: string
t
That didn't work either. I'm facing the same behavior as here https://meltano.slack.com/archives/C01TCRBBJD7/p1662537428796579
z
Any update here? getting the same issue with target-redshift
appears to be an issue with varchar[] columns in postgres