Hi all We have a column type array from postgresql and tryin Meltano #troubleshooting

Hi all! We have a column type array from postgresq...

TomasB

12/02/2022, 2:34 PM

Hi all! We have a column type array from postgresql and trying to sync with big-query. I'm overwriting the schema by using

Copy code

{
  "streams": {
    "public-NAME_OF_THE_TABLE": {
      "force_fields": {
        "NAME_OF_THE_COLUMN": {
          "type": "STRING",
          "mode": "REPEATED"
        }
      }
    }
  }
}

and the schema is:

Copy code

public-NAME_OF_THE_TABLE schema: {
    'type': 'object', 
    'properties': {
        'id': {
            'type': ['integer'], 'minimum': -9223372036854775808, 'maximum': 9223372036854775807
        },
        'NAME_OF_THE_COLUMN': {
            'type': ['null', 'array'], 
            'items': {'$ref': '#/definitions/sdc_recursive_string_array'}
        },
    }
}

Unfortunately the load is still failing with error:

Copy code

CRITICAL `$ref` path "
{
    'type': ['null', 'string', 'array'], 
    'items': {'$ref': '#/definitions/sdc_recursive_string_array'}
}" is recursive

Does anyone have any example on a successful syncs with type arrays using

tap-postgresql

and

target-bigquery

edgar_ramirez_mondragon

12/08/2022, 5:52 PM

Hi @TomasB! There’s a few things to unpack. • How are you overwriting the schema of the tap? Are you using the schema extra, or a custom catalog file? • What does the schema of

#/definitions/sdc_recursive_string_array

look like?

edgar_ramirez_mondragon

12/08/2022, 5:54 PM

I know jmriego/pipelinewise-target-bigquery is tested against array data: https://github.com/jmriego/pipelinewise-target-bigquery/blob/master/tests/integration/resources/messages-with-array-data.json

TomasB

12/08/2022, 6:36 PM

Hmm I have overwritten only the loader. Should I overwrite the schema of the tap?

edgar_ramirez_mondragon

12/08/2022, 7:04 PM

Hmm I have overwritten only the loader

I don’t understand what that means. Can you describe how you did that?

TomasB

12/08/2022, 7:05 PM

The example is above in the thread

TomasB

12/08/2022, 7:05 PM

Copy code

{
  "streams": {
    "public-NAME_OF_THE_TABLE": {
      "force_fields": {
        "NAME_OF_THE_COLUMN": {
          "type": "STRING",
          "mode": "REPEATED"
        }
      }
    }
  }
}

edgar_ramirez_mondragon

12/08/2022, 7:05 PM

But that’s the tap schema, right?

edgar_ramirez_mondragon

12/08/2022, 7:05 PM

To be clear tap = extractor, target = loader

TomasB

12/08/2022, 7:06 PM

no, that's for the target (loader) as the example here https://github.com/adswerve/target-bigquery#step-6-target-tables-config-file-force-data-types-and-modes

edgar_ramirez_mondragon

12/08/2022, 7:10 PM

Oh, I didn’t know that target supported its own schema overrides 🙏.

edgar_ramirez_mondragon

12/08/2022, 7:11 PM

Ok, so how are you setting that config for the target? In

meltano.yml

TomasB

12/08/2022, 7:15 PM

no, a

.json

file in a

target-configs

directory that we have created in the meltano project. The target config overwrite works for some other columns that were coming as

RECORD

and overwrote them to

STRING

but the array one isn't working as I have set it up like above

edgar_ramirez_mondragon

12/08/2022, 7:20 PM

Ok, so if

force_fields

is working for other fields, I’d • Look out for typos in the column name • Dive in the

target-bigquery

venv under

.meltano/loaders/target-bigquery/venv/lib/pythonX.Y/site-packages/target_bigquery/

and add tweak the code to emit a log message here: https://github.com/adswerve/target-bigquery/blob/74ea806e1c681bd5d731b96e2ae8cc6f04c8ad9a/target_bigquery/schema.py#L359-L365

TomasB

12/14/2022, 9:01 PM

I didn't find any typo. Is it possible to override the schema for the extractor? I have tried it and it doesn't pick it up

TomasB

12/14/2022, 9:16 PM

This is what I tried

Copy code

- name: tap-postgres
  config:
    schema:
      public-NAME_OF_THE_TABLE:
        NAME_OF_THE_COLUMN:
        - type: ['null', array]
          items:
            type: string
  metadata:
    public-NAME_OF_THE_TABLE:
      NAME_OF_THE_COLUMN:
      - type: ['null', array]
        items:
          type: string

edgar_ramirez_mondragon

12/14/2022, 9:39 PM

I think you have an array there instead of an object. Try:

Copy code

- name: tap-postgres
  config:
    schema:
      public-NAME_OF_THE_TABLE:
        NAME_OF_THE_COLUMN:
        - type: ['null', array]
          items:
            type: string
  metadata:
    public-NAME_OF_THE_TABLE:
      NAME_OF_THE_COLUMN:
        type: ['null', array]
        items:
          type: string

TomasB

12/15/2022, 3:34 PM

That didn't work either. I'm facing the same behavior as here https://meltano.slack.com/archives/C01TCRBBJD7/p1662537428796579

zach_nagengast

02/13/2023, 7:36 PM

Any update here? getting the same issue with target-redshift

zach_nagengast

02/13/2023, 7:36 PM

appears to be an issue with varchar[] columns in postgres

Open in Slack

Previous Next