Hi I m just getting started with Meltano to evaluate differe Meltano #troubleshooting

Hi, I'm just getting started with Meltano to evalu...

jonas_kalderstam

01/13/2022, 10:32 AM

Hi, I'm just getting started with Meltano to evaluate different ELT tools. I'm running into an issue with

target-bigquery

. Trying to load data from postgres into bigquery and getting this error

Copy code

2022-01-13T10:29:57.794779Z [info     ] ERROR failed to load table t_public-e_address_9bf781de077049efa9ab87068bb1e228 from file: 400 POST <https://bigquery.googleapis.com/upload/bigquery/v2/projects/marketing-333110/jobs?uploadType=resumable>: Field point is type RECORD but has no schema cmd_type=loader job_id=yobify-to-bigquery name=target-bigquery run_id=8125ef19-fb9f-47a6-ad34-5f59831972c0 stdio=stderr
2022-01-13T10:29:57.795270Z [info     ] CRITICAL 400 POST <https://bigquery.googleapis.com/upload/bigquery/v2/projects/marketing-333110/jobs?uploadType=resumable>: Field point is type RECORD but has no schema cmd_type=loader job_id=yobify-to-bigquery name=target-bigquery run_id=8125ef19-fb9f-47a6-ad34-5f59831972c0 stdio=stderr

It complains that

public-e_address.point

has no schema. But the strange bit is that it is listed as excluded:

Copy code

$ meltano select tap-postgres --list --all | grep point
[...]
	[excluded ] public-e_address.point
[...]

So I don't understand why it is even uploading this field.

jonas_kalderstam

01/13/2022, 12:03 PM

looking at the

tap.properties.json

, I see that point is included:

Copy code

"point": {},

but without any type defined. The database type is a PostGIS geometry so it's not surprising there is no support for it.

jonas_kalderstam

01/13/2022, 12:51 PM

so postgres generates a complete schema always, even though you exclude some columns in the actual data. fine. so I try to amend the schema in

tap-postgres

with

Copy code

schema:
          '*':
            point:
              type: 'null'
              default: null

(also tried string and such). but the result is the same. Looking in the

run

dir, the metadata for the tap matches what I've configured. But the log output when running etl shows that an empty schema is transmitted:

Copy code

2022-01-13T12:49:57.192021Z [info     ] INFO public-e_address schema: { [...] 'point': {} [...]

jonas_kalderstam

01/13/2022, 1:26 PM

right. so this appears to be https://gitlab.com/meltano/meltano/-/issues/2469

jonas_kalderstam

01/13/2022, 2:39 PM

and there is an unmerged fix (since oct!) here: https://github.com/transferwise/pipelinewise-tap-postgres/pull/129 using the that fork fixes the schema, and now it works.

taylor

01/13/2022, 4:06 PM

Yeah, the issue you linked to would be the fix for that. We’re releasing soon the stream maps functionality that could intercept these messages and do the fix you’re wanting, but ideally the tap would respect the catalog!

peter_huss

03/23/2022, 7:27 PM

Is there any workaround for this issue for the time being? I'm encountering this using

tap-jira

and

target-bigquery

and I'm wondering if it's simply impossible to use these in combination since

tap-jira

doesn't fully define types for all the attributes in its catalog

peter_huss

03/23/2022, 7:31 PM

I'm evaluating Meltano for my company and finding it rather frustrating that these aren't working together out of the box, even after deselecting the problematic columns

cody_hanson

03/23/2022, 7:43 PM

@peter_huss if the unmerged PR resolves your issue, you could use that fork+branch as your extractor. See: https://docs.meltano.com/guide/plugin-management#using-a-custom-fork-of-a-plugin

peter_huss

03/23/2022, 7:44 PM

Unfortunately my issue is with the

tap-jira

and not

tap-postgres

, but it seems the same in nature

cody_hanson

03/23/2022, 8:02 PM

Ah, gotcha. The only thing I could think of would be to fork

tap-jira

and patch it to work for your use case and then use that fork for the time being.

aaronsteers

03/23/2022, 9:04 PM

Hi, @peter_huss. The feature @taylor mentioned has since been shipped, specifically this point:

We’re releasing soon the stream maps functionality that could intercept these messages and do the fix you’re wanting...

I'll post some general guidance in a new "workaround" section of the issue @jonas_kalderstam correctly linked regarding taps not applying selection logic to schema.

aaronsteers

03/23/2022, 9:04 PM

Will post back shortly once that is updated.

peter_huss

03/23/2022, 9:06 PM

Thanks @aaronsteers, I appreciate it!

aaronsteers

03/23/2022, 9:15 PM

@peter_huss - I've added a workaround into that issue. Code is psuedocode but should hopefully point you in the right direction: https://gitlab.com/meltano/meltano/-/issues/2469#workaround-using-mappers-updated-2022-03-23

aaronsteers

03/23/2022, 9:16 PM

Could you let us know (either way) if you have any luck with this approach? As mentioned, this is a brand new feature and we're still gathering feedback from real-world applications.

peter_huss

03/23/2022, 9:16 PM

Really appreciate it. I'll give this a try and report back with my findings

peter_huss

03/23/2022, 10:07 PM

Copy code

plugins:
  extractors:
  - name: tap-jira
    variant: singer-io
    pip_url: git+<https://github.com/singer-io/tap-jira.git>
    config:
      base_url: *******
      start_date: '2022-03-01'
      username: *******
    select:
    - issues.*
    - '!issues.renderedFields'
    - '!issues.versionedRepresentations'
  loaders:
  - name: target-bigquery
    variant: adswerve
    pip_url: git+<https://github.com/adswerve/target-bigquery.git@0.11.3>
    config:
      credentials_path: *******
      dataset_id: *******
      project_id: *******
  mappers:
  - name: meltano-map-transformer
    variant: meltano
    pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
    mappings:
    - name: remove-bad-cols
      config:
        stream_maps:
          issues:
            renderedFields: null
            versionedRepresentations: null

I've tested this basic config to remove two problematic fields (

issues.renderedFields

and

issues.versionedRepresentations

) but I still encounter the same error as before when running

meltano run tap-jira remove-bad-cols target-bigquery

Perhaps I'm missing something obvious?

peter_huss

03/23/2022, 10:08 PM

the final exception is same as before using the mapper:

Copy code

google.api_core.exceptions.BadRequest: 400 POST <https://bigquery.googleapis.com/upload/bigquery/v2/projects/***/jobs?uploadType=resumable>: Field renderedFields is type RECORD but has no schema

peter_huss

03/23/2022, 10:18 PM

Oddly enough I do see

target-bigquery

complaining early on about the

issues

schema

Copy code

2022-03-23T22:16:27.021628Z [info     ] WARNING the pipeline might fail because of undefined fields: an empty object/dictionary indicated as {} cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery

peter_huss

03/23/2022, 10:22 PM

and above that I see

target-bigquery

still printing out the schema with the two columns I've attempted to remove (removed other parts since the log message is long)

Copy code

2022-03-23T22:16:27.021085Z [info     ] INFO issues schema: {[...] 'renderedFields': {'type': ['null', 'object'], 'patternProperties': {'.+': {}}},'versionedRepresentations': {'type': ['null', 'object'], 'patternProperties': {'.+': {'type': ['null', 'object'], 'patternProperties': {'.+': {}}}}},} cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery

jose_riego_valenzuela

03/25/2022, 1:19 PM

hi @peter_huss, I've recently become the maintainer for a different target for BigQuery. Do you mind checking if this target works? It would help me cover for more use cases https://github.com/jmriego/pipelinewise-target-bigquery

jose

11/22/2022, 10:58 AM

@peter_huss did you manage to solve this issue? I was playing with tap-jira and i'm getting a similar issue.

jose

11/22/2022, 12:00 PM

@jose_riego_valenzuela what's the best way to install your version of target bigquery? I've changed my meltano.yml with this code

- name: target-bigquery

variant: transferwise

pip_url: git+<https://github.com/jmriego/pipelinewise-target-bigquery.git>

and then did a

meltano install loader target-bigquery

shubham

07/24/2023, 7:51 PM

@peter_huss Hi were you able to solve this? I am facing the exact same issue. Can you please help me with that

Open in Slack

Previous Next