Hi, I'm just getting started with Meltano to evalu...
# troubleshooting
j
Hi, I'm just getting started with Meltano to evaluate different ELT tools. I'm running into an issue with
target-bigquery
. Trying to load data from postgres into bigquery and getting this error
Copy code
2022-01-13T10:29:57.794779Z [info     ] ERROR failed to load table t_public-e_address_9bf781de077049efa9ab87068bb1e228 from file: 400 POST <https://bigquery.googleapis.com/upload/bigquery/v2/projects/marketing-333110/jobs?uploadType=resumable>: Field point is type RECORD but has no schema cmd_type=loader job_id=yobify-to-bigquery name=target-bigquery run_id=8125ef19-fb9f-47a6-ad34-5f59831972c0 stdio=stderr
2022-01-13T10:29:57.795270Z [info     ] CRITICAL 400 POST <https://bigquery.googleapis.com/upload/bigquery/v2/projects/marketing-333110/jobs?uploadType=resumable>: Field point is type RECORD but has no schema cmd_type=loader job_id=yobify-to-bigquery name=target-bigquery run_id=8125ef19-fb9f-47a6-ad34-5f59831972c0 stdio=stderr
It complains that
public-e_address.point
has no schema. But the strange bit is that it is listed as excluded:
Copy code
$ meltano select tap-postgres --list --all | grep point
[...]
	[excluded ] public-e_address.point
[...]
So I don't understand why it is even uploading this field.
looking at the
tap.properties.json
, I see that point is included:
Copy code
"point": {},
but without any type defined. The database type is a PostGIS geometry so it's not surprising there is no support for it.
so postgres generates a complete schema always, even though you exclude some columns in the actual data. fine. so I try to amend the schema in
tap-postgres
with
Copy code
schema:
          '*':
            point:
              type: 'null'
              default: null
(also tried string and such). but the result is the same. Looking in the
run
dir, the metadata for the tap matches what I've configured. But the log output when running etl shows that an empty schema is transmitted:
Copy code
2022-01-13T12:49:57.192021Z [info     ] INFO public-e_address schema: { [...] 'point': {} [...]
and there is an unmerged fix (since oct!) here: https://github.com/transferwise/pipelinewise-tap-postgres/pull/129 using the that fork fixes the schema, and now it works.
t
Yeah, the issue you linked to would be the fix for that. We’re releasing soon the stream maps functionality that could intercept these messages and do the fix you’re wanting, but ideally the tap would respect the catalog!
p
Is there any workaround for this issue for the time being? I'm encountering this using
tap-jira
and
target-bigquery
and I'm wondering if it's simply impossible to use these in combination since
tap-jira
doesn't fully define types for all the attributes in its catalog
I'm evaluating Meltano for my company and finding it rather frustrating that these aren't working together out of the box, even after deselecting the problematic columns
c
@peter_huss if the unmerged PR resolves your issue, you could use that fork+branch as your extractor. See: https://docs.meltano.com/guide/plugin-management#using-a-custom-fork-of-a-plugin
p
Unfortunately my issue is with the
tap-jira
and not
tap-postgres
, but it seems the same in nature
c
Ah, gotcha. The only thing I could think of would be to fork
tap-jira
and patch it to work for your use case and then use that fork for the time being.
a
Hi, @peter_huss. The feature @taylor mentioned has since been shipped, specifically this point:
We’re releasing soon the stream maps functionality that could intercept these messages and do the fix you’re wanting...
I'll post some general guidance in a new "workaround" section of the issue @jonas_kalderstam correctly linked regarding taps not applying selection logic to schema.
Will post back shortly once that is updated.
p
Thanks @aaronsteers, I appreciate it!
a
@peter_huss - I've added a workaround into that issue. Code is psuedocode but should hopefully point you in the right direction: https://gitlab.com/meltano/meltano/-/issues/2469#workaround-using-mappers-updated-2022-03-23
Could you let us know (either way) if you have any luck with this approach? As mentioned, this is a brand new feature and we're still gathering feedback from real-world applications.
p
Really appreciate it. I'll give this a try and report back with my findings
Copy code
plugins:
  extractors:
  - name: tap-jira
    variant: singer-io
    pip_url: git+<https://github.com/singer-io/tap-jira.git>
    config:
      base_url: *******
      start_date: '2022-03-01'
      username: *******
    select:
    - issues.*
    - '!issues.renderedFields'
    - '!issues.versionedRepresentations'
  loaders:
  - name: target-bigquery
    variant: adswerve
    pip_url: git+<https://github.com/adswerve/target-bigquery.git@0.11.3>
    config:
      credentials_path: *******
      dataset_id: *******
      project_id: *******
  mappers:
  - name: meltano-map-transformer
    variant: meltano
    pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
    mappings:
    - name: remove-bad-cols
      config:
        stream_maps:
          issues:
            renderedFields: null
            versionedRepresentations: null
I've tested this basic config to remove two problematic fields (
issues.renderedFields
and
issues.versionedRepresentations
) but I still encounter the same error as before when running
meltano run tap-jira remove-bad-cols target-bigquery
Perhaps I'm missing something obvious?
the final exception is same as before using the mapper:
Copy code
google.api_core.exceptions.BadRequest: 400 POST <https://bigquery.googleapis.com/upload/bigquery/v2/projects/***/jobs?uploadType=resumable>: Field renderedFields is type RECORD but has no schema
Oddly enough I do see
target-bigquery
complaining early on about the
issues
schema
Copy code
2022-03-23T22:16:27.021628Z [info     ] WARNING the pipeline might fail because of undefined fields: an empty object/dictionary indicated as {} cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
and above that I see
target-bigquery
still printing out the schema with the two columns I've attempted to remove (removed other parts since the log message is long)
Copy code
2022-03-23T22:16:27.021085Z [info     ] INFO issues schema: {[...] 'renderedFields': {'type': ['null', 'object'], 'patternProperties': {'.+': {}}},'versionedRepresentations': {'type': ['null', 'object'], 'patternProperties': {'.+': {'type': ['null', 'object'], 'patternProperties': {'.+': {}}}}},} cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
j
hi @peter_huss, I've recently become the maintainer for a different target for BigQuery. Do you mind checking if this target works? It would help me cover for more use cases https://github.com/jmriego/pipelinewise-target-bigquery
j
@peter_huss did you manage to solve this issue? I was playing with tap-jira and i'm getting a similar issue.
@jose_riego_valenzuela what's the best way to install your version of target bigquery? I've changed my meltano.yml with this code
- name: target-bigquery
variant: transferwise
pip_url: git+<https://github.com/jmriego/pipelinewise-target-bigquery.git>
and then did a
meltano install loader target-bigquery
s
@peter_huss Hi were you able to solve this? I am facing the exact same issue. Can you please help me with that