I have a bit of trouble fighting with target bigquery needin Meltano #troubleshooting

I have a bit of trouble fighting with target-bigqu...

johannes_rudolph

01/17/2022, 4:20 PM

I have a bit of trouble fighting with target-bigquery needing accurate json schemas, and not all taps providing them in the right quality. While working with the maintainers upstream to fix those, this is slow and painful. So I sought to workaround issues with

extractor.select

extra and

extractor.schema

extra. However I found those to be only working with taps built using the meltano sdk. My attempts with e.g. tap-pipedrive and tap-github (both built using the stitch sdk it seems) were futile so far, and it seems these taps simply ignore the extras I specify in my

meltano.yml

. Is this the expected behavior? seems like a pretty big caveat 😕

edgar_ramirez_mondragon

01/17/2022, 6:01 PM

Hi @johannes_rudolph! Even most non-Meltano-SDK-based taps should be able to work with an input catalog, and thus support the

select

extra at least for filtering streams. Field selection has less broad support in the ecosystem, and similarly

schema

. Looking at

tap-pipedrive

, these points seem to apply. It supports (de)selecting streams but the schemas and property selection are hardcoded. That said, one strategy is to migrate old taps to, and develop new ones with the SDK. Another one is the soon-to-be-released streams map transformation for all taps in Meltano: https://gitlab.com/meltano/meltano/-/issues/2299. So, do leave a 👍 or a comment in the issue if you think it'd solve your problem 😄

edgar_ramirez_mondragon

01/17/2022, 6:07 PM

Oh, and there's a tap-github built with the SDK: https://github.com/MeltanoLabs/tap-github/

johannes_rudolph

01/17/2022, 9:43 PM

Thanks for your answer and educating me here @edgar_ramirez_mondragon! To be honest the whole singer ecosystem is kind of hard to make sense of without diving into the source code of every tap/loader plugin. This is thankfully easy due to python, but nonetheless the discoverability of what works is really loose. I understand this is an n*m problem but I’d really appreciate if a tool like meltano in the middle could help validating tap capabilities more aggressively (rather than just relying on a community sourced discovery.yml). For example, by observing the tap-pipedrive singer output meltano could output a warning that says “hey it looks like you specified “!select x.y” but the tap is still emitting records with this field it. The tap probably does not support …” Of course this might be an optional thing… Anyhow, stream maps sound cool to fix this, thanks for the pointer!

jonas_kalderstam

01/18/2022, 8:36 AM

One important thing I've learned is that many extractors will send the full schema to big-query - even if you haven't selected all columns. So you need to override the schema for things as well as not selecting the columns. For example, this is a column in a database which is stored as a JSON-array, which can't be automatically converted to something that BigQuery understands. So I override the schema to something small and silly, and then make sure the column is not selected:

Copy code

environments:
- name: prod
  config:
    plugins:
      extractors:
      - name: tap-postgres
        select:
        - '!*.experiences'
        schema:
          '*':
            experiences:
              type:
              - 'null'
              - boolean

Open in Slack

Previous Next