I have a bit of trouble fighting with target-bigqu...
# troubleshooting
j
I have a bit of trouble fighting with target-bigquery needing accurate json schemas, and not all taps providing them in the right quality. While working with the maintainers upstream to fix those, this is slow and painful. So I sought to workaround issues with
extractor.select
extra and
extractor.schema
extra. However I found those to be only working with taps built using the meltano sdk. My attempts with e.g. tap-pipedrive and tap-github (both built using the stitch sdk it seems) were futile so far, and it seems these taps simply ignore the extras I specify in my
meltano.yml
. Is this the expected behavior? seems like a pretty big caveat 😕
e
Hi @johannes_rudolph! Even most non-Meltano-SDK-based taps should be able to work with an input catalog, and thus support the
select
extra at least for filtering streams. Field selection has less broad support in the ecosystem, and similarly
schema
. Looking at
tap-pipedrive
, these points seem to apply. It supports (de)selecting streams but the schemas and property selection are hardcoded. That said, one strategy is to migrate old taps to, and develop new ones with the SDK. Another one is the soon-to-be-released streams map transformation for all taps in Meltano: https://gitlab.com/meltano/meltano/-/issues/2299. So, do leave a 👍 or a comment in the issue if you think it'd solve your problem 😄
Oh, and there's a tap-github built with the SDK: https://github.com/MeltanoLabs/tap-github/
j
Thanks for your answer and educating me here @edgar_ramirez_mondragon! To be honest the whole singer ecosystem is kind of hard to make sense of without diving into the source code of every tap/loader plugin. This is thankfully easy due to python, but nonetheless the discoverability of what works is really loose. I understand this is an n*m problem but I’d really appreciate if a tool like meltano in the middle could help validating tap capabilities more aggressively (rather than just relying on a community sourced discovery.yml). For example, by observing the tap-pipedrive singer output meltano could output a warning that says “hey it looks like you specified “!select x.y” but the tap is still emitting records with this field it. The tap probably does not support …” Of course this might be an optional thing… Anyhow, stream maps sound cool to fix this, thanks for the pointer!
j
One important thing I've learned is that many extractors will send the full schema to big-query - even if you haven't selected all columns. So you need to override the schema for things as well as not selecting the columns. For example, this is a column in a database which is stored as a JSON-array, which can't be automatically converted to something that BigQuery understands. So I override the schema to something small and silly, and then make sure the column is not selected:
Copy code
environments:
- name: prod
  config:
    plugins:
      extractors:
      - name: tap-postgres
        select:
        - '!*.experiences'
        schema:
          '*':
            experiences:
              type:
              - 'null'
              - boolean