Hey guys! I'm performing an extract with the Tap-G...
# plugins-general
l
Hey guys! I'm performing an extract with the Tap-Google-Sheets Airbyte variant and I can't set the sync_mode to overwrite the full refresh. How can I do this? I forgot to mention that the target is postgress
t
If you can point me to the place in the airbyte docs where it says how to specify the replication mode I can help you configure it for Meltano, but I’m not seeing in https://docs.airbyte.com/understanding-airbyte/connections/full-refresh-append/ how to do it
l
Hi Taylor. I was able to do a full_refresh overwrite using Meltano mapping. I don't know if it was the best option. Any suggestion?
Copy code
plugins:
  extractors:
  - name: tap-google-sheets--extract
    inherit_from: tap-google-sheets
    config:
      airbyte_config:
        spreadsheet_id: <My Spreedsheet>
    select:
    - Test.*

  mappers:
    - name: map-transformer--pk
      inherit_from: meltano-map-transformer
      mappings:
        - name: mapper_for_adding_key_properties
          config:
            stream_maps:
              Test:
                __key_properties__:
                  - "numero"
  loaders:
  - name: target-postgres--sheets
    inherit_from: target-postgres

jobs:
- name: sheets-pipeline
  tasks:
  - tap-google-sheets--extract mapper_for_adding_key_properties target-postgres--sheets
In this case, the modifications are linked to the table's primary keys.
t
ah ok, that’s a good solution. since the airbyte wrapper is written using the SDK, you should be able to move the stream_map up into the configuration for the extractor.
Copy code
plugins:
  extractors:
  - name: tap-google-sheets--extract
    inherit_from: tap-google-sheets
    config:
      airbyte_config:
        spreadsheet_id: <My Spreedsheet>
    select:
    - Test.*
    stream_maps:
      Test:
        __key_properties__:
          - "numero"
(check the spacing on that) but that should work and probably is a bit quicker than invoking the mapper separately
l
Do I need to install mappers in this case? Or only extractor and loaders?
t
you wouldn’t need to install the mapper. the SDK has the stream maps feature built in, so you can just use it https://sdk.meltano.com/en/latest/stream_maps.html#constructing-the-stream-maps-config-object
l
Taylor, is it possible to pass a sql command in pipeline.yml to drop the table?
t
not directly in the extractor. you could create use a python script to do any preprocessing you might need and register it as a custom utility. https://docs.meltano.com/concepts/plugins#custom-utilities or if you’re using dbt you could run it there as a pre-hook
l
Is that the idea was to perform a full_refresh overwrite on the destination. https://docs.airbyte.com/understanding-airbyte/connections/ But we couldn't configure either Tap-Google-Sheets or Target-Postgres without using the Mappers I passed above. We are not convinced that it is the best solution.
t
I’m not sure I understand your question. I proposed an alternate solution to use stream_maps directly in the extractor definition, but it’s not required that you use that method (just a suggestion). As for running custom SQL before or after a run taht would be a different action and it’s not something supported by the taps / targets. So I suggested a custom utility / python script for that.
l
Sorry, I think I didn't make my question clear. In Airbyte's Google Sheets source it is possible to configure (in Airbyte itself), at the time of extraction, if you want a full_refresh overwrie or full_refresh append of your table. I thought Tap would be able to somehow configure these options, but it only appends the table, duplicating all the information at each extraction. As our need is to make a drop table and load the table again at each extraction, I haven't found a solution without mappers (as I mentioned at the beginning of our conversations). Sorry again for the confusion
t
No worries - that makes sense. I think I just don’t understand how the configuration for append vs overwrite is sent to the Airbyte connector. Is it something in the configuration or something else? You might be able to set it via the metadata key though.
Copy code
extractors:
- name: your tap
  metadata:
    "*":
      destination_sync_mode:overwrite
I’m basing that on https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#configuredairbytestream and https://docs.meltano.com/guide/integration#setting-metadata perhaps @alexander_butler has set this before and knows more definitively?
a
You should set it like a regular tap, the translation will take care of the rest. So:
replication-method: FULL_TABLE
Copy code
plugins:
  extractors:
  - name: tap-google-sheets--extract
    inherit_from: tap-google-sheets
    config:
      airbyte_config:
        spreadsheet_id: <My Spreedsheet>
    select:
    - Test.*
    metadata:
      Test:
        replication-method: FULL_TABLE
l
Hi @alexander_butler I ran the idea you recommended me, but the extract keeps making an addition to the dataset still replicating the information.
a
Oh your mixing up the responsibilities of the "tap" and "target" The target is responsible for truncating / replacing a table, not the tap
The
replication-method: FULL_TABLE
just guarantees that the source pulls all the data on each invocation
l
Yes, yes. I know. 😀 The problem is that I couldn't make it work no tap or target. 🤣 @alexander_butler
a
You bet. Then don’t use a tap or target haha.