hey all I have a custom utiliy like so ```plugins utilities Meltano #troubleshooting

hey all! I have a custom utiliy like so: ```plugin...

Kevin Phan

03/07/2025, 3:29 PM

hey all! I have a custom utiliy like so:

Copy code

plugins:
  utilities:  
  - name: transform-google-sheets
      namespace: transform-google-sheets
      commands:
        run_script:
          executable: python
          args: extract/integrations/google_sheet_transform_helper.py

when i reference this in my meltano.yml

Copy code

- name: ingest_googlesheets_meltano
  tasks:
    - tap-google-sheets
    - transform-google-sheets:run_script
    - target-snowflake-googleSheets

i get some block violations -

block violates set requirements: Unknown command type or bad block sequence at index 1, starting block 'tap-google-sheets'

. Anyone know the proper syntax here?

✅ 1

Reuben (Matatika)

03/07/2025, 3:46 PM

A valid

run

block is either

[<tap>, <target>]

[<tap>, <mapping>, target]

, not

[<tap>, <utility>, <target>]

. As far as using

python

as the

executable

, I don't know if what you're trying to do is possible currently (i.e. run a script as a mapping between tap stdout and target stdin). Maybe try moving the definition under

mappers

(as opposed to

utilities

)? I suspect you will run into another block violation error since you would be referencing a mapper plugin, rather than a

mapping

(which I don't believe supports

commands

). https://github.com/meltano/meltano/pull/9105 might make it work as a mapper though...

👍 1

Kevin Phan

03/07/2025, 5:35 PM

got it, thanks! Ill give it a try 🙂 for context, im essentially flattening the data and just making new a couple new columns so nothing crazy (so a simple data manipulation). I initially had a mapper but i think that only renames columns or perhaps cast data types not make new columns explicitly. ie if i have in the gsheet

Copy code

STATUSA STATUSB STATUSC
user1   user2   user3

the result would be below before landing in snowflake

Copy code

user  status
user1 STATUSA
user2 STATUSB
user3 STATUSC

I could do this using other transformation tools after ingest but in this use case it would have to be transformed before it lands in snowflake which in this case is easiest to do with a siumple python script @Reuben (Matatika)

Reuben (Matatika)

03/07/2025, 9:21 PM

What new columns did you want to add? That is definitely possible with stream maps, which along with flattening

tap-google-sheets

supports out-the-box:

Copy code

- name: tap-google-sheets
    variant: matatika
    pip_url: <https://github.com/Matatika/tap-google-sheets.git|https://github.com/Matatika/tap-google-sheets.git>
    config:
      flattening_enabled: true
      flattening_max_depth: 1
      stream_maps:
        <sheet name>:
          new_property: '"test"'

👍 1

Kevin Phan

03/07/2025, 9:23 PM

Oh sweet , also new column would be a status column that maps each user to a status rather then each status being its own column of users

Reuben (Matatika)

03/07/2025, 9:25 PM

How are you determining what status to assign to a user?

Kevin Phan

03/07/2025, 9:26 PM

It’s actually already there in the sheet to begin with. If status is current , that column already has a list of user ids. Thing is , because we went to track historical changes to the user, it makes sense to have the user id be the primary key and a status column explicitly stating the status. So it’s just a re manipulation of the data. Not determining anything new

Kevin Phan

03/07/2025, 9:27 PM

Similar to one hot encoding

Reuben (Matatika)

03/07/2025, 9:32 PM

Right, so for a single row you want to extract each user out along with the rest of the row as a record i.e one row in the sheet with three user IDs should output three records?

Kevin Phan

03/07/2025, 9:35 PM

Yes!

Kevin Phan

03/07/2025, 9:40 PM

Actually well close, the three status columns would be a value in a general status column but nevertheless I feel this is possible so thanks for guiding

👍 1

Kevin Phan

03/10/2025, 2:08 PM

hmm so running into a small block and wondering if you knew how to get around this. so test google shee ti have is included in the pic below. when i add this stream map to the tap:

Copy code

stream_maps:
        testsheetkevin:  # Name of your Google Sheet tab
          __key_properties__: ["user_id", "status"]
          user_id: >
            record["Current"] if "Current" in record and record["Current"] else
            record["demoted"] if "demoted" in record and record["demoted"] else
            record["Watchlist"] if "Watchlist" in record and record["Watchlist"] else None
          status: >
            "Current" if "Current" in record and record["Current"] else
            "demoted" if "demoted" in record and record["demoted"] else
            "Watchlist" if "Watchlist" in record and record["Watchlist"] else None
          Current:  __NULL__
          demoted: __NULL__
          Watchlist: __NULL__

I get the output json

Copy code

{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_111","status":"Current"},"time_extracted":"2025-03-10T14:07:17.933276+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_222","status":"Current"},"time_extracted":"2025-03-10T14:07:17.933402+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_333","status":"Current"},"time_extracted":"2025-03-10T14:07:17.933467+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_444","status":"Current"},"time_extracted":"2025-03-10T14:07:17.933523+00:00"}

which is what i want but it seems to be only doing it for the

current

column. Any idea how i can get the same for demoted and watchlist as well? Thought id ask before trying something janky

Kevin Phan

03/10/2025, 2:10 PM

ideally i would like to have had

Copy code

{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_111","status":"Current"},"time_extracted":"2025-03-10T14:07:17.933276+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_222","status":"Current"},"time_extracted":"2025-03-10T14:07:17.933402+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_333","status":"Current"},"time_extracted":"2025-03-10T14:07:17.933467+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_444","status":"Current"},"time_extracted":"2025-03-10T14:07:17.933523+00:00"}

{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_555","status":"demoted"},"time_extracted":"2025-03-10T14:07:17.933276+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_666","status":"demoted"},"time_extracted":"2025-03-10T14:07:17.933402+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_777","status":"demoted"},"time_extracted":"2025-03-10T14:07:17.933467+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_888","status":"demoted"},"time_extracted":"2025-03-10T14:07:17.933523+00:00"}

{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_999","status":"Watchlist"},"time_extracted":"2025-03-10T14:07:17.933276+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_100","status":"Watchlist"},"time_extracted":"2025-03-10T14:07:17.933402+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_101","status":"Watchlist"},"time_extracted":"2025-03-10T14:07:17.933467+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_102","status":"Watchlist"},"time_extracted":"2025-03-10T14:07:17.933523+00:00"}

Reuben (Matatika)

03/10/2025, 2:41 PM

IIRC the tap handles one row as one record. I assume restructuring the sheet is not an option? You might be able to get something referencing the same sheet 3x in

sheets

config with a different

range

set for each. Might look something like this:

Copy code

config:
      sheets:
      - sheet_id: <sheet ID>
        child_sheet_name: testsheetkevin
        output_name: Current
        range: <Current range>
      - sheet_id: <sheet ID>
        child_sheet_name: testsheetkevin
        output_name: demoted
        range: <demoted range>
      - sheet_id: <sheet ID>
        child_sheet_name: testsheetkevin
        output_name: Watchlist
        range: <Watchlist range>
      stream_maps:
        "*":
          __key_properties__:
          - user_id
          - status
          user_id: record[__stream_name__]
          status: __stream_name__

👀 1

Kevin Phan

03/11/2025, 2:57 PM

thanks for the help! I ended up creating a custom extract and load and ran it like so:

Copy code

- name: extract_google_sheets
      namespace: extract_google_sheets
      env:
        SHEET_ID: "1test"
      commands:
        extract:
          executable: python
          args: extract/integrations/extract_google_sheets.py
          description: >
            Extracts Google Sheets data and stores in a JSONL file.

Copy code

- name: load_google_sheets_snowflake
      namespace: load_google_sheets_snowflake
      commands:
        load:
          executable: python
          args: extract/integrations/load_google_sheets_to_snowflake.py
          description: >
            Loads extracted Google Sheets data from JSONL into Snowflake.

Copy code

- name: ingest_google_sheets_pc
  tasks:
    - extract_google_sheets:extract
    - load_google_sheets_snowflake:load

This gave me autonomy over what i wanted to happen since the destination table in snowflake was going to be SCD type 2 along with a whole other set of requirements 👍

👍 1

7 Views

Open in Slack

Previous Next