Kevin Phan
03/07/2025, 3:29 PMplugins:
utilities:
- name: transform-google-sheets
namespace: transform-google-sheets
commands:
run_script:
executable: python
args: extract/integrations/google_sheet_transform_helper.py
when i reference this in my meltano.yml
- name: ingest_googlesheets_meltano
tasks:
- tap-google-sheets
- transform-google-sheets:run_script
- target-snowflake-googleSheets
i get some block violations - block violates set requirements: Unknown command type or bad block sequence at index 1, starting block 'tap-google-sheets'
. Anyone know the proper syntax here?Reuben (Matatika)
03/07/2025, 3:46 PMrun
block is either [<tap>, <target>]
or [<tap>, <mapping>, target]
, not [<tap>, <utility>, <target>]
.
As far as using python
as the executable
, I don't know if what you're trying to do is possible currently (i.e. run a script as a mapping between tap stdout and target stdin). Maybe try moving the definition under mappers
(as opposed to utilities
)? I suspect you will run into another block violation error since you would be referencing a mapper plugin, rather than a mapping
(which I don't believe supports commands
). https://github.com/meltano/meltano/pull/9105 might make it work as a mapper though...Kevin Phan
03/07/2025, 5:35 PMSTATUSA STATUSB STATUSC
user1 user2 user3
the result would be below before landing in snowflake
user status
user1 STATUSA
user2 STATUSB
user3 STATUSC
I could do this using other transformation tools after ingest but in this use case it would have to be transformed before it lands in snowflake which in this case is easiest to do with a siumple python script @Reuben (Matatika)Reuben (Matatika)
03/07/2025, 9:21 PMtap-google-sheets
supports out-the-box:
- name: tap-google-sheets
variant: matatika
pip_url: <https://github.com/Matatika/tap-google-sheets.git|https://github.com/Matatika/tap-google-sheets.git>
config:
flattening_enabled: true
flattening_max_depth: 1
stream_maps:
<sheet name>:
new_property: '"test"'
Kevin Phan
03/07/2025, 9:23 PMReuben (Matatika)
03/07/2025, 9:25 PMKevin Phan
03/07/2025, 9:26 PMKevin Phan
03/07/2025, 9:27 PMReuben (Matatika)
03/07/2025, 9:32 PMKevin Phan
03/07/2025, 9:35 PMKevin Phan
03/07/2025, 9:40 PMKevin Phan
03/10/2025, 2:08 PMstream_maps:
testsheetkevin: # Name of your Google Sheet tab
__key_properties__: ["user_id", "status"]
user_id: >
record["Current"] if "Current" in record and record["Current"] else
record["demoted"] if "demoted" in record and record["demoted"] else
record["Watchlist"] if "Watchlist" in record and record["Watchlist"] else None
status: >
"Current" if "Current" in record and record["Current"] else
"demoted" if "demoted" in record and record["demoted"] else
"Watchlist" if "Watchlist" in record and record["Watchlist"] else None
Current: __NULL__
demoted: __NULL__
Watchlist: __NULL__
I get the output json
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_111","status":"Current"},"time_extracted":"2025-03-10T14:07:17.933276+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_222","status":"Current"},"time_extracted":"2025-03-10T14:07:17.933402+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_333","status":"Current"},"time_extracted":"2025-03-10T14:07:17.933467+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_444","status":"Current"},"time_extracted":"2025-03-10T14:07:17.933523+00:00"}
which is what i want but it seems to be only doing it for the current
column. Any idea how i can get the same for demoted and watchlist as well? Thought id ask before trying something jankyKevin Phan
03/10/2025, 2:10 PM{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_111","status":"Current"},"time_extracted":"2025-03-10T14:07:17.933276+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_222","status":"Current"},"time_extracted":"2025-03-10T14:07:17.933402+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_333","status":"Current"},"time_extracted":"2025-03-10T14:07:17.933467+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_444","status":"Current"},"time_extracted":"2025-03-10T14:07:17.933523+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_555","status":"demoted"},"time_extracted":"2025-03-10T14:07:17.933276+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_666","status":"demoted"},"time_extracted":"2025-03-10T14:07:17.933402+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_777","status":"demoted"},"time_extracted":"2025-03-10T14:07:17.933467+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_888","status":"demoted"},"time_extracted":"2025-03-10T14:07:17.933523+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_999","status":"Watchlist"},"time_extracted":"2025-03-10T14:07:17.933276+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_100","status":"Watchlist"},"time_extracted":"2025-03-10T14:07:17.933402+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_101","status":"Watchlist"},"time_extracted":"2025-03-10T14:07:17.933467+00:00"}
{"type":"RECORD","stream":"testsheetkevin","record":{"user_id":"US_102","status":"Watchlist"},"time_extracted":"2025-03-10T14:07:17.933523+00:00"}
Reuben (Matatika)
03/10/2025, 2:41 PMsheets
config with a different range
set for each. Might look something like this:
config:
sheets:
- sheet_id: <sheet ID>
child_sheet_name: testsheetkevin
output_name: Current
range: <Current range>
- sheet_id: <sheet ID>
child_sheet_name: testsheetkevin
output_name: demoted
range: <demoted range>
- sheet_id: <sheet ID>
child_sheet_name: testsheetkevin
output_name: Watchlist
range: <Watchlist range>
stream_maps:
"*":
__key_properties__:
- user_id
- status
user_id: record[__stream_name__]
status: __stream_name__
Kevin Phan
03/11/2025, 2:57 PM- name: extract_google_sheets
namespace: extract_google_sheets
env:
SHEET_ID: "1test"
commands:
extract:
executable: python
args: extract/integrations/extract_google_sheets.py
description: >
Extracts Google Sheets data and stores in a JSONL file.
- name: load_google_sheets_snowflake
namespace: load_google_sheets_snowflake
commands:
load:
executable: python
args: extract/integrations/load_google_sheets_to_snowflake.py
description: >
Loads extracted Google Sheets data from JSONL into Snowflake.
- name: ingest_google_sheets_pc
tasks:
- extract_google_sheets:extract
- load_google_sheets_snowflake:load
This gave me autonomy over what i wanted to happen since the destination table in snowflake was going to be SCD type 2 along with a whole other set of requirements π