I am trying to use tap-google-sheets to load the g...
# troubleshooting
s
I am trying to use tap-google-sheets to load the googlesheets data to postgresql database. I am having a strange issue. When I use
meltano run tap-google-sheets target-jsonl
, all the selected fields show up in the jsonl file. but if I use
meltano run tap-google-sheets target-postgres
, some of the selected fields are missing in the target table.
a
Can you give any more info? Which fields are missing from postgres? Do they have anything in common?
s
random fields are missing from the postgresql table.
here is the table created:
Copy code
CREATE TABLE staging.sheet1 (
	"__sdc_row" numeric NOT NULL,
	"__sdc_sheet_id" numeric NULL,
	"__sdc_spreadsheet_id" varchar NULL,
	"_sdc_batched_at" timestamp NULL,
	"_sdc_deleted_at" varchar NULL,
	"_sdc_extracted_at" timestamp NULL,
	field_a varchar NULL,
	field_b varchar NULL,
	field_c varchar NULL,
	field_d varchar NULL,
	CONSTRAINT sheet1_pkey PRIMARY KEY (__sdc_row)
);
CREATE INDEX i_sheet1__sdc_deleted_at ON staging.sheet1 USING btree (_sdc_deleted_at);
here are the selected fields from log:
Copy code
2023-03-17T05:22:30.241635Z [info     ] INFO Stream: Sheet1, selected_fields: ['__sdc_spreadsheet_id', '__sdc_sheet_id', '__sdc_row', 'id', 'field_a', 'field_b', 'field_c', 'field_d'] cmd_type=elb consumer=False name=tap-google-sheets producer=True stdio=stderr string_id=tap-google-sheets
you can see id field is missing in the table.
here is the jsonl file:
Copy code
{"__sdc_spreadsheet_id": "1Blj-u4-vcMtRoh2obHBBC3b5eBjkCtSDC4NbsmHDrw4", "__sdc_sheet_id": 0, "__sdc_row": 2, "id": "1", "field_a": "a1", "field_b": "b1", "field_c": "c1", "field_d": "d1"}
{"__sdc_spreadsheet_id": "1Blj-u4-vcMtRoh2obHBBC3b5eBjkCtSDC4NbsmHDrw4", "__sdc_sheet_id": 0, "__sdc_row": 3, "id": "2", "field_a": "a2", "field_b": "b2", "field_c": "c2", "field_d": "d2"}
{"__sdc_spreadsheet_id": "1Blj-u4-vcMtRoh2obHBBC3b5eBjkCtSDC4NbsmHDrw4", "__sdc_sheet_id": 0, "__sdc_row": 4, "id": "3", "field_a": "a3", "field_b": "b3", "field_c": "c3", "field_d": "d3"}
{"__sdc_spreadsheet_id": "1Blj-u4-vcMtRoh2obHBBC3b5eBjkCtSDC4NbsmHDrw4", "__sdc_sheet_id": 0, "__sdc_row": 5, "id": "4", "field_a": "a4", "field_b": "b4", "field_c": "c4", "field_d": "d4"}
id field is in each record.
a
Can you share the
meltano.yaml
for those two taps? The implementation will be different for each tap, which explains why some
_sdc
fields are present in postgres but not in jsonl, that's up to the tap developer's implementation as to what gets added to the output. Of the non-sdc fields, it looks like id is missing from postgres which is odd. Is
id
actually a column in your sheet, or just the row number?
s
id is a column in the sheet, and it is missing, _sdc fields are not critical.
here is the postgres one:
Copy code
- name: target-postgres-application
    inherit_from: target-postgres
    config:
Here is the jsonl one:
Copy code
- name: target-jsonl
    variant: andyh1203
    pip_url: target-jsonl
here is the target-postgres:
Copy code
- name: target-postgres
    variant: transferwise
    pip_url: pipelinewise-target-postgres
a
Can you try renaming the id column? Or recording columns to put id at the end? Does the id column always disappear?
s
yes, I did renamed the id column, the new column was not loaded too. I put it at the end, same.
jsonl is always working fine.
a
Have you tried wiping out the tables on postgres completely and then forcing
--full-refresh
when running?
s
yes, I did.
@Andy Carter, Hi Andy, I am new to Meltano, what do you think the issue is from? Meltano framework, Tap-google-sheets or Target-postgres? is there any way to debug this?
a
Hard to say but the fact that jsonl comes out ok maybe points to postgres. Could you share the full meltano.yml with the config blocks too? WHat happens if you create an entirely new sheet without an
id
column? Do all columns load ok?
I wonder whether the postgres issue is that you are trying to create a column called
id
but it is not the primary key, perhaps that is forbidden? Can you try to explicitly set
id
as the key for the plugin? As always drop the table in postgres before rebuilding
Copy code
metadata:
      Sheet1:
        key-properties: id
s
no, I created a new sheet, and have the same problem.
Here is the config in meltano.yml:
Copy code
plugins:
  extractors:
  - name: tap-google-sheets
    variant: singer-io
    pip_url: git+<https://github.com/singer-io/tap-google-sheets.git>
    config:
      client_id: {client id}
      spreadsheet_id: {sheet id}
      start_date: '2022-01-01T00:00:00Z'
  loaders:
  - name: target-postgres
    variant: transferwise
    pip_url: pipelinewise-target-postgres
    config:
      host: ${DATA_WAREHOUSE_HOST}
      port: ${DATA_WAREHOUSE_PORT}
      user: ${DATA_WAREHOUSE_USER_NAME}
      dbname: ${DATA_WAREHOUSE_DB_NAME}
a
If you've tried the
metadata
approach of setting the PK manually, then I'm all out of ideas. I would raise the issue over at the
target-postgres
repo
s
after I used metadata, it is working now, but I got another issue. when I added a new column to a sheet, and re-ran the command using full-refresh option, the new column will not be loaded, even using target-jsonl. I also tried to remove the state, and still it is not working. what else do I need to test? If I copied the google sheet to a new one, it is working again. I don't understand where this tap caches the googlesheet metadata.
by the way, where can I find the document for
metadata
? I would like to know what are the detail configuration in it.
a
I don't think meltano can alter a SQL table once created so for a new column to appear you might need to drop the table from the db. When you create a new Google sheet that is seen as a new stream and therefore a new table is created in the db matching the new schema
More metadata info here: #
s
I added a new column in the googlesheet, target-jsonl is not working too, the new column is not showing in the jsonl file. if we can't load from the same sheet, it will be not convenient.
only I copied the sheet and it works by running with new sheet.
where
key-properties
is documented? I didn't see it.
d
For the key-properties docs: https://hub.meltano.com/singer/spec/#metadata, though looks like in the docs its been renamed to table-key-properties Also if you are still running into your missing columns issue using the Matatika variant of tap-google-sheets I get new columns added (to my postgres db) when I add them to my google sheet. I just updated the tap to have a built in key-properties setting to let you set your primary keys as well. https://hub.meltano.com/extractors/tap-google-sheets--matatika/