Hi all I ll get an error when trying to load into snowflake Meltano #plugins-general

Hi all I’ll get an error when trying to load into ...

farid_akhavan

03/04/2021, 10:51 PM

Hi all I’ll get an error when trying to load into snowflake. The problem (as I checked in Snowflake’s history) seems to be when the lib tries to get tables in the schema and wrtaps the db and schema in quotes

. The query runs okay without those quotes. Has anyone had any similar experience?

log.txt

douwe_maan

03/04/2021, 11:01 PM

@farid_akhavan Is it possible that

meltano

should be uppercase

MELTANO

? My hunch is that without quotes, it's treated as case insensitive, and with quotes, the case needs to match

douwe_maan

03/04/2021, 11:01 PM

And Snowflake uses ALLCAPS for its objects, afaik

farid_akhavan

03/04/2021, 11:13 PM

@douwe_maan You’re right that fixed it. However I get another error now, it’s seems to be the same one I get when I try to load into Postgres. It seems like it creates the table with column`_SMART_SOURCE_BUCKET` being not “nullable” , but when it tries to copy the rows from temp table to the main table, it fails as the column

_SMART_SOURCE_BUCKET

is null. Needless to say that I load the csv file using

tap-spreadsheets-anywhere

from a local directory so not sure if the source bucket field even should apply

douwe_maan

03/04/2021, 11:27 PM

Interesting, that suggests that here: https://github.com/ets/tap-spreadsheets-anywhere/blob/cc03db106f4fe81fef7696c0743022af5e511a65/tap_spreadsheets_anywhere/file_utils.py#L30,

table_spec['path']

is unintentionally not set

douwe_maan

03/04/2021, 11:27 PM

That corresponds to the

path

you specified in your config for the table in question

douwe_maan

03/04/2021, 11:28 PM

Can you share your config for the tap?

farid_akhavan

03/05/2021, 12:13 AM

I can send the whole

meltano.yml

file if you want but this is the part related to extractor tap:

Copy code

- name: tap-spreadsheets-anywhere
    pip_url: git+<https://github.com/ets/tap-spreadsheets-anywhere.git>
    config:
      tables:
      - path: file:///Users/farid/WorkSpace/meltano-projects/meltano-projects/poc_local_to_postgres/input_directory
        name: customer_s3_uploads
        pattern: .*\.csv
        start_date: '2017-05-01T00:00:00Z'
        key_properties: []
        format: csv

douwe_maan

03/05/2021, 12:29 AM

Hmm 😕 If you run

meltano elt

in debug mode (https://meltano.com/docs/command-line-interface.html#debugging), is

_smart_source_bucket

actually unset in the

RECORD

messages (meaning the tap is at fault), or is it there, but getting lost somewhere in target-snowflake?

farid_akhavan

03/05/2021, 12:37 AM

I think these are the record messages right? They seem to only contain what’s in the csv…

douwe_maan

03/05/2021, 12:42 AM

Interesting! That's a bug in the tap then, because at a glance it looks like https://github.com/ets/tap-spreadsheets-anywhere/blob/cc03db106f4fe81fef7696c0743022af5e511a65/tap_spreadsheets_anywhere/file_utils.py#L38 inserts that metadata into

record

, just like the metadata _schem_a is inserted into the record schema here: https://github.com/ets/tap-spreadsheets-anywhere/blob/c92a153a85551b47e715e9348b02bca79d90bca0/tap_spreadsheets_anywhere/__init__.py#L64. But, that's not actually what

transformer.transform

does with its

metadata

argument: https://github.com/singer-io/singer-python/blob/2f618c6194f154be7ba37f288d456f6abd4d78ec/singer/transform.py#L148, which is only used to find out if a specific column should be selected or not: https://github.com/singer-io/singer-python/blob/2f618c6194f154be7ba37f288d456f6abd4d78ec/singer/transform.py#L128

douwe_maan

03/05/2021, 12:42 AM

So the tap is currently claiming that each row will have non-null

_smart_<etc>

fields, but they never actually end up in the

RECORD

messages

douwe_maan

03/05/2021, 12:43 AM

This bug was introduced by https://github.com/ets/tap-spreadsheets-anywhere/commit/a59fdfd63c86cefb6abfb767f7d1cb216be65c6c (cc @michael_cooper)

douwe_maan

03/05/2021, 12:43 AM

@farid_akhavan Can you please file an issue for that on that repo?

douwe_maan

03/05/2021, 12:45 AM

To fix, we should merge

metadata

into

record

again, and the actual stream-specific metadata that

transform.transform

wants to determine which fields to include should be coming from the

catalog

here: https://github.com/ets/tap-spreadsheets-anywhere/blob/2df74eb81ce8eee555b7a01de4813c8f48de4e2b/tap_spreadsheets_anywhere/__init__.py#L94 and passed to

write_file

here: https://github.com/ets/tap-spreadsheets-anywhere/blob/2df74eb81ce8eee555b7a01de4813c8f48de4e2b/tap_spreadsheets_anywhere/__init__.py#L114

farid_akhavan

03/05/2021, 12:45 AM

Sure thing man, on the tap-spreadsheets-anywhere, right?

douwe_maan

03/05/2021, 12:45 AM

yessir

douwe_maan

03/05/2021, 12:47 AM

If you just want to get past this in your local project, the easiest option is to change https://github.com/ets/tap-spreadsheets-anywhere/blob/c92a153a85551b47e715e9348b02bca79d90bca0/tap_spreadsheets_anywhere/file_utils.py#L38 to:

Copy code

transformed_record = transformer.transform({**record, **metadata}, schema)

douwe_maan

03/05/2021, 12:47 AM

But that's not a complete fix because because we should also be passing the real stream metadata to

transformer.transform

. It's a start though!

farid_akhavan

03/05/2021, 2:41 AM

I raised the issue. I think ideally the transform method should change to add the metadata into the record, right? @douwe_maan

douwe_maan

03/05/2021, 5:56 PM

@farid_akhavan No, the

transform

method is working correctly, but it's talking about a different type of metadata. There are metadata columns that the tap outputs (the ones causing the issue), and there's metadata about streams and properties that are fed to the tap using a catalog file, so that you can say "I want this property, but not that one".

transform

deals with the latter type, but the tap's currently implementation thinks it means the former, which is the bug

douwe_maan

03/05/2021, 5:57 PM

I'll have a look at the issue to see if I've got anything to add

michael_cooper

03/05/2021, 10:28 PM

It's interesting that

transform

does not break for us. We're using this tap in production now and have never had an issue with it yet. Does it have to do with the source by chance? We haven't tried it with local files, but we are using it to ingest GCS csv data just fine.

douwe_maan

03/05/2021, 10:31 PM

@michael_cooper Are you still seeing the

_smart_*

fields being populated in the destination? You may be using a target that doesn't care that the fields were advertised as non-nullable, but are omitted from the

RECORD

s anyway

douwe_maan

03/05/2021, 10:32 PM

The problem is that before moving to the transformer, the metadata was getting merged into the record: https://github.com/ets/tap-spreadsheets-anywhere/commit/a59fdfd63c86cefb6abfb767f7d1cb216be65c6c#diff-7f8274d0c49665bb13a[…]a352f18890c7d2532c32cL35, and now it is no longer, since that's not what

transformer.transform

uses its

metadata

argument for

michael_cooper

03/05/2021, 10:35 PM

Those are fields in the destination, though all of them are null. This is

target-bigquery

for reference.

douwe_maan

03/05/2021, 10:36 PM

All right, I've noticed before that

target-bigquery

does not (always?) listen to the tap when it comes to nullability

michael_cooper

03/05/2021, 10:37 PM

Also, thank you for pointing that out in regards to

transform

. I didn't realize that the

metadata

argument worked that way.

douwe_maan

03/05/2021, 10:37 PM

So the bug is that the SCHEMA still advertises those

_smart_

fields as non-nullable, but they actually no longer show up in the `RECORD`s. target-bigquery not caring is a happy coincidence for you 🙂

douwe_maan

03/05/2021, 10:37 PM

yeah it's not quite obvious that it uses

metadata

in a totally different way

farid_akhavan

03/08/2021, 3:26 AM

@nick_hamlin I just saw your message in previous threads and any chance you think you had the same issue as mine above?

nick_hamlin

03/08/2021, 4:12 AM

Sounds like there might be some similar symptoms @farid_akhavan, but I don’t suspect the same root cause. I’m using

pipelinewise-target-redshift

, not

target-bigquery

farid_akhavan

03/08/2021, 4:13 AM

@nick_hamlin The problem seems to be inside the

tap-spreadsheets-anywhere

not

target-bigquery

. Any chance you’ve been using that tap to extract from local files?

nick_hamlin

03/08/2021, 4:16 AM

Unfortunately no - closest I’ve come to that is a little poking around

tap-google-sheets

, but haven’t gotten through setting up my credentials there yet

nick_hamlin

03/08/2021, 4:16 AM

I’ve mostly been using

tap-mysql

and

tap-zendesk

so far, both of which had a few quirks to getting set up but seem to be running just fine now that I’ve ironed them out

farid_akhavan

03/08/2021, 4:18 AM

Interesting, it probably has been related to something else then. CC: @douwe_maan

douwe_maan

03/08/2021, 4:16 PM

@farid_akhavan Sorry, what specific issues are we talking about now? 😄

2 Views

Open in Slack

Previous Next