Hi all I’ll get an error when trying to load into ...
# plugins-general
f
Hi all I’ll get an error when trying to load into snowflake. The problem (as I checked in Snowflake’s history) seems to be when the lib tries to get tables in the schema and wrtaps the db and schema in quotes
"
. The query runs okay without those quotes. Has anyone had any similar experience?
d
@farid_akhavan Is it possible that
meltano
should be uppercase
MELTANO
? My hunch is that without quotes, it's treated as case insensitive, and with quotes, the case needs to match
And Snowflake uses ALLCAPS for its objects, afaik
f
@douwe_maan You’re right that fixed it. However I get another error now, it’s seems to be the same one I get when I try to load into Postgres. It seems like it creates the table with column`_SMART_SOURCE_BUCKET` being not “nullable” , but when it tries to copy the rows from temp table to the main table, it fails as the column
_SMART_SOURCE_BUCKET
is null. Needless to say that I load the csv file using
tap-spreadsheets-anywhere
from a local directory so not sure if the source bucket field even should apply
d
That corresponds to the
path
you specified in your config for the table in question
Can you share your config for the tap?
f
I can send the whole
meltano.yml
file if you want but this is the part related to extractor tap:
Copy code
- name: tap-spreadsheets-anywhere
    pip_url: git+<https://github.com/ets/tap-spreadsheets-anywhere.git>
    config:
      tables:
      - path: file:///Users/farid/WorkSpace/meltano-projects/meltano-projects/poc_local_to_postgres/input_directory
        name: customer_s3_uploads
        pattern: .*\.csv
        start_date: '2017-05-01T00:00:00Z'
        key_properties: []
        format: csv
d
Hmm 😕 If you run
meltano elt
in debug mode (https://meltano.com/docs/command-line-interface.html#debugging), is
_smart_source_bucket
actually unset in the
RECORD
messages (meaning the tap is at fault), or is it there, but getting lost somewhere in target-snowflake?
f
I think these are the record messages right? They seem to only contain what’s in the csv…
d
Interesting! That's a bug in the tap then, because at a glance it looks like https://github.com/ets/tap-spreadsheets-anywhere/blob/cc03db106f4fe81fef7696c0743022af5e511a65/tap_spreadsheets_anywhere/file_utils.py#L38 inserts that metadata into
record
, just like the metadata _schem_a is inserted into the record schema here: https://github.com/ets/tap-spreadsheets-anywhere/blob/c92a153a85551b47e715e9348b02bca79d90bca0/tap_spreadsheets_anywhere/__init__.py#L64. But, that's not actually what
transformer.transform
does with its
metadata
argument: https://github.com/singer-io/singer-python/blob/2f618c6194f154be7ba37f288d456f6abd4d78ec/singer/transform.py#L148, which is only used to find out if a specific column should be selected or not: https://github.com/singer-io/singer-python/blob/2f618c6194f154be7ba37f288d456f6abd4d78ec/singer/transform.py#L128
So the tap is currently claiming that each row will have non-null
_smart_<etc>
fields, but they never actually end up in the
RECORD
messages
@farid_akhavan Can you please file an issue for that on that repo?
To fix, we should merge
metadata
into
record
again, and the actual stream-specific metadata that
transform.transform
wants to determine which fields to include should be coming from the
catalog
here: https://github.com/ets/tap-spreadsheets-anywhere/blob/2df74eb81ce8eee555b7a01de4813c8f48de4e2b/tap_spreadsheets_anywhere/__init__.py#L94 and passed to
write_file
here: https://github.com/ets/tap-spreadsheets-anywhere/blob/2df74eb81ce8eee555b7a01de4813c8f48de4e2b/tap_spreadsheets_anywhere/__init__.py#L114
f
Sure thing man, on the tap-spreadsheets-anywhere, right?
d
yessir
If you just want to get past this in your local project, the easiest option is to change https://github.com/ets/tap-spreadsheets-anywhere/blob/c92a153a85551b47e715e9348b02bca79d90bca0/tap_spreadsheets_anywhere/file_utils.py#L38 to:
Copy code
transformed_record = transformer.transform({**record, **metadata}, schema)
But that's not a complete fix because because we should also be passing the real stream metadata to
transformer.transform
. It's a start though!
f
I raised the issue. I think ideally the transform method should change to add the metadata into the record, right? @douwe_maan
d
@farid_akhavan No, the
transform
method is working correctly, but it's talking about a different type of metadata. There are metadata columns that the tap outputs (the ones causing the issue), and there's metadata about streams and properties that are fed to the tap using a catalog file, so that you can say "I want this property, but not that one".
transform
deals with the latter type, but the tap's currently implementation thinks it means the former, which is the bug
I'll have a look at the issue to see if I've got anything to add
m
It's interesting that
transform
does not break for us. We're using this tap in production now and have never had an issue with it yet. Does it have to do with the source by chance? We haven't tried it with local files, but we are using it to ingest GCS csv data just fine.
d
@michael_cooper Are you still seeing the
_smart_*
fields being populated in the destination? You may be using a target that doesn't care that the fields were advertised as non-nullable, but are omitted from the
RECORD
s anyway
The problem is that before moving to the transformer, the metadata was getting merged into the record: https://github.com/ets/tap-spreadsheets-anywhere/commit/a59fdfd63c86cefb6abfb767f7d1cb216be65c6c#diff-7f8274d0c49665bb13a[…]a352f18890c7d2532c32cL35, and now it is no longer, since that's not what
transformer.transform
uses its
metadata
argument for
m
Those are fields in the destination, though all of them are null. This is
target-bigquery
for reference.
d
All right, I've noticed before that
target-bigquery
does not (always?) listen to the tap when it comes to nullability
m
Also, thank you for pointing that out in regards to
transform
. I didn't realize that the
metadata
argument worked that way.
d
So the bug is that the SCHEMA still advertises those
_smart_
fields as non-nullable, but they actually no longer show up in the `RECORD`s. target-bigquery not caring is a happy coincidence for you 🙂
yeah it's not quite obvious that it uses
metadata
in a totally different way
f
@nick_hamlin I just saw your message in previous threads and any chance you think you had the same issue as mine above?
n
Sounds like there might be some similar symptoms @farid_akhavan, but I don’t suspect the same root cause. I’m using
pipelinewise-target-redshift
, not
target-bigquery
f
@nick_hamlin The problem seems to be inside the
tap-spreadsheets-anywhere
not
target-bigquery
. Any chance you’ve been using that tap to extract from local files?
n
Unfortunately no - closest I’ve come to that is a little poking around
tap-google-sheets
, but haven’t gotten through setting up my credentials there yet
I’ve mostly been using
tap-mysql
and
tap-zendesk
so far, both of which had a few quirks to getting set up but seem to be running just fine now that I’ve ironed them out
f
Interesting, it probably has been related to something else then. CC: @douwe_maan
d
@farid_akhavan Sorry, what specific issues are we talking about now? 😄