Hi all, I started up an instance of Snowflake and ...
# troubleshooting
j
Hi all, I started up an instance of Snowflake and have been attempting to extract from a local MSSQL DB via `tap-mssql --buzzcutnorman`and load it via Meltano's
target-snowflake
. Meltano was able to create the tables in my Warehouse, but when it tries to load data, it fails out. I have previously been testrunning these ingestion activities via the same mssql tap and the
target-jsonl
package and was able to execute without issue into JSONL. Details in 🧵 !
Attached SS is the tables that were created, as well as I can see .json.gz files being created in the root folder. Log file included.
Copy code
plugins:
  extractors:
  - name: tap-mssql
    variant: buzzcutnorman
    pip_url: git+<https://github.com/BuzzCutNorman/tap-mssql.git>
    config:
      dialect: mssql
      driver_type: pyodbc
      host: 127.0.0.1,4433
      port: 1433
      user: abc
      database: xyz
      sqlalchemy_eng_params:
        fast_executemany: 'True'
      sqlalchemy_url_query:
        driver: ODBC Driver 17 for SQL Server
        TrustServerCertificate: yes
  - name: tap-mssql-content
    inherit_from: tap-mssql
    select:
    - Content-ResourceUpdate.*
    - Content-*LocationUpdate.*
  loaders:
  - name: target-snowflake
    variant: meltanolabs
    pip_url: meltanolabs-target-snowflake
    config:
      account: abc
      add_record_metadata: true
      database: TS
      user: MELTANO_USER
      role: MELTANO_ROLE
      warehouse: TS_Warehouse
      default_target_schema: ${MELTANO_EXTRACT__LOAD_SCHEMA} # Default Schema that Meltano, can be adjusted
At the end of the log file I am seeing a lot of
Copy code
2023-07-12T16:02:21.858773Z [info     ] sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 253006: 253006: File doesn't exist: ['C:GitHubETLRedesignmeltano-projectMeltanoV2\target-snowflake--Content-ResourceLocationUpdate-c1b6705a-fbbc-4965-ae9c-3cf50ee17296-1.json.gz'] cmd_type=elb 
consumer=True name=target-snowflake producer=False stdio=stderr string_id=target-snowflake
2023-07-12T16:02:21.859776Z [info     ] [SQL: put '<file://C>:\GitHub\ETLRedesign\meltano-project\MeltanoV2\target-snowflake--Content-ResourceLocationUpdate-c1b6705a-fbbc-4965-ae9c-3cf50ee17296-1.json.gz' '@~/target-snowflake/Content-ResourceLocationUpdate-e3a47dbf-2469-4bda-a986-9c91e1cd1b79'] cmd_type=elb consumer=True name=target-snowflake producer=False stdio=stderr string_id=target-snowflake
2023-07-12T16:02:21.860774Z [info     ] (Background on this error at: <https://sqlalche.me/e/14/f405>) cmd_type=elb consumer=True name=target-snowflake producer=False stdio=stderr string_id=target-snowflake
2023-07-12T16:02:21.861775Z [info     ] """
But I see the JSON.GZ files being created. So I am not too sure how to proceed from here
t
oh interesting… I wonder if that’s a windows thing? I’m surprised a little that it’s writing gzip files out into the main directory. @visch have you seen this? cc @pat_nadolny since it’s using our target-snowflake
v
I haven't seen this myself, I don't use target snowflake though so I"m not sure. There's two things that pop out at me from the log 1.
C:GitHubETLRedesignmeltano-projectMeltanoV2
doesn't have slashes which seems like it could be the issue here 2. Windows can complain if file paths are really really long (I think it's #1 though)
j
For completion's sake I am executing
melatno run tap-mssql-content target-snowflake
v
Copy code
SQL: put '<file://C>:\GitHub\ETLRedesign\meltano-project\MeltanoV2\target-snowflake--Content-ResourceLocationUpdate-c1b6705a-fbbc-4965-ae9c-3cf50ee17296-1.json.gz' '@~/target-snowflake/Content-ResourceLocationUpdate-e3a47dbf-2469-4bda-a986-9c91e1cd1b79']
Makes me question it being #1 though
u
Yeah it looks a lot like a windows path issue. FWIW the target loads data to snowflake by writing a batch of jsonl data to a compressed file locally, then uploads the file to a snowflake internal stage, then copies it from the stage into the table.
v
I will say most of the time I haven't had pathing issues with python libraries things tend to "just work" on windows so take that fwiw!
j
The files are being created no problem. it's just that the slashes missing again when trying to load them up (like Derek mentioned) that are suspect to me but I am not sure if that's a me thing with the filepath being a tad long (?) or if there is something else at play
t
seems like some weird name conforming on the file. I’m poking through the code but nothing’s jumping out at me
v
If there's e2e tests on the repo maybe add in Windows to the matrix for
target-snowflake
and see if you get a replicated error?
b
Looking at the code maybe try adding add a
batch_config
to
tap-snowflake
. Here is an example one to use as a template.
Copy code
loaders:
  - name: target-snowflake
    variant: meltanolabs
    pip_url: meltanolabs-target-snowflake
    config:
      account: abc
      add_record_metadata: true
      database: TS
      user: MELTANO_USER
      role: MELTANO_ROLE
      warehouse: TS_Warehouse
      default_target_schema: ${MELTANO_EXTRACT__LOAD_SCHEMA} # Default Schema that Meltano, can be adjusted
      batch_config:
		encoding:
			format: jsonl
			compression: gzip
		storage:
			root: "<file://c>://GitHub/ETLRedesign/batches"
j
message has been deleted
b = \x08 ... weird escaping somewhere?
p
@BuzzCutNorman I think you're on to something. I was trying to step through this and I cant find where its adding the root to the file path but where ever thats happening is where its prepending the broken part of the path
'C:GitHubETLRedesignmeltano-projectMeltanoV2
. The root input default for the target is current working directory so its generating that part automatically somewhere
b
wait I am a dork you don't need the
"
in around the
root:
location
j
Same error with or without the
"
😞
b
ok give this a try
root: C:\GitHub\ETLRedesign\batches
I just tried this locally with a tap and the file path
c:\development\projects\batches
and it worked. Since
target-snowflake
is leveraging the same functions maybe it will work.
j
If it's any help at all, I have also been trying to escape the slashes in the config file with and without
'
and
"
and still returning the same error code with the
\b
being represented as
\x08
(e.g.)`\x08atches\target-snowflake`
Copy code
sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 253006: 253006: File doesn't exist: ['c:GitHubETLRedesign\x08atches\target-snowflake--Content-ResourceUpdate-1dd772ee-39e5-4eca-a5da-b2086dca122a-1.json.gz'
Trying the new root is still giving me the same problem. Is it maybe a python specific version issue? I am currently on 3.10
message has been deleted
b
so
\b
is a backspace so maybe change to
root: C:\GitHub\ETLRedesign\snowballs
j
Something hungry keeps eating up all my slashes but kept the colon haha
u
Have you tried double slashes in there? I have no idea whats going on but its worth a try
j
Variations I have attempted includes
\/
,
\\
. In every case it produces the files but then somewhere eats the slashes and can't find the file anymore
c:GitHubETLRedesign\x08atches\target-snowflake--Content
root: "<file://c>://GitHub//ETLRedesign//batches"
root: "<file://c>:\\GitHub\\ETLRedesign\\batches"
root: "<file://c>:\/GitHub\/ETLRedesign\/batches"
root: c:\\GitHub\\ETLRedesign\\batches
root: "c:\\GitHub\\ETLRedesign\\batches"
u
root: <file://C>:/GitHub/ETLRedesign/batches
can you see what that says?
j
fs.opener.errors.UnsupportedProtocol: protocol '<file' is not supported
u
are you including the
<
or is it adding it automatically? I have
"root": "<file://C>:/GitHub/ETLRedesign/batches"
in my config and its just building the wrong file path for me but the slashes are correct so I thought maybe that would help
j
message has been deleted
That's what I got so far 😞 won't build the files now with that
p
oh no, I'm sorry I just realized a slack plugin that we use is manipulating my messages and adding those
<
I'm meaning to say
"root": "<file://C>:/GitHub/ETLRedesign/batches"
without
<
j
No worries! The files are being created again but now we're seeing the usual
sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 253006: 253006: File doesn't exist: ['C:GitHubETLRedesign\x08atches\target-snowflake--Content-ResourceUpdate-1825b227-8fde-4f64-b8ef-35321edac790-1.json.gz']
message has been deleted
p
Unfortunately I'm stumped and I dont have a windows machine to test on. I created https://github.com/MeltanoLabs/target-snowflake/issues/87 to log the issue
v
https://github.com/MeltanoLabs/target-snowflake/pull/89 fixes this Ended up being the slash here was being removed by
sqlalchemy.Text
I used a bound parameter instead and it works. Not sure if we want to expand bound parameters to entire target or not but this works!
Copy code
C:GitHubETLRedesignmeltano-projectMeltanoV2\target-snowflake--Content-ResourceLocationUpdate-c1b6705a-fbbc-4965-ae9c-3cf50ee17296-1.json.gz
u
Following Derek's PR we put out a new release https://meltano.slack.com/archives/C01R0V100KB/p1689267304726929?thread_ts=1689267031.215679&amp;cid=C01R0V100KB. @joshua_janicas let us know if you have a chance to reinstall and tests. Run
meltano install loader target-snowflake --clean
to pull in the latest, assuming you havent pinned your pip_url in your meltano.yml (obvious update that first if you do have it pinned).
j
I can confirm that I was able to load ok!