Whenever I run my tap to target snowflake I get all these fi Meltano #troubleshooting

Join Slack

Whenever I run my tap to target-snowflake, I get a...

# troubleshooting

Ian OLeary

03/08/2024, 2:23 PM

Whenever I run my tap to target-snowflake, I get all these files in my project root. What's causing this behavior?

John Archer

03/08/2024, 7:09 PM

With Snowflake if you look at the logs you will see that the files are generated and sent to a temp Internal Stage once completed a COPY FROM that stage is conducted. This is optimal for loading data into Snowflake. Where I see this goes wrong is that the files stay locally unless you are using batch were the Files are removed by setting: clean_up_batch_files to true.

👍 1

John Archer

03/08/2024, 7:10 PM

I also think that you should be able to select a Stage to use if you want to keep the batches for Data Lake or recovery purposes.

Ian OLeary

03/11/2024, 2:21 PM

See this is where I'm confused, though. If the

clean_up_batch_files

setting defaults to true, shouldn't it be cleaning up those batch files?

Ian OLeary

03/11/2024, 2:21 PM

And I also set it to true in my config to double check

John Archer

03/11/2024, 2:23 PM

Only if you are using Batching, the default insert behavior is not Batch but bulk, they act similar due to how bulk inserts work within Snowflake.

John Archer

03/11/2024, 2:30 PM

Bulk Insert you will see no clean up within this method Batch Insert you will see a clean up block within this method This should be an easy fix, I may put in a pr for it in a couple weeks when I have time. I broke that cardinal rule and shimmed it for my use case.

Ian OLeary

03/11/2024, 2:46 PM

So is the fix to use batching? How do I use batching instead?

Ian OLeary

03/11/2024, 2:47 PM

Do incremental replications (like when there's a state replication_key that it's using) bulk insert by default?

John Archer

03/11/2024, 2:50 PM

No those are two different things, for batching you need to add

batch_config

--> docs

John Archer

03/11/2024, 2:52 PM

I would take sometime to familiarize yourself with Meltano SDK and dig into the tap / tgt repos, to see how they work. The docs on the Tap/Tgt are light leaving you to fill in a number of gaps.

John Archer

03/11/2024, 2:54 PM

In yaml as example

🙌 1

Ian OLeary

03/11/2024, 2:55 PM

So this is only achievable once I've set a state backend, correct?

John Archer

03/11/2024, 2:55 PM

nope

John Archer

03/11/2024, 2:57 PM

State and batching are two separate things, meaning you do not need state to batch, or do incremental etc. Batching is more like pagination in that it will chunk up the stream into more consumable chunks

Ian OLeary

03/11/2024, 3:30 PM

Alright that makes sense. What I was referring to is if batching needed to be pointed at a storage account similar to state backend when it's deployed to a prod environment. So will I just add that batch_config to my target config?

Ian OLeary

03/11/2024, 3:30 PM

And could that storage root be pointed at a blob storage container?

John Archer

03/11/2024, 3:33 PM

Yes, I would dig into the SDK docs as the meltano docs and tap/tgt docs are light on this. In order to get S3 to work you have to it as a dep to the

pip_url

for S3 I am not sure if it is using Boto3 or something else.

🙌 1

Ian OLeary

03/11/2024, 3:37 PM

Gotcha - I'll take a look, and if I just set the batch_config in my Snowflake target config (Without pointing them to blob store) the batch files will land and be cleaned up from

$MELTANO_PROJECT_ROOT

as they are right now?

John Archer

03/11/2024, 3:39 PM

tgt default: If you did not change storage.root then yes as

file://

is relative

Ian OLeary

03/11/2024, 3:40 PM

thank you very much

np 1

joshua_janicas

03/12/2024, 9:30 PM

https://github.com/MeltanoLabs/target-snowflake/issues/95 not sure if this is relevant at all because i was encountering the same problem. i never saw this problem when working within my linux container

John Archer

03/12/2024, 9:32 PM

possibly, I am going to be running it in docker shortly, I will pass on my results

Ian OLeary

03/13/2024, 2:37 PM

Interesting. So I guess use a linux distro for my container? lol

joshua_janicas

03/13/2024, 2:37 PM

I use Debian (?) Without much issue

joshua_janicas

03/13/2024, 2:38 PM

I think it had to do with how the file path is being constructed

joshua_janicas

03/13/2024, 2:38 PM

And it differs in windows vs linux

Ian OLeary

03/13/2024, 2:40 PM

Interesting - thank you!

2 Views

Open in Slack

Previous Next