Whenever I run my tap to target-snowflake, I get a...
# troubleshooting
i
Whenever I run my tap to target-snowflake, I get all these files in my project root. What's causing this behavior?
j
With Snowflake if you look at the logs you will see that the files are generated and sent to a temp Internal Stage once completed a COPY FROM that stage is conducted. This is optimal for loading data into Snowflake. Where I see this goes wrong is that the files stay locally unless you are using batch were the Files are removed by setting: clean_up_batch_files to true.
👍 1
I also think that you should be able to select a Stage to use if you want to keep the batches for Data Lake or recovery purposes.
i
See this is where I'm confused, though. If the
clean_up_batch_files
setting defaults to true, shouldn't it be cleaning up those batch files?
And I also set it to true in my config to double check
j
Only if you are using Batching, the default insert behavior is not Batch but bulk, they act similar due to how bulk inserts work within Snowflake.
Bulk Insert you will see no clean up within this method Batch Insert you will see a clean up block within this method This should be an easy fix, I may put in a pr for it in a couple weeks when I have time. I broke that cardinal rule and shimmed it for my use case.
i
So is the fix to use batching? How do I use batching instead?
Do incremental replications (like when there's a state replication_key that it's using) bulk insert by default?
j
No those are two different things, for batching you need to add
batch_config
--> docs
I would take sometime to familiarize yourself with Meltano SDK and dig into the tap / tgt repos, to see how they work. The docs on the Tap/Tgt are light leaving you to fill in a number of gaps.
In yaml as example
🙌 1
i
So this is only achievable once I've set a state backend, correct?
j
nope
State and batching are two separate things, meaning you do not need state to batch, or do incremental etc. Batching is more like pagination in that it will chunk up the stream into more consumable chunks
i
Alright that makes sense. What I was referring to is if batching needed to be pointed at a storage account similar to state backend when it's deployed to a prod environment. So will I just add that batch_config to my target config?
And could that storage root be pointed at a blob storage container?
j
Yes, I would dig into the SDK docs as the meltano docs and tap/tgt docs are light on this. In order to get S3 to work you have to it as a dep to the
pip_url
for S3 I am not sure if it is using Boto3 or something else.
🙌 1
i
Gotcha - I'll take a look, and if I just set the batch_config in my Snowflake target config (Without pointing them to blob store) the batch files will land and be cleaned up from
$MELTANO_PROJECT_ROOT
as they are right now?
j
tgt default: If you did not change storage.root then yes as
file://
is relative
i
thank you very much
np 1
j
https://github.com/MeltanoLabs/target-snowflake/issues/95 not sure if this is relevant at all because i was encountering the same problem. i never saw this problem when working within my linux container
j
possibly, I am going to be running it in docker shortly, I will pass on my results
i
Interesting. So I guess use a linux distro for my container? lol
j
I use Debian (?) Without much issue
I think it had to do with how the file path is being constructed
And it differs in windows vs linux
i
Interesting - thank you!