janis_puris
06/26/2023, 9:06 PMtap-oracle target-snowflake
I was thinking of modifying the tap-oracle
to basically run up to N amount of records and call it done, then have the job run in loop until it no longer hits the set "max/limit".
---
Currently to my understanding, the tap-oracle
will do its best to download all of 300M rows at once, then serialise into JSONL and pass it to the target-snowflake
that supports batching i.e. 100k rows (S3 -> Stage -> MERGE INTO) until done.
The problem with this is that
1. It takes a long time to download the data from the tap
a. If this fails, one needs to do it all again
2. Compute/time to serialise the 300M record messages into JSONL
3. It would take a long time to then load the data into target
a. If any batch fails for whatever reason, again.. we need to do everything from beginning, because state is recorded only after a successful job (all or nothing)
---
Questions
1. What do you think of the proposed approach?
2. Is this tried and failed method?
3. There are some talks from late last year about SDK supporting some sort of "batch" mode. Is this something similar to what Stitch Data has been doing for long time or something different? Is there any info on how this may look like for the taps/targets created by SDK?
P.S. I can always transfer the initial "load" manually and then hack the meltano state, but I very much would prefer not to need to figure out how to do this with Oracle and batch it in a way Snowflake will spend massive amounts of compute time to digest that.luke_rodgers
06/26/2023, 10:52 PMkeshav_narayan_balot
06/27/2023, 6:16 AMjanis_puris
06/27/2023, 6:55 AMjanis_puris
06/27/2023, 1:20 PMtap-oracle
will only produce as many rows as the target-snowflake
is able to take in.
As soon as the target
reaches the set the limits, it will proceed to
• write to /tmp a "csv.gz" file
• upload said file to S3 and remove from local fs
• issue a MERGE INTO ... FROM @STAGE/...
command, which then loads the file
• removes the files from S3
It works just the way I want it!pat_nadolny
06/27/2023, 6:56 PMjanis_puris
06/27/2023, 7:18 PMtarget-snowflake
.
Very exciting!