Not sure of best place to ask this question, becau...
# troubleshooting
c
Not sure of best place to ask this question, because it stems from troubleshooting a toy example inspired by modern data stack. So happy to move this elsewhere. How could I improve performance of a similar meltano pipeline to load a CSV of ~1M records into a duckdb database? In my pipeline, I used Meltano variant of tap-csv, and jwills target-duckdb. Pipeline would take ~2hrs to complete. Using duckdb directly to import the same CSV would take no where near as long (seconds to minutes).
t
This is likely b/c the way the Singer taps work is they serialize each row into a newline delimited JSON record. Since this is based on our SDK we should be able to support BATCH messages which would dramatically speed it up. I opened an issue on that https://github.com/MeltanoLabs/tap-csv/issues/177 batch message docs --> https://sdk.meltano.com/en/latest/batch.html