Hi everyone, I'm developing a tap that uses the `b...
# singer-targets
j
Hi everyone, I'm developing a tap that uses the
batch
feature to write to
S3
. Currently, it looks like the only encoding that is supported is jsonl (looking at the
get_batches
method in core.py). I want to use the snowflake/duckdb targets to load from the
S3
files. Is it reasonable to assume that I need to add a
CSVEncoding
and update
get_batches
in the sdk?
a
I believe duckdb can also load from json, but it looks like it is perhaps not as robust as CSV/Parquet.
To your question though, yes, I think that's a valid approach and we'd welcome a PR to add that functionality. One of the challenges though, is the fact that CSV has so many diverse dialects.
The hard part would actually be setting a config dialect that can support requirements of various platforms. https://duckdb.org/docs/data/csv#parameters
j
Hmm, with that in mind I wonder if making
ParquetEncoding
would be simpler, or if I should just stick with json for now and create a Snowflake stage to load json
a
Parquet encoding could be a good path forward. Also has the benefit of being self-describing in terms of column names and data types.
Parquet has a benefit of being basically zero config, like jsonl.
j
Sweet, when I have some free time I'll create an Issue. Thanks, AJ!