Hi everyone I m developing a tap that uses the `batch` featu Meltano #singer-targets

Hi everyone, I'm developing a tap that uses the `b...

jam

11/07/2022, 5:30 PM

Hi everyone, I'm developing a tap that uses the

batch

feature to write to

S3

. Currently, it looks like the only encoding that is supported is jsonl (looking at the

get_batches

method in core.py). I want to use the snowflake/duckdb targets to load from the

S3

files. Is it reasonable to assume that I need to add a

CSVEncoding

and update

get_batches

in the sdk?

aaronsteers

11/07/2022, 5:54 PM

I believe duckdb can also load from json, but it looks like it is perhaps not as robust as CSV/Parquet.

aaronsteers

11/07/2022, 5:55 PM

To your question though, yes, I think that's a valid approach and we'd welcome a PR to add that functionality. One of the challenges though, is the fact that CSV has so many diverse dialects.

aaronsteers

11/07/2022, 5:56 PM

The hard part would actually be setting a config dialect that can support requirements of various platforms. https://duckdb.org/docs/data/csv#parameters

jam

11/07/2022, 6:30 PM

Hmm, with that in mind I wonder if making

ParquetEncoding

would be simpler, or if I should just stick with json for now and create a Snowflake stage to load json

aaronsteers

11/07/2022, 6:30 PM

Parquet encoding could be a good path forward. Also has the benefit of being self-describing in terms of column names and data types.

aaronsteers

11/07/2022, 6:32 PM

Parquet has a benefit of being basically zero config, like jsonl.

jam

11/07/2022, 6:36 PM

Sweet, when I have some free time I'll create an Issue. Thanks, AJ!

Open in Slack

Previous Next