Meltano

Hi, we are ingesting data in gzip-compressed csv files from s3/minio into Clickhouse. Clickhouse already can ingest this data natively very quickly, if it knows the location of the file in S3/minio, but I would like to manage this process using meltano. How should I structure a pipeline to do this :

1. extract the filenames/paths from minio/s3
2. issue SQL to Clickhouse to pull the files directly from minio using the filenames/paths?
I would like to use meltano for managing state for this process, I.e. (which files that have been processed), for ordering the source files by timestamp, for triggering dbt jobs after ingestion and also for scheduling.

The standard way of pulling a large number of records through singer tap is very slow by comparison.

Hi Andy!

This is probably a good use case for <https://sdk.meltano.com/en/latest/batch.html|Batch messages>, but there's missing pieces for that to work in your use case:
• SDK support for CSV as a batch file format: <https://github.com/meltano/sdk/issues/1584>
• <https://github.com/shaped-ai/target-clickhouse|target-clickhouse> support for processing the messages, similar to target-snowflake's handling of JSONL batch files: <https://github.com/MeltanoLabs/target-snowflake/blob/3747a09195c24a6552f031a169405371a7ff1139/target_snowflake/sinks.py#L219-L242>