Hi folks now that I ve got a first cut of a Clickhouse Loade Meltano #singer-target-development

Hi folks, now that I've got a first cut of a Click...

ben_theunissen

06/14/2023, 11:30 AM

Hi folks, now that I've got a first cut of a Clickhouse Loader working with full table replication, I want to start knocking off the more exotic features such as key-based incremental replication. As I am able to use the

SQLSink

base class with SQLAlchemy, I am hoping to find the current "best example" of a Meltano SDK Target that is supporting key-based replication, and if there are any unit tests for key-based incremental replication that I could pull in from the Singer SDK. Ive taken a look at the Meltano Target Postgres but I am not sure if that currently supported key-based replication from looking at the code.

visch

06/14/2023, 1:15 PM

You have a couple foundational things missing here in your understanding of Taps I think, or maybe I"m just missing the question. Singer works like this

tap | target

, records are sent over stdout to the target. The target doesn't know about anything but the SCHEMA message, RECORD message, and STATE message. Which means you don't have to build special handling in the target for "key based replication", that's called "INCREMENTAL" streams which is a tap concern, not a target concern (or at least it shouldn't be a target concern, I've created a target that only handles full table loads which is a seperate issue) https://github.com/MeltanoLabs/target-postgres is a good example I think, note there's a PR open that's refactoring to using SQL Alchemy for more stuff that you should just steal from imo (eventually this stuff will make it to the SDK I hope!)

ben_theunissen

06/14/2023, 1:24 PM

I agree to some degree that this is a tap concern to disambiguate the upstream record changes but my main concern is that Clickhouse is an columnar data warehouse, so I want to make sure a merge upsert on repeated records can be optimized as much as possible for the underlying DB engine, and I'm having trouble tracing through the SQLSink to see how a full-table replication scenario exercises the Sink code paths differently from the incremental scenario. Really what I'm looking for is a line between each target capability, and the SDK methods I'll be best overriding to provide the functionality. I'll take a look at the PR on the Target Postgres as that appears to have more of the customization I am looking for

visch

06/14/2023, 1:25 PM

Ok so the question is more about optimization? If that's the case then I'd peek at https://github.com/z3z1ma/target-bigquery

pat_nadolny

06/15/2023, 11:22 AM

I’d also add that we’ve been putting a lot of work into meltano labs target-snowflake in the last couple weeks so that could be a good example too. Specifically the testing framework

Open in Slack

Previous Next