Hi folks, now that I've got a first cut of a Click...
# singer-target-development
b
Hi folks, now that I've got a first cut of a Clickhouse Loader working with full table replication, I want to start knocking off the more exotic features such as key-based incremental replication. As I am able to use the
SQLSink
base class with SQLAlchemy, I am hoping to find the current "best example" of a Meltano SDK Target that is supporting key-based replication, and if there are any unit tests for key-based incremental replication that I could pull in from the Singer SDK. Ive taken a look at the Meltano Target Postgres but I am not sure if that currently supported key-based replication from looking at the code.
v
You have a couple foundational things missing here in your understanding of Taps I think, or maybe I"m just missing the question. Singer works like this
tap | target
, records are sent over stdout to the target. The target doesn't know about anything but the SCHEMA message, RECORD message, and STATE message. Which means you don't have to build special handling in the target for "key based replication", that's called "INCREMENTAL" streams which is a tap concern, not a target concern (or at least it shouldn't be a target concern, I've created a target that only handles full table loads which is a seperate issue) https://github.com/MeltanoLabs/target-postgres is a good example I think, note there's a PR open that's refactoring to using SQL Alchemy for more stuff that you should just steal from imo (eventually this stuff will make it to the SDK I hope!)
b
I agree to some degree that this is a tap concern to disambiguate the upstream record changes but my main concern is that Clickhouse is an columnar data warehouse, so I want to make sure a merge upsert on repeated records can be optimized as much as possible for the underlying DB engine, and I'm having trouble tracing through the SQLSink to see how a full-table replication scenario exercises the Sink code paths differently from the incremental scenario. Really what I'm looking for is a line between each target capability, and the SDK methods I'll be best overriding to provide the functionality. I'll take a look at the PR on the Target Postgres as that appears to have more of the customization I am looking for
v
Ok so the question is more about optimization? If that's the case then I'd peek at https://github.com/z3z1ma/target-bigquery
p
I’d also add that we’ve been putting a lot of work into meltano labs target-snowflake in the last couple weeks so that could be a good example too. Specifically the testing framework