I am using LOG_BASED replication to ingest data in...
# best-practices
s
I am using LOG_BASED replication to ingest data into clickhouse. One thing I notice was that only new changes are ingested to clickhouse such as insert/update/delete after setting up replication. To bring already existing data to do i need to first use FULL_TABLE replication ?
p
Log based reads the logs to push only changes by design, maybe you could run an inital full table replication and changes right after that, though it can be though to find the exact point at which you took the snapshot to start the CDC from that point forward
s
@pablo_seibelt, do you think running with --full-refresh might do the trick ?
p
Only if your binary logs have the whole history of your dataset, which i doubt, most likely you have some retention policy that has just a fraction of it
Even tho i love Meltano, i think for database replication something like Debezium is much more mature for this purpose; which does the full load and immediately starts replicating from the logs, not missing any change in the middle
s
Yup metlano failed to live up to my expectations Things don't work as expected most of the times
p
Meltano is amazing for replicating data from external services (e.g. APIs), i don't think database CDC replication is the best usecase, specially if you have a big database
s
I would definitely agree that not a good tool for database replication it have definitely given me alot of pain.