Kuanysh Zhaksylyk
01/17/2025, 12:04 PMtap-mysql
(PipelineWise variant) and target-postgresql
.(Variant Meltano). The ELT process is based on CDC using the binlog. Currently, the process only considers the primary key.
Is it possible to specify which column should be indexed when the process starts? If not, I’d like to know how the ELT process will behave if an index is created on an existing column in PostgreSQL after loading. Will this break the CDC process? Could it lead to a deviation in the binlog?
I’m running MySQL 5.7 and PostgreSQL 13. I was considering creating a concurrent index on the updated_at
column. What do you think?
I’m interested in any opinions, any strategies
My tap and target configs:
plugins:
extractors:
- name: tap-mysql
variant: transferwise
pip_url: git+<https://github.com/edgarrmondragon/pipelinewise-tap-mysql.git@patch-1>
config:
database: ***
engine: mysql
session_sqls:
- SET @@session.time_zone='+0:00'
- SET @@session.wait_timeout=86400
- SET @@session.net_read_timeout=86400
- SET @@session.innodb_lock_wait_timeout=3600
select:
- schema-table.*
metadata:
'*':
replication-method: LOG_BASED
loaders:
- name: target-postgres
variant: meltanolabs
pip_url: meltanolabs-target-postgres
config:
batch_size_rows: 50000
hard_delete: true
load_method: upsert
use_copy: true
validate_records: true
sanitize_null_text_characters: true