Hello everyone. I encountered a problem with melta...
# troubleshooting
k
Hello everyone. I encountered a problem with meltano skipping some values ​​from the source. Out of 200 million rows, 51 rows were skipped There is no specific pattern, it is not entirely clear why it does this. I am transferring data by CDC via bin log from mysql to postgresql. There were no errors during the process. First, I took a full snapshot of the full_table table using the
meltano run tap-mysql target-postgres
command. After the process has successfully completed, I re-run the process using the
meltano run tap-mysql target-postgres
command and load the increment. My tap and target configs:
Copy code
plugins:
  extractors:
  - name: tap-mysql
    variant: transferwise
    pip_url: git+<https://github.com/edgarrmondragon/pipelinewise-tap-mysql.git@patch-1>
    config:
      database: ***
      engine: mysql
      session_sqls:
      - SET @@session.max_execution_time=0     # No limit
      - SET @@session.time_zone='+0:00'
      - SET @@session.wait_timeout=86400
      - SET @@session.net_read_timeout=86400
      - SET @@session.net_write_timeout=86400
      - SET @@session.innodb_lock_wait_timeout=3600
    select:
    - schema-table.*
    metadata:
      '*':
        replication-method: LOG_BASED

  loaders:
  - name: target-postgres
    variant: meltanolabs
    pip_url: meltanolabs-target-postgres
    config:
      batch_size_rows: 50000
      hard_delete: true
      load_method: upsert
      use_copy: true
      validate_records: true
      sanitize_null_text_characters: true
e
I wonder if there's a time window that is missed between the full snapshot and the CDC sync