Hello everyone. I hope you are having a fantastic ...
# singer-targets
b
Hello everyone. I hope you are having a fantastic day! I'm a new user of the tool, and having difficulties while trying to build a pipeline: We’re mirroring data from MS SQL Server to PostgreSQL using
tap-mssql
target-postgres
. A table with 1.1B rows and 50+ columns (
detalle_prod_emitidas
) is causing severe performance issues during upserts or incremental syncs. Here is the execution history: Attempt Method Duration Result 1 append-only 10 days Initial load (1.1B rows, 10k rows batches) 2 upsert 6 days Cancelled (timeout, 50M new rows, 10k rows batches) 3 append-only 4+ days Cancelled (timeout, 50M new rows, 10k rows batches) 4 upsert 1+ days Ongoing (30M new rows, testing new index in primary key and 100k batches) Questions for the Community 1. Upsert Optimization: ◦ How to configure
target-postgres
for faster merges? ◦ Recommended indexes/PostgreSQL tuning for tables >1B rows? 2. Incremental Extraction: ◦ Best practices for configuring
tap-mssql
to extract only new inserts (no updates) if
detalle_prod_emitidas
is append-only? 3. Alternative Approaches: ◦ Can we combine
append-only
for this table +
upsert
for others? ◦ Would using `COPY`/`pg_bulkload` for this table outside Meltano be advisable? Infrastructure: ◦ Meltano + PostgreSQL: Ubuntu server (same machine), installed on virtualenv, no containers. ◦ MS SQL: Windows Server (separate machine, same site). Here is the YML file config:
Copy code
version: 1
default_environment: dev
project_id: XXXXXXX-d7a5-4bc3-8f98-XXXXX
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-mssql
    variant: wintersrd
    pip_url: tap-mssql
    config:
      cursor_array_size: 100000
      database: db_source
      host: XXXXXX
      user: XXXX
    select:
    - facturacion-cat_clave_prod_serv.*
    - facturacion-cat_clave_unidad.*
    - facturacion-cat_regimen_fiscal.*
    - facturacion-cp33.*
    - facturacion-cp33_det_imp.*
    - facturacion-detalle_prod_emitidas.*
    - facturacion-nc_emitidas.*
  loaders:
  - name: target-postgres
    variant: meltanolabs
    pip_url: meltanolabs-target-postgres
    config:
      batch_size_rows: 100000
      database: dl_main
      default_target_schema: raw_data
      host: localhost
      load_method: upsert
      port: 5500
      user: postgres
A heads snap of the current log:
Copy code
2025-01-25T18:48:43.632638Z [info     ] time=2025-01-25 19:48:43 name=singer level=INFO message=Server Parameters: version: Microsoft SQL Server 2017 (RTM) - 14.0.1000.169 (X64) cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:43.632965Z [info     ]     Aug 22 2017 17:04:49          cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:43.633119Z [info     ]     Copyright (C) 2017 Microsoft Corporation cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:43.633246Z [info     ]     Enterprise Edition (64-bit) on Windows Server 2022 Standard 10.0 <X64> (Build 20348: ) cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:43.633398Z [info     ] , lock_timeout: -1,            cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:43.633546Z [info     ] time=2025-01-25 19:48:43 name=singer level=INFO message=Beginning sync cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:43.842427Z [info     ] time=2025-01-25 19:48:43 name=singer level=INFO message=Preparing Catalog cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:43.944933Z [info     ] 2025-01-25 19:48:43,944 | INFO     | target-postgres      | Target 'target-postgres' is listening for input from tap. cmd_type=elb consumer=True job_name=dev:tap-mssql-to-target-postgres name=target-postgres producer=False run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=target-postgres
2025-01-25T18:48:44.052279Z [info     ] time=2025-01-25 19:48:44 name=singer level=INFO message=Fetching tables cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.121602Z [info     ] time=2025-01-25 19:48:44 name=singer level=INFO message=Tables fetched, fetching columns cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.511912Z [info     ] time=2025-01-25 19:48:44 name=singer level=INFO message=ARRAYSIZE=100000 cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.564381Z [info     ] time=2025-01-25 19:48:44 name=singer level=INFO message=Columns Fetched cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.599467Z [info     ] time=2025-01-25 19:48:44 name=singer level=INFO message=Catalog ready cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.601060Z [info     ] time=2025-01-25 19:48:44 name=singer level=INFO message=selected-by-default: False cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.601220Z [info     ] time=2025-01-25 19:48:44 name=singer level=INFO message=database-name: facturacion cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.601382Z [info     ] time=2025-01-25 19:48:44 name=singer level=INFO message=is-view: False cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.601573Z [info     ] time=2025-01-25 19:48:44 name=singer level=INFO message=table-key-properties: ['id'] cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.601742Z [info     ] time=2025-01-25 19:48:44 name=singer level=INFO message=selected: True cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.601876Z [info     ] time=2025-01-25 19:48:44 name=singer level=INFO message=replication-method: FULL_TABLE cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
And a tails snap:
Copy code
2025-01-28T03:41:39.148292Z [info     ] 2025-01-28 04:41:39,148 | INFO     | target-postgres      | Target sink for 'facturacion-detalle_prod_emitidas' is full. Current size is '100000'. Draining... cmd_type=elb consumer=True job_name=dev:tap-mssql-to-target-postgres name=target-postgres producer=False run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=target-postgres
2025-01-28T03:41:41.015557Z [info     ] 2025-01-28 04:41:41,012 | INFO     | target-postgres.facturacion-detalle_prod_emitidas | Inserting with SQL: INSERT INTO ec70724a_6f93_4609_90e7_371371dbbc70 (id, idfactura, idproducto, idunidad, cantidad, subtotal, descuento, total, importe, "valorUnitario", iva_tra, tasa_iva_tra, iva_ret, tasa_iva_ret, isr_tra, tasa_isr_tra, isr_ret, tasa_isr_ret, ieps_tra, tasa_ieps_tra, ieps_ret, tasa_ieps_ret, estatus, fecha_alta, descripcion, unidad, tipo_factor_isr_ret, tipo_factor_iva_ret, tipo_factor_iva_tra, tipo_factor_ieps_ret, tipo_factor_ieps_tra, objeto_imp, _sdc_extracted_at, _sdc_received_at, _sdc_batched_at, _sdc_deleted_at, _sdc_sequence, _sdc_table_version, _sdc_sync_started_at) VALUES (:id, :idfactura, :idproducto, :idunidad, :cantidad, :subtotal, :descuento, :total, :importe, :valorUnitario, :iva_tra, :tasa_iva_tra, :iva_ret, :tasa_iva_ret, :isr_tra, :tasa_isr_tra, :isr_ret, :tasa_isr_ret, :ieps_tra, :tasa_ieps_tra, :ieps_ret, :tasa_ieps_ret, :estatus, :fecha_alta, :descripcion, :unidad, :tipo_factor_isr_ret, :tipo_factor_iva_ret, :tipo_factor_iva_tra, :tipo_factor_ieps_ret, :tipo_factor_ieps_tra, :objeto_imp, :_sdc_extracted_at, :_sdc_received_at, :_sdc_batched_at, :_sdc_deleted_at, :_sdc_sequence, :_sdc_table_version, :_sdc_sync_started_at) cmd_type=elb consumer=True job_name=dev:tap-mssql-to-target-postgres name=target-postgres producer=False run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=target-postgres
2025-01-28T03:41:56.105391Z [info     ] 2025-01-28 04:41:54,907 | INFO     | singer_sdk.metrics   | METRIC: {"type": "timer", "metric": "batch_processing_time", "value": 15.758918285369873, "tags": {"stream": "facturacion-detalle_prod_emitidas", "pid": 3457125, "status": "succeeded"}} cmd_type=elb consumer=True job_name=dev:tap-mssql-to-target-postgres name=target-postgres producer=False run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=target-postgres
2025-01-28T03:42:23.409094Z [info     ] 2025-01-28 04:42:23,408 | INFO     | singer_sdk.metrics   | METRIC: {"type": "counter", "metric": "record_count", "value": 91050, "tags": {"stream": "facturacion-detalle_prod_emitidas", "pid": 3457125}} cmd_type=elb consumer=True job_name=dev:tap-mssql-to-target-postgres name=target-postgres producer=False run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=target-postgres
2025-01-28T03:42:23.861373Z [info     ] time=2025-01-28 04:42:23 name=singer level=INFO message=METRIC: b'{"type":"counter","metric":"record_count","value":156011,"tags":{"database":"facturacion","table":"detalle_prod_emitidas"}}' cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
Thank you very much in advance! - Balder Huerta.
e
Have you tried target-postgres'
use_copy
option?