Balder Huerta
01/28/2025, 3:49 AMtap-mssql
→ target-postgres
. A table with 1.1B rows and 50+ columns (detalle_prod_emitidas
) is causing severe performance issues during upserts or incremental syncs.
Here is the execution history:
Attempt Method Duration Result
1 append-only 10 days Initial load (1.1B rows, 10k rows batches)
2 upsert 6 days Cancelled (timeout, 50M new rows, 10k rows batches)
3 append-only 4+ days Cancelled (timeout, 50M new rows, 10k rows batches)
4 upsert 1+ days Ongoing (30M new rows, testing new index in primary key and 100k batches)
Questions for the Community
1. Upsert Optimization:
◦ How to configure target-postgres
for faster merges?
◦ Recommended indexes/PostgreSQL tuning for tables >1B rows?
2. Incremental Extraction:
◦ Best practices for configuring tap-mssql
to extract only new inserts (no updates) if detalle_prod_emitidas
is append-only?
3. Alternative Approaches:
◦ Can we combine append-only
for this table + upsert
for others?
◦ Would using `COPY`/`pg_bulkload` for this table outside Meltano be advisable?
Infrastructure:
◦ Meltano + PostgreSQL: Ubuntu server (same machine), installed on virtualenv, no containers.
◦ MS SQL: Windows Server (separate machine, same site).
Here is the YML file config:
version: 1
default_environment: dev
project_id: XXXXXXX-d7a5-4bc3-8f98-XXXXX
environments:
- name: dev
- name: staging
- name: prod
plugins:
extractors:
- name: tap-mssql
variant: wintersrd
pip_url: tap-mssql
config:
cursor_array_size: 100000
database: db_source
host: XXXXXX
user: XXXX
select:
- facturacion-cat_clave_prod_serv.*
- facturacion-cat_clave_unidad.*
- facturacion-cat_regimen_fiscal.*
- facturacion-cp33.*
- facturacion-cp33_det_imp.*
- facturacion-detalle_prod_emitidas.*
- facturacion-nc_emitidas.*
loaders:
- name: target-postgres
variant: meltanolabs
pip_url: meltanolabs-target-postgres
config:
batch_size_rows: 100000
database: dl_main
default_target_schema: raw_data
host: localhost
load_method: upsert
port: 5500
user: postgres
A heads snap of the current log:
2025-01-25T18:48:43.632638Z [info ] time=2025-01-25 19:48:43 name=singer level=INFO message=Server Parameters: version: Microsoft SQL Server 2017 (RTM) - 14.0.1000.169 (X64) cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:43.632965Z [info ] Aug 22 2017 17:04:49 cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:43.633119Z [info ] Copyright (C) 2017 Microsoft Corporation cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:43.633246Z [info ] Enterprise Edition (64-bit) on Windows Server 2022 Standard 10.0 <X64> (Build 20348: ) cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:43.633398Z [info ] , lock_timeout: -1, cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:43.633546Z [info ] time=2025-01-25 19:48:43 name=singer level=INFO message=Beginning sync cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:43.842427Z [info ] time=2025-01-25 19:48:43 name=singer level=INFO message=Preparing Catalog cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:43.944933Z [info ] 2025-01-25 19:48:43,944 | INFO | target-postgres | Target 'target-postgres' is listening for input from tap. cmd_type=elb consumer=True job_name=dev:tap-mssql-to-target-postgres name=target-postgres producer=False run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=target-postgres
2025-01-25T18:48:44.052279Z [info ] time=2025-01-25 19:48:44 name=singer level=INFO message=Fetching tables cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.121602Z [info ] time=2025-01-25 19:48:44 name=singer level=INFO message=Tables fetched, fetching columns cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.511912Z [info ] time=2025-01-25 19:48:44 name=singer level=INFO message=ARRAYSIZE=100000 cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.564381Z [info ] time=2025-01-25 19:48:44 name=singer level=INFO message=Columns Fetched cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.599467Z [info ] time=2025-01-25 19:48:44 name=singer level=INFO message=Catalog ready cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.601060Z [info ] time=2025-01-25 19:48:44 name=singer level=INFO message=selected-by-default: False cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.601220Z [info ] time=2025-01-25 19:48:44 name=singer level=INFO message=database-name: facturacion cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.601382Z [info ] time=2025-01-25 19:48:44 name=singer level=INFO message=is-view: False cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.601573Z [info ] time=2025-01-25 19:48:44 name=singer level=INFO message=table-key-properties: ['id'] cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.601742Z [info ] time=2025-01-25 19:48:44 name=singer level=INFO message=selected: True cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
2025-01-25T18:48:44.601876Z [info ] time=2025-01-25 19:48:44 name=singer level=INFO message=replication-method: FULL_TABLE cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
And a tails snap:
2025-01-28T03:41:39.148292Z [info ] 2025-01-28 04:41:39,148 | INFO | target-postgres | Target sink for 'facturacion-detalle_prod_emitidas' is full. Current size is '100000'. Draining... cmd_type=elb consumer=True job_name=dev:tap-mssql-to-target-postgres name=target-postgres producer=False run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=target-postgres
2025-01-28T03:41:41.015557Z [info ] 2025-01-28 04:41:41,012 | INFO | target-postgres.facturacion-detalle_prod_emitidas | Inserting with SQL: INSERT INTO ec70724a_6f93_4609_90e7_371371dbbc70 (id, idfactura, idproducto, idunidad, cantidad, subtotal, descuento, total, importe, "valorUnitario", iva_tra, tasa_iva_tra, iva_ret, tasa_iva_ret, isr_tra, tasa_isr_tra, isr_ret, tasa_isr_ret, ieps_tra, tasa_ieps_tra, ieps_ret, tasa_ieps_ret, estatus, fecha_alta, descripcion, unidad, tipo_factor_isr_ret, tipo_factor_iva_ret, tipo_factor_iva_tra, tipo_factor_ieps_ret, tipo_factor_ieps_tra, objeto_imp, _sdc_extracted_at, _sdc_received_at, _sdc_batched_at, _sdc_deleted_at, _sdc_sequence, _sdc_table_version, _sdc_sync_started_at) VALUES (:id, :idfactura, :idproducto, :idunidad, :cantidad, :subtotal, :descuento, :total, :importe, :valorUnitario, :iva_tra, :tasa_iva_tra, :iva_ret, :tasa_iva_ret, :isr_tra, :tasa_isr_tra, :isr_ret, :tasa_isr_ret, :ieps_tra, :tasa_ieps_tra, :ieps_ret, :tasa_ieps_ret, :estatus, :fecha_alta, :descripcion, :unidad, :tipo_factor_isr_ret, :tipo_factor_iva_ret, :tipo_factor_iva_tra, :tipo_factor_ieps_ret, :tipo_factor_ieps_tra, :objeto_imp, :_sdc_extracted_at, :_sdc_received_at, :_sdc_batched_at, :_sdc_deleted_at, :_sdc_sequence, :_sdc_table_version, :_sdc_sync_started_at) cmd_type=elb consumer=True job_name=dev:tap-mssql-to-target-postgres name=target-postgres producer=False run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=target-postgres
2025-01-28T03:41:56.105391Z [info ] 2025-01-28 04:41:54,907 | INFO | singer_sdk.metrics | METRIC: {"type": "timer", "metric": "batch_processing_time", "value": 15.758918285369873, "tags": {"stream": "facturacion-detalle_prod_emitidas", "pid": 3457125, "status": "succeeded"}} cmd_type=elb consumer=True job_name=dev:tap-mssql-to-target-postgres name=target-postgres producer=False run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=target-postgres
2025-01-28T03:42:23.409094Z [info ] 2025-01-28 04:42:23,408 | INFO | singer_sdk.metrics | METRIC: {"type": "counter", "metric": "record_count", "value": 91050, "tags": {"stream": "facturacion-detalle_prod_emitidas", "pid": 3457125}} cmd_type=elb consumer=True job_name=dev:tap-mssql-to-target-postgres name=target-postgres producer=False run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=target-postgres
2025-01-28T03:42:23.861373Z [info ] time=2025-01-28 04:42:23 name=singer level=INFO message=METRIC: b'{"type":"counter","metric":"record_count","value":156011,"tags":{"database":"facturacion","table":"detalle_prod_emitidas"}}' cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-postgres name=tap-mssql producer=True run_id=03a6a07c-6906-459e-ac92-9cc176b0aaf9 stdio=stderr string_id=tap-mssql
Thank you very much in advance!
- Balder Huerta.Edgar Ramírez (Arch.dev)
01/29/2025, 10:50 PMuse_copy
option?