Alex Novytskyi
05/28/2025, 12:54 PMtap-jtl-mssql
to connect to MSSQL, we are experiencing extremely long execution time during the --discover
phase. According to the logs, the invocation:
2025-05-28T12:50:05.060658Z [debug] Invoking: [.../tap-jtl-mssql ... --discover]
takes
2–3 minutes to complete. It seems that the tap is trying to scan and analyze the entire database schema, even though a local catalog file is explicitly defined in the configuration:
yaml
catalog: /home/on/Projects/meltano_jtl_pim/project/tap_jtl_mssql/catalog.json
However, this file appears to be
ignored, as the tap still initiates a full database scan via SQLAlchemy and launches discovery instead of using the pre-generated catalog.
Expected behavior:
When a catalog.json
is provided, the tap should skip the discovery process and use the existing catalog instead.
Questions:
1. Is there a way to forcefully prevent the --discover
step and work only with the existing catalog.json
?
2. Is there a Meltano or tap configuration option to disable automatic discovery if the catalog file is already available?
3. Is there a way to cache discovery results to avoid hitting the database repeatedly?
Environment:
• Meltano version: 3.7.6
• OS: Ubuntu 22.04
• Python: 3.12.3Edgar Ramírez (Arch.dev)
05/28/2025, 8:32 PMWhen aI can confirm that's the actual behavior. Where are you definingis provided, the tap should skip the discovery process and use the existing catalog instead.catalog.json
catalog.json
in the context of the rest of the plugin config meltano.yml
?
Is there a way to cache discovery results to avoid hitting the database repeatedly?That is also already the case. Are you running Meltano in an ephemeral environment, e.g. Docker?
Alex Novytskyi
05/29/2025, 9:49 AMversion: 1
send_anonymous_usage_stats: true
project_id: tap-mssql
default_environment: dev
venv:
backend: uv
environments:
- name: dev
plugins:
extractors:
- name: tap-jtl-mssql
namespace: tap_jtl_mssql
pip_url: -e .
capabilities:
- state
- catalog
- discover
- about
- stream-maps
# TODO: Declare settings and their types here:
settings_group_validation:
- [host, port, database, user, password, query]
# TODO: Declare default configuration values here:
settings:
- name: host
label: Host
description: The DB Host
- name: port
label: Port
description: The DB Port
- name: database
label: Database
description: The DB Name
- name: user
label: User
description: The DB User
- name: password
kind: string
label: DB User Password
description: DB User Password
sensitive: true
- name: query
label: Query for select data from DB
description: Query for select data from DB
- name: driver
label: Driver for DB connection
description: Driver, default - ODBC Driver 17 for SQL Server
# TODO: Declare required settings here:
config:
driver: ODBC Driver 18 for SQL Server
host: 217.154.199.124
#database: test-db
database: eazybusiness
user: sa
#query: SELECT TOP 10 * FROM products
query: |
SELECT
k.*,
ks.*
FROM tkategorie k
LEFT JOIN tkategoriesprache ks ON ks.kKategorie = k.kKategorie
WHERE kOberKategorie = 0;
catalog: ./catalog.json
select:
- '*.*'
##################################################################################
loaders:
- name: target-jsonl
variant: andyh1203
pip_url: target-jsonl
config:
do_timestamp_file: false
destination_path: ./output
- name: target-mysql
variant: thkwag
pip_url: thk-target-mysql
config:
user: user
database: db
password: pass
host: mysql
port: "3306"
Alex Novytskyi
05/29/2025, 9:50 AMAlex Novytskyi
05/30/2025, 8:55 AMEdgar Ramírez (Arch.dev)
05/30/2025, 4:21 PMcatalog
is nested incorrectly. It should look something like:
config:
driver: ODBC Driver 18 for SQL Server
..
catalog: ./catalog.json
select:
- '*.*'
i.e. at the same level as config
, not nested inside it.Alex Novytskyi
06/10/2025, 10:37 AMEdgar Ramírez (Arch.dev)
06/10/2025, 4:17 PMcatalog
and select
set. The dbo-products
stream most likely doesn't have the selected: true
metadata in the catalog file