I am in the process of modifying a copy of tap mssql variant Meltano #singer-tap-development

I am in the process of modifying a copy of tap-mss...

Don Venardos

08/20/2025, 6:28 PM

I am in the process of modifying a copy of tap-mssql --variant singer-io to integrate better with Meltano. I got the discovery option working correctly with Meltano. This issue was reported here: https://gitlab.com/meltano/meltano/-/issues/1175#note_684743065 Currently trying to get the error stream to flow through properly in the meltano run. What behavior does Meltano expect when an error is reported? Note: the error is actually due to how the DB schema is queried which I am fixing as well, but I want to make sure that errors are handled properly while I have one to work with.

meltano run tap-mssql target-jsonl

The tap correctly reports an error:

Copy code

2025-08-20T18:11:00.567018Z [info     ] FATAL [main] tap-mssql.core - Fatal Error Occured - Stream rss_test_dbo_c_logical_field_user_values has unsupported primary key(s): logical_field_sid cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-jsonl name=tap-mssql producer=True run_id=c8e6cbbf-26d4-415c-83d0-c420ccbf706c stdio=stderr string_id=tap-mssql
2025-08-20T18:11:00.567219Z [info     ] ERROR [main] #error {          cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-jsonl name=tap-mssql producer=True run_id=c8e6cbbf-26d4-415c-83d0-c420ccbf706c stdio=stderr string_id=tap-mssql
2025-08-20T18:11:00.567439Z [info     ]  :cause Stream rss_test_dbo_c_logical_field_user_values has unsupported primary key(s): logical_field_sid cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-jsonl name=tap-mssql producer=True run_id=c8e6cbbf-26d4-415c-83d0-c420ccbf706c stdio=stderr string_id=tap-mssql

But then Meltano gets a stack dump.

Copy code

2025-08-20T18:11:00.905173Z [error    ] Extractor failed              
2025-08-20T18:11:00.905403Z [error    ] Block run completed            block_type=ExtractLoadBlocks duration_seconds=89.355 err=RunnerError('Extractor failed') exit_codes={<PluginType.EXTRACTORS: 'extractors'>: 1} run_id=c8e6cbbf-26d4-415c-83d0-c420ccbf706c set_number=0 success=False
2025-08-20T18:11:00.906956Z [info     ] Run completed                  duration_seconds=89.357 run_id=c8e6cbbf-26d4-415c-83d0-c420ccbf706c status=failure
2025-08-20T18:11:00.907373Z [error    ] Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to join our friendly Slack community.

Run invocation could not be completed as block failed: Extractor failed
╭─────────────────────────────── Traceback (most recent call last) ...

BuzzCutNorman

08/20/2025, 7:28 PM

Meltano moved to using Github a few years back here is a link Meltano I would also check out the Meltano Hub for an up to date listing of available Taps and Targets. Here is a link to the hub Home - Meltano Hub

BuzzCutNorman

08/20/2025, 8:02 PM

There is also this section about logging in the SDK documentation. Logging - Meltano Singer SDK 0.48.0 documentation. I know the Singer-io tap is written mostly in Clojure but this at least gives you the basics of the log format. The process of seeing the error then getting a stack dump after the tap stopping is expected at least in my experience. It seems that the tap is throwing a runtime error which would stop the run.

Copy code

(if (not-empty unsupported-primary-keys)
      (throw (ex-info (format "Stream %s has unsupported primary key(s): %s"
                              stream-name
                              (string/join ", " unsupported-primary-keys)) {}))
      true)))

Don Venardos

08/20/2025, 8:33 PM

Thanks @BuzzCutNorman, that is helpful. I tried the only other tap that uses Change Tracking for incremental replication and it doesn't do the incremental sync properly (probably works for some subset of table structures). Testing this one out because it is the one used by Stitch so hoping it is more reliable. The sync code looks good. I am a SQL Server developer not Python or Clojure so trying to keep the mods small.

BuzzCutNorman

08/20/2025, 9:06 PM

I am not a pro developer either. Glad that you are willing to dive in an attempt to make life better for the SQL Server users out there. The singer-io tap seems to have most of the data types covered already. Just out curiosity what data type is

logical_field_sid

Don Venardos

08/20/2025, 9:09 PM

The issue is that it is using the jdbc catalog functions and that returns any SQL Server User Defined Data Types instead of the SQL Server base type, so it doesn't recognize any column that is defined as a UDDT. I just need to change it to query the information schema tables instead and that will return the SQL Server base type if you select the correct column.

Don Venardos

08/20/2025, 9:09 PM

JDBC getColumns uses SQL Server sp_columns to get the type_name used in defn column->schema. However, that returns the SQL Server UDDT instead of the corresponding SQL Server native data type. Using DATA_TYPE from INFORMATION_SCHEMA.COLUMNS will give the correct result.

BuzzCutNorman

08/20/2025, 9:29 PM

Sounds like you got the hard part already figured out. Hope it all goes smooth for you.

Don Venardos

08/26/2025, 12:00 AM

@BuzzCutNorman, I finally got the discovery working correctly; clojure is a bit obtuse... Currently having an issue with caching the catalog. I made changes so that meltano can generate the catalog, but it isn't getting cached. I get this message on every meltano run:

meltano --log-level=debug run tap-mssql target-jsonl

Cached catalog is outdated, running discovery... Any ideas?

BuzzCutNorman

08/26/2025, 2:53 PM

The first thing that comes to mind, and it is just a guess, is the following. https://docs.meltano.com/concepts/plugins/#use_cached_catalog-extra . You might try adding it to tap-mssql's config in the meltano project and setting it to true. The main reason I think this is the following. If you are generating a catalog and it is present then the use_cached_catalog is the other trigger for a discovery in the mltano code.

Copy code

if catalog_path.exists() and use_cached_catalog:

👍 1

Don Venardos

08/26/2025, 6:21 PM

Looks like Meltano expects the tap to implement functionality to support a cached catalog. Going to track down if they is a python only feature supported by taps built with the Meltano API.

BuzzCutNorman

08/26/2025, 6:29 PM

You are correct taps and targets that are written with the Singer-SDK do some of the background maintenance tasks for you. The tasks are still based on the Singer spec. The python based SDK does use SQLAlchemy to with assist SQL based tasks but I believe the catalog management is written by the Community and Arch so you should be able to match it the singer-io tap-mssql. I am going to CC @Edgar Ramírez (Arch.dev) who is who I reach out to when I need to know what Meltano is expecting from a tap or target in certain scenarios.

Edgar Ramírez (Arch.dev)

08/26/2025, 8:38 PM

@Don Venardos what does your

meltano.yml

look like? You may be just missing a

capabilities

entry.

Don Venardos

08/26/2025, 8:43 PM

These are the additions that I added after installing from Meltano hub. I didn't add any capabilities.

plugins:

extractors:

- name: tap-mssql

variant: singer-io

executable: /home/dvenardos/source/singer-io/tap-mssql/bin/tap-mssql

use_cached_catalog: true

settings:

- name: host

- name: instanceName

- name: port

kind: integer

- name: user

- name: password

kind: string

sensitive: true

- name: ssl

kind: boolean

- name: encrypt

- name: trustServerCertificate

kind: boolean

- name: database

https://hub.meltano.com/extractors/tap-mssql--singer-io/ From webpage: This plugin has the following capabilities: • catalog • discover • state

Edgar Ramírez (Arch.dev)

08/26/2025, 8:48 PM

Gotcha, so I'm curious what you mean by

Looks like Meltano expects the tap to implement functionality to support a cached catalog

Is there an error message?

Don Venardos

08/26/2025, 8:54 PM

Every time it runs I get this: Cached catalog is outdated, running discovery... I was trying to track down what the meltano run command is testing for to generate that message.

Don Venardos

08/26/2025, 8:56 PM

That message is coming from tap.py and I didn't see run.py calling tap.py, but I am not a python developer so just trying to pick my way through the code.

Edgar Ramírez (Arch.dev)

08/26/2025, 8:56 PM

What's the full

meltano run

command you're using?

Don Venardos

08/26/2025, 8:56 PM

meltano --log-level=debug run tap-mssql target-jsonl

Don Venardos

08/26/2025, 8:58 PM

2025-08-26T16:10:52.788060Z [debug    ] Created configuration at /home/dvenardos/source/meltano/singer_replication/.meltano/run/tap-mssql/tap.92844ea6-1add-4cad-88a6-3e2f27ace053.config.json run_id=e1a45d6e-fa51-4384-8183-5257c981c391

2025-08-26T16:10:52.788469Z [debug    ] Could not find tap.properties.json in /home/dvenardos/source/meltano/singer_replication/.meltano/extractors/tap-mssql/tap.properties.json, skipping. run_id=e1a45d6e-fa51-4384-8183-5257c981c391

2025-08-26T16:10:52.788689Z [debug    ] Could not find tap.properties.cache_key in /home/dvenardos/source/meltano/singer_replication/.meltano/extractors/tap-mssql/tap.properties.cache_key, skipping. run_id=e1a45d6e-fa51-4384-8183-5257c981c391

2025-08-26T16:10:52.788866Z [debug    ] Could not find state.json in /home/dvenardos/source/meltano/singer_replication/.meltano/extractors/tap-mssql/state.json, skipping. run_id=e1a45d6e-fa51-4384-8183-5257c981c391

2025-08-26T16:10:52.789033Z [debug    ] Could not find tap.singer_sdk_logging.json in /home/dvenardos/source/meltano/singer_replication/.meltano/extractors/tap-mssql/tap.singer_sdk_logging.json, skipping. run_id=e1a45d6e-fa51-4384-8183-5257c981c391

2025-08-26T16:10:52.789194Z [debug    ] Could not find tap.pipelinewise_logging.conf in /home/dvenardos/source/meltano/singer_replication/.meltano/extractors/tap-mssql/tap.pipelinewise_logging.conf, skipping. run_id=e1a45d6e-fa51-4384-8183-5257c981c391

2025-08-26T16:10:52.792521Z [info     ] Using systemdb state backend   run_id=e1a45d6e-fa51-4384-8183-5257c981c391

2025-08-26T16:10:52.794141Z [debug    ] Cached catalog is outdated, running discovery... run_id=e1a45d6e-fa51-4384-8183-5257c981c391

2025-08-26T16:10:52.795019Z [debug    ] Invoking: ['/home/dvenardos/source/singer-io/tap-mssql/bin/tap-mssql', '--config', '/home/dvenardos/source/meltano/singer_replication/.meltano/run/tap-mssql/tap.92844ea6-1add-4cad-88a6-3e2f27ace053.config.json', '--state', '/home/dvenardos/source/meltano/singer_replication/.meltano/run/tap-mssql/state.json', '--discover'] run_id=e1a45d6e-fa51-4384-8183-5257c981c391

Edgar Ramírez (Arch.dev)

08/26/2025, 8:58 PM

I think the issue lies in https://github.com/meltano/meltano/blob/f6fa998c99d5e17ea4677a34c2d769f7361c5700/src/meltano/core/plugin/singer/tap.py#L633-L635

Don Venardos

08/26/2025, 8:59 PM

Yes, that would be the issue...

Edgar Ramírez (Arch.dev)

08/26/2025, 9:01 PM

An ugly way to have a "cached" catalog then would be to dump it, save it to a file, then point to it with the catalog extra for the plugin.

Don Venardos

08/26/2025, 9:04 PM

Yeah, really need to have the select and metadata coming from the meltano.yml file. Catalog discovery is pretty fast will see if I can make it faster. It slowed down when I changed to write to output in chunks as it was writing the entire catalog as one string and exceeding meltano's byte limit.

👍 1

Don Venardos

08/26/2025, 9:05 PM

Thanks for the help!

Don Venardos

10/02/2025, 6:11 PM

@Edgar Ramírez (Arch.dev), we are getting pretty close to deploying this in production. The amount of time to run the catalog discovery varies greatly by the processing power of what is running Meltano so we are going to have to do the dump and save option. Before going that route is there any change of adding a meltano option to turn off a forced catalog refresh? It seems like this would be an option that would be good to be able to set for cases where you own the data source, i.e. you only want a catalog refresh when you know there are DB schema changes to process at which point you could turn the override off for a single sink and then turn it back off. Basically giving you more control over when the catalog is considered stale.

4 Views

Open in Slack

Previous Next