I am in the process of modifying a copy of tap-mss...
# singer-tap-development
d
I am in the process of modifying a copy of tap-mssql --variant singer-io to integrate better with Meltano. I got the discovery option working correctly with Meltano. This issue was reported here: https://gitlab.com/meltano/meltano/-/issues/1175#note_684743065 Currently trying to get the error stream to flow through properly in the meltano run. What behavior does Meltano expect when an error is reported? Note: the error is actually due to how the DB schema is queried which I am fixing as well, but I want to make sure that errors are handled properly while I have one to work with.
meltano run tap-mssql target-jsonl
The tap correctly reports an error:
Copy code
2025-08-20T18:11:00.567018Z [info     ] FATAL [main] tap-mssql.core - Fatal Error Occured - Stream rss_test_dbo_c_logical_field_user_values has unsupported primary key(s): logical_field_sid cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-jsonl name=tap-mssql producer=True run_id=c8e6cbbf-26d4-415c-83d0-c420ccbf706c stdio=stderr string_id=tap-mssql
2025-08-20T18:11:00.567219Z [info     ] ERROR [main] #error {          cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-jsonl name=tap-mssql producer=True run_id=c8e6cbbf-26d4-415c-83d0-c420ccbf706c stdio=stderr string_id=tap-mssql
2025-08-20T18:11:00.567439Z [info     ]  :cause Stream rss_test_dbo_c_logical_field_user_values has unsupported primary key(s): logical_field_sid cmd_type=elb consumer=False job_name=dev:tap-mssql-to-target-jsonl name=tap-mssql producer=True run_id=c8e6cbbf-26d4-415c-83d0-c420ccbf706c stdio=stderr string_id=tap-mssql
But then Meltano gets a stack dump.
Copy code
2025-08-20T18:11:00.905173Z [error    ] Extractor failed              
2025-08-20T18:11:00.905403Z [error    ] Block run completed            block_type=ExtractLoadBlocks duration_seconds=89.355 err=RunnerError('Extractor failed') exit_codes={<PluginType.EXTRACTORS: 'extractors'>: 1} run_id=c8e6cbbf-26d4-415c-83d0-c420ccbf706c set_number=0 success=False
2025-08-20T18:11:00.906956Z [info     ] Run completed                  duration_seconds=89.357 run_id=c8e6cbbf-26d4-415c-83d0-c420ccbf706c status=failure
2025-08-20T18:11:00.907373Z [error    ] Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to join our friendly Slack community.

Run invocation could not be completed as block failed: Extractor failed
╭─────────────────────────────── Traceback (most recent call last) ...
b
Meltano moved to using Github a few years back here is a link Meltano I would also check out the Meltano Hub for an up to date listing of available Taps and Targets. Here is a link to the hub Home - Meltano Hub
There is also this section about logging in the SDK documentation. Logging - Meltano Singer SDK 0.48.0 documentation. I know the Singer-io tap is written mostly in Clojure but this at least gives you the basics of the log format. The process of seeing the error then getting a stack dump after the tap stopping is expected at least in my experience. It seems that the tap is throwing a runtime error which would stop the run.
Copy code
(if (not-empty unsupported-primary-keys)
      (throw (ex-info (format "Stream %s has unsupported primary key(s): %s"
                              stream-name
                              (string/join ", " unsupported-primary-keys)) {}))
      true)))
d
Thanks @BuzzCutNorman, that is helpful. I tried the only other tap that uses Change Tracking for incremental replication and it doesn't do the incremental sync properly (probably works for some subset of table structures). Testing this one out because it is the one used by Stitch so hoping it is more reliable. The sync code looks good. I am a SQL Server developer not Python or Clojure so trying to keep the mods small.
b
I am not a pro developer either. Glad that you are willing to dive in an attempt to make life better for the SQL Server users out there. The singer-io tap seems to have most of the data types covered already. Just out curiosity what data type is
logical_field_sid
?
d
The issue is that it is using the jdbc catalog functions and that returns any SQL Server User Defined Data Types instead of the SQL Server base type, so it doesn't recognize any column that is defined as a UDDT. I just need to change it to query the information schema tables instead and that will return the SQL Server base type if you select the correct column.
JDBC getColumns uses SQL Server sp_columns to get the type_name used in defn column->schema. However, that returns the SQL Server UDDT instead of the corresponding SQL Server native data type. Using DATA_TYPE from INFORMATION_SCHEMA.COLUMNS will give the correct result.
b
Sounds like you got the hard part already figured out. Hope it all goes smooth for you.
d
@BuzzCutNorman, I finally got the discovery working correctly; clojure is a bit obtuse... Currently having an issue with caching the catalog. I made changes so that meltano can generate the catalog, but it isn't getting cached. I get this message on every meltano run:
meltano --log-level=debug run tap-mssql target-jsonl
Cached catalog is outdated, running discovery... Any ideas?
b
The first thing that comes to mind, and it is just a guess, is the following. https://docs.meltano.com/concepts/plugins/#use_cached_catalog-extra . You might try adding it to tap-mssql's config in the meltano project and setting it to true. The main reason I think this is the following. If you are generating a catalog and it is present then the use_cached_catalog is the other trigger for a discovery in the mltano code.
Copy code
if catalog_path.exists() and use_cached_catalog:
👍 1
d
Looks like Meltano expects the tap to implement functionality to support a cached catalog. Going to track down if they is a python only feature supported by taps built with the Meltano API.
b
You are correct taps and targets that are written with the Singer-SDK do some of the background maintenance tasks for you. The tasks are still based on the Singer spec. The python based SDK does use SQLAlchemy to with assist SQL based tasks but I believe the catalog management is written by the Community and Arch so you should be able to match it the singer-io tap-mssql. I am going to CC @Edgar Ramírez (Arch.dev) who is who I reach out to when I need to know what Meltano is expecting from a tap or target in certain scenarios.
e
@Don Venardos what does your
meltano.yml
look like? You may be just missing a
capabilities
entry.
d
These are the additions that I added after installing from Meltano hub. I didn't add any capabilities.
plugins:
extractors:
- name: tap-mssql
variant: singer-io
executable: /home/dvenardos/source/singer-io/tap-mssql/bin/tap-mssql
use_cached_catalog: true
settings:
- name: host
- name: instanceName
- name: port
kind: integer
- name: user
- name: password
kind: string
sensitive: true
- name: ssl
kind: boolean
- name: encrypt
- name: trustServerCertificate
kind: boolean
- name: database
https://hub.meltano.com/extractors/tap-mssql--singer-io/ From webpage: This plugin has the following capabilities: • catalog • discover • state
e
Gotcha, so I'm curious what you mean by
Looks like Meltano expects the tap to implement functionality to support a cached catalog
Is there an error message?
d
Every time it runs I get this: Cached catalog is outdated, running discovery... I was trying to track down what the meltano run command is testing for to generate that message.
That message is coming from tap.py and I didn't see run.py calling tap.py, but I am not a python developer so just trying to pick my way through the code.
e
What's the full
meltano run
command you're using?
d
meltano --log-level=debug run tap-mssql target-jsonl
2025-08-26T16:10:52.788060Z [debug    ] Created configuration at /home/dvenardos/source/meltano/singer_replication/.meltano/run/tap-mssql/tap.92844ea6-1add-4cad-88a6-3e2f27ace053.config.json run_id=e1a45d6e-fa51-4384-8183-5257c981c391
2025-08-26T16:10:52.788469Z [debug    ] Could not find tap.properties.json in /home/dvenardos/source/meltano/singer_replication/.meltano/extractors/tap-mssql/tap.properties.json, skipping. run_id=e1a45d6e-fa51-4384-8183-5257c981c391
2025-08-26T16:10:52.788689Z [debug    ] Could not find tap.properties.cache_key in /home/dvenardos/source/meltano/singer_replication/.meltano/extractors/tap-mssql/tap.properties.cache_key, skipping. run_id=e1a45d6e-fa51-4384-8183-5257c981c391
2025-08-26T16:10:52.788866Z [debug    ] Could not find state.json in /home/dvenardos/source/meltano/singer_replication/.meltano/extractors/tap-mssql/state.json, skipping. run_id=e1a45d6e-fa51-4384-8183-5257c981c391
2025-08-26T16:10:52.789033Z [debug    ] Could not find tap.singer_sdk_logging.json in /home/dvenardos/source/meltano/singer_replication/.meltano/extractors/tap-mssql/tap.singer_sdk_logging.json, skipping. run_id=e1a45d6e-fa51-4384-8183-5257c981c391
2025-08-26T16:10:52.789194Z [debug    ] Could not find tap.pipelinewise_logging.conf in /home/dvenardos/source/meltano/singer_replication/.meltano/extractors/tap-mssql/tap.pipelinewise_logging.conf, skipping. run_id=e1a45d6e-fa51-4384-8183-5257c981c391
2025-08-26T16:10:52.792521Z [info     ] Using systemdb state backend   run_id=e1a45d6e-fa51-4384-8183-5257c981c391
2025-08-26T16:10:52.794141Z [debug    ] Cached catalog is outdated, running discovery... run_id=e1a45d6e-fa51-4384-8183-5257c981c391
2025-08-26T16:10:52.795019Z [debug    ] Invoking: ['/home/dvenardos/source/singer-io/tap-mssql/bin/tap-mssql', '--config', '/home/dvenardos/source/meltano/singer_replication/.meltano/run/tap-mssql/tap.92844ea6-1add-4cad-88a6-3e2f27ace053.config.json', '--state', '/home/dvenardos/source/meltano/singer_replication/.meltano/run/tap-mssql/state.json', '--discover'] run_id=e1a45d6e-fa51-4384-8183-5257c981c391
d
Yes, that would be the issue...
e
An ugly way to have a "cached" catalog then would be to dump it, save it to a file, then point to it with the catalog extra for the plugin.
d
Yeah, really need to have the select and metadata coming from the meltano.yml file. Catalog discovery is pretty fast will see if I can make it faster. It slowed down when I changed to write to output in chunks as it was writing the entire catalog as one string and exceeding meltano's byte limit.
👍 1
Thanks for the help!
@Edgar Ramírez (Arch.dev), we are getting pretty close to deploying this in production. The amount of time to run the catalog discovery varies greatly by the processing power of what is running Meltano so we are going to have to do the dump and save option. Before going that route is there any change of adding a meltano option to turn off a forced catalog refresh? It seems like this would be an option that would be good to be able to set for cases where you own the data source, i.e. you only want a catalog refresh when you know there are DB schema changes to process at which point you could turn the override off for a single sink and then turn it back off. Basically giving you more control over when the catalog is considered stale.