I added one table (or so I believed) into meltano....
# troubleshooting
d
I added one table (or so I believed) into meltano.yml and tried to run the pipeline, but I get this big crash. I note the first problem is 'catalog discovery failed' - which again suggests the timeout problem. How can I extend this, or at least get Meltano to only look at a subset of tables? ```avid@kafka:~/BC-to-Snowflake$ meltano run tap-mssql target-snowflake 2023-09-05T054138.329477Z [info ] Environment 'dev' is active 2023-09-05T054138.712367Z [warning ] No state was found, complete import. 2023-09-05T054655.771965Z [error ] Cannot start plugin tap-mssql: Catalog discovery failed: command ['/home/david/BC-to-Snowflake/.meltano/extractors/tap-mssql/venv/bin/tap-mssql', '--config', '/home/david/BC-to-Snowflake/.meltano/run/tap-mssql/tap.e19ab3c5-af2f-4e03-9d8b-9302aa60c3dc.config.json', '--discover'] returned 1 with stderr: INFO Server Parameters: version: Microsoft SQL Server 2019 (RTM-CU18-GDR) (KB5021124) - 15.0.4280.7 (X64) Jan 23 2023 123713 Copyright (C) 2019 Microsoft Corporation Standard Edition (64-bit) on Windows Server 2019 Datacenter 10.0 <X64> (Build 17763: ) (Hypervisor) , lock_timeout: -1, INFO Preparing Catalog INFO Fetching tables INFO Tables fetched, fetching columns CRITICAL (20047, b'DB-Lib error message 20047, severity 9:\nDBPROCESS is dead or not enabled\n') Traceback (most recent call last): File "src/pymssql/_pymssql.pyx", line 459, in pymssql._pymssql.Cursor.execute File "src/pymssql/_mssql.pyx", line 1087, in pymssql._mssql.MSSQLConnection.execute_query File "src/pymssql/_mssql.pyx", line 1118, in pymssql._mssql.MSSQLConnection.execute_query File "src/pymssql/_mssql.pyx", line 1251, in pymssql._mssql.MSSQLConnection.format_and_run_query File "src/pymssql/_mssql.pyx", line 1789, in pymssql._mssql.check_cancel_and_raise File "src/pymssql/_mssql.pyx", line 1835, in pymssql._mssql.raise_MSSQLDatabaseException pymssql._mssql.MSSQLDatabaseException: (20047, b'DB-Lib error message 20047, severity 9:\nDBPROCESS is dead or not enabled\n') During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/david/BC-to-Snowflake/.meltano/extractors/tap-mssql/venv/bin/tap-mssql", line 8, in <module> sys.exit(main()) File "/home/david/BC-to-Snowflake/.meltano/extractors/tap-mssql/venv/lib/python3.10/site-packages/tap_mssql/__init__.py", line 780, in main raise exc File "/home/david/BC-to-Snowflake/.meltano/extractors/tap-mssql/venv/lib/python3.10/site-packages/tap_mssql/__init__.py", line 777, in main main_impl() File "/home/david/BC-to-Snowflake/.meltano/extractors/tap-mssql/venv/lib/python3.10/site-packages/tap_mssql/__init__.py", line 763, in main_impl do_discover(mssql_conn, args.config) File "/home/david/BC-to-Snowflake/.meltano/extractors/tap-mssql/venv/lib/python3.10/site-packages/tap_mssql/__init__.py", line 314, in do_discover discover_catalog(mssql_conn, config).dump() File "/home/david/BC-to-Snowflake/.meltano/extractors/tap-mssql/venv/lib/python3.10/site-packages/tap_mssql/__init__.py", line 232, in discover_catalog cur.execute( File "src/pymssql/_pymssql.pyx", line 479, in pymssql._pymssql.Cursor.execute pymssql._pymssql.OperationalError: (20047, b'DB-Lib error message 20047, severity 9:\nDBPROCESS is dead or not enabled\n') 2023-09-05T054655.772200Z [error ] Block run completed. block_type=ExtractLoadBlocks err=RunnerError('Cannot start plugin tap-mssql: Catalog discovery failed: command [\'/home/david/BC-to-Snowflake/.meltano/extractors/tap-mssql/venv/bin/tap-mssql\', \'--config\', \'/home/david/BC-to-Snowflake/.meltano/run/tap-mssql/tap.e19ab3c5-af2f-4e03-9d8b-9302aa60c3dc.config.json\', \'--discover\'] returned 1 with stderr:\n INFO Server Parameters: version: Microsoft SQL Server 2019 (RTM-CU18-GDR) (KB5021124) - 15.0.4280.7 (X64) \n\tJan 23 2023 123713 \n\tCopyright (C) 2019 Microsoft Corporation\n\tStandard Edition (64-bit) on Windows Serverโ€ฆ
c
That's odd. AFAIK, Error 20047 would mean that the underlying TDS session has expired on the SQL Server, but your client is trying to re-use it. Not sure why that would happen during Meltano Catalog Discovery. Catalog Discovery should be quick and this situation should be unlikely to occur.
d
Oh! I am not sure what to do with that ๐Ÿ™‚ but interesting and intriguing.
c
Well, basically the SQL Server thinks that the client has disappeared and SQL Server kills the session. When the client wakes up again during the meltano catalog discovery, it just keeps on going as if nothing happened and is trying to use a session ID that has been purged by the SQL Server already.
d
Interesting!
c
At least that's my interpretation of the situation.
d
Should I send the logs somewhere for analysis?
c
I'm not sure if there would be any useful logs at this stage. My go-to for SQL Server troubleshooting in pymssql is usually the FreeTDS tace log file, which can be pretty helpful. I think it should be possible to generate such FreeTDS trace log by simply invoking the meltano run with an extra environment variable added in
TDSDUMP=/tmp/freetds.log meltano ...
In general, you're already on the right path by reducing the scope for the extractor to just one table as a troubleshooting step.
One meltano "quirk" to note when doing this kind of "scoping" of which streams to include in the catalog: The meltano catalog data is cached by meltano after discovery. And you may want to force delete the cache file during your troubleshooting steps. The cache files will be in the
.meltano
directory in a subdirectory called
run/<nameofthetap>
(the name of the tap is probably going to be just
tap-mssql
) and the two files are
tap.properties.json
and
tap.properties.cache_key
. Deleting those 2 files will clear the cached catalog.
d
Thank you, @christoph, I appreciate the details and support. I will give this a try in the morning (GMT +10h here).
Also, I hear @visch is a Microsoft genius and may have insights ๐Ÿ˜„
v
no genius, just do a bunch of work in MS land ๐Ÿ˜„ I've had really bad luck with pymssql (https://github.com/pymssql/pymssql/issues/695) , to be fair I haven't had this issue specifically but I tend to recommend pyodbc Could give that a shot, not sure it'd fix your issue here though ๐Ÿคท
c
I couldn't disagree more. Pymssql and FreeTDS has been absolutetly tremenduous for me. I could not fathom using a piece of software like Microsoft ODBC drivers where I can't do any troubleshooting myself and have to rely on the vendor to tell me what the problem is. ๐Ÿคฎ Luckily, I have never even been in the situation where I actually managed to install the ODBC drivers! ๐Ÿ˜‚
I will give this a try in the morning (GMT +10h here)
I used to work with a David Williams once. I tried decyphering your profile picture, but I can't quite tell if it's you or not. ๐Ÿ˜‚
v
@christoph I'd be happy to use pymssql and freetds, I've just hit random issues like the one I linked to and there's no fix ๐Ÿ˜•
c
Interesting. Wasn't aware of that issue. I'll have a look through and see what I can find. Thanks for linking it here.
d
@christoph are you in Australia? My LinkedIn is https://linkedin.com/in/dataexcellence - would be funny if we ever did.
Ok - so switch out pymsql for pyodbc. Hmm, presumably that means a different extractor than tap-mssql? I will Google what I need to do here. Thanks for the pointer.
I really do like the elegance of Meltano so if I can get this to work it will be awesome.
v
c
Different David Williams ๐Ÿ˜ Yup. I'm Sydney based.
v
@david_m_williams sorry I kinda hijacked this thread We want to get you up and running, try the stuff Chris recommended and get the info back I don't care what you use as long as it works! Sending the logs here would give you a better shot of someone pointing you in the right direction!
d
Ok ... I tried removing the cache and setting the TDSDUMP env variable. Unfortunately, no change, and alas, no new log created. I then found the tap-mssql variant I was using - the default one - hasn't been updated for some time and some people (e.g., a GitLab post) suggest it is abandonware. I changed to 'buzzcutnorman'. Now my meltano job is not crashing ... but at the same time I'm not sure what it's doing. It's been running for almost two hours, but without any interesting output at all. I will leave it overnight and see what happens ....
message has been deleted
v
When you're debugging I'd start with just making sure one side works so
meltano invoke tap-mssql > out
no output is a bit concerning as you'd expect snowflake to say something as it's writing the data odd that it'd just hang
d
Ok, it eventually failed ...
Copy code
2023-09-06T06:23:38.986448Z [info     ] Environment 'dev' is active
2023-09-06T06:23:41.764464Z [warning  ] No state was found, complete import.

2023-09-06T20:38:56.095445Z [error    ] Cannot start plugin tap-mssql: Catalog discovery failed: command ['/home/david/BC-to-Snowflake/.meltano/extractors/tap-mssql/venv/bin/tap-mssql', '--config', '/home/david/BC-to-Snowflake/.meltano/run/tap-mssql/tap.655beb03-af65-471d-a83c-8d5360dbb793.config.json', '--discover'] returned 1 with stderr:
 2023-09-06 22:36:47,835 | ERROR    | sqlalchemy.pool.impl.QueuePool | Exception during reset or similar
Traceback (most recent call last):
  File "/home/david/BC-to-Snowflake/.meltano/extractors/tap-mssql/venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 763, in _finalize_fairy
    fairy._reset(pool, transaction_was_reset)
  File "/home/david/BC-to-Snowflake/.meltano/extractors/tap-mssql/venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1038, in _reset
    pool._dialect.do_rollback(self)
  File "/home/david/BC-to-Snowflake/.meltano/extractors/tap-mssql/venv/lib/python3.10/site-packages/sqlalchemy/dialects/mssql/base.py", line 2845, in do_rollback
    super(MSDialect, self).do_rollback(dbapi_connection)
  File "/home/david/BC-to-Snowflake/.meltano/extractors/tap-mssql/venv/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 683, in do_rollback
    dbapi_connection.rollback()
  File "src/pymssql/_pymssql.pyx", line 316, in pymssql._pymssql.Connection.rollback
  File "src/pymssql/_pymssql.pyx", line 300, in pymssql._pymssql.Connection.rollback
  File "src/pymssql/_mssql.pyx", line 1056, in pymssql._mssql.MSSQLConnection.execute_non_query
  File "src/pymssql/_mssql.pyx", line 1077, in pymssql._mssql.MSSQLConnection.execute_non_query
  File "src/pymssql/_mssql.pyx", line 1251, in pymssql._mssql.MSSQLConnection.format_and_run_query
  File "src/pymssql/_mssql.pyx", line 1789, in pymssql._mssql.check_cancel_and_raise
  File "src/pymssql/_mssql.pyx", line 1835, in pymssql._mssql.raise_MSSQLDatabaseException
pymssql._mssql.MSSQLDatabaseException: (20047, b'DB-Lib error message 20047, severity 9:\nDBPROCESS is dead or not enabled\n')
Traceback (most recent call last):
  File "src/pymssql/_pymssql.pyx", line 461, in pymssql._pymssql.Cursor.execute
  File "src/pymssql/_mssql.pyx", line 1087, in pymssql._mssql.MSSQLConnection.execute_query
  File "src/pymssql/_mssql.pyx", line 1118, in pymssql._mssql.MSSQLConnection.execute_query
  File "src/pymssql/_mssql.pyx", line 1251, in pymssql._mssql.MSSQLConnection.format_and_run_query
  File "src/pymssql/_mssql.pyx", line 1789, in pymssql._mssql.check_cancel_and_raise
  File "src/pymssql/_mssql.pyx", line 1835, in pymssql._mssql.raise_MSSQLDatabaseException
pymssql._mssql.MSSQLDatabaseException: (20047, b'DB-Lib error message 20047, severity 9:\nDBPROCESS is dead or not enabled\n')
c
Sounds like the same problem? Getting stuck during discovery?
b
No promises this will work but some have had luck by enabling
pool_pre_ping
. You should be able to enable it for tap-mssql buzzcutnorman variant with this command:
Copy code
meltano config tap-mssql set sqlalchemy_eng_params.pool_pre_ping "True"
s
This problem is most likely a timeout / network issue rather than anything to do with Meltano or any of the TAPs. I would be speaking to the DBA responsible for the database to understand what timeouts are defined or whether there is any network policies which will be terminating the connection. This page discusses remote query timeouts for MS SQL Server. https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-[โ€ฆ]y-timeout-server-configuration-option?view=sql-server-ver16