Hello everyone fire I m trying to keep a MSSQL database as a Meltano #troubleshooting

Hello everyone! :fire: I’m trying to keep a MSSQL...

anna

04/27/2023, 9:28 AM

Hello everyone! 🔥 I’m trying to keep a MSSQL database as a “mirror” of a external data source. For various reasons the whole data must be downloaded each time, and no unique keys exists to distinguish each row. I’ve written my own custom extractor, and then use the standard Microsoft MSSQL loader. My extractor uses the replication_method=“FULL_TABLE”, which I initially thought would replace the data, but it is actually appending. (I am also using a state in the extractor, could that affect anything? The state is not relevant to this question otherwise.) Is there a neat way of doing this using some config or setting? Otherwise my best idea is to, in the beginning of the extractor, connect to my MSSQL db and execute some drop table statements… Please advice 🙏 Thank you! TL;DR: I need help with replacing all data in my target db to “mirror” the content of an external data source. How do I do this neatly in my extractor (tap)?

user

04/27/2023, 10:06 AM

Hey @anna, there's two options for you here: 1. @visch e.g. uses dbt to clean out all tables and then does the load (inside dbt you can easily use macros to truncate). 2. You could also just fork the mssql loader and run the truncation first. Both options will work just fine. Btw.: I would usually recommend implementing the truncation slightly different: 1. Load all your data into "target_table_tmp" 2. Once this finishes, use a hook, or a renaming/swapping to swap old and new table 3. truncate the old table. That way you protect against failed loads causing downtime for whatever is behind your data. It's also the reason a lot of targets probably are hesitant to implement a truncation policy.

Henning Holgersen

04/27/2023, 10:52 AM

My two cents: if your extractor has a key specified, target-mssql will do an upsert instead of just appending - even with full table replication. This might work well, but it still won’t delete any data. For that, you need to truncate the target table first - and Sven is pretty spot on. Personally I use meltano with Prefect, so I have a separate truncate table statement running first for the tables I need to make sure are deleted. And by the way, I am the maintainer of the target-mssql. I have not planned to add a truncate table option, but PRs are always welcome and I’m happy to help.

phornkrit_hirunyanitiwatana

04/27/2023, 11:28 AM

@Henning Holgersen You use Prefect to Truncate Table and Load Data with Meltano am I right?

Henning Holgersen

04/27/2023, 11:29 AM

Correct. But all inside the same prefect flow.

phornkrit_hirunyanitiwatana

04/27/2023, 11:31 AM

https://prefecthq.github.io/prefect-sqlalchemy/

phornkrit_hirunyanitiwatana

04/27/2023, 11:31 AM

With prefect-sqlalchemy ? My Project use Prefect and Meltano. I will consider this way to solve my problems too. Thanks @Henning Holgersen

user

04/27/2023, 11:32 AM

Things are getting interesting 😄 That sounds cool and really helpful. Maybe someone (not me) should write a blog post about that!

Henning Holgersen

04/27/2023, 11:37 AM

I use just the normal sqlalchemy, but thanks for the prefect-sqlalchemy tip - I will check it out, maybe it simplifies something.

anna

04/27/2023, 11:44 AM

Thank you for the help guys! 🙌 @Henning Holgersen I do not use a key and it’s unfortunately no data field available that could be a suitable key, so I don’t think this is applicable… @Sven Balnojan I really like your suggestion with temporary data table and then swapping. However, while this is “safer” in terms of consistent data access, would this affect the computing/read/write times? I’m using only 10% of the full data now and I’m already experience quite slow pipelines…

user

04/27/2023, 11:52 AM

@anna no it won't. 99% of your data pipeline time is in reading, writing, and computing on top of data. Swapping tables/renaming/hooking are metadata operations and thus happen almost instantaneously (if implemented correctly, and that depends very much on your tech stack, but I haven't seen a single one where this isn't doable).

anna

04/27/2023, 12:01 PM

Great! I’ll try this out then! Thanks again for your quick support and expert advice 🔥

jacob_matson

04/27/2023, 11:48 PM

can you hash rows? I have done that from time to time with MSSQL in this scenario (not using meltano though)

jacob_matson

04/27/2023, 11:49 PM

something like MERGE on row hash + then flag Is_deleted = 1 for missing rows

anna

04/28/2023, 6:01 AM

@jacob_matson I have not looked into this! This might be a nice way to get incremental updates instead of updating everything. Thanks for the idea!

anna

04/28/2023, 8:06 PM

I’m still struggling with this 😞 I tried to go with the suggestion by @Sven Balnojan to load data into tmp table, swap names, then truncate the old data, using post-hooks. However, unlike target-postgres, target-mssql does not implement any

after_run_sql

configurations, so then I have to implement the hooks myself using a transformer, right? I tried installing dbt-sqlserver, but I cannot for my life get it to work on my MS SQL db instance. I’ve tried installing it through meltano but since it’s not one of the official transformers it is very unclear to me how I’m supposed to install it, and it keeps giving me very little error messages. Here is part of my meltano.yml file with the transformer.

Copy code

transformers:
  - name: dbt-sqlserver
    namespace: dbt-sqlserver
    executable: dbt
    pip_url: dbt-sqlserver==1.3.1
    commands:
      run:
        args: run --models +staging.*
    config:
      server: localhost
      port: '1433'
      database: master
      schema: dbo
      user: user
      password: password
      driver: ODBC Driver 18 for SQL Server
      #project_dir: /Users/anna/git/my-tap/transform # Does not work, must set with flag? Relative path does not work?
      #profiles_dir: /Users/anna/git/my-tap/transform/profile # Does not work, must set with flag?

I should have the necessary drivers for the MS SQL db. I’ve tried to install this through

meltano install

but when that didn’t work I also tried to install through pip in a local venv. This didn’t seem right so I backtracked? However now I’m stuck at this error:

Copy code

(venv) anna@Annas-Mac ~/git/my-tap (main) % meltano invoke dbt-sqlserver:run +staging --profiles-dir /Users/anna/git/my-tap/transform/profile --project-dir /Users/anna/git/my-tap/transform
Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
join our friendly Slack community.

__init__() got an unexpected keyword argument 'request_timeout'

So likely I’ve not installed this correctly…. Am I even right in trying to install this dbt-sqlserver? I feel like I’m digging deeper and deeper and not sure if I’m even supposed to be digging here. 😟 😖 Thanks for any help! I really appreciate it 🙌

Henning Holgersen

04/28/2023, 8:39 PM

I’m not very well-versed in transformers and utilities, but looking at the dbt utility, I suspect it is possible to get it to work with mssql just by configuring it correctly and including the required libraries. https://github.com/meltano/dbt-ext - it really isn’t one repo for each DB type, but rather one repo with many configs. Alternatively I might try to run the table swap as a simple command line thing (python script maybe) directly on the command line outside of meltano. Especially since table swaps aren’t a thing in dbt.

jacob_matson

04/28/2023, 9:53 PM

jacob_matson

04/28/2023, 9:54 PM

Anna I am happy to jump on a call to troubleshoot. the dbt-ext is the way to go but dbt-sqlserver can be a bit squirrely!

jacob_matson

04/28/2023, 9:54 PM

if you steal the config from my mdsinabox.com project (https://github.com/matsonj/nba-monte-carlo), that should be a starting point for dbt-ext.

jacob_matson

04/28/2023, 10:00 PM

i haven't actually used dbt-sqlserver with meltano but i have used dbt-duckdb and I use dbt-sqlserver every day

Open in Slack

Previous Next