I am currently running tap mssql BuzzCutNorman + target snow Meltano #troubleshooting

I am currently running tap-mssql (BuzzCutNorman) +...

joshua_janicas

06/05/2024, 7:08 PM

I am currently running tap-mssql (BuzzCutNorman) + target-snowflake (Meltano) for my ingestion + using Dagster for my orchestration - got a theoretical question regarding parallelization of streams 🧵 I also have split up my extractors into individual YMLs based on business logic and I have Dagster run 4 sets of YMLs at a time (parallelization set to 4). I was wondering if there are settings in Meltano that tell the program to fan-out each object under your

select

into it's own "process", for lack of a better term

joshua_janicas

06/05/2024, 7:08 PM

As an example, I broke out my tables into individual YMLs like so

joshua_janicas

06/05/2024, 7:09 PM

Each YML looks something like this

Copy code

plugins:
  extractors:
  - name: tap-mssql-closures
    inherit_from: tap-mssql
    config:
      stream_maps:
        Admin-Closure:
          ClosureVersionId: __NULL__
    # Any tables not included here from the SELECT configuration above is defaulted to FULLTABLE replication
    select:
    - Admin-Closure.*
    - Admin-ClosureAllocation.*
    - Admin-ClosureReasonCode.* # No incremental 
    - Admin-ClosureReasonCodeLocalization.* # No incremental 
    - Admin-ClosureSettings.* # No incremental 
    - Admin-ClosureVersion.*
    - Admin-ClosureZoneAllocation.*
    - Admin-ClosureZoneEntryAllocation.*
    metadata:
      Admin-Closure:
        replication-method: INCREMENTAL
        replication-key: ClosureId
      Admin-ClosureVersion:
        replication-method: INCREMENTAL
        replication-key: LastEditDate
      Admin-ClosureAllocation:
        replication-method: INCREMENTAL
        replication-key: LastEditDate
      Admin-ClosureZoneEntryAllocation:
        replication-method: INCREMENTAL
        replication-key: LastEditDate
      Admin-ClosureZoneAllocation:
        replication-method: INCREMENTAL
        replication-key: LastEditDate

joshua_janicas

06/05/2024, 7:10 PM

So when dagster runs, I've told it to pick up each extractor name and run it, so

tap-mssql-admin

tap-mssql-closures

etc.

joshua_janicas

06/05/2024, 7:10 PM

But sometimes some of the tables are MUCH bigger than the others, and even if they start right away, they take significantly longer than the other streams and would benefit from its own parelliization

joshua_janicas

06/05/2024, 7:11 PM

I can break out those big tables into a standalone stream, but it's kind of annoying then having tons of YMLs that are one offs for tables

joshua_janicas

06/05/2024, 7:12 PM

So I was wondering if there's any functionality or settings that tell meltano to fan-out the names under the

select

and do all of them in parellel (or to a limit of how many you can do at once)

joshua_janicas

06/05/2024, 7:14 PM

So that Dagster is still runs my "4" extractors at a time, but each extractor then also fans out and runs in some kind of paralellization so that it's not stuck doing 1 table at a time

joshua_janicas

06/05/2024, 7:14 PM

if that's not possible then i'll just stick with problematic tables having their own names/stream 😅

visch

06/05/2024, 7:51 PM

https://github.com/meltano/meltano/issues/2677 and there's some chats in the gitlab thread that go pretty far To do this today I use the

select_filter

option in meltano via env variables. I have a process that runs these and then we schedule N number of jobs (300 in my case) that can all run independently of one another.

joshua_janicas

06/05/2024, 7:52 PM

Melturbo

😂 ❤️

😂 1

joshua_janicas

06/05/2024, 7:56 PM

Thanks Derek, I'll keep an eye on this

visch

06/05/2024, 7:56 PM

I don't know that anyone's working on it right now, I just know that people do it (including me) but it's really specific to our orchestrators and sometimes even the tap/target we're using

💯 1

joshua_janicas

06/05/2024, 7:57 PM

Yeah I am assuming it's not being looked at right now , and that's ok I have workarounds.

➕ 1

joshua_janicas

06/05/2024, 7:58 PM

But maybe one day 🎄 🎅

💯 1

5 Views

Open in Slack

Previous Next