Hey guys! I have implemented a Singer Tap and Targ...
# troubleshooting
s
Hey guys! I have implemented a Singer Tap and Target for my use case. The problem that I am facing now is that I need to run this combination for 100 clients with their config files. I have that config data in an AirTable base (or, alternatively, in a Google Sheet). Is there a meaningful way to use Meltano iterating through these 100 pipelines with their configurations? I don’t want to manually create and maintain these 100 configurations.
a
Hey Steven. There's at least a few different options: 1. you create yourself a little script that loops through all the clients and sets the TAP_xxx_setting, then invoke meltano 2. option 1 could be extended with a number of deployment options on your cloud infra 3. generate meltano environment files and again call meltano, but meltano would setup the environment 4. Our API, which manages a workspace with a meltano project per workspace, might be good for your use case? Integrated into your app a client could setup their own data connection to you. Either create a workspace for each client or a pipeline for each client https://www.matatika.com/docs/api/resources/workspaces https://www.matatika.com/docs/api/resources/pipelines
s
Thanks for your answer, @aaron_phethean. This project is internal only, so clients aren’t in touch with this. The problem with this is that I don’t oversee the complexity of keeping the configs up to date based on the google sheet. Entries get deleted, added, and updated.
a
So someone else will manage the Google Sheet? In any case, there are tons of ways to do this. Just shout if you need some more ideas. Cheers.
s
Yes, the google sheet is managed internally. I’ve done this in the past with a lib called Singer Runner, but I’m trying to understand what advantages Meltano would give me and how I can do this, eventually with Airflow or Dagster as shown in this post. 🤔
a
Yeah, that's a cool setup. Not for the faint hearted - plenty of things to setup! My thoughts on Meltano vs straight Singer: • environment management. In your case, and ours there's no big advantage here as we still have to manage the tenant / client specific / pipeline specific properties on top of the meltano.yml (there are lots of laying options in meltano.yml, but you still have to manage those! e.g. transform google sheet to many meltano.ymls) • python venv management. Being explicit about the tap version is good. Isolated from other dependencies of other taps is a big plus. • job id / incremental state. Meltano can take care of the storage of this state to a central db with a separate id for each of the clients. • local environment. Perhaps an under-rated benefit. Things go wrong, and when they do it's really helpful to be able to run the project as is locally with the identical environment to production. https://www.matatika.com/docs/getting-started/running-your-data-import-locally • managing and running other plugins. We like the idea of putting more plugins into Meltano and managing those as a 'data project dependency' e.g. reports and alerts • metadata discovery and selection. We're not doing this yet ourselves, but can certainly see how useful it would be. • I'm sure there's more
v
Only issue with that blog is that solids are now gone I guess with the latest release of dagster so some of the code examples are dated already. An easy approach for doing what you want (solutions are heavily dependent on your use case, but I'll give it a go) would be to first figure out how to run your meltano setup with one client. We'll say dagster and I'll hand wave a bit Ie runner.sh
Copy code
#!/bin/bash
export TAP_ABC_PASSWORD=abc
export TAP_ABC_CLIETN_ID=id_for_client1
export CLIENT_NAME=client_name

meltano elt tap-abc target-abc --job_id=tap-abc2target-abc-$CLIENT_NAME
Cool you now have a runner that works for any client assuming you pass in the right data Next 1. Have dagster go get your list of clients from whatever dataset you want (Google Sheet / Airtable) 2. Run that script, but inject your variables as needed Simple and quick, and allows you to iterate to your solution 😄
s
Thanks a lot, guys! Do you have any experience doing this with Airflow and a custom DAG generator to process my list of clients and throw these DAGs at Meltano?
a
I would be tempted to start with meltano's dag generator https://gitlab.com/meltano/files-airflow/-/blob/master/bundle/orchestrate/dags/meltano.py (this is the code that turns schedules into dags https://meltano.com/docs/orchestration.html#installing-airflow)