@ripe-musician-59933, as mentioned in the other thread about the data type issue, I am working on automating my ETL pipeline as much as possible through Meltano.
The project I am on right now uses sensitive client data. I need to fetch it through different tap configurations and store it separately through different target configurations to make sure about not mixing up the data and managing fine-grained access rights.
Hence, I need to run about 80 to 100 different tap-target configurations, which, to make it even more complex, sometimes change, new ones get added, and others get deleted. I used the search box and investigated options by reading some epics on GitLab and the documentation. I figured out that
plugin inheritance seems to be the approach you guys settled with. However, maintaining all of these configurations myself seems not appropriate. I want to allow the project owner or one of his employees to maintain the settings without technical background knowledge and prevent them from breaking the settings.
To keep it simple, I set up a Google Sheet with these 100 rows to store and easily maintain all information that changes while keeping all shared settings across clients in a single config file. I like the concept of inheritance, but I am still searching to allow the client to maintain config settings themselves without logging into the cloud VM and working on the CLI.
I can programmatically read the google sheet through their API and call the tap-target combination by iterating over the sheet rows and dynamically generating configuration dictionaries with the singer-runner. Additionally, I don’t need to schedule every single tap-target combination individually but rather schedule the orchestration script itself. Also, I want to run the 100 combinations one after another as they are not time-critical, only run once a week, and keep it deployed on a low-cost VM. Running them one after another helps me achieve these goals while also keeping the overall runtime as short as possible. The runs vary in their runtime, so a general buffer time between runs and separate scheduling for each of them would increase the overall runtime.
Do you have a suggestion on how to solve this with Meltano?