Hoping to get some feedback on an idea I had for u...
# getting-started
d
Hoping to get some feedback on an idea I had for using Meltano. I'd like to have a daily schedule to incrementally extract from tap-jdbc and load to target-s3-parquet. If that is successful then I want to run some R code that adds features and builds summary models from the S3 files and inserts some (much smaller) analytical tables into Postgres. Should I just develop that entire final transformation step in R and wrap it in a shell script, or is there a better way using the Meltano SDK? How do I configure an orchestrator job for that second part, with a run dependency on the first part, in the case where it is just a shell script? Thanks in advance.
d
The Meltano SDK is just for Singer taps, mappers, and transforms, so for your R transformation step a shell or Python wrapper would be more appropriate. You can then add it to your Meltano project as a Utility plugin (using the
executable
property instead of
pip_url
) and run it using
meltano run
after your EL steps:
meltano run tap-jdbc - target-s3-parquet my-r-transformation
. That will run them serially. If you’d like more control, you can set up Airflow, and define a DAG with one step that runs
meltano run tap target
and another that invokes the script
d
Thank you! The Utility plugin docs is what I needed to find. Perhaps that documentation should be enhanced to mention this capability of using a non-python executable.