Is there any way to run arbitrary code at the end ...
# getting-started
g
Is there any way to run arbitrary code at the end of a Meltano job, to do things like report on job stats and check whether the job was successful and then record the outcome? In this case I would want to be able to access a stream class's properties before they get cleared so I can record them elsewhere.
v
meltano run tap-name target-name arbitrary-code
g
Apologies for the dumb question, but what else would I need to do to make this work? Are there any meltano.yml properties I would need to define? Any methods to be overridden in Tap or Stream classes?
v
What have you tried? 😄
g
Sorry, I got sidetracked today so haven't tried anything yet. I just have a feeling it's not gonna be straightforward. I'm eventually going to try adding a plugin to meltano.yml that contains the code (based on this). I'm finding the Meltano docs pretty sparse overall for cases outside the basic source-to-target use case, so I find it challenging to know where to even start on things like this.
v
https://docs.meltano.com/reference/command-line-interface#add , custom. Arbitrary can mean any executable as well, or python project. Can be easy or complicated really all depends on what you're after 🤷 Much easier to help after you've tried to do something and can't get it working, then you can share what didn't work for you
Also note that this works as well
Copy code
#/bin/bash
meltano run tap-name target-name
echo "Arbitrary command here is doing an echo!"
g
One complication here is that I want to be able to access the properties of the stream (eg, a final count of records fetched from an API, stored in a property of my custom stream class like
MyStream.total_records_fetched
), so the execution environment needs to have access to that property so it can't be a totally independent context
p
@garret_cree related to accessing your metrics of your run - check out https://sdk.meltano.com/en/latest/implementation/logging.html#custom-logging-configuration. Singer has a concept of metric messages where this type of metadata is stored. You can set up a logger to store them somewhere then your post process script can read them and do whatever you want with them
Regarding the post process script the concept of using utility plugins is probably the most common. You could probably figure out a way to have meltano run a simple .py script but I've found it pretty easy to use the EDK to build a quick CLI in a standardized meltano plugin way e.g. https://github.com/meltano/squared/tree/main/data/utilities/snowflake-cloner
Also can you share more details on your use case for this? I think this could be a generally useful feature for all users so eventually getting it baked into meltano itself would be awesome. Today theres no real support for post run stats or anything like that but its come up a bunch of times.
g
Thanks for the extra info @pat_nadolny! In my case we want to do two things: 1. Compare the number of records written to the DB with a number that comes from the API or a metric gathered during the run (eg, a count of order records processed compared with the number of rows written to the orders table for this job run), and log a warning if the numbers are substantially different. This could arise because of integrity constraint violations in the loading step, etc 2. Set a
synced
flag to True in the DB after the job has finished. This one doesn't depend on any Stream class properties, but we would do it in the same post-run code anyway