Is there any way to run arbitrary code at the end of a Melta Meltano #getting-started

Is there any way to run arbitrary code at the end ...

garret_cree

04/27/2023, 3:15 PM

Is there any way to run arbitrary code at the end of a Meltano job, to do things like report on job stats and check whether the job was successful and then record the outcome? In this case I would want to be able to access a stream class's properties before they get cleared so I can record them elsewhere.

visch

04/27/2023, 3:33 PM

meltano run tap-name target-name arbitrary-code

garret_cree

04/27/2023, 3:58 PM

Apologies for the dumb question, but what else would I need to do to make this work? Are there any meltano.yml properties I would need to define? Any methods to be overridden in Tap or Stream classes?

visch

04/27/2023, 7:07 PM

What have you tried? 😄

garret_cree

04/27/2023, 7:12 PM

Sorry, I got sidetracked today so haven't tried anything yet. I just have a feeling it's not gonna be straightforward. I'm eventually going to try adding a plugin to meltano.yml that contains the code (based on this). I'm finding the Meltano docs pretty sparse overall for cases outside the basic source-to-target use case, so I find it challenging to know where to even start on things like this.

visch

04/27/2023, 7:15 PM

https://docs.meltano.com/reference/command-line-interface#add , custom. Arbitrary can mean any executable as well, or python project. Can be easy or complicated really all depends on what you're after 🤷 Much easier to help after you've tried to do something and can't get it working, then you can share what didn't work for you

visch

04/27/2023, 7:16 PM

Also note that this works as well

Copy code

#/bin/bash
meltano run tap-name target-name
echo "Arbitrary command here is doing an echo!"

garret_cree

04/27/2023, 7:19 PM

One complication here is that I want to be able to access the properties of the stream (eg, a final count of records fetched from an API, stored in a property of my custom stream class like

MyStream.total_records_fetched

), so the execution environment needs to have access to that property so it can't be a totally independent context

pat_nadolny

04/28/2023, 12:53 PM

@garret_cree related to accessing your metrics of your run - check out https://sdk.meltano.com/en/latest/implementation/logging.html#custom-logging-configuration. Singer has a concept of metric messages where this type of metadata is stored. You can set up a logger to store them somewhere then your post process script can read them and do whatever you want with them

pat_nadolny

04/28/2023, 12:56 PM

Regarding the post process script the concept of using utility plugins is probably the most common. You could probably figure out a way to have meltano run a simple .py script but I've found it pretty easy to use the EDK to build a quick CLI in a standardized meltano plugin way e.g. https://github.com/meltano/squared/tree/main/data/utilities/snowflake-cloner

pat_nadolny

04/28/2023, 12:59 PM

Also can you share more details on your use case for this? I think this could be a generally useful feature for all users so eventually getting it baked into meltano itself would be awesome. Today theres no real support for post run stats or anything like that but its come up a bunch of times.

garret_cree

04/28/2023, 2:25 PM

Thanks for the extra info @pat_nadolny! In my case we want to do two things: 1. Compare the number of records written to the DB with a number that comes from the API or a metric gathered during the run (eg, a count of order records processed compared with the number of rows written to the orders table for this job run), and log a warning if the numbers are substantially different. This could arise because of integrity constraint violations in the loading step, etc 2. Set a

synced

flag to True in the DB after the job has finished. This one doesn't depend on any Stream class properties, but we would do it in the same post-run code anyway

Open in Slack

Previous Next