Hey all, can you write custom log formatters for M...
# singer-taps
r
Hey all, can you write custom log formatters for Meltano like you can in other python logging adherent libs/tools? I’ve got a fairly simple transformation that I’ve written a formatter for but I can’t get it to be recognized in my logging.yaml using the normal python import path rules. I’m simple doing this:
Copy code
{"run_id": "354f78d4-161d-4a72-a545-7e1807965a37", "state_id": "2024-01-16T065607--tap-name", "stdio": "stderr", "cmd_type": "extractor", "name": "tap-identity", "event": "time=2024-01-15 22:56:13 name=singer level=INFO message=METRIC: {\"type\": \"counter\", \"metric\": \"record_count\", \"value\": 27, \"tags\": {}}", "level": "info", "timestamp": "2024-01-16T06:56:13.224071Z"}

what I want is the below where METRIC is pulled out as an event_type and even is just a subobject of the json with objects inside of it (can't be bothered to get rid of all the string quotes):

{"run_id": "354f78d4-161d-4a72-a545-7e1807965a37", "state_id": "2024-01-16T065607--tap-name", "stdio": "stderr", "cmd_type": "extractor", "name": "tap-identity", "event": {"event_type": "METRIC", "time":"2024-01-15 22:56:13", "name": "singer", "level":"INFO" "message":{\"type\": \"counter\", \"metric\": \"record_count\", \"value\": 27, \"tags\": {}}}, "level": "info", "timestamp": "2024-01-16T06:56:13.224071Z"}
As I say I have decently testable code for this, but I’ve been banging my head against it working with Meltano, I’m probably doing something silly because it’s late here
d
I solve this using env vars related to tap/target’s logging system, for example:
SINGER_SDK_LOG_CONFIG
LOGGING_CONF_FILE
So, that means you able to define log config path for tap/target itself. Unfortunately tap/target sources use different logging systems. The easiest way I found is to standardise text logs output format for every tap/target via their log configs respectively and then parse produced logs somewhere downstream.
r
Thanks, I’ll check them out. I was trying to parse my logs in Datadog but I can’t seem to get a parser to work with JSON data that has structured, non-json inside of it (this is quite easy with both Humio and Elastic) and was trying to avoid putting a logstash/fluentd/some sort of collector in between the two
not the hugest chore, just more complex if I can work it out in tools I have
d
In the end it’s just a tap/target’s log messages incapsulated in meltano’s log messages with their own log formats. Also the METRIC message has it’s own format standard. One of the alternative options I’ve considered was to write target’s logs into separate stream (using logging config) without meltano’s wrapper log message. Also I see @Edgar Ramírez (Arch.dev) was building singer metric processing project, maybe he has some better ideas.
r
Having trouble getting Meltano to see my
SINGER_SDK_LOG_CONFIG
when I run meltano locally like so:
SINGER_SDK_LOG_CONFIG=singer_sdk_logging.yaml meltano el tap-blah target-blah
- is that how you are doing it? It doesn’t error out, but it also doesn’t change anything about my logging conf
it seems like the “normal” logging.yaml isn’t 1:1 with normal Python dictconfig, because as far as my testing can tell you can only modify the format in the
default
formatter, otherwise you have to use the pre-built formatters that come with meltano
not the hugest deal, I probably should run Fluentd with it anyway for a variety of reasons 🙂
d
It depends on the tap/target’s internal approach to logging, some of them even have their own env var. If I remember correctly, meltano’s sdk also use default Python logging lib with no extended formatters. For my goal I researched all my taps & targets sources and setup logging configs respectively. In my case it was two different config formats using two different env vars declared in .env file.
e
FWIW I've got a PR to make it easier to extract the metric as JSON in the SDK: https://github.com/meltano/sdk/pull/2162
🔥 1
r
Hey all, thought I’d loop back here: I ended up figuring out what I needed to do in Datadog, which was to create a secondary Grokparser that could parse the message field out how I wanted it to be. Now I have nicely structured logs for various things including schema mutations. I noticed that the HTTPHandler doesn’t respect the
formatter
directive at all, because I wanted to send my logs via HTTP to logstash and mutate them there, but it just sends everything as a minified string and key=val.
I’ll take a look at the issues list on github and submit one if there isn’t but I thought I’d loop back here and thank you both but also share my findings 🙂
🙌 1
e
Thanks for looping back @Rhys Davies!