Hey all can you write custom log formatters for Meltano like Meltano #singer-taps

Hey all, can you write custom log formatters for M...

Rhys Davies

01/16/2024, 7:04 AM

Hey all, can you write custom log formatters for Meltano like you can in other python logging adherent libs/tools? I’ve got a fairly simple transformation that I’ve written a formatter for but I can’t get it to be recognized in my logging.yaml using the normal python import path rules. I’m simple doing this:

Copy code

{"run_id": "354f78d4-161d-4a72-a545-7e1807965a37", "state_id": "2024-01-16T065607--tap-name", "stdio": "stderr", "cmd_type": "extractor", "name": "tap-identity", "event": "time=2024-01-15 22:56:13 name=singer level=INFO message=METRIC: {\"type\": \"counter\", \"metric\": \"record_count\", \"value\": 27, \"tags\": {}}", "level": "info", "timestamp": "2024-01-16T06:56:13.224071Z"}

what I want is the below where METRIC is pulled out as an event_type and even is just a subobject of the json with objects inside of it (can't be bothered to get rid of all the string quotes):

{"run_id": "354f78d4-161d-4a72-a545-7e1807965a37", "state_id": "2024-01-16T065607--tap-name", "stdio": "stderr", "cmd_type": "extractor", "name": "tap-identity", "event": {"event_type": "METRIC", "time":"2024-01-15 22:56:13", "name": "singer", "level":"INFO" "message":{\"type\": \"counter\", \"metric\": \"record_count\", \"value\": 27, \"tags\": {}}}, "level": "info", "timestamp": "2024-01-16T06:56:13.224071Z"}

Rhys Davies

01/16/2024, 7:05 AM

As I say I have decently testable code for this, but I’ve been banging my head against it working with Meltano, I’m probably doing something silly because it’s late here

Denis I.

01/16/2024, 12:39 PM

I solve this using env vars related to tap/target’s logging system, for example:

SINGER_SDK_LOG_CONFIG

LOGGING_CONF_FILE

So, that means you able to define log config path for tap/target itself. Unfortunately tap/target sources use different logging systems. The easiest way I found is to standardise text logs output format for every tap/target via their log configs respectively and then parse produced logs somewhere downstream.

Rhys Davies

01/16/2024, 1:07 PM

Thanks, I’ll check them out. I was trying to parse my logs in Datadog but I can’t seem to get a parser to work with JSON data that has structured, non-json inside of it (this is quite easy with both Humio and Elastic) and was trying to avoid putting a logstash/fluentd/some sort of collector in between the two

Rhys Davies

01/16/2024, 1:07 PM

not the hugest chore, just more complex if I can work it out in tools I have

Denis I.

01/16/2024, 2:03 PM

In the end it’s just a tap/target’s log messages incapsulated in meltano’s log messages with their own log formats. Also the METRIC message has it’s own format standard. One of the alternative options I’ve considered was to write target’s logs into separate stream (using logging config) without meltano’s wrapper log message. Also I see @Edgar Ramírez (Arch.dev) was building singer metric processing project, maybe he has some better ideas.

Rhys Davies

01/16/2024, 7:32 PM

Having trouble getting Meltano to see my

SINGER_SDK_LOG_CONFIG

when I run meltano locally like so:

SINGER_SDK_LOG_CONFIG=singer_sdk_logging.yaml meltano el tap-blah target-blah

- is that how you are doing it? It doesn’t error out, but it also doesn’t change anything about my logging conf

Rhys Davies

01/16/2024, 7:33 PM

it seems like the “normal” logging.yaml isn’t 1:1 with normal Python dictconfig, because as far as my testing can tell you can only modify the format in the

default

formatter, otherwise you have to use the pre-built formatters that come with meltano

Rhys Davies

01/16/2024, 7:34 PM

not the hugest deal, I probably should run Fluentd with it anyway for a variety of reasons 🙂

Denis I.

01/17/2024, 8:22 AM

It depends on the tap/target’s internal approach to logging, some of them even have their own env var. If I remember correctly, meltano’s sdk also use default Python logging lib with no extended formatters. For my goal I researched all my taps & targets sources and setup logging configs respectively. In my case it was two different config formats using two different env vars declared in .env file.

Edgar Ramírez (Arch.dev)

01/20/2024, 12:34 AM

FWIW I've got a PR to make it easier to extract the metric as JSON in the SDK: https://github.com/meltano/sdk/pull/2162

🔥 1

Edgar Ramírez (Arch.dev)

01/20/2024, 12:35 AM

https://meltano-sdk--2162.org.readthedocs.build/en/2162/implementation/logging.html

Rhys Davies

01/23/2024, 12:17 AM

Hey all, thought I’d loop back here: I ended up figuring out what I needed to do in Datadog, which was to create a secondary Grokparser that could parse the message field out how I wanted it to be. Now I have nicely structured logs for various things including schema mutations. I noticed that the HTTPHandler doesn’t respect the

formatter

directive at all, because I wanted to send my logs via HTTP to logstash and mutate them there, but it just sends everything as a minified string and key=val.

Rhys Davies

01/23/2024, 12:18 AM

I’ll take a look at the issues list on github and submit one if there isn’t but I thought I’d loop back here and thank you both but also share my findings 🙂

🙌 1

Edgar Ramírez (Arch.dev)

01/23/2024, 4:08 PM

Thanks for looping back @Rhys Davies!

4 Views

Open in Slack

Previous Next