Hi all, I ran into a small issue with orchestratin...
# random
i
Hi all, I ran into a small issue with orchestrating Meltano from Prefect, and wondered if anyone here had dealt with a similar problem. Invoking Meltano using prefect-shell works, but the logs are less than ideal. Meltano logs using its own logging config, then all the output is captured by prefect and logged to stdout. Is there a way to make the meltano logs go directly to the "outer" logging setup, with all metadata intact? I suppose one way would be to invoke meltano as a python function instead of a standalone cli tool, but that seems tricky since it's a click app and those aren't meant to be called as python functions.
Example log output:
Copy code
$ python flows/meltano_run.py 
13:46:35.544 | INFO    | prefect.engine - Created flow run 'nondescript-bird' for flow 'meltano-run'
13:46:35.546 | INFO    | Flow run 'nondescript-bird' - View at <snip>
13:46:37.367 | INFO    | Flow run 'nondescript-bird' - Running "meltano run"
13:46:37.373 | INFO    | Flow run 'nondescript-bird' - PID 9490 triggered with 1 commands running inside the PosixPath('/workspace/dp-brreg-example/meltano') directory.
13:46:38.253 | INFO    | Flow run 'nondescript-bird' - PID 9490 stream output:
2023-12-01T13:46:38.252976Z [info     ] Environment 'prod' is active
13:46:38.387 | INFO    | Flow run 'nondescript-bird' - PID 9490 stream output:
2023-12-01T13:46:38.387236Z [warning  ] No state was found, complete import.
13:46:39.675 | INFO    | Flow run 'nondescript-bird' - PID 9490 stream output:
2023-12-01T13:46:39.675307Z [info     ] 2023-12-01 13:46:39,675 | INFO     | tap-brreg            | Beginning incremental sync of 'enheter'... cmd_type=elb consumer=False name=tap-brreg producer=True stdio=stderr string_id=tap-brreg
13:46:39.677 | INFO    | Flow run 'nondescript-bird' - PID 9490 stream output:
2023-12-01T13:46:39.676438Z [info     ] 2023-12-01 13:46:39,676 | INFO     | tap-brreg            | Tap has custom mapper. Using 1 provided map(s). cmd_type=elb consumer=False name=tap-brreg producer=True stdio=stderr string_id=tap-brreg
13:46:40.633 | INFO    | Flow run 'nondescript-bird' - PID 9490 stream output:
2023-12-01T13:46:40.632724Z [info     ] 2023-12-01 13:46:40,632 | INFO     | tap_brreg.client     | Downloaded 1048576 of 94913233 bytes cmd_type=elb consumer=False name=tap-brreg producer=True stdio=stderr string_id=tap-brreg
13:46:40.813 | INFO    | Flow run 'nondescript-bird' - PID 9490 stream output:
2023-12-01T13:46:40.813572Z [info     ] 2023-12-01 13:46:40,813 | INFO     | target-snowflake     | Target 'target-snowflake' is listening for input from tap. cmd_type=elb consumer=True name=target-snowflake producer=False stdio=stderr string_id=target-snowflake
13:46:40.815 | INFO    | Flow run 'nondescript-bird' - PID 9490 stream output:
2023-12-01T13:46:40.813894Z [info     ] 2023-12-01 13:46:40,813 | INFO     | target-snowflake     | Initializing 'target-snowflake' target sink... cmd_type=elb consumer=True name=target-snowflake producer=False stdio=stderr string_id=target-snowflake
2023-12-01T13:46:40.814033Z [info     ] 2023-12-01 13:46:40,813 | INFO     | target-snowflake     | Initializing target sink for stream 'enheter'... cmd_type=elb consumer=True name=target-snowflake producer=False stdio=stderr string_id=target-snowflake
13:46:40.916 | INFO    | Flow run 'nondescript-bird' - PID 9490 stream output:
2023-12-01T13:46:40.915606Z [info     ] 2023-12-01 13:46:40,915 | INFO     | tap_brreg.client     | Downloaded 2097152 of 94913233 bytes cmd_type=elb consumer=False name=tap-brreg producer=True stdio=stderr string_id=tap-brreg
v
Most people I've seen write a log formatter in their orchestrator. I think in Prefect you write some regex and parse out the attributes you're after. I think I see what you're thinking, probably doable with a custom log handler but I haven't ventured down that path https://docs.meltano.com/reference/settings/#clilog_config
The other thing to remember is you're not going to get access to the tap/target "log" as they are actually seperate processes and meltano is aggregating them for you so even with a custom handler I'm not sure you'd get what you're after
i
Cool, thanks. I see I have a few tools available to me with logging formatters, so I'll look into that. I also see there's a feature request for programmatic invocation of meltano cli, which I think would solve my problem: https://github.com/meltano/meltano/issues/7769