just wondering here amp I welcome your thoughts Is it possib Meltano #random

just wondering here & I welcome your thoughts:...

haleemur_ali

09/10/2024, 9:41 PM

just wondering here & I welcome your thoughts: Is it possible to have debug logs emitted that exclude singer

RECORD

messages. This could be useful in contexts where sensitive data is transported using Meltano, and the logs are sent to Datadog or similar, and the entire technical organization may have access to Datadog, exposing the sensitive information. These record-message-redacted debug logs could still be valuable to the data engineer troubleshooting an issue, and in incident response scenarios the quickest way to get insight into what's failing in prod is to run the pipeline in prod with the

--log-level=debug

flag, so the developers have an incentive to sacrifice some privacy / security for technical expediency. Omitting the record messages would help improve the security posture in these settings.

Edgar Ramírez (Arch.dev)

09/11/2024, 12:49 AM

Totally possible. Let me dig an example using

cli.log_config

...

Edgar Ramírez (Arch.dev)

09/11/2024, 12:52 AM

Ok, so something like this should give you what you want:

Copy code

version: 1
disable_existing_loggers: no

formatters:
  json:
    (): meltano.core.logging.json_formatter
handlers:
  console:
    class: logging.StreamHandler
    level: DEBUG
    formatter: json
    stream: "<ext://sys.stderr>"

loggers:
  # Disable logging of tap and target stdout
  meltano.core.block.extract_load:
    level: INFO

root:
  level: DEBUG
  handlers: [console]

minus the JSON formatter stuff. Essentially, configures the

meltano.core.block.extract_load

logger, which is responsible for logging tap/target stdout, to log at INFO and above, which results in exclusion of the Singer stream.

haleemur_ali

09/11/2024, 2:41 AM

Thanks. I'll try this out!

Greg Koutsimpogiorgos

09/18/2024, 6:26 AM

Hello, Thank you @Edgar Ramírez (Arch.dev) for providing this! I was wondering, could this be something to actually used in a production environment as an intermediate level between INFO and DEBUG? Would it have any impact on the performance of the pipeline? @haleemur_ali did you had the opportunity to try it already and maybe share your thoughts as well?

haleemur_ali

09/18/2024, 1:11 PM

Hi Greg, This could be used in the production environment as an intermediate level between info & debug, In the most recent project for a client, I have this config saved as

logging-info-extra.yml

and Meltano is invoked as

Copy code

meltano --log-config=logging-info-extra.yml run ...

While setting up the pipeline, I used to run meltano with debug log level to ensure the data was flowing as expected. The debug logs are very helpful in validating that everything actually works as expected, specially in scenarios where the tap or target is developed in house & the only deployment is at this organization. That unfortunately meant that all the record messages were populated in the logging tool and retained there until the time set by the retention policy is exceeded. Many people in an organization can have access to the logging tool, so its not ideal from a privacy / security perspective to have the actual data end up there. I'd consider the

info-extra

level to be good practice to adopt regardless

👍 1

Edgar Ramírez (Arch.dev)

09/18/2024, 2:01 PM

We could also set the log level of

meltano.core.block.extract_load

INFO

by default in https://github.com/meltano/meltano/blob/c16d37dc4db4e04dd1bedb6073a3489a51641125/src/meltano/core/logging/utils.py#L102-L122

👀 1

4 Views

Open in Slack

Previous Next