just wondering here & I welcome your thoughts:...
# random
h
just wondering here & I welcome your thoughts: Is it possible to have debug logs emitted that exclude singer
RECORD
messages. This could be useful in contexts where sensitive data is transported using Meltano, and the logs are sent to Datadog or similar, and the entire technical organization may have access to Datadog, exposing the sensitive information. These record-message-redacted debug logs could still be valuable to the data engineer troubleshooting an issue, and in incident response scenarios the quickest way to get insight into what's failing in prod is to run the pipeline in prod with the
--log-level=debug
flag, so the developers have an incentive to sacrifice some privacy / security for technical expediency. Omitting the record messages would help improve the security posture in these settings.
e
Totally possible. Let me dig an example using
cli.log_config
...
Ok, so something like this should give you what you want:
Copy code
version: 1
disable_existing_loggers: no

formatters:
  json:
    (): meltano.core.logging.json_formatter
handlers:
  console:
    class: logging.StreamHandler
    level: DEBUG
    formatter: json
    stream: "<ext://sys.stderr>"

loggers:
  # Disable logging of tap and target stdout
  meltano.core.block.extract_load:
    level: INFO

root:
  level: DEBUG
  handlers: [console]
minus the JSON formatter stuff. Essentially, configures the
meltano.core.block.extract_load
logger, which is responsible for logging tap/target stdout, to log at INFO and above, which results in exclusion of the Singer stream.
h
Thanks. I'll try this out!
g
Hello, Thank you @Edgar Ramírez (Arch.dev) for providing this! I was wondering, could this be something to actually used in a production environment as an intermediate level between INFO and DEBUG? Would it have any impact on the performance of the pipeline? @haleemur_ali did you had the opportunity to try it already and maybe share your thoughts as well?
h
Hi Greg, This could be used in the production environment as an intermediate level between info & debug, In the most recent project for a client, I have this config saved as
logging-info-extra.yml
and Meltano is invoked as
Copy code
meltano --log-config=logging-info-extra.yml run ...
While setting up the pipeline, I used to run meltano with debug log level to ensure the data was flowing as expected. The debug logs are very helpful in validating that everything actually works as expected, specially in scenarios where the tap or target is developed in house & the only deployment is at this organization. That unfortunately meant that all the record messages were populated in the logging tool and retained there until the time set by the retention policy is exceeded. Many people in an organization can have access to the logging tool, so its not ideal from a privacy / security perspective to have the actual data end up there. I'd consider the
info-extra
level to be good practice to adopt regardless
👍 1
e
We could also set the log level of
meltano.core.block.extract_load
at
INFO
by default in https://github.com/meltano/meltano/blob/c16d37dc4db4e04dd1bedb6073a3489a51641125/src/meltano/core/logging/utils.py#L102-L122
👀 1