haleemur_ali
09/10/2024, 9:41 PMRECORD
messages.
This could be useful in contexts where sensitive data is transported using Meltano, and the logs are sent to Datadog or similar, and the entire technical organization may have access to Datadog, exposing the sensitive information.
These record-message-redacted debug logs could still be valuable to the data engineer troubleshooting an issue, and in incident response scenarios the quickest way to get insight into what's failing in prod is to run the pipeline in prod with the --log-level=debug
flag, so the developers have an incentive to sacrifice some privacy / security for technical expediency.
Omitting the record messages would help improve the security posture in these settings.Edgar Ramírez (Arch.dev)
09/11/2024, 12:49 AMcli.log_config
...Edgar Ramírez (Arch.dev)
09/11/2024, 12:52 AMversion: 1
disable_existing_loggers: no
formatters:
json:
(): meltano.core.logging.json_formatter
handlers:
console:
class: logging.StreamHandler
level: DEBUG
formatter: json
stream: "<ext://sys.stderr>"
loggers:
# Disable logging of tap and target stdout
meltano.core.block.extract_load:
level: INFO
root:
level: DEBUG
handlers: [console]
minus the JSON formatter stuff. Essentially, configures the meltano.core.block.extract_load
logger, which is responsible for logging tap/target stdout, to log at INFO and above, which results in exclusion of the Singer stream.haleemur_ali
09/11/2024, 2:41 AMGreg Koutsimpogiorgos
09/18/2024, 6:26 AMhaleemur_ali
09/18/2024, 1:11 PMlogging-info-extra.yml
and Meltano is invoked as
meltano --log-config=logging-info-extra.yml run ...
While setting up the pipeline, I used to run meltano with debug log level to ensure the data was flowing as expected. The debug logs are very helpful in validating that everything actually works as expected, specially in scenarios where the tap or target is developed in house & the only deployment is at this organization.
That unfortunately meant that all the record messages were populated in the logging tool and retained there until the time set by the retention policy is exceeded.
Many people in an organization can have access to the logging tool, so its not ideal from a privacy / security perspective to have the actual data end up there.
I'd consider the info-extra
level to be good practice to adopt regardlessEdgar Ramírez (Arch.dev)
09/18/2024, 2:01 PMmeltano.core.block.extract_load
at INFO
by default in https://github.com/meltano/meltano/blob/c16d37dc4db4e04dd1bedb6073a3489a51641125/src/meltano/core/logging/utils.py#L102-L122