Hey there, I am wondering if and how I would accom...
# getting-started
m
Hey there, I am wondering if and how I would accomplish to anonymise PII before/after loading it into the data warehouse like shown on the video here: https://www.talend.com/resources/anonymize-data/
t
Depends on what your infra looks like. If your org allows it, bringing in the data raw into a warehouse and then using dbt to mask / anonymize is a good option.
You can also build some masking into a tap. tap-gmail does this by streaming aggregates https://github.com/Mashey/tap-gmail
a
@manuel_biermann - Most users choose to either leave sensitive data on the source (by deselecting those sensitive fields) or to transform the data into de-identified forms in DBT as @taylor mentions. The ability to obfuscate data in transit is not yet supported, but it’s a topic I’ve also been thinking about recently. I’ve logged this as an idea for new feature - feel free to thumbs up on that issue if you are interested in seeing this added. Feature proposal: Support inline one-way hash transforms (#2651) · Issues · meltano / Meltano · GitLab
m
Thank you for the feedback @taylor and @aaronsteers!
n
fwiw the first strategy (i.e. don’t replicate it in the first place) is very wise 🙂. I really liked how meltano gave you a pretty easy way to opt-in to specific fields on your sources for this reason