Hi all, has anyone had the need to mask production...
# plugins-general
a
Hi all, has anyone had the need to mask production data with Meltano yet? Seems like a pretty common use case - maybe there are some good ideas here. As a first thought, I'm wondering if it would be generally useful to create a mapper plugin that uses faker (https://faker.readthedocs.io/en/master/). That mapper could deal with the PII or client account names - I've explored Mostly.ai but didn't have much joy yet to be honest.
v
For all my use cases for production PII I just don't pull the PII data, either use a mapper to filter it or just tell the source to stay away from it
a
I haven't had the personal application opportunity, but I definitely agree with mappers for that use case. (That's probably the main use case for them, actually.)
I thought about Faker and similar libraries for tap-smoke-test, but not for the mapper class. I actually really like the idea though and I think it could be worth adding a built-in
faker()
function to our SDK built-in mapper functions - like we have the helper function
md5()
for instance.
There's a counter argument about dependencies, but this one doesn't seem to have much in the way of additional dependency requirements: https://github.com/joke2k/faker/blob/master/setup.py#L73-L77
e
Those two deps are already in the sdk 🙂
a
As of today, you should already be able to use a hashing function like
md5()
but having data that "looks like" real data could be really handy - especially for test environments! chef kiss
@aaron_phethean - would you be interested in logging an issue for this and/or perhaps contributing the feature? If added to the SDK, you also would automatically get access to that in taps and targets that use the SDK - even without having a standalone mapper between them.
a
100% will do that first thing in the morning. Maybe PR by the afternoon 😁