re Data Deanonymization followup to <https meltano slack com Meltano #troubleshooting

Join Slack

re Data Deanonymization followup to <https://melta...

# troubleshooting

visch

08/20/2021, 7:13 PM

re Data Deanonymization followup to https://meltano.slack.com/archives/C01TCRBBJD7/p1629486263231400

visch

08/20/2021, 7:30 PM

Been doing a lot of thinking about what the "right" way is to maintain taps/targets for the long term. End goal: Tested Tap/Targets all work together perfectly 🙂 One of the biggest challenges that's been brought up multiple times is "simply" the data. The data that comes out of your source system is unique to that instance's data. One part of the solution that would help with this situation at least would be if you could get a group of people that use tap-quickbooks to offer up their deanoymized data so that we could test against it. Deanoymized data definition: Parse every record coming out of a tap, and generate data following the Schema and "shape" of the existing data. An MVP of this would be pretty simply Schema of record a: {string maxlength: 7000}, generate a random string with 7000 characters in place of the actual data that's coming out of tap-quickbooks. Why go through the trouble? Situations like this could be tested against without needing to see anyone's proprietary data. You could run test suites against large amounts of anonymized data to be sure changes to targets end up with the results that you're looking for

visch

08/20/2021, 7:31 PM

I"m guessing my idea here isn't new, anyone have pointers to people who have tried things around this? I think it's going to have to be a part of the tap /target service solution I'm going with anyway. Part of maintaining your tap/target includes making sure your data will continue to work or something along those lines

visch

08/20/2021, 7:33 PM

TLDR of everything above: Test actual data by storing a close copy of it that's "deidentified". Everything would work perfectly then 😉

Open in Slack

Previous Next