aaronsteers
06/01/2021, 6:51 PMaaronsteers
06/01/2021, 6:53 PMaaronsteers
06/01/2021, 6:53 PMtaylor
06/01/2021, 8:22 PMvisch
06/02/2021, 1:21 PMstream_maps
is passed in via config.json the idea with meltano is to have it be passed in as a config in yaml
so is this right ish? matching your config.json
config:
host: yadayada
stream_maps:
customers:
email: null
email_domain: ""owner_email.split('@')[-1]"
email_hash: "md5(config['hash_seed'] + owner_email)"
steam_map_config:
hash_seed: "01AWZh7A6DzGm6iJZZ2T"
I have a target I need to write, and a use case for this in the next 4 weeks. I'll be giving this a spin, doing my best to use this instead of doing it myself hackily. Probably won't be able to open source this one though :/, I'll try my best to get it opened but no guarantees there. If this works well for me I have lots of ideas as you've heard :D
I was really curious about the hash seeding portion, glad you answered that! (It irks me when people think hashes are good enough without seeds)
https://gitlab.com/meltano/singer-sdk/-/blob/63-custom-mappings-in-the-sdk-for-targets-including-renaming-and-basic-expressions/docs/stream_maps.md#remove-all-undeclared-streams-or-properties - Beautiful
https://gitlab.com/meltano/singer-sdk/-/blob/63-custom-mappings-in-the-sdk-for-targets-including-renaming-and-basic-expressions/docs/stream_maps.md#constructing-expressions - Beautiful
https://gitlab.com/meltano/singer-sdk/-/blob/63-custom-mappings-in-the-sdk-for-targets-including-renaming-and-basic-expressions/docs/stream_maps.md#filtering-out-records-from-a-stream-using-__filter__-operation - Beautiful
https://gitlab.com/meltano/singer-sdk/-/blob/63-custom-mappings-in-the-sdk-for-targets-including-renaming-and-basic-expressions/docs/stream_maps.md#security-implications-for-low-trust-environments - How scary is this from a source data side of things? Someone in Google changes their name to "eval rm -rf /" should be fine theoretically. I'll have to look closer, but It sounds like the risk is very low from a source data side of things (ie malicious source data shouldn't be able to do anything)aaronsteers
06/02/2021, 6:40 PMProbably won’t be able to open source this one thoughNot a problem, man. 👍 You’re already improving the SDK just by using it, kicking the tires, and reporting and/or fixing bugs that you may run into along the way! 🙂
I was really curious about the hash seeding portion, glad you answered that! (It irks me when people think hashes are good enough without seeds)Ditto. This is top of mind for me too, as I’ve wished for it in past lives when I was creating and managing data pipelines myself. (Even logged a request for it back in Sep 2020.)
How scary is this from a source data side of things?I hadn’t thought of code injection via upstream source data itself so I’m glad you’ve raised this. But no, that should not be a threat. We explicitly control which names and functions are permitted to called in the expression, and unless the user of the tap tries to explicitly interpret a source data string as code instruction, this would not be an issue. And further, if the
simpleeval
library does not already do this (which I think it might), we can and should just remove eval
and similar from the callable functions allowed in expressions.aaronsteers
06/03/2021, 8:40 PMdan_ladd
06/03/2021, 8:46 PMaaronsteers
06/03/2021, 8:53 PMdan_ladd
06/03/2021, 9:06 PMaaronsteers
06/03/2021, 9:07 PMaaronsteers
06/03/2021, 9:08 PMdouwe_maan
06/03/2021, 9:09 PMdan_ladd
06/03/2021, 9:09 PMaaronsteers
06/03/2021, 9:11 PM