In advance of our office hours discussion tomorrow...
# singer-tap-development
a
In advance of our office hours discussion tomorrow, here’s the latest draft of the spec documentation for inline stream map transformations. Also available from the MR: Custom mappings in the SDK for targets, including renaming, aliasing, obfuscation, and basic transformations (!92)
This is still early but any feedback is much appreciated.
@visch, @taylor, @douwe_maan 👆
t
This proposal looks really good to me. You answered my salt question and to me eye you covered all of the initial corner cases I could think of.
v
Wow, great work. Your writing here is great! Only question I have is since
stream_maps
is passed in via config.json the idea with meltano is to have it be passed in as a config in yaml so is this right ish? matching your config.json
Copy code
config:
  host: yadayada
  stream_maps:
    customers:
      email: null
      email_domain: ""owner_email.split('@')[-1]"
      email_hash: "md5(config['hash_seed'] + owner_email)"
  steam_map_config:
    hash_seed: "01AWZh7A6DzGm6iJZZ2T"
I have a target I need to write, and a use case for this in the next 4 weeks. I'll be giving this a spin, doing my best to use this instead of doing it myself hackily. Probably won't be able to open source this one though :/, I'll try my best to get it opened but no guarantees there. If this works well for me I have lots of ideas as you've heard :D I was really curious about the hash seeding portion, glad you answered that! (It irks me when people think hashes are good enough without seeds) https://gitlab.com/meltano/singer-sdk/-/blob/63-custom-mappings-in-the-sdk-for-targets-including-renaming-and-basic-expressions/docs/stream_maps.md#remove-all-undeclared-streams-or-properties - Beautiful https://gitlab.com/meltano/singer-sdk/-/blob/63-custom-mappings-in-the-sdk-for-targets-including-renaming-and-basic-expressions/docs/stream_maps.md#constructing-expressions - Beautiful https://gitlab.com/meltano/singer-sdk/-/blob/63-custom-mappings-in-the-sdk-for-targets-including-renaming-and-basic-expressions/docs/stream_maps.md#filtering-out-records-from-a-stream-using-__filter__-operation - Beautiful https://gitlab.com/meltano/singer-sdk/-/blob/63-custom-mappings-in-the-sdk-for-targets-including-renaming-and-basic-expressions/docs/stream_maps.md#security-implications-for-low-trust-environments - How scary is this from a source data side of things? Someone in Google changes their name to "eval rm -rf /" should be fine theoretically. I'll have to look closer, but It sounds like the risk is very low from a source data side of things (ie malicious source data shouldn't be able to do anything)
a
@visch - Thanks for this detailed feedback and for adding your voice in the call today. To your questions…
Probably won’t be able to open source this one though
Not a problem, man. 👍 You’re already improving the SDK just by using it, kicking the tires, and reporting and/or fixing bugs that you may run into along the way! 🙂
I was really curious about the hash seeding portion, glad you answered that! (It irks me when people think hashes are good enough without seeds)
Ditto. This is top of mind for me too, as I’ve wished for it in past lives when I was creating and managing data pipelines myself. (Even logged a request for it back in Sep 2020.)
How scary is this from a source data side of things?
I hadn’t thought of code injection via upstream source data itself so I’m glad you’ve raised this. But no, that should not be a threat. We explicitly control which names and functions are permitted to called in the expression, and unless the user of the tap tries to explicitly interpret a source data string as code instruction, this would not be an issue. And further, if the
simpleeval
library does not already do this (which I think it might), we can and should just remove
eval
and similar from the callable functions allowed in expressions.
For anyone who missed the discussion on stream_maps yesterday, the recap blog post is now live, and I’ve added links in the video description so you can quickly jump to the appropriate topic(s). We plan to launch this feature this month and your feedback ahead of launch is very much appreciated. https://meltano.com/blog/2021/06/03/office-hours-recap-2021-06-02/
d
Excited for the inline mapping 🙌, the obfuscation example is exactly what I'm looking for
a
@dan_ladd - That’s great confirmation for us to hear. If we could have you as one of our first testers, would you be using the middle-layer plugin (between a non-SDK tap and target), or are you thinking primarily to use this in a combination of an SDK-based tap or target?
d
Ah I missed that this is for within the SDK first. ATM, I would probably be interested in the "middle layer" transformations, but whenever I get comfortable using the SDK for a tap, I could see myself using it there as well.
a
Yeah, not a problem at all. 👍 I think the middle-layer solution will be the most popular at the beginning, since that option will basically work with all taps/targets.
I’ll let you know once we launch the solo mapper (eta ~2 weeks) - if you have time to be a tester for us, that will be a big help.
d
d
Will gladly do!
a
That’s right - as @douwe_maan links above, our vision is is to let you configure this directly within meltano.yml - so when all is landed, you won’t need to think about the fact that there’s another plugin in the middle. It’ll just be another meltano feature from the user perspective, but in the background it’s using the middle plugin layer. 😄