Discussion about working on <target-s3 (all)> :thr...
# singer-target-development
p
Discussion about working on target-s3 (all) 🧵
cc @andy_crowe
I know I mentioned that I was going to start on it but I havent made any real progress yet. You can feel free to pick it up if youre interested!
I put it in the original issue but it was brought up again today so its top of mind, it might be helpful to leverage the smart_open package for this. Then the meat of the code is around writing to different file formats, which @aaronsteers mentioned might get pulled into the SDK eventually too! So if that all works out this could be a very slim (and high impact) target 🚀 in the end
a
@pat_nadolny — what repo should this live in? personal repo to start? I’ve got a good chunk of foundational code for the
parquet
format (and pattern for future formats) here — still lots of tweaking needed, but I think this will be functional for me at the moment, feedback welcome 🙂
@aaronsteers — I saw your comment re:
smart_open
, I think we could use this as a foundation and build a multi-cloud, multi-format target with this pattern. I didn’t quickly see how to save
parquet
with the
smart_open
library, would be happy to refactor it.
a
Hi, @andy_crowe - thanks for your comment. I don't know if either system can natively generate the
parquet
file format.
DuckDB
or
arrow
could in theory be used to generate the Parquet dataset, then
smart_open
or
PyFilesystem
could be used to upload/write the file bytes to the respective cloud. I've added a comment to this effect to my new issue proposal for a generic multi-cloud target. To be clear, there may be other challenges I've not foreseen. In total, the dev effort could be significant... but I do feel that it would be a valuable addition to our currently available targets if there's a path forward here.
p
what repo should this live in? personal repo to start?
@andy_crowe awesome to hear you've made progress! Its up to you really. Many people leave them in their own personal or organization's github repo. But if you dont want it in your personal repo, we've created MeltanoLabs for that exact reason (see this blog post for what we view as the connector ownership models) , you have the option to have the repo live there but you'd still be the primary maintainer. Or you could always wait and migrate it out of your personal namespace down the line. Its up to you
@andy_crowe I created a few issues in your repo for some stuff I saw while I was testing it out. Are you still working on that target?
a
Awesome, thank you @pat_nadolny! Yes, this is the primary target we use in production (parquet/json). However, I’ve been using/modifying the target in our private repository — I’ll can get the latest pushed to GitHub, I think that will solve a couple of the issues you submitted
GitHub is updated with latest
p
@andy_crowe thanks for the update! I'm able to get json working now but I'm still having trouble with parquet but its slightly different now, I'll open an issue to describe it.