Discussion about working on <https github com MeltanoLabs Si Meltano #singer-target-development

Join Slack

Discussion about working on <target-s3 (all)> :thr...

# singer-target-development

pat_nadolny

10/28/2022, 6:21 PM

Discussion about working on target-s3 (all) 🧵

pat_nadolny

10/28/2022, 6:21 PM

cc @andy_crowe

pat_nadolny

10/28/2022, 6:22 PM

I know I mentioned that I was going to start on it but I havent made any real progress yet. You can feel free to pick it up if youre interested!

pat_nadolny

10/28/2022, 6:26 PM

I put it in the original issue but it was brought up again today so its top of mind, it might be helpful to leverage the smart_open package for this. Then the meat of the code is around writing to different file formats, which @aaronsteers mentioned might get pulled into the SDK eventually too! So if that all works out this could be a very slim (and high impact) target 🚀 in the end

andy_crowe

10/31/2022, 9:07 PM

@pat_nadolny — what repo should this live in? personal repo to start? I’ve got a good chunk of foundational code for the

parquet

format (and pattern for future formats) here — still lots of tweaking needed, but I think this will be functional for me at the moment, feedback welcome 🙂

andy_crowe

11/01/2022, 3:29 AM

@aaronsteers — I saw your comment re:

smart_open

, I think we could use this as a foundation and build a multi-cloud, multi-format target with this pattern. I didn’t quickly see how to save

parquet

with the

smart_open

library, would be happy to refactor it.

aaronsteers

11/01/2022, 6:40 AM

Hi, @andy_crowe - thanks for your comment. I don't know if either system can natively generate the

parquet

file format.

DuckDB

arrow

could in theory be used to generate the Parquet dataset, then

smart_open

PyFilesystem

could be used to upload/write the file bytes to the respective cloud. I've added a comment to this effect to my new issue proposal for a generic multi-cloud target. To be clear, there may be other challenges I've not foreseen. In total, the dev effort could be significant... but I do feel that it would be a valuable addition to our currently available targets if there's a path forward here.

pat_nadolny

11/01/2022, 1:28 PM

what repo should this live in? personal repo to start?

@andy_crowe awesome to hear you've made progress! Its up to you really. Many people leave them in their own personal or organization's github repo. But if you dont want it in your personal repo, we've created MeltanoLabs for that exact reason (see this blog post for what we view as the connector ownership models) , you have the option to have the repo live there but you'd still be the primary maintainer. Or you could always wait and migrate it out of your personal namespace down the line. Its up to you

pat_nadolny

04/03/2023, 5:52 PM

@andy_crowe I created a few issues in your repo for some stuff I saw while I was testing it out. Are you still working on that target?

andy_crowe

04/03/2023, 6:21 PM

Awesome, thank you @pat_nadolny! Yes, this is the primary target we use in production (parquet/json). However, I’ve been using/modifying the target in our private repository — I’ll can get the latest pushed to GitHub, I think that will solve a couple of the issues you submitted

andy_crowe

04/03/2023, 6:54 PM

GitHub is updated with latest

pat_nadolny

04/03/2023, 7:33 PM

@andy_crowe thanks for the update! I'm able to get json working now but I'm still having trouble with parquet but its slightly different now, I'll open an issue to describe it.

Open in Slack

Previous Next