Hi all I have just started looking at Meltano not sure if th Meltano #plugins-general

Hi all, I have just started looking at Meltano, no...

daniel_wiesmann

04/23/2021, 8:27 AM

Hi all, I have just started looking at Meltano, not sure if this is the right approach. I have a number of applications/DBs, and I want to run some queries, the send the data off to some sftp site. I can see the meltano postgres tap, but how do I get the resulting data to the sftp target?

taylor

04/23/2021, 12:20 PM

I believe you’d have to pick a target file type (say target-csv) and then pipe those files to the SFTP site. I don’t believe that’s something that can be done with meltano right now, but if you’d like to open an issue about it, it’s definitely a valid use case I’ve seen!

jon_brasher

04/24/2021, 9:43 PM

Would it make sense to have a sftp target? Even it were calling paramiko or a wrapper for it this would help intergrate this workflow. Does a similar target exist for a snowflake internal stage or azure blob external stage?

visch

04/25/2021, 3:45 PM

The snowflake target I believe using staging to do things but I don't use snowflake just zoomed through the source a few times. https://github.com/datamill-co/target-snowflake specifically the S3 section Files are always interesting as there's a format to the file you want (csv, json, xml, etc) , then basically a final

scp

aws

,etc etc command that you need to run to finish the job. Loader seems like the right place to take care of that but it's almost like you'd want a library they all share to do the "sftp, ftp, s3, azure blob, etc, etc load" and each target-csv, target-json (file related tap) would use that library to push the file at the end of the job.

visch

04/25/2021, 3:47 PM

I can think of some other use cases like properitary applications that take a csv as an input. Maybe like

bcp

from MSSQL, that you might also want to call. But then you're getting into the territory of just wanting to run arbitrary commands after the loader is finished. Plugins could do that as well 🤷

visch

06/29/2021, 8:58 PM

I have a use case for this as well today. Seems like a target thing more than a meltano thing but thought I'd mention it

visch

06/29/2021, 8:58 PM

SFTP Target is what I need

visch

06/29/2021, 9:02 PM

https://github.com/hotgluexyz/target-csv/issues/6 wrote up a bit more info here

edgar_ramirez_mondragon

06/29/2021, 9:22 PM

Hey @visch, I was experimenting with file sources in arbitrary file systems with this thing: https://github.com/edgarrmondragon/tap-dbf/pull/1/files. Maybe something like that makes sense for targets too

visch

06/29/2021, 10:43 PM

Thanks @edgar_ramirez_mondragon , I'll dive in. You and I should chat some time we seem to be into the same stuff right now!

visch

06/29/2021, 10:45 PM

https://github.com/edgarrmondragon/tap-dbf/tree/patch-fs-open this branch it seems?

visch

06/29/2021, 10:51 PM

https://www.pyfilesystem.org/page/index-of-filesystems/ curious stuff @edgar_ramirez_mondragon For a TAP it's almost link any File Format tap would want to use some module / abstraction layer that sites ontop of https://www.pyfilesystem.org/page/index-of-filesystems/ A tap would want this to be before processing so something like a Staging, or Prep before processing the data like normal. And you might want to chain them. Something like 1. sftp://folderstructure 2. If .zip , then unzip 3. Send to regular Tap Another example I've seen on the target side is something like 1. Create your files 2. Zip them up with some kinda of encryption 3. Send them to an SFTP site For a File target it's almost the same idea, except it's a post processing step.

visch

06/29/2021, 10:54 PM

I've seen this go both ways before. Not sure which is better. First way is: Bake this into the tap / target (File system chaining etc) Second way is: Keep the tap/target stupid and put something in front of the tap/target that will give the tap/target a file on the local file system (or in memory) like

pullsftpdata | tap-dbase | target-snowflake

visch

06/29/2021, 10:56 PM

There's an argument for Ansible to step in as these things can get complicated as there's an auth component that can change based on your system type (Things like Kerberos, LDAP, Windows Auth, SSH, yada yada) https://github.com/johanmeiring/ansible-sftp https://github.com/melmorabity/ansible-ftp https://docs.ansible.com/ansible/latest/collections/ansible/windows/win_share_module.html

visch

06/29/2021, 10:58 PM

Reminds me some of Apache Camel as well https://camel.apache.org/camel-quarkus/latest/reference/index.html The space gets kinda deep but there's not a clear winner here for what works long term

edgar_ramirez_mondragon

06/29/2021, 11:51 PM

thanks for your feedback @visch!

For a TAP it's almost link any File Format tap would want to use some module / abstraction layer that sites ontop of pyfilesystem

that's what I was going for with that approach, making the filesystem pluggable via some pip extras and credentials baked into the file path like here for S3

Second way is: Keep the tap/target stupid and put something in front of the tap/target that will give the tap/target a file on the local file system (or in memory)

an orchestrator like airflow helps a lot with that approach, but you end up with bespoke dependencies for your file taps

pullsftpdata | tap-dbase | target-snowflake

composable ftw! (and a relevant meltano issue). I've actually been playing around with a "merge" transform for multi-tenant sources here: https://github.com/edgarrmondragon/singer-playground/tree/main/merge_streams

visch

06/30/2021, 1:11 AM

hmm I didn't dive to much into the merge stuff you wrote, due to time today. How does that compare with streammaps?

visch

06/30/2021, 1:11 AM

More comprehensive?

visch

06/30/2021, 1:12 AM

Thanks for the pointers by the way, looks like I'm going to implement fs-ssh for target-csv. PR in 30 min? We'll see how slow I am

visch

06/30/2021, 3:48 AM

Having all kinds of problems with fs.sshfs with using setup.py. No idea what I"m doing wrong @edgar_ramirez_mondragon https://github.com/althonos/fs.sshfs/issues/49 Going to dump this for now dumped way too much time into this

visch

06/30/2021, 5:39 AM

https://github.com/AutoIDM/target-csv/pull/2 might be curious to look at

Open in Slack

Previous Next