hi everyone, just checking -- can Meltano help at all with moving files (e.g. between SFTP and S3) without parsing them, for example PDFs?
h
Henning Holgersen
03/29/2023, 2:19 PM
Not really (as far as i know), i would check out rclone for that. I have recently started using it directly from python via the python-rclone package and it works great.
t
tiberiu
03/29/2023, 2:26 PM
nice, thanks! I was hoping for a nice end-to-end way for scheduling file exchanges and seeing the logs, but so far I haven't found anything.
wrapping rclone in a dagster pipeline will work, but it won't be as polished.
h
Henning Holgersen
03/29/2023, 2:36 PM
Might be possible to wrap rclone in a meltano utility. Haven’t tried that because I run meltano from inside prefect anyways.
a
aaron_phethean
03/30/2023, 11:18 AM
The utility idea is a winner. We created a GDrive utility for this purpose, and discussed creating a ‘files’ utility that used smart_open (same underlying library used by tap-spreadsheets-anywhere)
https://hub.meltano.com/utilities/gdrive/
t
tiberiu
03/30/2023, 8:18 PM
soo what do you see in Meltano when you use this? can you use it to log/audit the actual file transfers?
a
aaron_phethean
03/30/2023, 9:41 PM
In meltano it’s a job running the utility e.g. ‘meltano run utility-gdrive’ which you might then process the file as csv or call another utility. We use it to download markdown and publish into our app.
‘meltano run utility-gdrive custom:publish’
Strikes me you could either use the meltano job history as the audit (we created tap-meltano for that) or run a utility recording the exchange completed