https://meltano.com/ logo
#announcements
Title
# announcements
m

millions-toddler-72102

03/26/2021, 3:15 AM
Hello, I am not sure where the best place is to ask this question. Is it possible to set Meltano up to output to multiple targets? i.e. I could push to say Postgres AND S3 in parallel? If not possible, is there a best practice approach to doing something like Source -> S3 -> Postgres (technologies are just examples). i.e. would this be two meltano pipelines; Source -> S3, and S3 -> Postgres The reason for the question is that I'm interested in capturing all the raw data as part of a data lake, but also make sure that data is ingested into an operational data store to power an app. I am hoping to avoid just ingesting from the source twice, because that may put extra pressure on the source system, and we may not always have the ability to run multiple extractors. Better if we can just do it once.
Is it just as simple as:
Copy code
meltano etl <src> target-s3
meltano etl tap-s3 target-postgres
?
s

salmon-actor-23953

03/26/2021, 8:45 AM
Hi, @millions-toddler-72102. This is an area I’ve been focusing on for a little while also. As of today, many of the “big data” targets like Redshift and Snowflake already land their data in S3 prior to ingesting to the target DB. So for those cases, an emerging pattern is to simply retain those files in S3 after the load is complete. This essentially is building out the data lake while also populating the target DB. However, this is not (at least not yet) part of the postgres target, and there are challenges you’d run into if you landed your data in S3 and then ingested it again to the the downstream target. First, if you use CSV as the file type in S3, you’d lose the ability to confidently detect data types after landing the CSV files. A good solution to that challenge would be to land the S3 data in a type-aware format such as Parquet, but to my knowledge we don’t yet have a stable S3-parquet target. Another option is that you could manually or programmatically create a catalog file for S3 CSV files which matches your upstream data types. This should be possible, but in practice I haven’t seen this done before and it would likely take some trial and error in JSON manipulation. Depending on how many sources you have, I don’t know how well this would scale. Can you confirm if Postgres is your intended target? If so, I think one attractive option might be to fork an existing repo to function like the Redshift/Snowflake targets, landing data in S3 and then ingesting with the postgres extension function
aws_s3.table_import_from_s3()
documented here. (It’s also possible someone is already working on this or has built it under their own target-postgres fork which I’m not yet aware of.) Does this help at all?
s

salmon-salesclerk-77709

03/26/2021, 10:22 AM
We also have an issue around this too https://gitlab.com/meltano/meltano/-/issues/2626
m

millions-toddler-72102

03/29/2021, 2:46 AM
https://github.com/fixdauto/target-s3 I wonder if this connector is any good for parquet
👀 1
@salmon-actor-23953 It helps clarify the situation, thanks I think for now we'll just go direct into Postgres, and try out some other S3 options at a later date. Postgres is definitely our priority.
👍 1
s

salmon-actor-23953

03/29/2021, 3:34 PM
Thanks for the update, @millions-toddler-72102 👍
I’ve not seen this target before but I’ll definitely check it out!