Not sure if this is the best channel for this, but...
# singer-taps
m
Not sure if this is the best channel for this, but I’m looking for some general guidance. My team is currently using an RDBS to store our website events. As you can imagine, it’s getting quite large. I want to move it to S3 buckets (on its way to our DW). I’m looking for advice on file-format. I would think that parquet or orc would be best, with json coming next, but I don’t see any taps that can handle: 1. The format 2. Reading from s3 I’m new to a lot of this so I’m guessing missing something obvious. Thanks!
v
You're in luck
I've had a conversation going around tap-csv, regarding how to deal with pulling data via different mechanisms (FTP, FTPS, SFTP, S3, etc etc) Right now the easiest thing to do is run a command before your tap runs and run aws cli tools to pull your parquet file, then run your tap against it
c
2nded for parquet.
m
Oh cool. That totally makes sense. I just need it local when I run the tap. Perfect. Thanks!
Thanks!