Hello I m building an ELT for county level data for every co Meltano #getting-started

Hello! I'm building an ELT for county level data f...

franklin_sarkett

09/06/2024, 8:58 PM

Hello! I'm building an ELT for county level data for every county in a given state. There are 200 counties, and each county has different csv file schemas, and different numbers of csv files. Then inserting into postgres. I'm currently using

tap-spreadsheets-anywhere

and

target-postgres

. All in all, i may have 1000 different csv's to ingest. Would it make sense to create a different county level meltano.yml files, and/or create different meltano projects? Or is it possible to use different config.json for each county, and pass them into the meltano run commands? How do people handle large amounts of different types of files to ingest?

👀 2

aaron_phethean

09/07/2024, 3:09 AM

Well, I use Matatika for that 😉 . Essentially you only need one meltano.yml and set the environment variables to run your pipeline in 1000 different ways. Spreadsheets anywhere / csv files do create a certain issue. In practice I have found the first row discovery of the csv file is unreliable and often filled with junk of some kind. So you end up having to also supply the csv fields. I prefer this in code so you may be a better off creating separate included meltano yml for each of those. Check out https://github.com/Matatika/matatika-ce Happy to chat too. Cheers.

👍 1

visch

09/09/2024, 7:06 PM

I'd probably make a custom tap just for the data type and pull in CSV's via that method instead of trying to fit a general tap-csv into this scenario. A lot of time you can get some nice schema adds this way

👍 1

Edgar Ramírez (Arch.dev)

09/09/2024, 7:53 PM

I was exploring using pkl to generate a meltano project dynamically in my dogfood project, so maybe that could become useful for this sort of use cases

👍 1

franklin_sarkett

09/10/2024, 7:35 PM

thanks for the help! i forked

tap-spreadsheets-anywhere

to make some adjustments to how its importing the column names, and setting env vars to use in the meltano.yml file.

👍 2

4 Views

Open in Slack

Previous Next