Hello! I'm building an ELT for county level data f...
# getting-started
f
Hello! I'm building an ELT for county level data for every county in a given state. There are 200 counties, and each county has different csv file schemas, and different numbers of csv files. Then inserting into postgres. I'm currently using
tap-spreadsheets-anywhere
and
target-postgres
. All in all, i may have 1000 different csv's to ingest. Would it make sense to create a different county level meltano.yml files, and/or create different meltano projects? Or is it possible to use different config.json for each county, and pass them into the meltano run commands? How do people handle large amounts of different types of files to ingest?
👀 2
a
Well, I use Matatika for that 😉 . Essentially you only need one meltano.yml and set the environment variables to run your pipeline in 1000 different ways. Spreadsheets anywhere / csv files do create a certain issue. In practice I have found the first row discovery of the csv file is unreliable and often filled with junk of some kind. So you end up having to also supply the csv fields. I prefer this in code so you may be a better off creating separate included meltano yml for each of those. Check out https://github.com/Matatika/matatika-ce Happy to chat too. Cheers.
👍 1
v
I'd probably make a custom tap just for the data type and pull in CSV's via that method instead of trying to fit a general tap-csv into this scenario. A lot of time you can get some nice schema adds this way
👍 1
e
I was exploring using pkl to generate a meltano project dynamically in my dogfood project, so maybe that could become useful for this sort of use cases
👍 1
f
thanks for the help! i forked
tap-spreadsheets-anywhere
to make some adjustments to how its importing the column names, and setting env vars to use in the meltano.yml file.
👍 2