CSV best of both worlds tap csv amp tap spreadsheets anywhe Meltano #plugins-general

CSV - best of both worlds (tap-csv & tap-sprea...

chrish

12/22/2022, 2:30 PM

CSV - best of both worlds (tap-csv & tap-spreadsheets_anywhere) I have csv files with some fixed columns and some ephemeral columns. This means that some columns exist in some csv files, but not others. However, I want to load all of the csv files into the same table in postgres.

target-postgres

(transferwise), in conjunction with

tap-csv

, let's me do this today using with the

data_flattening_max_level

config setting. Each time I extract a new csv, target-postgres will add any columns that are missing. Unfortunately, tap-csv does not seem to support state. If I add a new file to the

path:

folder, tap-csv will extract all files, not just the new one. State support is also called out as missing here: Implement State Capability On the other hand, while

tap-spreadsheets-anywhere

properly supports state, the new columns are not automatically created for some reason. (Not sure why, it seems that capability is part of target-postgres, not the extractor...). I'd like to see one extractor with both features, but I'm wondering which one would be better to improve? tap-csv seems like the best place, but there are some other rich capabilities in tap-spreadsheets-anywhere that myself and others would certainly benefit from. Any thoughts on combining the features of spreadsheets-anywhere into tap-csv?

jacob_matson

12/22/2022, 8:19 PM

in my nba-monte-carlo project, I'm adding a mapper to get the latest file date so i can handle versioning downstream. While that won't handle state explicitly, is the work around I am using since not all sources support state (i.e. a csv file hosted on the web)

chrish

01/03/2023, 1:16 PM

I found that the issue new columns aren't being added due to caching of the catalog. I can manually resolve the issue by deleting the

.meltano\run\tap-csv\tap.properties.json

file, and the new columns will be added. Or, I can add

-e

to the beginning of the pip_url setting for the extractor, which treats the extractor as editable and tap.py in meltano core says: # If the extractor is installed as editable, don't cache because the results of discovery could change at any time. See specific thread discussing this subject here: https://meltano.slack.com/archives/C01TCRBBJD7/p1672680739956629?thread_ts=1658986134.766319&cid=C01TCRBBJD7

Open in Slack

Previous Next