CSV - best of both worlds (tap-csv & tap-sprea...
# plugins-general
c
CSV - best of both worlds (tap-csv & tap-spreadsheets_anywhere) I have csv files with some fixed columns and some ephemeral columns. This means that some columns exist in some csv files, but not others. However, I want to load all of the csv files into the same table in postgres.
target-postgres
(transferwise), in conjunction with
tap-csv
, let's me do this today using with the
data_flattening_max_level
config setting. Each time I extract a new csv, target-postgres will add any columns that are missing. Unfortunately, tap-csv does not seem to support state. If I add a new file to the
path:
folder, tap-csv will extract all files, not just the new one. State support is also called out as missing here: Implement State Capability On the other hand, while
tap-spreadsheets-anywhere
properly supports state, the new columns are not automatically created for some reason. (Not sure why, it seems that capability is part of target-postgres, not the extractor...). I'd like to see one extractor with both features, but I'm wondering which one would be better to improve? tap-csv seems like the best place, but there are some other rich capabilities in tap-spreadsheets-anywhere that myself and others would certainly benefit from. Any thoughts on combining the features of spreadsheets-anywhere into tap-csv?
j
in my nba-monte-carlo project, I'm adding a mapper to get the latest file date so i can handle versioning downstream. While that won't handle state explicitly, is the work around I am using since not all sources support state (i.e. a csv file hosted on the web)
c
I found that the issue new columns aren't being added due to caching of the catalog. I can manually resolve the issue by deleting the
.meltano\run\tap-csv\tap.properties.json
file, and the new columns will be added. Or, I can add
-e
to the beginning of the pip_url setting for the extractor, which treats the extractor as editable and tap.py in meltano core says: # If the extractor is installed as editable, don't cache because the results of discovery could change at any time. See specific thread discussing this subject here: https://meltano.slack.com/archives/C01TCRBBJD7/p1672680739956629?thread_ts=1658986134.766319&cid=C01TCRBBJD7