Are there any examples of a tap for “fixed-width” ...
# getting-started
m
Are there any examples of a tap for “fixed-width” data? Like, a text file where the lines are 100 characters long, and the first 10 characters are “account id”, the next four characters are “account type”, etc etc. We have an in-house NodeJS tool to parse this sort of file that I think would be relatively straightforward to port to a Meltano extractor, but if there’s an existing tap I could use that’d be even better.
p
Would something like tap-csv + dbt staging work?
m
ooh, I hadn’t thought about using dbt to parse the columns out but that’s a really interesting idea. Thanks Pat!
that wouldn’t be too different from what we’re doing for MongoDB data today, loading a JSON blob into one db column with meltano and then parsing the columns out of the JSON with dbt thinkspin
e
In a past life I used to parse fixed width files with Pandas read_fwf, but a Singer tap would make sense
h
Yeah, this really brings me back too. Fond memories of the day I found
read_fwf
. The tap-csv+transform idea should work, but I would keep an eye on leading and trailing whitespace. Some libraries might try to be a little too helpful. The annoying part is of course that if parsing in the tap we need the full field definitions as part of the tap config, I’m assuming as a separate file to preserve our sanity. But it would be fairly inflexible to schema changes (maybe those aren’t really an issue with the type of sources that generate fwf files…)