Are there any examples of a tap for fixed width data Like a Meltano #getting-started

Are there any examples of a tap for “fixed-width” ...

Matt Menzenski

05/12/2023, 4:40 PM

Are there any examples of a tap for “fixed-width” data? Like, a text file where the lines are 100 characters long, and the first 10 characters are “account id”, the next four characters are “account type”, etc etc. We have an in-house NodeJS tool to parse this sort of file that I think would be relatively straightforward to port to a Meltano extractor, but if there’s an existing tap I could use that’d be even better.

pat_nadolny

05/12/2023, 4:46 PM

Would something like tap-csv + dbt staging work?

Matt Menzenski

05/12/2023, 4:47 PM

ooh, I hadn’t thought about using dbt to parse the columns out but that’s a really interesting idea. Thanks Pat!

Matt Menzenski

05/12/2023, 4:48 PM

that wouldn’t be too different from what we’re doing for MongoDB data today, loading a JSON blob into one db column with meltano and then parsing the columns out of the JSON with dbt thinkspin

edgar_ramirez_mondragon

05/12/2023, 4:48 PM

In a past life I used to parse fixed width files with Pandas read_fwf, but a Singer tap would make sense

Henning Holgersen

05/12/2023, 5:11 PM

Yeah, this really brings me back too. Fond memories of the day I found

read_fwf

. The tap-csv+transform idea should work, but I would keep an eye on leading and trailing whitespace. Some libraries might try to be a little too helpful. The annoying part is of course that if parsing in the tap we need the full field definitions as part of the tap config, I’m assuming as a separate file to preserve our sanity. But it would be fairly inflexible to schema changes (maybe those aren’t really an issue with the type of sources that generate fwf files…)

Open in Slack

Previous Next