Has anyone here has issues using `tap spreadsheets anywhere` Meltano #troubleshooting

Has anyone here has issues using `tap-spreadsheets...

ian_lewis

05/15/2023, 7:25 AM

Has anyone here has issues using

tap-spreadsheets-anywhere

? We have a case where spreadsheet data is provided to us in a non-standard way. Basically, before the actual data begins here are a number of rows that comprise titles, descriptions and other preamble. There are also blank lines. It seems

tap-spreadsheets-anywhere

chokes on blank lines and fails to process any further lines. Any experience or suggestions with this issue?

aaron_phethean

05/15/2023, 10:43 AM

Hi @ian_lewis I've found this kind of junk in the headers (multiple rows of headers, blank lines, comments, etc) to be pretty common too! Here is an example of how to use 'skip_initial' to skip those and supply your own header: https://github.com/Matatika/matatika-ce/blob/main/plugins/extractors/tap-govuk-weekly-road-fuel-prices--matatika.yml NOTE - this is on our fork, but provided you use the latest default variant with this change you'll be ok: https://github.com/ets/tap-spreadsheets-anywhere/commit/379173323ee14f48bc408fd15a35fe581a51e317

ian_lewis

05/15/2023, 10:48 AM

Thank you @aaron_phethean we will take a look. Very much appreciated! It would be a huge help if the latest version of

tap-spreadsheets-anywhere

had a release, it would save using specific commits 🤷

aaron_phethean

05/15/2023, 11:02 AM

Agreed! Supported taps, with an automated upgrade bump is the dream. I think we are edging closer to this (the 'royal we' as in the whole singer / meltano ecosystem) - but it's hard to say whether this is viable as a business proposition. Would anyone pay a small fee for a tap with 12-months support, regular patch releases, and an upgrade option in meltano?

craig_astill

05/15/2023, 11:26 AM

I've been looking at this a bit deeper for @ian_lewis. The original

skip_initial

changes (https://github.com/ets/tap-spreadsheets-anywhere/pull/37) supported skipping over rows with data in them. To skip over blank rows, it looks like the skip needs to be pushed into the Excel `generator_wrapper`: https://github.com/ets/tap-spreadsheets-anywhere/blob/main/tap_spreadsheets_anywhere/excel_handler.py#L9-L32 before the

header_row

is populated. This avoids the

IndexError

raised when this function parses a blank row. I'm thinking of cleaning up my experiment and raising an issue + PR. Although I will check out your links @aaron_phethean to see if there is a cleaner way.

aaron_phethean

05/15/2023, 11:29 AM

Nice one @craig_astill - I think perhaps we circumvented that header blank row problem by supplying the field names. Hope that helps

Copy code

"field_names":["Date","ULSP_per_litre","ULSD_per_litre","ULSP_duty","ULSD_duty","ULSP_vat_pc","ULSD_vat_pc"],

craig_astill

05/15/2023, 11:31 AM

field_names

didn't help. The tap blows up during sampling of the file in the discovery phase, instead of later on when

field_names

are used.

aaron_phethean

05/15/2023, 11:31 AM

ah, worth a shot!

craig_astill

05/15/2023, 2:13 PM

Busy day, but finally raised: https://github.com/ets/tap-spreadsheets-anywhere/issues/52. Will try to knock up a test PR for people to look at.

craig_astill

05/16/2023, 3:24 PM

Rough PR raised: https://github.com/ets/tap-spreadsheets-anywhere/pull/55.

craig_astill

05/22/2023, 4:50 PM

I've been digging into: https://github.com/ets/tap-spreadsheets-anywhere/pull/56, to figure out why

zipfile.ZipFile(file_handl)

blows up on an S3 sourced file. Any ideas?

craig_astill

05/22/2023, 4:52 PM

Also saw @Matt Menzenski was helpful, when digging into issues on other slack threads. (Hope you don't mind the ping).

Matt Menzenski

05/22/2023, 11:46 PM

I have been added as a maintainer of tap-spreadsheets-anywhere, but I haven’t personally used it for any binary files - only JSONL and CSV

craig_astill

05/23/2023, 7:38 AM

Ah, no worries, but thanks for replying.

peter_s

05/28/2023, 8:17 PM

@aaron_phethean:

Would anyone pay a small fee for a tap with 12-months support, regular patch releases, and an upgrade option in meltano?

We’d definitely be interested.

Open in Slack

Previous Next