Hi everyone! I'm just getting started, so probably...
# plugins-general
l
Hi everyone! I'm just getting started, so probably have a bunch of silly questions. I'm trying to do what seems the most basic test possible: using
tap-spreadsheets-anywhere
to read an excel file on my local filesystem and load it in postgres. I've been following the getting started page twice, once with
tap-csv
which worked fine, but now with the new tap, I'm not seeing anything being loaded. When I run
meltano --log-level=debug elt...
I see:
Copy code
tap-spreadsheets-anywhere       | INFO Found 6 files.
tap-spreadsheets-anywhere       | INFO Wrote 0 records for stream "my_stream_name".
There are indeed 6 files in that folder, so that seems correct, but I'm seeing no errors or warnings, so I'm not sure where to look next. I went through the plugin docs, but nothing obvious there. I've tried all sorts of regex patterns including
".*"
which I'd assume would catch any file in there, but no luck. Any suggestions where to look?
d
And can you share your configuration from
meltano.yml
for good measure?
l
No, can't see anything like
Syncing file...
in the logs
in the meantime I added
tap-csv
and now I'm getting a confusing error
ELT could not be completed: Cannot start extractor: Catalog discovery failed: invalid catalog: Expecting value: line 1 column 1 (char 0)
Let me see if I can go back to the previous config and see the initial problem
d
Hmm 😕 How about we jump on a quick Zoom call so I can help you debug this for a bit?
l
sure, yeah!
d
l
Thanks for the debugging session Douwe!
@eric_simmerman I'm told you're the maintainer of
tap-spreadsheets-anywhere
we just found a couple of issues in it: • there's a bug in how paths are handled for local files, which results in files not being discovered properly • is looks like xlrd cannot load xlsx files at all I'm going to file issues on github about these, and try to suggest fixes.
e
@laurent happy to take a look, so thanks for filing some issues. We do have a test in the project that demonstrates reading from a local xlsx file & that’s passing. So curious to hear more about what you’ve encountered there.
d
@eric_simmerman We were seeing https://github.com/ets/tap-spreadsheets-anywhere/blob/master/tap_spreadsheets_anywhere/excel_handler.py#L32 raise an "xlsx not supported" error. https://pypi.org/project/xlrd/ currently states:
This library will no longer read anything other than
.xls
files. For alternatives that read newer file formats, please see http://www.python-excel.org/.
That's for version 2.0.1. The description for 1.2.0 (https://pypi.org/project/xlrd/1.2.0/) stated:
Extract data from Excel spreadsheets (.xls and .xlsx, versions 2.0 onwards) on any platform.
This suggests that the tap should either pin version 1.2.0, or use another library for xlsx as suggested in http://www.python-excel.org/
e
Committed a quick fix just now to pin xlrd to 1.2.0
d
@eric_simmerman Great, thanks!
l
Great! thanks @eric_simmerman!
e
Through the power of opensource - I just merged a PR from https://github.com/JulesHuisman that moved us off xlrd and onto openpyxl
d
Wow: https://github.com/ets/tap-spreadsheets-anywhere/pull/9 Sounds like someone else hit the exact same issue just when you did @laurent!
l
awesome!