Hi! It’s me again about `tap-csv` (<https://github...
# troubleshooting
d
Hi! It’s me again about
tap-csv
(https://github.com/MeltanoLabs/tap-csv) 🙂 Documentation states “_path: Local path to the file to be ingested. Note that this may be a directory, in which case all files in that directory and any of its subdirectories will be recursively processed_”, but in the code I see
os.listdir
function used to get files list from the path (so only top-folder files processed). Most probably my question is specificaly to tap’s maintainer @pat_nadolny, but everyone else with the same issue is welcome inside!
My use case is the following structure: Source name 1 • Source account 1 ◦ File 1 ◦ File 2 • Source account 2 ◦ File 3 ◦ File 4 Source name 2 • Source account 3 ◦ File 5 • Source account 4 ◦ File 6 I’ve made some experiments and in order to keep the previous behaviour the same way, I propose to introduce a new files’ option that is disabled by default. @pat_nadolny what do you think?
p
@Denis I. thanks for raising this. I would consider this a bug vs a new feature. The expectation was that it would be recursive but I agree the code doesnt do that. Can you open an issue for it?
Also for context I ported this tap logic over from the legacy version https://gitlab.com/meltano/tap-csv. I must have not completed this recursive logic properly because I see in the old version that it does recursively call itself https://gitlab.com/meltano/tap-csv/-/blob/master/tap_csv/__init__.py#L35. I cant remember if there was a specific challenge to doing it in the new SDK version but I'm sure we can figure it out
From a skim of your code screenshot that looks like it does what we want!
Personally since I view it as a bug I'd vote to not require a config setting to get the behavior. What do you think?
d
Seems like the recursive call https://gitlab.com/meltano/tap-csv/-/blob/master/tap_csv/__init__.py#L46 didn’t survive the porting process 🙂
I can only imagine envs that already use the tap. Assuming that it’s a rare case when a folder with csv files has subfolders with unrelated csv files, I’d vote for the simple bugfix with recursive call ressurection.
Maybe just implement the
os.walk
version, since it’s a bit cleaner than the recursion call?
message has been deleted
p
I can only imagine envs that already use the tap. Assuming that it’s a rare case when a folder with csv files has subfolders with unrelated csv files, I’d vote for the simple bugfix with recursive call ressurection.
I agree - it was documented that it was recursive so its more of a bug that a breaking change.
I agree os.walk is clearer
d
Agreed! I’ve tested the code locally with the case scenario I mentioned above. I can create a pull request, go?
p
yeah that would be awesome! I can review it
p
@Denis I. I added a test https://github.com/TyShkan/tap-csv/pull/1. If that looks good to you then we should be good to merge it in
d
@pat_nadolny done! Looks like we are good to go!
p