Hi everyone! I'm working on a tap for <dbase file...
# singer-tap-development
e
Hi everyone! I'm working on a tap for dbase files using the SDK. I'm PoCing to make it agnostic to the filesystem so it's able to read files from the OS, S3, Google Drive, etc. That part is built on top of Will McGugan's PyFilesystem. I'm able to temporarily patch the
open
function so the
dbfread
package opens files with pyfilesystem2's methods. You can see that here: https://github.com/edgarrmondragon/tap-dbf/blob/patch-fs-open/tap_dbf/tap.py#L91. The only issue is
dbfread.DBF
can load records in two ways: read everything into memory during instantiation, or lazily iterate from the file. The first mode is not ideal for large files but works well with the patch since nothing else is trying to read files in the same context. The second one fails with fs.errors.ResourceNotFound: resource '/etc/timezone' not found. That is because I'm patching
open
when the first record is read but `pendulum.now()` in the SDK is also trying to read from the filesystem. This is not a bug in the SDK per se, but
pendulum.now(tz="UTC")
would fix the issue and afaik not break anything since
singer.RecordMessage
converts
time_extracted
to UTC anyway. So, do you think this thing merits an issue? A MR? That I don't try this patching witchery 😅?