Hi everyone! I'm working on a tap for <dbase file...
# singer-tap-development
e
Hi everyone! I'm working on a tap for dbase files using the SDK. I'm PoCing to make it agnostic to the filesystem so it's able to read files from the OS, S3, Google Drive, etc. That part is built on top of Will McGugan's PyFilesystem. I'm able to temporarily patch the
open
function so the
dbfread
package opens files with pyfilesystem2's methods. You can see that here: https://github.com/edgarrmondragon/tap-dbf/blob/patch-fs-open/tap_dbf/tap.py#L91. The only issue is
dbfread.DBF
can load records in two ways: read everything into memory during instantiation, or lazily iterate from the file. The first mode is not ideal for large files but works well with the patch since nothing else is trying to read files in the same context. The second one fails with fs.errors.ResourceNotFound: resource '/etc/timezone' not found. That is because I'm patching
open
when the first record is read but `pendulum.now()` in the SDK is also trying to read from the filesystem. This is not a bug in the SDK per se, but
pendulum.now(tz="UTC")
would fix the issue and afaik not break anything since
singer.RecordMessage
converts
time_extracted
to UTC anyway. So, do you think this thing merits an issue? A MR? That I don't try this patching witchery 😅?
a
Hi, @edgar_ramirez_mondragon! Thanks for sharing what you’re working on - it’s fun to hear about new applications and sources. And to your question, yes, an issue and/or MR would be welcome. Pendulum was chosen because it (overall) was seen to provide a more stable datetime conversion experience. That said, the pendulum details are intentionally not meant to be exposed to developers, exactly for the reason that we want to be able to refactor and continually improve compatibility in the backend without breaking any interfaces. It sounds like what you describe might help others as well. Happy to take on an issue or MR if you have time to contribute this. Thanks!
e
Missed this! I can certainly contribute a MR