Hi folks, Is there a loader that will load parque...
# troubleshooting
i
Hi folks, Is there a loader that will load parquet files to S3 and is not susceptible to the jsonschema
multipleOf
issue caused by poor handling of Decimal/Float data. Mentioned here https://gitlab.com/meltano/target-csv/-/issues/3
u
Do you mind clarifying your question? You mentioned loading parquet files to S3 but the link is for target-csv and its from a legacy gitlab variant thats no longer listed on the meltano hub https://hub.meltano.com/loaders/target-csv/ page
j
I'm pretty sure @ian_lewis is referring to this open issue for target-s3. However, I did some digging on this and seems as though there is an open issue for the jsonschema-issue in the SDK's github repo.
I'm wondering, is there are a way I could fix this in the target?
i
Correct @john_kaustinen that is what I am referring to, apologies for the lack of clarity @pat_nadolny. Although the issue I referred to is old it still persists and makes targets which use jsonschema for validation of floating point data almost unusable. I ran into the issue today having done some small scale testing with a few tables and encountering no problems. When I widened the scope and included more tables from the source schema my pipeline broke because of the jsonschema float validation problem. There doesn't seem to be a simple way to circumvent the issue and I really need the output in Parquet format.
u
We’d have to dig into this deeper especially if it’s affecting everything using the SDK but for now you could try casting that problem column to a string using https://docs.meltano.com/guide/mappers or if you don’t necessarily need that column you could deselect it from the stream altogether. It might be tedious though if you have lots of them. Just an idea
i
Thank you! For me this is quite a big issue as it affects my long terms choices around using Meltano. And I really enjoy using Meltano.
u
@ian_lewis after re-reading the SDK issue this should be a pretty minor bug to fix given @edgar_ramirez_mondragon’s suggestion in https://github.com/meltano/sdk/issues/344. It would probably just need to be tested but should solve the problem for all SDK plugins.
i
Hi @pat_nadolny thank you for digging into this. Looks like @edgar_ramirez_mondragon is describing the fix I (we all) need.