What about sending over large CLOB data? Can Melta...
# best-practices
p
What about sending over large CLOB data? Can Meltano/Singer do this? I have some cases of large documents stored in CLOB columns in database where fields may have many MB. I saw 40 MB in one case, but I do not know what is the upper limit. How does this translate to Meltano and Singer limitations? Will I hit some hard-coded limits? As I understand, all data is being sent as JSON. I don't think JSON specification itself has any limit, but I imagine that there may be practical issues with how particular python handle large data including buffering and other issues. Please share best practices and if you know of practical limitations, please share them. Scenario: I am copying from tap-oracle to target-oracle. I see that target-oracle does not handle CLOBS correctly: it translates them to VARCHAR. I saw the client is hard-coded to use thin client. I am thinking to follow advice from @Edgar Ramírez (Arch.dev) and fork it to use thick client rather than thin, hoping this resolves the issue. But I assume I might hit various other issues down the road. Maybe the translation from CLOB to VARCHAR is not a mistake, but a way to overcome some problem? Before going too far and hitting some wall, I want to check what is awaiting me there. Any feedback is more than welcome.
e
I think Postgres' BYTEA is equivalent to Oracle's CLOB? If so, what target-postgres uses to decide to create a BYTEA column for a field is the presence of
"contentEncoding": "base16"
in the schema. https://github.com/MeltanoLabs/target-postgres/blob/1ffe556b4bfa5369646c43e322c1f46ef60634d9/target_postgres/connector.py#L62-L67 If that's implemented in target-oracle, then it'd be a matter of adding that annotation with the schema extra, or even better update tap-oracle to emit it when discovering the database catalog.
p
No, I think that's equivalent to BLOB. CLOBs are instead, basically very large strings - for instance, it can be a large document, utf-8 encoded. Meanwhile I have done the fork I was talking about and now I am testing. Smaller CLOBs are being copied fine but the table with larger CLOBs (some of them are a few MB) takes forever. The first 10,000 rows took 1 hour... still waiting to complete test. At least no errors seen yet, but I need to somehow optimize this. I will shoot a separate message, as now this is about performance.