prashant_shukla
08/14/2023, 11:43 AM/queries/run/json?limit=100&apply_formatting=false&apply_vis=false&cache=false&force_production=true&server_table_calcs=false
• Query Response Time: < 2.5 seconds
• REST api Method POST
--data { ... }
Result Summary:
• Authentication: Successful
• Discovery: Successful (command: meltano invoke tap-custom --discover
) with dynamic schema
• Output: Successful (command: meltano elt tap-custom target-jsonl
)
• Time Taken for Completion: Approximately 10 minutes
• Response Status: OK (HTTP 200)
• Response JSON Size: 167 KB
• Number of Records: 100
Issue:
• The main concern lies in the extended completion time, which is quite high—taking around 50 minutes to process 500 records. This is significantly longer than expected, especially when compared to other taps that handle similar data sizes much more efficiently.
◦ After careful verification of the code using various debugging techniques, I've identified a few points:The total completion time is consistently high.
◦ The response status, the contents of Response.json, and the catalog all seem to be in order. Similarly, the final output when using the target-jsonl
appears to be correct.
◦ Increasing the buffer size hasn't led to any noticeable improvement.
◦ Interestingly, I've observed that the tap's performance remains consistent both in a containerised environment and when running locally.
Expectation:
Our goal is to optimize the ETL completion time for pulling data, particularly when dealing with larger data sizes in megabytes.
Sample Records JSON:
json
[ { "history.created_time": "YYYY-MM-DD hh:mm:ss", "user.id": 9999, "user.name": "RAMBO CHARLES", "user.email": "<mailto:rcharles@example.com|rcharles@example.com>", "sql_text.text": null } // ... other records ... ]
I appreciate your attention to this matter and any insights or assistance you can provide in resolving this performance concern. If there's any additional information you require, please don't hesitate to reach out.
Thank you!visch
08/14/2023, 1:00 PMvisch
08/14/2023, 1:01 PMvisch
08/14/2023, 1:02 PM