Hi All, I was running a benchmark in meltano &amp...
# troubleshooting
d
Hi All, I was running a benchmark in meltano & wanted to chunk the response by 1024 bytes (using tap-rest-api-msdk & target-s3 plugins) each. What I am trying to achieve is, though api is producing huge amount of rows (>10k), wanted to write API response into s3 in a streaming fashion (smaller data volume) & release the memory. It will help us to reduce the memory consumption. Will result in a longer running job. But that's acceptable.
Copy code
In the streams, I have set stream = True

    def requests_session(self) -> requests.Session:
        if not self._requests_session or not self._requests_session.stream:
            self._requests_session = requests.Session()
            self._requests_session.stream = True
        return self._requests_session


Reading the chunk in response using
for chunk in response.iter_content(chunk_size=1024):
    // yield from json code
Do you have any reference how to do the same? Is this possible to achieve the above mentioned scenario?
e
Are you using the crowemi variant of target-s3?
d
@Edgar Ramírez (Arch.dev) yes. I am using the crowemi variant. Do I need to use anyother variant to achieve the same?
e
It's probably fine. The target has a
max_batch_size
setting which does something like what you need. I opened a PR to document some missing settings: https://github.com/meltano/hub/pull/1724
d
@Edgar Ramírez (Arch.dev) are we saying we don’t need to control the response of the api using inter_content (chunk_size=1024) in a streaming fashion? I was thinking to control it tap_msdk end instead of target-S3
e
Not that you don't need it, rather it's hard to get right since it's not clear how you would parse records from a chunk of bytes.
(and you would need to make that change in the tap or upstream in meltano's singer-sdk)