Hi All, I was running a benchmark in meltano & wanted to chunk the response by 1024 bytes (usin...

Debashis Adak

03/26/2024, 8:26 PM

Hi All, I was running a benchmark in meltano & wanted to chunk the response by 1024 bytes (using tap-rest-api-msdk & target-s3 plugins) each. What I am trying to achieve is, though api is producing huge amount of rows (>10k), wanted to write API response into s3 in a streaming fashion (smaller data volume) & release the memory. It will help us to reduce the memory consumption. Will result in a longer running job. But that's acceptable.

Copy code

In the streams, I have set stream = True

    def requests_session(self) -> requests.Session:
        if not self._requests_session or not self._requests_session.stream:
            self._requests_session = requests.Session()
            self._requests_session.stream = True
        return self._requests_session


Reading the chunk in response using
for chunk in response.iter_content(chunk_size=1024):
    // yield from json code

Do you have any reference how to do the same? Is this possible to achieve the above mentioned scenario?

Edgar Ramírez (Arch.dev)

03/26/2024, 10:18 PM

Are you using the crowemi variant of target-s3?

Debashis Adak

03/27/2024, 12:18 AM

@Edgar Ramírez (Arch.dev) yes. I am using the crowemi variant. Do I need to use anyother variant to achieve the same?

Edgar Ramírez (Arch.dev)

03/27/2024, 12:29 AM

It's probably fine. The target has a

max_batch_size

setting which does something like what you need. I opened a PR to document some missing settings: https://github.com/meltano/hub/pull/1724

Debashis Adak

03/27/2024, 12:51 AM

@Edgar Ramírez (Arch.dev) are we saying we don’t need to control the response of the api using inter_content (chunk_size=1024) in a streaming fashion? I was thinking to control it tap_msdk end instead of target-S3

Edgar Ramírez (Arch.dev)

03/27/2024, 1:32 AM

Not that you don't need it, rather it's hard to get right since it's not clear how you would parse records from a chunk of bytes.

Edgar Ramírez (Arch.dev)

03/27/2024, 1:32 AM

(and you would need to make that change in the tap or upstream in meltano's singer-sdk)

Open in Slack

Previous Next

Meltano

Meltano community