I m using target snowflake and had some questions around bat Meltano #singer-targets

I'm using target_snowflake and had some questions ...

Daniel Luo

08/28/2024, 1:51 PM

I'm using target_snowflake and had some questions around batching. The config documentation here seems to suggest that it's disabled by default, with no default value for

batch_size_rows

and

true

for

clean_up_batch_files

. However, I'm seeing this in the logs still:

Copy code

target-snowflake     | Target sink for 'hist-SecurityReference' is full. Current size is '10000'. Draining... cmd_type=elb consumer=True job_name=default:tap-mssql-analytics-to-target-snowflake:hist-SecurityReference name=target-snowflake producer=False run_id=826fdf57-7c14-4765-8fa9-02207cabd6ef stdio=stderr string_id=target-snowflake

In addition, I'm seeing a bunch of .json.gz files showing up in my directory (see screenshot). That seems contrary to the documentation. If I manually set

batch_size_rows

, I do see that the size is adjusted to the new value. However, even with

clean_up_batch_files

explicitly set to

true

, the files are not getting cleaned up. I'm also a bit concerned about batch messages, which I'm not sure if it's being used here or not. It sounds like it is, though, because it is creating those intermediate .json.gz files on my local filesystem, which I presume are batch files. Why I'm confused, though, is because the bottom of the page notes some known limitations of batch, which says that it doesn't incremental or stream maps. The incremental issue seems to have people commenting that it was already fixed in target-snowflake, so maybe that's why it seems to be working for me? I will need to test in more detail. However, the issue for the stream maps is still open, yet it seems like my stream map to rename a column does seem to work. Was it perhaps fixed specifically in target-snowflake again?

Daniel Luo

08/28/2024, 2:16 PM

Is there also a max batch size of 10000? I tried setting the size to 50000 and it still ended up creating 5 batch files for 45k records, though the draining log message didn't show anymore.

BuzzCutNorman

08/28/2024, 3:22 PM

I am not sure about the

clean_up_batch_files

but I have seen others post that they needed to set both

batch_size_rows

and

batch_size

to the same number when using meltanolabs target-snowflake. Here is an example I pulled from a post.

Copy code

batch_size_rows: 1000000
      batch_config:
        batch_size: 1000000

Daniel Luo

08/28/2024, 5:10 PM

So that does address the batch size, but it did uncover something new. By introducing batch_config, it also expects encoding and storage. Out of curiosity, I tried setting the encoding format to parquet, which the batch page says is supported. However, the batch files generated are still in json format. With a specific path set, clean_up_batch_files still isn't working either.

Copy code

config:
        batch_size_rows: 50000
        batch_config:
          encoding:
            format: parquet
          storage:
            root: file://.meltano/batch-staging
          batch_size: 50000
        clean_up_batch_files: true

7 Views

Open in Slack

Previous Next