I'm using target_snowflake and had some questions ...
# singer-targets
d
I'm using target_snowflake and had some questions around batching. The config documentation here seems to suggest that it's disabled by default, with no default value for
batch_size_rows
and
true
for
clean_up_batch_files
. However, I'm seeing this in the logs still:
Copy code
target-snowflake     | Target sink for 'hist-SecurityReference' is full. Current size is '10000'. Draining... cmd_type=elb consumer=True job_name=default:tap-mssql-analytics-to-target-snowflake:hist-SecurityReference name=target-snowflake producer=False run_id=826fdf57-7c14-4765-8fa9-02207cabd6ef stdio=stderr string_id=target-snowflake
In addition, I'm seeing a bunch of .json.gz files showing up in my directory (see screenshot). That seems contrary to the documentation. If I manually set
batch_size_rows
, I do see that the size is adjusted to the new value. However, even with
clean_up_batch_files
explicitly set to
true
, the files are not getting cleaned up. I'm also a bit concerned about batch messages, which I'm not sure if it's being used here or not. It sounds like it is, though, because it is creating those intermediate .json.gz files on my local filesystem, which I presume are batch files. Why I'm confused, though, is because the bottom of the page notes some known limitations of batch, which says that it doesn't incremental or stream maps. The incremental issue seems to have people commenting that it was already fixed in target-snowflake, so maybe that's why it seems to be working for me? I will need to test in more detail. However, the issue for the stream maps is still open, yet it seems like my stream map to rename a column does seem to work. Was it perhaps fixed specifically in target-snowflake again?
Is there also a max batch size of 10000? I tried setting the size to 50000 and it still ended up creating 5 batch files for 45k records, though the draining log message didn't show anymore.
b
I am not sure about the
clean_up_batch_files
but I have seen others post that they needed to set both
batch_size_rows
and
batch_size
to the same number when using meltanolabs target-snowflake. Here is an example I pulled from a post.
Copy code
batch_size_rows: 1000000
      batch_config:
        batch_size: 1000000
d
So that does address the batch size, but it did uncover something new. By introducing batch_config, it also expects encoding and storage. Out of curiosity, I tried setting the encoding format to parquet, which the batch page says is supported. However, the batch files generated are still in json format. With a specific path set, clean_up_batch_files still isn't working either.
Copy code
config:
        batch_size_rows: 50000
        batch_config:
          encoding:
            format: parquet
          storage:
            root: file://.meltano/batch-staging
          batch_size: 50000
        clean_up_batch_files: true