salmon-actor-23953
09/27/2022, 6:23 PMstate_id_suffix
in meltano.yml
environment definitions and the`--state-id-suffix` CLI option in meltano run -- Thanks @flat-caravan-93758!
📚 Documentation Improvements
• #6764 Added Part 1 of an ELT tutorial (Link).
• #6769 Added a guide for migrating existing dbt projects to Meltano (Link).
• #6737 Added a Meltano at a Glance page to the Getting Started section (Link).
• #6753 Added a handy list of video tutorials and demos (Link).
• #6764 Move installation guide to Getting Started section (Link).
• #6709 Added a tutorial for using Meltano with DataHub (Link).
• #6739 Add a tutorial for using Meltano with Jupyter (Link).
• #6743 Describe how project_id
is hashed (Link).
singer logo Singer SDK 0.11.1
✨ New
• #904 Add support for file-based processing with a new BATCH message type in taps and targets.
• #968 Add docs for VS Code debugging, including CLI entry points in cookiecutter templates.
🐛 Fixes
• #979 Resolve install failures on certain images due to missing wheels for ciso8601
.
• #972 Resolve issue where TypeError
is thrown by SQLConnector
cookiecutter implementation due to super()
references.
⚙️ Under the hood
• #979 Remove dependency on pipelinewise-singer-python
and move Singer library code into private module singer_sdk._singerlib
.
We're excited for you to try out these new releases - let us know what you think!strong-garage-76760
09/27/2022, 9:50 PMsalmon-actor-23953
09/27/2022, 10:52 PMstrong-garage-76760
09/28/2022, 5:39 AMsalmon-actor-23953
09/30/2022, 1:13 AMfancy-park-71136
09/30/2022, 3:59 AMpipelinewise-singer-python
and move Singer library code into private module singer_sdk._singerlib
.
One thing that we have discovered in pipelinewise-singer-python is the use of orjson.dumps for formatting the json. It has quite a speed increase over the plain json dumps. In fact in my port of tap-oracle I used pipelinewise-singer-python to get better performance. We did however put through a patch to handle decimal data better https://github.com/mjsqu/pipelinewise-singer-python/commit/2a839d245b92d1ec6d5801a30f5280fa77b22d0c .
I also noted the support for BATCH messages which is a great new feature.
In line with performance, I thought it would be useful to mention that a lot of database clients support the concept of array_size or fetchmany to improve performance over low latency networks. These options haven't / aren't always exposed and the default of one record at a time is slow. In our variants of tap-oracle, tap-mssql, tap-sybase, and tap-db2 we have exposed a config item to support bigger fetches leading to less round-trips to the database. When pulling over a lower latency network we have doubled the through-put with these settings. It may be something for consideration in the database section of the SDK.
This commit shows an example of fetchmany equivalent of an arraysize being exposed. https://github.com/s7clarke10/pipelinewise-tap-mssql/commit/673cc9e471f02de7cefb37f4179e9dc478e71ea4