Hi all I've been trying to look at documentation a...
# plugins-general
j
Hi all I've been trying to look at documentation around configuring batch sizing for Meltano's version of target-snowflake. I saw https://github.com/meltano/sdk/pull/2248/ and was very excited for this to be happening! I was wondering if there was a timeline if the target was also going to have this feature? (Not sure if this PR is also related too: https://github.com/MeltanoLabs/target-snowflake/pull/154?)
e
I was wondering if there was a timeline if the target was also going to have this feature?
The target's update to the latest singer SDK is currently blocked by https://github.com/MeltanoLabs/target-snowflake/issues/151
👍 1
j
Hi Edgar, sorry to bother you again on this. I'm currently just trying to get a handle on expectations around when that Blocker is planning to be prioritized? This is my current setup; Tap: tap-mssql (BuzzCutNorman variant [Meltano SDK]) https://hub.meltano.com/extractors/tap-mssql--buzzcutnorman/ Target: target-snowflake (Meltano variant) https://hub.meltano.com/extractors/tap-snowflake I would love to stick to the Meltano variants as they are the official ones but execution time on them has been slow and still stuck to the 10,000 rows/file. I have been trying to explore my options over the past few months as some of my smaller databases have long load times (it took 10 minutes to upload an entire schema containing a total of 1.5 million rows), and by that timing extrapolated to my largest databases with the same database schema it will take at least 4 days to do an initial full load. I've played around with Dagster to parallelize the process so at least I don't have to wait one after another to get the data in place but still having to burn days worth of credits in Snowflake feels bad.
Any suggestions or recommendations would be super appreciated 🙏
For what it's worth, these are my current YAML files (extractor)
Copy code
plugins:
  extractors:
  - name: tap-mssql
    variant: buzzcutnorman
    pip_url: git+<https://github.com/BuzzCutNorman/tap-mssql.git>
    config:
      dialect: mssql
      driver_type: pyodbc
      host: 127.0.0.1,4433 #Docker Composed: sqlserver #Non Docker: 127.0.0.1,4433
      port: 1433
      #user: See .ENV
      #password: See .ENV
      #database: See .ENV
      sqlalchemy_eng_params:
        fast_executemany: 'True'
      sqlalchemy_url_query:
        driver: ODBC Driver 17 for SQL Server
        TrustServerCertificate: yes
  - name: tap-mssql-admin
    inherit_from: tap-mssql
    select:
    ...
  - name: tap-mssql-core
    inherit_from: tap-mssql
    select:
    ...
  - name: tap-mssql-content
    inherit_from: tap-mssql
    config:
      stream_maps:
        Content-AttributeDefinitionUpdate:
          AttributeDefinitionId: AttributeDefinitionId
          ContentUpdateId: ContentUpdateId
          DisplayOrder: Order
        Content-ContentUpdate:
          ContentUpdateId: ContentUpdateId
          IsDisabled: IsDisabled
          __else__: __NULL__
          __alias__: Content-ContentEnabledState
        Content-EnumAttributeDefinitionValue:
          ContentUpdateId: ContentUpdateId
          EnumValue: EnumValue
          DisplayOrder: Order
        Content-WorkflowStateHistory:
          ContentUpdateId: ContentUpdateId
          WorkflowState: WorkflowState
          WorkflowStateDate: WorkflowStateDate
          __else__: __NULL__
    select:
    ...
  - name: tap-mssql-inventory
    inherit_from: tap-mssql
    select:
    ...
  - name: tap-mssql-shopping
    inherit_from: tap-mssql
    select:
    ...
(loader)
Copy code
plugins:
  loaders:
  - name: target-snowflake
    variant: meltanolabs
    pip_url: meltanolabs-target-snowflake
    config:
      add_record_metadata: false # Can enable if we want more metadata
      #account: See .ENV
      #database: TS See .ENV 
      #user: See .ENV
      #role: See .ENV
      #warehouse: See .ENV
      #password: See .ENV
      default_target_schema: Raw # ${MELTANO_EXTRACT__LOAD_SCHEMA} # Meltano chooses the schema based on the `name` of the extractor
      hard_delete: false
  - name: target-jsonl
    variant: andyh1203
    pip_url: target-jsonl
@BuzzCutNorman pinging you as well in case you have any wisdom to give to help my current plight 😅
b
During Office Hours on 2/28 @Edgar Ramírez (Arch.dev) mentioned that MeltanoLabs target-snowflake would take a day or a couple of days to update its testing framework to allow the target to be bumped to the latest version of the Singer SDK which includes
batch_size_rows
. I am not sure if the linked above is the best one to track that work.
❤️ 1