Also, any plans for adopting Prefect(Workflow orch...
# plugins-general
r
Also, any plans for adopting Prefect(Workflow orchestrator) in the future?
e
It's in the roadmap! https://gitlab.com/meltano/meltano/-/issues/2668 If you have any experience or insights into how a prefect integration should look like, we'd appreciate a comment in the issue 😄
r
Is S3 available for a target loader?
e
r
@edgar_ramirez_mondragon how does this process, https://github.com/transferwise/pipelinewise-target-s3-csv, differ from using Meltano. Are these independent processes?
e
It'd be part of the same process but you have to add it as a custom plugin
r
Got it, thank you!
What, might cause this issue? Installing loader 'target-athena'... Loader 'target-athena' could not be installed: failed to install plugin 'target-athena'. ERROR: Could not find a version that satisfies the requirement target-athena (from versions: none) ERROR: No matching distribution found for target-athena Failed to install plugin(s)
e
I think there isn't a package for
target-athena
in pypi (though the repo seems to suggest there is cc @aaronsteers), you have to set the
pip_url
to point to the repo instead
git+<https://github.com/dataops-tk/target-athena.git>
a
@royzac - The error you describe could also be caused by running on Python 3.9 - which the SDK previously didn't support. I've just bumped that version restriction to allow python 3.9 installs, so you may have better luck retrying. Also - as @edgar_ramirez_mondragon points out, you'll need to link directly to the repo since we aren't yet publishing that target to PyPi. (I've updated README.md file to reflect that.)
r
@edgar_ramirez_mondragon @aaronsteers thanks for this! I was having some additional version dependencies issue before but have decided to take a step back and use poetry venv to mitigate some of these issues going forward. Have you had any issues adding Meltano and related packages to Poetry?
a
The only issue people repeatedly run into (especially if they are used to managing their own virtualenvs) is the fact that you can't run
-e
(editable) pip installed for a poetry project. So when building taps with the SDK, we've provided a shell script that automatically executes the tap's executable from the poetry venv in a single (portable) step.
We created a helper guide here but you might already have this level of context: Python Tips for SDK Developers — Meltano SDK 0.3.6 documentation
r
This is helpful, I'll look through it and circle back if I have any questions. Btw, the Athena plugin was successfully installed with python 3.9 and changing pip_url to point to the repo.
What is the proper syntax here? "meltano tap.shopify | target-s3-csv --config config.json " Can this pipeline be accessed and schedule on the meltano ui?
e
The correct syntax for an elt job is
Copy code
meltano elt tap-shopify target-athena
you usually wouldn't need a config file since Meltano can take settings from
meltano.yml
,
.env
or environment variables. To be able to see it in the UI, you'll have to create a schedule
Copy code
meltano schedule shopify-to-athena tap-shopify target-athena @daily
r
What might be causing the issue displayed in the pic? Any further issues that you see?
e
I think the key in
tap-shopify
should be
settings
, not
config
r
Tried. No change in the error.
e
Ah, sorry, Had the key names backwards. The
settings
definition in tap-athena should be an array of objects:
Copy code
plugins:
  extractors:
    - name: tap-shopify
      variant: singer-io
      pip_url: tap-shopify
      config:
        shop: royzac
        start_date: '2021-06-29'
  loaders:
    - name: target-athena
      namespace: target_athena
      pip_url: git+<https://github.com/MeltanoLabs/target-athena.git>
      executable: target-athena
      settings:
        - name: aws_access_key_id
        - name: aws_secret_access_key
        - name: aws_session_token
        - name: aws_profile
        - name: s3_bucket
        - name: s3_key_prefix
        - name: s3_staging_dir
        - name: delimiter
        - name: quotechar
        - name: add_metadata_columns
        - name: encryption_type
        - name: encryption_key
        - name: compression
        - name: naming_convention
        - name: temp_dir
Meltano will automatically look for environment variables based on those names, exactly the ones you wrote but you don't need to declare the env var names (e.g.
TAP_ATHENA_DELIMITER
will also work)
r
@aaronsteers there seems to be a version constraint on tap-athena. Can this be relaxed to accomodate 3.9?
Touching base on this. Thanks!
e
Hi @royzac. I would say
tap-athena
is still very much experimental and depends on not-yet-released features of the SDK, but if you wanna give a try with Python 3.9 (and assuming you actually can install it) can you try pointing the
pip_url
to
Copy code
git+<https://github.com/edgarrmondragon/tap-athena@support-python3.9>
a
@edgar_ramirez_mondragon and @royzac - This is now merged: https://github.com/MeltanoLabs/tap-athena/pull/1
(Relaxes the Python version restriction) Thanks, @edgar_ramirez_mondragon!
@royzac - To @edgar_ramirez_mondragon's other point, this is our first tap build on still-unreleased and experimental database tap support. If you can tolerate some changes over time and help us by reporting bugs, I think it might work well for you. If you are looking for a more polished/stable tap for Athena, I think it might take a few weeks (+/-) for this tap to arrive.
n
Found this by searching target-athena and found this conversation helpful. Any update here @aaronsteers and @edgar_ramirez_mondragon on whether target-athena is more stable now? Also looks like you were discussing target-athena, but switched topic to tap-athena for some reason.
a
Hi, @nicholas_degiacomo! We have been using target-athena and tap-athena for a short while now with good results for ELTP. ("P" being "publish", aka "reverse etl".) We don't yet publish to PyPi but you can install using the GitHub ref as the pip_url parameter. We hope to merge database support very soon to the SDK, and this is the reason the tap should not yet be considered "stable".
@nicholas_degiacomo - Can you say more about your use case? High volume or medium volume? Do you know if you need partitioning and/or if you have a preferred data format between CSV/JSON/Parquet/etc.?
Our current internal use cases are fairly low volume so we have not yet optimized for partitioned use cases or parquet data storage. Both could be added with some amount of effort if needed.
n
@aaronsteers i’m attempting to use the custom loader target-athena with minimal settings as a proof of concept. According to https://github.com/MeltanoLabs/target-athena, i need to set s3_bucket and athena_database as a minimal.
Copy code
{
  "s3_bucket": "my_bucket",
  "athena_database": "my_database"
}
I ran
meltano add --custom loader target-athena
and followed the prompts in the terminal. namespace: I accepted default pip_url: git+https://github.com/dataops-tk/target-athena.git s3_bucket:string, athena_database:string then running
meltano elt tap-shopify target-athena
gives me the error seen in the photo & copied below:
Copy code
ELT could not be completed: Cannot start extractor: Catalog discovery failed: command ['/Users/nickdegiacomo/learn-meltano/wed-oct13/meltano-quickstart/my-meltano-project/.meltano/extractors/tap-shopify/venv/bin/tap-shopify', '--config', '/Users/nickdegiacomo/learn-meltano/wed-oct13/meltano-quickstart/my-meltano-project/.meltano/run/elt/2021-10-13T172736--tap-shopify--target-athena/eea3155b-20a7-4b2c-8e9e-9fecf7d266f6/tap.a9c1bfe0-f849-4c5d-b674-16af3a7bcefe.config.json', '--discover'] returned 1
ELT could not be completed: Cannot start extractor: Catalog discovery failed: command ['/Users/nickdegiacomo/learn-meltano/wed-oct13/meltano-quickstart/my-meltano-project/.meltano/extractors/tap-shopify/venv/bin/tap-shopify', '--config', '/Users/nickdegiacomo/learn-meltano/wed-oct13/meltano-quickstart/my-meltano-project/.meltano/run/elt/2021-10-13T172736--tap-shopify--target-athena/eea3155b-20a7-4b2c-8e9e-9fecf7d266f6/tap.a9c1bfe0-f849-4c5d-b674-16af3a7bcefe.config.json', '--discover'] returned 1
If i change the yml to be config rather than settings, I get a different error:
Copy code
ValueError: argument type <class 'list'> is not in the flattenalbe types (<class 'collections.abc.Mapping'>,) 
argument type <class 'list'> is not in the flattenalbe types (<class 'collections.abc.Mapping'>,)
What would be next steps from here? Do I keep playing with the configs, or is this related to the stability of the packages?
@aaronsteers seems like i get this issue if i just type in
meltano invoke tap-shopify
If i input
meltano config tap-shopify list
or
meltano config tap-shopify
it seems it is properly configured… I installed by running
meltano add extractor tap-shopify
and with environmental variables assuming my store’s name was nick_test_store, and the api key was nick_api_key export TAP_SHOPIFY_SHOP=nick_test_store export TAP_SHOPIFY_START_DATE=2021-10-01T000000Z export TAP_SHOPIFY_API_KEY=nick_api_key
solved this issue, and many more after it. the immediate issue was an issue with config on api key. Meltano/singer actually wanted the Shopify API password. Then had an issue with the Shopify schema and had to put in
Copy code
plugins:
  extractors:
  - name: tap-shopify
    schema:
      orders:
          subtotal_price_set:
              type: ["string", "null"]
          total_discounts_set:
              type: ["string", "null"]
          total_line_items_price_set:
              type: ["string", "null"]
          total_price_set: 
              type: ["string", "null"]
          total_shipping_price_set:
              type: ["string", "null"]
          total_tax_set:
              type: ["string", "null"]
          discounted_price_set:
              type: ["string", "null"]
          price_set:
now facing an issue where columns are misaligned in athena