I'd like to use the Shopify extractor to connect w...
# plugins-general
r
I'd like to use the Shopify extractor to connect with Aurora DB through SQLAlchemy. Is this possible?
e
Hi @royzac! There is tap-shopify and, assuming your aurora instance is postgres-based, there is target-postgres (though it doesn't use sqlalchemy to establish the connection or manipulate any db objects).
r
I see so if wanted to map out the data through ELT the tables would have to be in place before then, correct? Also, could the pipeline interact with a the aurora data api with a secret credential from SecretManager?
ETL*
e
Yeah, if you wanted to further transform the raw shopify data, the conventional ELT approach (which Meltano embraces) is to do it in the database (aurora postgres in this case) using SQL. Meltano supports dbt for that stage of your pipeline. For the aurora data api, I don't think there is any connector that uses it and instead all rely on the generic database clients (psycopg2, etc.). If you want to get data out of the aurora instance with the serverless api, I can imagine one of the existing connectors and replacing calls to
psycopg2.connect
with
aurora_data_api.connect
from https://github.com/cloud-utils/aurora-data-api. To use a credential from SecretManager, depending on your deployment strategy, you could shim the secret as an environment variable in the Meltano container/ec2 instance since environment variables are supported to configure connectors without any additional effort.
r
This is very helpful Edgar, thank you!
How would I change update_at_min argument in the shopify URL for the pipeline? It's not going far back enough and changing "start_date:" does not seem to affect it. Example below.
Copy code
<https://royzac.myshopify.com/admin/api/2021-04/checkouts.json?since_id=1&updated_at_min=2021-08-29+04%3A51%3A23%2B00%3A00&updated_at_max=2021-08-29+05%3A25%3A27%2B00%3A00&limit=175&status=any>
e
start_date
should work afaik. How are you passing configuration to the tap?
r
My yml file is attached.
Happy to do a very brief video tour of my process if it make sense to.
e
@royzac if you run
meltano config tap-shopify list
does it show the expected config keys (api key, start date, shop)? You can also run Meltano in debug mode
meltano --log-level=debug invoke tap-shopify
to figure out what is going on in the tap.
Happy to do a very brief video tour of my process if it make sense to.
I'd be super helpful if you can do that! simple smile
e
Looking at the logs in the first part of the video it does seem that the
start_date
is being used, right? Unless I'm missing something. In the second part you don't get any new data because a state with the job_id
shopify-to-csv
was present in Meltano's system db and there don't seem to be any new records
r
Thanks Edgar, that's helpful. Unfortunately there is another error coming up. I've attached a couple logs. What woud be causing this?
buceph-json-job-log.txt,buceph-csv-job-log.txt
e
Oh, for those two targets you have to create the configured output directory, in this case
output
(imo that should be their default behavior simple smile)
r
Thanks Edgar! I appreciate all your help!
e
np!
r
What specifcally is this parameter referring to? th.Property("athena_database", th.StringType, required=True) athena is query engine not a database. It looks like it's the last thing tripping up the ETL. Couple pics attached.
e
Ah, the readme is likely outdated. Here's the full config schema: https://github.com/MeltanoLabs/target-athena/blob/main/target_athena/target.py. I'm sure
athena_database
is the name of the database in the athena catalog. So you need to define at least three settings:
s3_bucket
,
athena_database
and
aws_region
.
r
Ok, that makes sense. Unfortunately, I'm still getting an error. Not sure what's wrong here. The bucket is public.
e
can you run
meltano config target-athena list
to confirm configs are set and they have the right values?
r
This is the result/
*.
Resolved. environment variable were changed with a TARGET_ATHENA prefix, and s3_staging_dir had to be added.
It seems like my pipe is streaming data to the s3 now but as empy txt files and there is a bug occuring. See attached. I suppose it is probably what's causing txt files to be produced versus csv. Thoughts on the issue?
e
It seems like some fields in
orders
are missing a
"type"
entry in their JSON schema: https://github.com/singer-io/tap-shopify/blob/master/tap_shopify/schemas/orders.json#L9-L14
r
Is there a work around for this? It's dummy data at the moment.
e
You can try overriding the properties using the the schema extra. Would look like this:
Copy code
plugins:
  extractors:
  - name: tap-shopify
    schema:
      orders:
        subtotal_price_set:
          properties: {}
r
This is an interesting issue, since it didn't occur when I used a CSV loader. Why is this? Is this a bug that I could resolve in the repo?
e
Yes, this is a bug in the tap repo. I'm not familiar with the Shopify
orders
endpoint to know why those fields are left empty (maybe they're supposed to be variant JSON objects?).
r
Hm, still begs the question why the pipe works fine for a csv loader and not athena? I'll see if I can modify the settings somehow. I tried your example but it didn't work out unfortunately.
e
target-csv
must be better at handling "variant" fields (i.e. object fields without properties)
n
I’m also interested in using a tap-shopify and a target-athena (but i’m happy for anywhere in AWS at this point, e.g. target-s3)… This process seems to work fine on a (GCP) target-bigquery
r
@edgar_ramirez_mondragon what would cause this issue? I've changed my yaml file and was working fine before.
e
hi @royzac. does
meltano invoke tap-shopify
work normally?
r
No
e
ok can you run in debug mode
Copy code
meltano --log-level=debug invoke tap-shopify
r
log.txt
e
ah ok, so then the
schema
definition in meltano.yml could be off. What's the output of
meltano config tap-shopify list --extras
?
r
Plugin 'tap-bitso' is not known to Meltano
Is this a custom plugin?
e
sorry, that's what I'm testing with 😅. Should be
meltano config tap-shopify list --extras
of course
r
no worries!
e
aha, I don't know how I didn't see this before. the YAML indentation of
variant
is wrong
meltano thinks it's part of
schema
in fact that's true also for
pip_url
and
settings
r
Perfect, thanks Edgar!!