nicholas_degiacomo
11/05/2021, 3:56 PMFrom my meltano.yml
schema:
orders:
subtotal_price_set:
type:
- object
- 'null'
total_discounts_set:
type:
- object
- 'null'
total_line_items_price_set:
type:
- object
- 'null'
total_price_set:
type:
- object
- 'null'
total_shipping_price_set:
type:
- object
- 'null'
total_tax_set:
type:
- object
- 'null'
discounted_price_set:
type:
- object
- 'null'
price_set:
type:
- object
- 'null'
2 - I’ve experimented with target-bigquery instead, which has it’s own problems. It fails because of the same schema issue in orders, but this time I’m not able to resolve this using the same schema fix as I use for target-athena. So instead I need to make a catalog.json file.
TAP_SHOPIFY__CATALOG=extract/tap-shopify.catalog.json
here the issue is that i’m only getting customers, orders, products, and transactions. I’m failing to get inventory_levels and inventory_item which are supported by the tap. I added those two by running
meltano invoke tap-shopify --discover > test.json
and copying the streams into my tap-shopify.catalog from test. Why doesn’t the same schema override for athena work for here? Why can’t i get inventory levels and items piping through?visch
11/05/2021, 6:15 PMmeltano elt tap-shopify target-athena --job_id=abc
?|
When I load the data into Athena, all the columns come up as a varchar of type string making it inconvenient to use.From https://github.com/andrewcstewart/target-athena/blob/master/target_athena/utils.py#L162 , target-athena looks to build a struct in athena. Which means your DDL should work fine. What's the DDL for your table that's created in Athena? Is it a struct currently or a string?
Why doesn’t the same schema override for athena work for here?Because you're overriding the catalog with TAP_SHOPIFY__CATALOG=extract/tap-shopify.catalog.json Select doesn't work if you're passing in a catalog. Select builds the catalog for you.
visch
11/05/2021, 6:37 PMnicholas_degiacomo
11/06/2021, 5:31 PMpresentment_currency
string COMMENT 'from deserializer',
subtotal_price_set
string COMMENT 'from deserializer',
total_discounts_set
string COMMENT 'from deserializer',
total_line_items_price_set
string COMMENT 'from deserializer',
total_price_set
string COMMENT 'from deserializer',
total_shipping_price_set
string COMMENT 'from deserializer',
total_tax_set
string COMMENT 'from deserializer',
total_price
string COMMENT 'from deserializer',
line_items
string COMMENT 'from deserializer',
processing_method
string COMMENT 'from deserializer',
order_number
string COMMENT 'from deserializer',
confirmed
string COMMENT 'from deserializer',
total_discounts
string COMMENT 'from deserializer',
total_line_items_price
string COMMENT 'from deserializer',
order_adjustments
string COMMENT 'from deserializer',
shipping_lines
string COMMENT 'from deserializer',
admin_graphql_api_id
string COMMENT 'from deserializer',
device_id
string COMMENT 'from deserializer',
cancel_reason
string COMMENT 'from deserializer',
currency
string COMMENT 'from deserializer',
payment_gateway_names
string COMMENT 'from deserializer',
source_identifier
string COMMENT 'from deserializer',
id
string COMMENT 'from deserializer',
processed_at
string COMMENT 'from deserializer',
referring_site
string COMMENT 'from deserializer',
contact_email
string COMMENT 'from deserializer',
location_id
string COMMENT 'from deserializer',
fulfillments
string COMMENT 'from deserializer',
customer
string COMMENT 'from deserializer',
test
string COMMENT 'from deserializer',
total_tax
string COMMENT 'from deserializer',
payment_details
string COMMENT 'from deserializer',
number
string COMMENT 'from deserializer',
email
string COMMENT 'from deserializer',
source_name
string COMMENT 'from deserializer',
landing_site_ref
string COMMENT 'from deserializer',
shipping_address
string COMMENT 'from deserializer',
total_price_usd
string COMMENT 'from deserializer',
closed_at
string COMMENT 'from deserializer',
discount_applications
string COMMENT 'from deserializer',
name
string COMMENT 'from deserializer',
note
string COMMENT 'from deserializer',
user_id
string COMMENT 'from deserializer',
source_url
string COMMENT 'from deserializer',
subtotal_price
string COMMENT 'from deserializer',
billing_address
string COMMENT 'from deserializer',
landing_site
string COMMENT 'from deserializer',
taxes_included
string COMMENT 'from deserializer',
token
string COMMENT 'from deserializer',
app_id
string COMMENT 'from deserializer',
total_tip_received
string COMMENT 'from deserializer',
browser_ip
string COMMENT 'from deserializer',
discount_codes
string COMMENT 'from deserializer',
tax_lines
string COMMENT 'from deserializer',
phone
string COMMENT 'from deserializer',
note_attributes
string COMMENT 'from deserializer',
fulfillment_status
string COMMENT 'from deserializer',
order_status_url
string COMMENT 'from deserializer',
client_details
string COMMENT 'from deserializer',
buyer_accepts_marketing
string COMMENT 'from deserializer',
checkout_token
string COMMENT 'from deserializer',
tags
string COMMENT 'from deserializer',
financial_status
string COMMENT 'from deserializer',
customer_locale
string COMMENT 'from deserializer',
checkout_id
string COMMENT 'from deserializer',
total_weight
string COMMENT 'from deserializer',
gateway
string COMMENT 'from deserializer',
cart_token
string COMMENT 'from deserializer',
cancelled_at
string COMMENT 'from deserializer',
refunds
…nicholas_degiacomo
11/06/2021, 6:00 PMnicholas_degiacomo
11/06/2021, 6:04 PMQ: Why doesn’t the same schema override for athena work for here?
A: Because you’re overriding the catalog with TAP_SHOPIFY__CATALOG=extract/tap-shopify.catalog.jsonUnfortunately if I unset TAP_SHOPIFY__CATALOG or even just start out with just the schema in the YML like doing for athena, it fails because of the schema
visch
11/06/2021, 8:35 PMsubtotal_price
string COMMENT 'from deserializer', ` , with your override this should be an object. Something's causing that not to happen.
Are you also running meltano with the TAP_SHOPIFY__CATALOG
set?nicholas_degiacomo
11/06/2021, 9:00 PMmeltano elt tap-shopify target-athena
2. BigQuery: I run TAP_SHOPIFY__CATALOG set for Bigquery with the following command meltano elt tap-shopify target-bigquery. This is ran it it’s own virtual environment. This command fails if just using the schema in the yml (see error below). The way to resolve is to use the catalog file I attached previously and setting the TAP_SHOPIFY__CATALOG environment variable within the virtualenv.
line 22, in get_type\n raise ValueError(\n', "ValueError: 'type' or 'anyOf' are required fields in property: {}\n"]
nicholas_degiacomo
11/06/2021, 9:01 PMvisch
11/06/2021, 9:04 PMvisch
11/06/2021, 9:05 PMnicholas_degiacomo
11/06/2021, 9:15 PMvisch
11/06/2021, 9:25 PMWhat's curious isMy best two guesses without diving: 1. Either https://github.com/andrewcstewart/target-athena/blob/master/target_athena/utils.py#L162 isn't working at all (I doubt, but I don't know) 2. The schema isn't actually set properly forstring COMMENT 'from deserializer', ` , with your override this should be an object. Something's causing that not to happen.subtotal_price
subtotal_price
, if you run meltano invoke tap-shopify > out
and look at the schema for the orders
stream I wonder if subtotal_price
is actually being set to Object properly. If that's working then I'd run cat out | meltano invoke target-athena
and see if athena properly sets the tables for you. Then it's more obvious where to go I thinknicholas_degiacomo
11/06/2021, 9:33 PMnicholas_degiacomo
11/06/2021, 9:39 PM"subtotal_price": {"format": "singer.decimal", "type": ["null", "string"]}
visch
11/08/2021, 1:37 AMsubtotal_price
is just wrong. The field you're overriding to be an object is actually subtotal_price_set
, so I'm going to focus on that field right now.
Focusing just on subtotal_price_set
, in the Schema that you posted it is set properly to an object from
{"type": "SCHEMA", "stream": "orders", "schema": {"properties": {"presentment_currency": {"type": ["null", "string"]}, "subtotal_price_set": {"type": ["object", "null"]},...
From the Athena ddl
`subtotal_price_set` string COMMENT 'from deserializer',
Odd for sure.
From here next steps I'd take are:
1. Take the schema data you listed for the orders
stream
2. Snag one record line for the orders
stream, make a test file just for that, let's call it orders_test
3. Run cat orders_test | meltano invoke target-athena
, that should replicate the error again (I'd try to point to a different database just to seperate your tests from the stuff in production).
4. After you have that easy file around to do some testing, now we have a failing test and we just need to make it pass
5. I'd throw a log line here https://github.com/andrewcstewart/target-athena/blob/master/target_athena/utils.py#L162 (or setup a break point here in a debugger) , and just see what's happening. Is this condition getting hit? What is attributes['type']
when subtotal_price_set
is ran through here?visch
11/08/2021, 1:37 AMsubtotal_price_set
in athena to a struct insteadnicholas_degiacomo
11/08/2021, 5:58 PMmeltano invoke tap-shopify > output_test.json
I then opened up output_test.json and picked a random
row of type RECORD.
I copied everything to the right of
{“type”: “RECORD”, “stream”: “orders”, “record”:
into a file orders_test
when i tried to run
cat orders_test | meltano invoke target-athena
I get the following error
RuntimeError: Config validation failed: 's3_bucket' is a required property
Failed validating 'required' in schema:
{'properties': {'add_metadata_columns': {'type': ['boolean', 'null']},
'athena_database': {'type': ['string']},
'aws_access_key_id': {'type': ['string', 'null']},
'aws_profile': {'type': ['string', 'null']},
'aws_region': {'type': ['string']},
'aws_secret_access_key': {'type': ['string', 'null']},
'aws_session_token': {'type': ['string', 'null']},
'compression': {'default': 'gzip',
'type': ['string', 'null']},
'delimiter': {'default': ',',
'type': ['string', 'null']},
'encryption_key': {'type': ['string', 'null']},
'encryption_type': {'type': ['string', 'null']},
'flatten_records': {'type': ['boolean', 'null']},
'naming_convention': {'type': ['string', 'null']},
'object_format': {'default': 'csv',
'type': ['string', 'null']},
'quotechar': {'default': '"',
'type': ['string', 'null']},
's3_bucket': {'type': ['string']},
's3_key_prefix': {'type': ['string', 'null']},
'stream_map_config': {'properties': {},
'type': ['object', 'null']},
'stream_maps': {'properties': {},
'type': ['object', 'null']},
'temp_dir': {'type': ['string', 'null']}},
'required': ['s3_bucket', 'athena_database', 'aws_region'],
'type': 'object'}
On instance:
{'aws_region': 'us-west-2',
'compression': 'gzip',
'delimiter': ',',
'object_format': 'jsonl',
'quotechar': '"',
'schema_name': 'shopify'}
JSONSchema was: {'type': 'object', 'properties': {'s3_bucket': {'type': ['string']}, 'athena_database': {'type': ['string']}, 'aws_region': {'type': ['string']}, 'aws_access_key_id': {'type': ['string', 'null']}, 'aws_secret_access_key': {'type': ['string', 'null']}, 'aws_session_token': {'type': ['string', 'null']}, 'aws_profile': {'type': ['string', 'null']}, 's3_key_prefix': {'type': ['string', 'null']}, 'naming_convention': {'type': ['string', 'null']}, 'object_format': {'type': ['string', 'null'], 'default': 'csv'}, 'compression': {'type': ['string', 'null'], 'default': 'gzip'}, 'encryption_type': {'type': ['string', 'null']}, 'encryption_key': {'type': ['string', 'null']}, 'add_metadata_columns': {'type': ['boolean', 'null']}, 'flatten_records': {'type': ['boolean', 'null']}, 'delimiter': {'type': ['string', 'null'], 'default': ','}, 'quotechar': {'type': ['string', 'null'], 'default': '"'}, 'temp_dir': {'type': ['string', 'null']}, 'stream_maps': {'type': ['object', 'null'], 'properties': {}}, 'stream_map_config': {'type': ['object', 'null'], 'properties': {}}}, 'required': ['s3_bucket', 'athena_database', 'aws_region']}
Not sure if that error is helpful at all ^
Is the point of just taking a single order in orders_test just to speed the failure process? One of the stores i’m pulling from has a very small amount of data so it’s not that much of a time save to rerun the whole pipe, so i can skip to 5? Thanks for showing that tho, can be useful in the future. Where can i find the code for target-athena relative to my meltano project?nicholas_degiacomo
11/08/2021, 5:59 PMAnother option to skip all the debugging would be to change the column type ofIs there a shortcut other than copying to a new column and casting to struct by specifying the full schema? Can i get the assumed struct meltano wants to use anywhere?in athena to a struct insteadsubtotal_price_set
visch
11/08/2021, 6:11 PMRuntimeError: Config validation failed: 's3_bucket' is a required propertyHave you tried to fix this one?
Can i get the assumed struct meltano wants to use anywhere?It's
target-athena
, not meltano subtle but very key difference.
Is there a shortcut other than copying to a new column and casting to struct by specifying the full schema?Maybe, I don't know.
target-athena
can do it based on the code I linked to but you have to debug thatnicholas_degiacomo
11/08/2021, 7:17 PMnicholas_degiacomo
11/08/2021, 7:43 PMvisch
11/08/2021, 11:38 PMmeltano.yml
in the repo here https://github.com/MeltanoLabs/target-athena/blob/main/meltano.yml which points to the local executable which shows how to run a local copy pretty easilyvisch
11/08/2021, 11:39 PMso do i need to branch target-athena, make a change a change, push the code, and in the meltano.yml file point to my edited repo?
Or can i reference a local version of my code?Both can work, however you like to work
visch
11/08/2021, 11:40 PM