Hi team I am working on the shopify tap which Matatika team Meltano #singer-tap-development

Hi team, I am working on the shopify tap which Mat...

trinath

12/06/2022, 10:34 PM

Hi team, I am working on the shopify tap which Matatika team built (THANK YOU SO MUCH !!), I think the customer stream is failing when an empty array is returned in one of the attribute in the json response. Would you happen to know if i am dealing with some python version issue or a big query problem, below is the screeenshot of the data and the error. (1) Screenshot #1 - is the data from the stream (2) Screenshot#2 where i think the customer.json schema is failing. (3) Screenshot #3 is the error i am receiving . Any thoughts on where the issue might be @aaron_phethean @Reuben (Matatika)

dan_ladd

12/06/2022, 11:38 PM

target-bigquery doesn't allow empty records like that. You have to fill out any empty {} in the schema. I believe some database targets allow it, but not BigQuery. Ideally, the target should handle those fields as JSON. You can overcome this by using

--catalog your-fixed-schema.json

when running meltano elt

trinath

12/06/2022, 11:58 PM

Thank you, Dan! Can you clarify a sample json for “your-fixed-schema.json” would look like so that i can test run it? sorry i am new to this stuff.

dan_ladd

12/07/2022, 12:05 AM

heres what a catalog file looks like https://hub.meltano.com/singer/spec/#catalog-files. You would have to add in the schema you have

here▾

, and fill out the empty {} in tax_exemptions that describes whatever your data looks like

dan_ladd

12/07/2022, 12:08 AM

If you don't need tax_exemptions, you can probably just remove it from your catalog and it should work.

aaron_phethean

12/07/2022, 7:20 AM

Great work @dan_ladd - to confirm my understanding, we should make a PR to target-bigquery to fix it for everyone? We currently test daily with a postgres target, and weekly with snowflake target. Adding big query testing has been on our plan for awhile!

aaron_phethean

12/07/2022, 7:25 AM

Glad you are getting use from the tap @trinath let me know if need any help. Another item on our plan was to release dbt models to the dbt hub. DM me if you want to chat about that or anything else.

dan_ladd

12/07/2022, 3:02 PM

Yea, Ideally this would be fixed in the default target-bigquery for support of the JSON type. I haven't had a chance to look into it yet. @alexander_butler has been working on a new sdk based target-bigquery that likely won't run into this issue. Since everything is initially ingested as raw json, which can then be parsed out.

alexander_butler

12/07/2022, 5:30 PM

Yeah it on the agenda to put it on the hub but as-is it will already handle this case by using a fixed schema where all data is loaded into a JSON column. You can unpack it with dbt during staging. It also can automatically generate the DDL to create a view on top of the ingested data which should unpack it/provide typing for you.

alexander_butler

12/07/2022, 5:32 PM

There are 3 denormalized load pathways (unpacking in flight) and 4 fixed schema pathways that leverage every bigquery load method from streaming, to storage write api, to GCS staging bucket -> load job, to batch load job from memory.

trinath

12/09/2022, 1:08 AM

Thnak you all for the help, really appreciate it. I think i might end up using postgres to land all the data and maybe then load it to bigquery for any analytic workloads. Seems like directly loading into bigquery might not be a production ready approach

alexander_butler

12/09/2022, 1:15 AM

I think the risk/reward of trying my variant of target-bigquery with tap-shopify to see if it just works might obviate the added complexity of a 2 step pipeline? 🤷 I use it in production across many taps successfully but I have no vested interest at the same time so do what works for you 😃

trinath

12/09/2022, 1:52 AM

ah i see the approach you are recommending, i will give this a shot. Appreciate the advise.

Open in Slack

Previous Next