Hi team, I am working on the shopify tap which Mat...
# singer-tap-development
t
Hi team, I am working on the shopify tap which Matatika team built (THANK YOU SO MUCH !!), I think the customer stream is failing when an empty array is returned in one of the attribute in the json response. Would you happen to know if i am dealing with some python version issue or a big query problem, below is the screeenshot of the data and the error. (1) Screenshot #1 - is the data from the stream (2) Screenshot#2 where i think the customer.json schema is failing. (3) Screenshot #3 is the error i am receiving . Any thoughts on where the issue might be @aaron_phethean @Reuben (Matatika)
d
target-bigquery doesn't allow empty records like that. You have to fill out any empty {} in the schema. I believe some database targets allow it, but not BigQuery. Ideally, the target should handle those fields as JSON. You can overcome this by using
--catalog your-fixed-schema.json
when running meltano elt
t
Thank you, Dan! Can you clarify a sample json for “your-fixed-schema.json” would look like so that i can test run it? sorry i am new to this stuff.
d
heres what a catalog file looks like https://hub.meltano.com/singer/spec/#catalog-files. You would have to add in the schema you have

here

, and fill out the empty {} in tax_exemptions that describes whatever your data looks like
If you don't need tax_exemptions, you can probably just remove it from your catalog and it should work.
a
Great work @dan_ladd - to confirm my understanding, we should make a PR to target-bigquery to fix it for everyone? We currently test daily with a postgres target, and weekly with snowflake target. Adding big query testing has been on our plan for awhile!
Glad you are getting use from the tap @trinath let me know if need any help. Another item on our plan was to release dbt models to the dbt hub. DM me if you want to chat about that or anything else.
d
Yea, Ideally this would be fixed in the default target-bigquery for support of the JSON type. I haven't had a chance to look into it yet. @alexander_butler has been working on a new sdk based target-bigquery that likely won't run into this issue. Since everything is initially ingested as raw json, which can then be parsed out.
a
Yeah it on the agenda to put it on the hub but as-is it will already handle this case by using a fixed schema where all data is loaded into a JSON column. You can unpack it with dbt during staging. It also can automatically generate the DDL to create a view on top of the ingested data which should unpack it/provide typing for you.
There are 3 denormalized load pathways (unpacking in flight) and 4 fixed schema pathways that leverage every bigquery load method from streaming, to storage write api, to GCS staging bucket -> load job, to batch load job from memory.
t
Thnak you all for the help, really appreciate it. I think i might end up using postgres to land all the data and maybe then load it to bigquery for any analytic workloads. Seems like directly loading into bigquery might not be a production ready approach
a
I think the risk/reward of trying my variant of target-bigquery with tap-shopify to see if it just works might obviate the added complexity of a 2 step pipeline? 🤷 I use it in production across many taps successfully but I have no vested interest at the same time so do what works for you 😃
t
ah i see the approach you are recommending, i will give this a shot. Appreciate the advise.