Hi, we have setup a pipeline for snapchat-to-bigqu...
# troubleshooting
m
Hi, we have setup a pipeline for snapchat-to-bigquery and finally its working and records are getting copied in bigquery. In this pipeline we are getting issues when trying to send data for schema
creatives
and
campaign
and the issue display in log file, please have a look below and let us know how we can solve this error so we can include these schemas as well: ```2022-04-12T124726.221664Z [info ] INFO Stream audience_segments, batch processed 6 records cmd_type=extractor job_id=snapchat-to-bigquery name=tap-snapchat-ads run_id=b5cf1c5e-0a68-4da3-bfd2-08c1a8fb8acf stdio=stderr 2022-04-12T124726.221941Z [info ] INFO Synced Stream: audience_segments, page: 1, records: 1 to 6 cmd_type=extractor job_id=snapchat-to-bigquery name=tap-snapchat-ads run_id=b5cf1c5e-0a68-4da3-bfd2-08c1a8fb8acf stdio=stderr 2022-04-12T124726.222196Z [info ] INFO Write state for Stream: audience_segments, ad_account ID: b9770de9-90e7-4f79-b952-31ce66c44570, value: 2022-03-11T174605.784000Z cmd_type=extractor job_id=snapchat-to-bigquery name=tap-snap> 2022-04-12T124726.222451Z [info ] INFO FINISHED Sync for Stream: audience_segments, parent_id: b9770de9-90e7-4f79-b952-31ce66c44570, total_records: 6 cmd_type=extractor job_id=snapchat-to-bigquery name=tap-snapchat-ads run_id=b5cf> 2022-04-12T124726.222704Z [info ] INFO START Syncing: creatives cmd_type=extractor job_id=snapchat-to-bigquery name=tap-snapchat-ads run_id=b5cf1c5e-0a68-4da3-bfd2-08c1a8fb8acf stdio=stderr 2022-04-12T124726.222977Z [debug ] {"type": "STATE", "value": {"currently_syncing": "organizations", "bookmarks": {"funding_sources": {"updated_at(parent_organization_id:5a044e78-3367-4f4b-91f5-54a603db56b2)": "2022-03-11T192037.> 2022-04-12T124726.223400Z [debug ] {"type": "SCHEMA", "stream": "creatives", "schema": {"properties": {"id": {"type": ["null", "string"]}, "updated_at": {"format": "date-time", "type": ["null", "string"]}, "created_at": {"format": > 2022-04-12T124726.223698Z [info ] INFO START Sync for Stream: creatives, parent_stream: ad_accounts, parent_id: b9770de9-90e7-4f79-b952-31ce66c44570 cmd_type=extractor job_id=snapchat-to-bigquery name=tap-snapchat-ads run_id=b5cf1> 2022-04-12T124726.223868Z [info ] INFO timezone = America/Los_Angeles cmd_type=extractor job_id=snapchat-to-bigquery name=tap-snapchat-ads run_id=b5cf1c5e-0a68-4da3-bfd2-08c1a8fb8acf stdio=stderr 2022-04-12T124726.224060Z [info ] INFO START Sync for Stream: creatives cmd_type=extractor job_id=snapchat-to-bigquery name=tap-snapchat-ads run_id=b5cf1c5e-0a68-4da3-bfd2-08c1a8fb8acf stdio=stderr 2022-04-12T124726.224263Z [info ] INFO Updating state with {'currently_syncing': 'organizations', 'bookmarks': {'funding_sources': {'updated_at(parent_organization_id:5a044e78-3367-4f4b-91f5-54a603db56b2)': '2022-03-11T192037.90> 2022-04-12T124726.224433Z [info ] INFO creatives schema: {'properties': {'id': {'type': ['null', 'string']}, 'updated_at': {'format': 'date-time', 'type': ['null', 'string']}, 'created_at': {'format': 'date-time', 'type': ['null',> 2022-04-12T124726.224623Z [info ] WARNING the pipeline might fail because of undefined fields: an empty object/dictionary indicated as {} cmd_type=loader job_id=snapchat-to-bigquery name=target-bigquery run_id=b5cf1c5e-0a68-4da3-b> 2022-04-12T124726.557134Z [info ] INFO METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.33484792709350586, "tags": {"endpoint": "creatives", "http_status_code": 200, "status": "succeeded"}} cmd_type=extracto> 2022-04-12T124726.558867Z [debug ] {"type": "RECORD", "stream": "creatives", "record": {"id": "0b5196a1-0a21-4c8e-87fe-6ce9b26d0dd6", "updated_at": "2022-03-11T193224.797000Z", "created_at": "2022-03-11T192052.320000Z", "name":> 2022-04-12T124726.564066Z [info ] CRITICAL 'RECORD' cmd_type=loader job_id=snapchat-to-bigquery name=target-bigquery run_id=b5cf1c5e-0a68…
p
Hey @mohammad_alam ! Are you able to run this snapchat sync fully without the target? You can try something like
meltano --log-level=debug invoke tap-snapchat > output.json
to write the data to a file. I'm trying to understand if the issue is related to extracting data from snapchat or loading data to BQ
m
Hi @pat_nadolny thanks for your reply. I checked using the command you shared, it seems tap extracted the
campaigns
records too since I searched it and found it in the
output.json
file. Please have a look
I would appreciate if you come back at this point please @pat_nadolny
p
@mohammad_alam I'm not totally sure whats going on here, its definitely something with the target. I'm able to run your output to target-jsonl so its not a json schema validation issue. I personally havent used big query so I'm not able to help test though. Theres nothing obviously wrong with the schema or record when I looked at them, you can try
cat output.json | meltano --log-level=debug invoke target-bigquery
to confirm that exact output is failing (not sure if that gets you anywhere).
Could this be related https://github.com/adswerve/target-bigquery/issues/32? Hey @ruslan_bergenov 👋 I saw you were on the issue - any ideas if this is related?
cc @edgar_ramirez_mondragon @aaronsteers in case you might see something I dont
r
@mohammad_alam, @pat_nadolny, the target gives a warning that the json schema is not complete, it contains an instance of empty properties indicated as {}.
Copy code
2022-04-12T12:47:26.224623Z [info     ] WARNING the pipeline might fail because of undefined fields: an empty object/dictionary indicated as {} cmd_type=loader job_id=snapchat-to-bigquery name=target-bigquery run_id=b5cf1c5e-0a68-4da3-b>
Later the target tries to convert each JSON field to a BigQuery field. To do that, the target needs to know what data type to convert JSON data type to. Because data types are not specified in JSON schema, the target fails. Our recommendation is to make sure JSON schema is complete, each field doesn't have instances of empty object/dictionary {}, but has actual data types specified. After that, we can pass JSON schema tap-catalog.json file during the sync.
p
@ruslan_bergenov Thanks for the response! Very helpful. Again im not super familiar with BQ or the target, I know that its a best practice to define the schema in full so data is consistent but theres also been discussion about having targets include a data type failsafe i.e. string, where possible. If I understand the issue here, having a failsafe would allow data to be loaded and not fail with the downside of the data type being a more generic string vs the exact data type. What do you think about that? Am I understanding the problem properly? Is that possible with BQ?
r
@pat_nadolny, What do you think about that? Is that possible with BQ? Yes, that makes sense. We will add this to a wish list/roadmap to target-bigquery. It should be possible with BQ. Am I understanding the problem properly? Yes, I think you do. 🙂
p
@ruslan_bergenov awesome, that would be ideal! I created an issue in the repo to track it https://github.com/adswerve/target-bigquery/issues/35 cc @edgar_ramirez_mondragon @aaronsteers
r
@pat_nadolny, thank you for submitting an issue! 👍