I'm running into an issue with the `target-bigquer...
# troubleshooting
j
I'm running into an issue with the
target-bigquery
loader where it hangs after reaching a certain point. I'm creating a new
users
table in BigQuery inside a dataset that already exists, the loader is successfully creating the table (but not with any non-generic columns) but not creating any rows inside the table. Here are the logs from when I run
meltano run tap-healthie target-bigquery
(tap-healthie is a custom extractor I've written):
Copy code
Environment 'dev' is active
Beginning full_table sync of 'users'... 
Tap has custom mapper. Using 1 provided map(s). 
METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.453015, "tags": {"stream": "users", "endpoint": "", "http_status_code": 200, "status": "succeeded"}} 
METRIC: {"type": "counter", "metric": "http_request_count", "value": 1, "tags": {"stream": "users", "endpoint": ""}} 
METRIC: {"type": "timer", "metric": "sync_duration", "value": 0.5560169219970703, "tags": {"stream": "users", "context": {}, "status": "succeeded"}} 
METRIC: {"type": "counter", "metric": "record_count", "value": 2, "tags": {"stream": "users", "context": {}}} 
Using thread-based parallelism 
Target 'target-bigquery' is listening for input from tap. 
Initializing 'target-bigquery' target sink... 
Initializing target sink for stream 'users'... 
Setting up users 
Target 'target-bigquery' completed reading 4 lines of input (2 records, (0 batch manifests, 1 state messages). 
Adding worker 3486bcdbbde445e683c5c1447695f005 
google.api_core.bidi | Thread-ConsumeBidirectionalStream exiting
This hangs for at last 10min on the
google.api_core.bidi | Thread-ConsumeBidirectionalStream exiting
step before I kill the job. Any ideas for why the bigquery tap would hang like this?
Here's my config in meltano.yml:
Copy code
- name: target-bigquery
    variant: z3z1ma
    pip_url: git+<https://github.com/z3z1ma/target-bigquery.git>
    config:
      credentials_json: ${BIGQUERY_CREDENTAILS_JSON}
      project: analytics-prod-383519
      dataset: source_healthie
      flattening_enabled: True
      flattening_max_depth: 1
      location: US
a
Try running the pipeline again like this:
TARGET_BIGQUERY_DEBUG=true meltano --log-level=debug run tap-healthie target-bigquery
We should be able to figure it out pretty quick.
j
Copy code
2023-04-13T08:20:11.368555Z [info     ] grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with: cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-04-13T08:20:11.368662Z [info     ]         status = StatusCode.PERMISSION_DENIED cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2023-04-13T08:20:11.368764Z [info     ]         details = "Permission 'TABLES_UPDATE_DATA' denied on resource 'projects/analytics-prod-383519/datasets/source_healthie/tables/users': Streaming insert is not allowed in free tier." cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
🎯
I changed the
method
config to
batch_job
and the data is now loading. thanks Alex!
However, even with
flattening_enabled: True
the source data is not being flattened. Do you know why? Currently all the the data loaded into BigQuery is in a
data
column as JSON.
a
that is because of the
denormalized
setting (down a bit in this table) https://github.com/z3z1ma/target-bigquery#settings It is false by default which means the target wraps everything into a data column to support any tap regardless of schema quality or stability. Set it to true to denormalize into independent columns
j
i just realized you're a contributor to the target-bigquery package (based on this?), thank you for creating and maintaining this and for the help!! 🙇
I was able to set
denormalized: true
for 2 extractors which create typed columns explicitly, which is great. However, I tried using this target to load data from
pipelinewise-tap-pogstgres
just now and am running into this error from the bigquery tap: ```2023-04-18 161024,644 | INFO | target-bigquery | Initializing 'target-bigquery' target sink... 2023-04-18 161024,644 | INFO | target-bigquery | Initializing target sink for stream 'public-Prescription'... 2023-04-18 161025,131 | INFO | root | HERE: 2023-04-18 161025,131 | INFO | root | {'$ref': '#/definitions/sdc_recursive_number_array'} 2023-04-18 161025,131 | INFO | root | None Traceback (most recent call last): File "/Users/jacob/code/miga/analytics/etl/.meltano/loaders/target-bigquery/venv/bin/target-bigquery", line 8, in <module> sys.exit(TargetBigQuery.cli()) File "/Users/jacob/code/miga/analytics/etl/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/click/core.py", line 1130, in call return self.main(*args, **kwargs) File "/Users/jacob/code/miga/analytics/etl/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/Users/jacob/code/miga/analytics/etl/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/jacob/code/miga/analytics/etl/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/Users/jacob/code/miga/analytics/etl/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/target_base.py", line 578, in cli target.listen(file_input) File "/Users/jacob/code/miga/analytics/etl/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/io_base.py", line 34, in listen self._process_lines(file_input) File "/Users/jacob/code/miga/analytics/etl/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/target_base.py", line 278, in _process_lines counter = super()._process_lines(file_input) File "/Users/jacob/code/miga/analytics/etl/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/io_base.py", line 78, in _process_lines self._process_schema_message(line_dict) File "/Users/jacob/code/miga/analytics/etl/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/target_base.py", line 378, in _process_schema_message _ = self.get_sink( File "/Users/jacob/code/miga/analytics/etl/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/target.py", line 472, in get_sink return self.add_sink(stream_name, schema, key_properties) File "/Users/jacob/code/miga/analytics/etl/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/singer_sdk/target_base.py", line 240, in add_sink sink = sink_class( File "/Users/jacob/code/miga/analytics/etl/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/batch_job.py", line 101, in init super().__init__(*args, **kwargs) File "/Users/jacob/code/miga/analytics/etl/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/core.py", line 293, in init self.create_target(key_properties=key_properties) File "/Users/jacob/code/miga/analytics/etl/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/tenacity/__init__.py", line 289, in wrapped_f return self(f, *args, **kw) File "/Users/jacob/code/miga/analytics/etl/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/tenacity/__init__.py", line 379, in call do = self.iter(retry_state=retry_state) File "/Users/jacob/code/miga/analytics/etl/.meltano…
can you also explain what the
flattening_enabled
config does? i assumed flatten did what
denormalized
did prior to this thread..