https://meltano.com/ logo
#announcements
Title
# announcements
b

bulky-park-65916

01/22/2021, 3:25 PM
Hi! Yesterday one of my taps (tap-facebook) crashed with the following error: Main error:
Copy code
[2021-01-22 09:21:53,814] {bash_operator.py:157} INFO - target-bigquery | CRITICAL 400 Provided Schema does not match Table ads_insights_platform_and_device. Field impression_device has changed mode from NULLABLE to REQUIRED
[2021-01-22 09:21:53,972] {bash_operator.py:157} INFO - ELT could not be completed: Target failed
Did the schema change from FB's side, or is it a Meltano related issue? Why is NULLABLE to REQUIRED causing an issue? Additonal info: Log:
Copy code
[2021-01-22 09:21:53,814] {bash_operator.py:157} INFO - target-bigquery | CRITICAL 400 Provided Schema does not match Table ads_insights_platform_and_device. Field impression_device has changed mode from NULLABLE to REQUIRED
[2021-01-22 09:21:53,817] {bash_operator.py:157} INFO - target-bigquery | CRITICAL ['Traceback (most recent call last):\n', '  File "/projects/.meltano/loaders/target-bigquery/venv/lib/python3.6/site-packages/target_bigquery/__init__.py", line 93, in main\n    for state in state_iterator:\n', '  File "/projects/.meltano/loaders/target-bigquery/venv/lib/python3.6/site-packages/target_bigquery/process.py", line 63, in process\n    for s in handler.on_stream_end():\n', '  File "/projects/.meltano/loaders/target-bigquery/venv/lib/python3.6/site-packages/target_bigquery/processhandler.py", line 260, in on_stream_end\n    self._do_temp_table_based_load(rows)\n', '  File "/projects/.meltano/loaders/target-bigquery/venv/lib/python3.6/site-packages/target_bigquery/processhandler.py", line 168, in _do_temp_table_based_load\n    raise e\n', '  File "/projects/.meltano/loaders/target-bigquery/venv/lib/python3.6/site-packages/target_bigquery/processhandler.py", line 160, in _do_temp_table_based_load\n    job_config=copy_config\n', '  File "/projects/.meltano/loaders/target-bigquery/venv/lib/python3.6/site-packages/google/cloud/bigquery/job.py", line 812, in result\n    return super(_AsyncJob, self).result(timeout=timeout)\n', '  File "/projects/.meltano/loaders/target-bigquery/venv/lib/python3.6/site-packages/google/api_core/future/polling.py", line 130, in result\n    raise self._exception\n', 'google.api_core.exceptions.BadRequest: 400 Provided Schema does not match Table digitalabi:abi.ads_insights_platform_and_device. Field impression_device has changed mode from NULLABLE to REQUIRED\n']
[2021-01-22 09:21:53,953] {bash_operator.py:157} INFO - meltano         | Loading failed (2): CRITICAL ['Traceback (most recent call last):\n', '  File "/projects/.meltano/loaders/target-bigquery/venv/lib/python3.6/site-packages/target_bigquery/__init__.py", line 93, in main\n    for state in state_iterator:\n', '  File "/projects/.meltano/loaders/target-bigquery/venv/lib/python3.6/site-packages/target_bigquery/process.py", line 63, in process\n    for s in handler.on_stream_end():\n', '  File "/projects/.meltano/loaders/target-bigquery/venv/lib/python3.6/site-packages/target_bigquery/processhandler.py", line 260, in on_stream_end\n    self._do_temp_table_based_load(rows)\n', '  File "/projects/.meltano/loaders/target-bigquery/venv/lib/python3.6/site-packages/target_bigquery/processhandler.py", line 168, in _do_temp_table_based_load\n    raise e\n', '  File "/projects/.meltano/loaders/target-bigquery/venv/lib/python3.6/site-packages/target_bigquery/processhandler.py", line 160, in _do_temp_table_based_load\n    job_config=copy_config\n', '  File "/projects/.meltano/loaders/target-bigquery/venv/lib/python3.6/site-packages/google/cloud/bigquery/job.py", line 812, in result\n    return super(_AsyncJob, self).result(timeout=timeout)\n', '  File "/projects/.meltano/loaders/target-bigquery/venv/lib/python3.6/site-packages/google/api_core/future/polling.py", line 130, in result\n    raise self._exception\n', 'google.api_core.exceptions.BadRequest: 400 Provided Schema does not match Table digitalabi:abi.ads_insights_platform_and_device. Field impression_device has changed mode from NULLABLE to REQUIRED\n']
[2021-01-22 09:21:53,972] {bash_operator.py:157} INFO - ELT could not be completed: Target failed
[2021-01-22 09:21:54,345] {bash_operator.py:161} INFO - Command exited with return code 1
[2021-01-22 09:21:54,386] {taskinstance.py:1150} ERROR - Bash command failed
Traceback (most recent call last):
  File "/projects/.meltano/orchestrators/airflow/venv/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/projects/.meltano/orchestrators/airflow/venv/lib/python3.6/site-packages/airflow/operators/bash_operator.py", line 165, in execute
    raise AirflowException("Bash command failed")
airflow.exceptions.AirflowException: Bash command failed
r

ripe-musician-59933

01/22/2021, 4:15 PM
@bulky-park-65916 tap-facebook still lists the type of the
impression_device
property as
["null", "string"]
(https://gitlab.com/meltano/tap-facebook/-/blob/master/tap_facebook/schemas/ads_insights_platform_and_device.json#L17-22), so I'm not sure where target-bigquery is now getting the idea from that that field should be required rather than nullable. 😕
If you run in debug mode (with
meltano --log-level=debug
), can you share the
SCHEMA
message for the
ads_insights_platform_and_device
stream that includes the
impression_device
property? I'd like to see if the
type
there still says
["null", "string"]
b

bulky-park-65916

01/22/2021, 4:54 PM
The SCHEMA message is returned at the end of the tap, right?
r

ripe-musician-59933

01/22/2021, 4:55 PM
SCHEMA messages need to come before RECORD messages for the same stream, so it's usually one of the first messages
The tap typically either outputs all SCHEMA messages for all streams in one go, or it goes stream-by-stream outputting a SCHEMA message followed by a bunch of RECORD messages, before it moves on to the next stream with its SCHEMA message, etc
b

bulky-park-65916

01/22/2021, 5:01 PM
Got it, OK, will share once the tap is complete
👍 1
image.png
r

ripe-musician-59933

01/22/2021, 5:53 PM
All right, that looks correct, so the question becomes: why does target-bigquery now think the field is no longer nullable?
What
pip_url
are you using for
target-bigquery
? Have you pinned a specific version or are you using the latest commit of https://github.com/adswerve/target-bigquery?
b

bulky-park-65916

01/22/2021, 5:57 PM
r

ripe-musician-59933

01/22/2021, 5:58 PM
OK, then I think target-bigquery broke in one of the recent commits to master: https://github.com/adswerve/target-bigquery/commits/master
I suggest pinning the plugin to the most recently released version, v0.10.2: https://github.com/adswerve/target-bigquery/releases/tag/v0.10.2
b

bulky-park-65916

01/22/2021, 5:58 PM
Hm, weird. It broke exactly yesterday
It's running on an hourly basis
r

ripe-musician-59933

01/22/2021, 5:59 PM
@bulky-park-65916 Did
meltano install
run again recently?
b

bulky-park-65916

01/22/2021, 5:59 PM
Nope
r

ripe-musician-59933

01/22/2021, 5:59 PM
😕
b

bulky-park-65916

01/22/2021, 5:59 PM
For 3 months it was stable
No new installs
r

ripe-musician-59933

01/22/2021, 5:59 PM
I have to jump on a call right now, but if you want I can help you debug over Zoom in 30 minutes?
b

bulky-park-65916

01/22/2021, 6:00 PM
Sure
Weird, it seems that it loads data now...
Hmm
I'll rerun all of my taps-facebook and get back to you if issue persists. But it seems that it's working now. Perhaps I fucked up the dataset in BigQuery yesterday, but it's weird because i've made the same changes to other tables that are loading tap-facebook's data, and only this one stopped working.
r

ripe-musician-59933

01/22/2021, 6:58 PM
Very weird 😕 The issue appears to be on the (target-)BigQuery side, but let me know if I can help debugging!
b

bulky-park-65916

01/22/2021, 7:02 PM
Sure, thanks @ripe-musician-59933
l

lemon-london-61072

01/24/2021, 4:58 PM
@bulky-park-65916 did you manually copy data from one table to another or something like that bigquery?
i did see that once, when I copied the data manually (was trying to remove duplicates)
meltano 1
b

bulky-park-65916

01/24/2021, 5:11 PM
Yup, something in this regard. Since I was optimizing the space my tables took, I made a backup table, then manipulated the original one (deleted some rows, but no changes to the schema), then appended the deleted rows from the backup. The end results was supposedly the same as it was before changes, but I don't know what exactly caused the error.
l

lemon-london-61072

01/24/2021, 6:28 PM
that is what i was thinking, in my case i copied original table and changed data and then dropped the original and renamed the copied, but even then as the copied tables is copy of original the schema should not change. Any way i am still unable to explain why the schema was changed 🙂
b

bulky-park-65916

01/25/2021, 12:39 PM
Hm, weird..
@ripe-musician-59933 It appears that when one uses PARTITION and overwrites the table it turns all fields to NULLABLE by design, which is later triggering an error either from tap-facebook(most likely) or Meltano. Nil from the group has stumbled upon the same issue. The thing is: would it be reasonable at all to change the tap-facebook's schema fields to NULLABLE? The whole point of this task is to filter out old
_time_loaded
rows, as they hold redundant and unnecessary information. I've already figured out an additional process of filtering the data and appending them to a different table, so if it's not reasonable to change schema, I'd just use this approach.
r

ripe-musician-59933

02/02/2021, 6:21 PM
The thing is: would it be reasonable at all to change the tap-facebook's schema fields to NULLABLE?
@bulky-park-65916 They already are nullable, aren't they? https://meltano.slack.com/archives/CFG3C3C66/p1611337929007500?thread_ts=1611329142.003700&cid=CFG3C3C66
target-bigquery
complains that "Field impression_device has changed mode from NULLABLE to REQUIRED", but tap-facebook has always said that it's "nullable", so I'm not sure where it's getting the idea that it was changed to "required". As far as I can see, tap-facebook is doing everything right (correctly communicating that that field could be null), while target-bigquery is getting confused and likely has a bug somewhere
b

bulky-park-65916

02/02/2021, 6:26 PM
That's weird! Then from where does BigQuery get different schema values? Here's a screenshot directly from BigQuery's table:
When I partition the table and overwrite the data, each field turns into NULLABLE and the target-bigquery stops loading the data.
Is it possible that target-bigquery somehow checks if each row has a value in a column, and if that's true it automatically assigns REQUIRED mode?
r

ripe-musician-59933

02/02/2021, 6:45 PM
That's really odd, and I don't know target-bigquery well enough to explain it 😕 Can you perhaps file an issue on https://gitlab.com/meltano/meltano describing exactly what you're doing step by step, along with how you're seeing the BQ table schema change, and when you start seeing errors? That'll help figure out where in the process it goes wrong, and what target-bigquery may be doing at that point
b

bulky-park-65916

02/02/2021, 7:04 PM
Alrighty 🙂