Hi What is the expected behavior when there is a schema chan Meltano #troubleshooting

Hi, What is the expected behavior when there is a ...

joe_bossalini

08/25/2021, 8:58 PM

Hi, What is the expected behavior when there is a schema change for a table (added field) being sent to

target-bigquery

? According to this thread, it should handle the schema change, but I am getting

CRITICAL 400 Provided Schema does not match Table hubspot_prod.contacts. Cannot add fields (field: property_hs_date_entered_customer)

. I’m using

tap-hubspot

and it worked earlier this morning (I think Hubspot must have added new fields midday?) and I even ran

meltano install loaders

to make sure I had the latest BigQuery loader. I also have the loader set to

truncate

. The

select

in my

meltano.yml

is set to

contacts.*

and I’m hoping I don’t need to use a catalog that fixes the schema in time. Thanks! https://meltano.slack.com/archives/CMN8HELB0/p1622053388058800?thread_ts=1622044438.058200&cid=CMN8HELB0

ruslan_bergenov

08/25/2021, 9:03 PM

Hi @joe_bossalini, unfortunately target-bigquery doesn't support handling schema changes like adding a new field. The expected behavior is that it breaks with the error above. A workaround could be to delete the old table and do backfill (I understand that this is inconvenient and inefficient).

ruslan_bergenov

08/25/2021, 9:07 PM

Adswerve considered adding this feature to target-bigquery, but chose not to. Our rationale is in this thread in singer.io Slack workspace

joe_bossalini

08/25/2021, 9:12 PM

Ah! Thank you for the quick response! Not what I had hoped to hear, but I’ll move forward with either manually adding the field to BQ or configure meltano to use a fixed schema. Thanks again!

ruslan_bergenov

08/25/2021, 9:15 PM

Sorry for the inconvenience @joe_bossalini

joe_bossalini

08/25/2021, 9:16 PM

no worries!

aaronsteers

08/25/2021, 9:36 PM

Thanks for sharing this thread. I will say that this is atypical amongst database targets. Given an expectation that schema almost never changes, and that human cycles are available to remedy, I could see the argument here from the other side. But I think the expectation should be that taps will frequently add columns to their schemas, and we should strive for the wider EL community (even beyond Meltano/Singer) to treat column additions as a non-breaking change. For the time being, I agree the workarounds should reasonable, but this target would likely be non-viable for a source like Salesforce which has user-driven schema changes constantly. I'll reach out with a new issue in the repo to see if we can make a case to reconsider. Cc @amanda.folson @taylor

aaronsteers

08/25/2021, 9:44 PM

@ruslan_bergenov - Do you know if there are significant performance hits to adding new columns in bigquery? or is more related to complexity? (Many databases can almost instantly make column additions but I don't know if the same applies to bigquery.)

joe_bossalini

08/25/2021, 9:44 PM

@ruslan_bergenov I just noticed that Meltano was not running with the latest version of the target! the Discovery catalog is hardcoded with an old version

0.10.2

. https://gitlab.com/meltano/meltano/blob/master/src/meltano/core/bundle/discovery.yml#L1558 I removed the version number from my

meltano.yml

and re-ran the loader install and now presumably have the most recent version. I’ve not done anything other than that and the load is now progressing! ~~(It looks like it has dropped that table and is re-loading from scratch, but I can live with that.)~~ It is reloading fresh, but maybe that is a side-effect of reinstalling the plugin.

ruslan_bergenov

08/25/2021, 9:48 PM

@aaronsteers, I'm not sure. I don't think there should be a significant performance hit. We already handled this situation (adding new columns, changing schema) in our other code, separate from target-bigquery, which we didn't open source. If this feature is very important to you to have in target-bigquery, let me pitch it to our senior engineers again to see if we should add it to target-bigquery roadmap (I can't promise anything though, as it depends on our general workload).

ruslan_bergenov

08/25/2021, 9:50 PM

@joe_bossalini, fingers crossed it works now. Yes, it should load the latest release now. You can also point to the latest explicitly: https://github.com/adswerve/target-bigquery/releases/tag/0.11.1

ruslan_bergenov

08/25/2021, 10:21 PM

@joe_bossalini, when you run your sync, do you pass a tap-catalog file? Or do you do discovery and generate a new catalog file each time your run your sync?

joe_bossalini

08/25/2021, 10:25 PM

Not passing a tap-catalog. I’m thinking that is the route I should take, and then handle any column changes manually

ruslan_bergenov

08/25/2021, 10:26 PM

We recommend passing a tap-catalog file, However, it's up to you, there's no right or wrong. We personally pass tap-catalog, as it helps us prevent changes in schema (because our tap-catalog file governs the schema). However, for some taps it doesn't work (their schema changes and it doesn't "listen" to what we said in tap-catalog).

ruslan_bergenov

09/02/2021, 8:18 PM

@aaronsteers, @joe_bossalini We were previously reluctant about adding a new feature to Adswerve target-bigquery: being able to add a new column during the append operation. We discussed this feature again, and we're willing to meet you halfway through, because we understand that this feature is important for you. However, we don't recommend supporting the following operations: • Appending or deleting a new field which is nested (we do have concerns about performance) • Deleting any field/column, whether at the top or nested level (this might compromise the integrity of data) • Changing a data type However, we're willing to support adding a new column at the top level during the append operation. We can do it with a table patch method. The steps would be: 1. compare destination table schema and the new data schema 2. patch the destination table, add the new column at the top level only Question: If we were to support adding a new column at the top level, would that solve about 80% of your issues with taps changing their schema? Potential next step: We could build this feature ourselves, we did some preliminary POC development.

ruslan_bergenov

09/03/2021, 4:52 PM

My coworker just pointed out that if wedo end up adding columns in target-bigquery, this might be an easier than checking schemas and sending a patch request:

Copy code

job_config.schema_update_options=[bigquery.SchemaUpdateOption.ALLOW_FIELD_ADDITION]

ruslan_bergenov

09/03/2021, 6:09 PM

we were able to add a new column during the append operation in our POC tests, but only at the top level

ruslan_bergenov

12/20/2021, 10:11 PM

@joe_bossalini, @aaronsteers, sorry for the delay, wanted to follow up on a target-bigquery feature that you requested: ability to add a column during the append operation. In the example below, we were able to add the

addresses

field, but we were not able to add something like

addresses.description

to the existing

addresses

field. Would it work for you if we supported adding a column at the top level (not the nested sub-fields)? Would this cover most of your use cases?

ruslan_bergenov

12/20/2021, 10:11 PM

For example, here's the destination table schema before appending

ruslan_bergenov

12/20/2021, 10:16 PM

Here's the destination table schema after appending

Open in Slack

Previous Next