is storing Meltano state in my BigQuery warehouse ...
# random
n
is storing Meltano state in my BigQuery warehouse a terrible idea? tried setting database uri with bigquery://... but get a dialect not found error, even though sqlalchemy-bigquery is installed. can you point me in the right direction to fix? thx
a
The problem is not so much about storing state in BigQuery, but rather in using BigQuery as the
systemdb
, which expects an OLTP-like database. As a cloud-native massively parallel (MPP) OLAP backend, BigQuery itself does not have basic OLTP-esque capabilities. A quick google search let me to some helpful points in https://dev.to/stack-labs/brace-yourself-using-bigquery-as-an-operational-backend-3e1i:
• BigQuery is not optimized for writing, but for performing complex queries.
• Single-line inserts are discouraged.
• BigQuery does not enforce keys, foreign keys nor constraints
• BigQuery does not perform well with normalized schema, on the contrary it encourages denormalization
In theory, this could be possible. If you provide an exact error message, someone might have an idea on how to debug - and if there's a way to add BigQuery as a supported
systemdb
backend, we would accept a PR. Since Meltano uses SQLAlchemy as our OLTP abstraction layer (as you've already found), anything that works with SQLAlchemy could also be implemented in the
systemdb
implementation. If your primary goal is just to track incremental bookmark state for EL loads, and you don't care as much about other
systemdb
functions (logs, settings backend, etc.), then you could use our new State Backends feature with Google Cloud Storage or similar.
even though sqlalchemy-bigquery is installed
One follow-up here, if you'd like to keep debugging this, can you confirm that
sqlalchemy-bigquery
is installed in the same virtual env? If Meltano is installed with
pipx
, you may need to use the
inject
command to make sure the library lands into the same Python venv.
n
@aaronsteers thanks for the feedback! gonna use GCS as a fallback for now, but keeping it all in BQ would be interesting I'll try the pipx inject and let you know.
got it connecting to BQ (w pipx inject) but getting an alembic error about enums :S
was worth a shot. might be something to consider in the future tho
a
got it connecting to BQ (w pipx inject) but getting an alembic error about enums :S
Gotcha. Yeah, that is helpful and somewhat expected. Rather than 'provider not available', the next round of compat issues would be on features that BigQuery doesn't implement, or implements differently. We've had internal discussions around moving off of `enum`s, so there's probably a path forward in the future that would get around that issue. I can't say though, if it's worth the investment or if we'd run into more compat issues behind after resolving that one. It's a good discussion though and in the future I'm sure someone will search Slack and find this thread. 🙂