I m almost there but airflow is failing because the psycopg2 Meltano #infra-deployment

I'm almost there, but airflow is failing because t...

fred_reimer

10/28/2021, 4:04 PM

I'm almost there, but airflow is failing because the psycopg2 module is not installed.

Copy code

File "/project/.meltano/orchestrators/airflow/venv/lib/python3.6/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 792, in dbapi
    import psycopg2
ModuleNotFoundError: No module named 'psycopg2'

Why would it not be installed with the

meltano install

in the Dockerfile for the image?

ken_payne

10/28/2021, 4:07 PM

Airflow doesn't install with

psycopg2

by default, so best to add it to your pip url:

fred_reimer

10/28/2021, 4:10 PM

That's what I'm actually testing right now, with psycopg2-binary...

fred_reimer

10/28/2021, 4:53 PM

So that works as far as the module, but when meltano invokes airflow db init I get this:

Copy code

File "/project/.meltano/orchestrators/airflow/venv/lib/python3.6/site-packages/alembic/script/revision.py", line 805, in _iterate_related_revisions
    ", ".join(r.revision for r in overlaps),
alembic.script.revision.RevisionError: Requested revision a13f7613ad25 overlaps with other requested revisions 13eb55f81627

I have no idea what that means. The scheduler created a bunch of tables, and the web created a user table. They are using the same image, so if this is the revision of the DB schema I'm not sure why that would be different.

ken_payne

10/28/2021, 5:24 PM

Hmmm... are you using the same database for Airflow and Meltano? It may be that alembic (which is used in both projects for database change management) is fighting over the

alembic

table 🤔 Meltano initialises first, so Airflow can't win 😅

fred_reimer

10/28/2021, 6:52 PM

No, different databases. Same server, but different database names.

fred_reimer

10/29/2021, 12:35 PM

Although I'm not sure, I think it was a conflict between the web and scheduler portions of airflow. I don't know how alembic works as far as "revision." I basically deleted all tables in the DB and restarted the scheduler pod, and it did the db init and worked. There was only one revision number in the table when it was re-doing the db init. Originally there were two numbers in the revision table. Since they don't look like an actual revision (maybe a partial hash? See below) I'm assuming they are random numbers and that two processes were trying to init the DB concurrently. You'd think they would lock that out. I'll have to do some additional testing. So based on what I read in stackoverflow, it looks like the revision is a UUID4 number. Then there is this: https://github.com/sqlalchemy/alembic/issues/633 - which suggests that concurrent DB schema upgrades (or init) are not supported with alembic natively or without special handling. Not sure how to handle this at the moment, as I don't know a way of blocking one k8s pod based on the state of another. Since Meltano is invoking the airflow db init command, perhaps it can place a lock record in the meltano DB so that it does not try and run the invoke on two separate pods/processes concurrently? That is probably the cleanest method.

fred_reimer

10/29/2021, 12:36 PM

And, if we want multiple Metlano UI's concurrently for HA, which we surely would want load balanced in a production environment, does Meltano itself handle concurrent DB schema updates via alembic correctly?

Open in Slack

Previous Next