Hello, I have been working with Meltano all week a...
# troubleshooting
m
Hello, I have been working with Meltano all week and it seems like nothing seems to work to get the webserver and the scheduler to communicate, I have verified they have network connectivity, what is is that could be missing? I get the message
The scheduler does not appear to be running. Last heartbeat was received 1 week ago.
in the web ui and as you can see in the screenshot the scheduler is running.
Oh they need to share a filesystem to make them communicate? is there no way around this?
It seems that they need to share this filesystem for them to identify eachother, because once I mounted a local volume it began to work. This in production would not be preferred because I would like updates to be in the form of updated docker containers being deployed instead of managing a shared filesystem that can become corrupted
e
Hi Matthew! DAG Serialization may be what you want to decouple the webserver and scheduler
m
Ok it looks like I need to add
compress_serialized_dags = False
to the
[core]
section of
airflow.cfg
. Ill add that and rebuild my container, then deploy to test, thank you
Hey @edgar_ramirez_mondragon, I have applied this change but then looking at the diagram it appears that it relies on a metadata db, which must be either a file on the filesystem or a remote database, does this mean I require a remote database configured to utilize this
DAG Serialization
feature?
e
Yeah, that's correct. You need to use something other than sqlite as the meta db.
m
@edgar_ramirez_mondragon Do you know of any documentation of how this is done in meltano/docker? All the airflow docs I have found dont seem to cover this in good detail.
I tried setting
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
env var but thats not doing anyting, the airflow.cfg seems to just point back to sqllite
u
If you're using Airflow as a Meltano plugin, https://hub.meltano.com/utilities/airflow#database-sql_alchemy_conn-setting might help
m
I ran
meltano config airflow set database sql_alchemy_conn <postgresql://airflow_user:airflow_password@10.30.0.5/airflow_db>
which updated my
meltano.yml
file. I then try to initialize the database with
meltano invoke airflow db init
however it returns the following, which does not provide me with an error to continue debugging
Copy code
root@e081afcf159a:/project# meltano invoke airflow db init
2023-07-13T21:44:26.730369Z [info     ] Environment 'dev' is active
Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
join our friendly Slack community.

'database'
I get the exact same error in the airflow scheduler and web containers so they continuously stop/start
@edgar_ramirez_mondragon I have tested the connection string
AIRFLOW__CORE__SQL_ALCHEMY_CONN: '<postgresql://airflow_user:airflow_password@10.30.0.5/airflow_db>'
with the official airflow container and it works 100% but its not working with the meltano container.
d
@matthew_van_zanten Can you please run the failing command again with
meltano --log-level=debug [etc]
so we can see exactly where the error originates?
m
I was trying to set that in the env var
AIRFLOW__CORE__LOGGING_LEVEL: DEBUG
however that does nto seem to work
ill try your suggestion
Here are the debug logs, very verbose!
d
Looks like it’s complaining about the “database” section not existing in airflow.cfg. Can you please share your Meltano.yml definition for Airflow, including the version number and config?
m
meltano.yml
sql_alchemy_conn
is not being set in the yml but is rather being set by the environment variable
AIRFLOW__CORE__SQL_ALCHEMY_CONN: '<postgresql://airflow_user:airflow_password@10.30.0.5/airflow_db>'
d
Part of the issue may be that compress_serialized_dags is currently under “database”, rather than the correct “core”
m
ok, I have removed it without change but ill move it
Copy code
config:
      database:
        #sql_alchemy_conn: <postgresql://airflow_user:airflow_password@10.30.0.5/airflow_db>
      core:
        compress_serialized_dags: False
d
Can you comment out “database” as well?
m
yup! ill rebuild and test
I think we are online, I will review the systems
so it didnt like the database config being present but it actually put that there on its own when I configured via the meltano command
d
Yeah it seems like the doc on how to set the DB URL is incorrect, at least for the version you’re using
m
still see the message
The scheduler does not appear to be running. Last heartbeat was received 1 hour ago.
I will create a schedule and see if it updates the web server
d
If the DB is being initialized correctly, this may now be an Airflow configuration issue rather than something Meltano-specific…
p
I'd also note that we've started recommending the airflow utility https://hub.meltano.com/utilities/airflow over the airflow orchestrator. I dont see why that would affect this issue but wanted to mention it
d
Ah good point Pat, I didn’t notice that mismatch
m
seems like its not complaining but its not connected
scheduler + webserver that is
This config did work with vanilla airflow containers
d
Meltano “just” generates the Airflow config based on what’s in meltano.yml and then invokes the airflow executable directly, so for any error coming from inside Airflow I’d suggest researching that as an Airflow issue rather than a Meltano issue. The heartbeat thing may show some search results. You may also want to consider moving from the Meltano Airflow orchestrator to the utility that Pat shared, which will put an airflow.cfg in your project that you can use to configure Airflow as usual
Then Meltano’s role is really just limited to invoking the executable for you, so less room for it to interfere
m
simplicity is what im looking to achieve
d
The utility will be simpler
m
so that means no more airflow scheduler task?
d
No that part doesn’t change, but the integration layer between meltano and airflow is thinner in the case of the utility plugin (see Pat’s link above), so setting airflow up will be the same as if you weren’t using meltano, as opposed to with the current orchestrator plugin you’re using where meltano interferes a bit more in configuration
m
ok this is all being deployed as containers
Copy code
- meltano-ui
- airflow-webserver
- airflow-scheduler
so this makes reading the documentation and turning it into containers a bit tricky, I assume all these commands are for the meltano container then
d
The way the containers are set up wouldn’t change, as the Docker image uses “meltano” as the entry point and the command you pass is eg “invoke airflow scheduler”. That’s still correct. But inside your meltano.yml file I suggest swapping out the airflow orchestrator for the airflow utility, based on the doc on the Hub that Pat shared. I don’t think that’ll immediately solve your issue, but it’ll likely further rule out Meltano’s involvement as the culprit, so you can further debug the issue purely on the Airflow side (which we can assist with but are not experts in)
m
got it, ill remove that from the yml and add in the utility
I get this kind of error a lot, and it never seems to make any sense how I fix it
Copy code
Executable 'airflow_invoker' could not be found. Utility 'airflow' may not have been installed yet using `meltano install utility airflow`, or the executable name may be incorrect.
I ran the following
Copy code
docker run -v $(pwd):/projects -w /projects meltano/meltano add utility airflow
docker run -v $(pwd):/projects -w /projects meltano/meltano install utility airflow
Then built the container
Copy code
docker build . --tag meltano-mine
Then I run the latest build via compose
Copy code
docker-compose up -d
And then the error appears, it happened back when I was using airflow orchestrator but eventually with enough try, try try, try, it works with no noticable change in my process
ill delete some files and try again..
d
@pat_nadolny @edgar_ramirez_mondragon Do you know what may be going on here? The misssing airflow_invoker executable suggests a failed installation or incorrect pip_url. @matthew_van_zanten Can you please share the updated meltano.yml and Meltano version?
m
Here is the meltano.yml, I am always building with the latest meltano docker container so I am unaware of the version
Copy code
FROM meltano/meltano:latest
I cant tell you why but I deleted the folder
./.meltano\utilities
and reran
Copy code
docker run -v $(pwd):/projects -w /projects meltano/meltano add utility airflow
docker run -v $(pwd):/projects -w /projects meltano/meltano install utility airflow
Then again rebuilt the container and its working now. This is what I mean by having to try again until it magically works
It does not seem to respect my
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
environment variable, it still defaults to
SQLite
. When we put the db config directly into meltano.yml I get the following errors. (Containers continuously refreshing)
Copy code
ModuleNotFoundError: No module named 'psycopg2' cmd=airflow --help stdio_stream=stderr
Looks like the key error
By my understanding this package would have to be installed during the
meltano install utility airflow
command when its downloading packages in its
venv
d
Can you add psycopg2 to the pip_url in meltano.yml, and reinstall? It’s not included with airflow by default
e
You might wanna make the dependency
psycopg2-binary
to avoid having to build it in the container
m
Giving this a go!
I think its up and running with that last change, setup my airflow password after it finished its
db init
. Now I can test it out! Thank you!
d
Awesome, glad we could help!
m
Hey guys, me again, I have setup the ECS system now with a meltano database with
MELTANO_DATABASE_URI
env variable, airflow database with the
AIRFLOW_DATABASE_SQL_ALCHEMY_CONN
env variable, and finally
AIRFLOW__CORE__COMPRESS_SERIALIZED_DAGS
set to
false
like we discussed above. All systems fire up without errors however they do not seem to be registering jobs or schedules when I run the job and schedule creation commands. This is what I run on the scheduler:
Copy code
root@4dfc79fff9de:/project# meltano job add tap-stackoverflow-sampledata-to-target-jsonl --tasks "tap-stackoverflow-sampledata target-jsonl"
2023-07-18T19:16:42.481438Z [info     ] The default environment 'dev' will be ignored for `meltano job`. To configure a specific environment, please use the option `--environment=<environment name>`.
Added job tap-stackoverflow-sampledata-to-target-jsonl: ['tap-stackoverflow-sampledata target-jsonl']
root@4dfc79fff9de:/project# meltano schedule add tap-stackoverflow-sampledata-to-target-jsonl --extractor tap-stackoverflow-sampledata --loader target-jsonl --transform run --interval "@daily"
2023-07-18T19:16:50.613843Z [info     ] The default environment 'dev' will be ignored for `meltano schedule`. To configure a specific environment, please use the option `--environment=<environment name>`.
/venv/lib/python3.9/site-packages/meltano/core/settings_service.py:445: RuntimeWarning: Unknown setting 'start_date' - the default value `None` will be used
  value, metadata = self.get_with_metadata(*args, **kwargs)
Scheduled elt 'tap-stackoverflow-sampledata-to-target-jsonl' at @daily
And then when I go to airflow I do not get any jobs or schedules appearing. What is is that I need to do? (this worked when it was using the SQLite) just not working with postgres as the backend database.
d
@matthew_van_zanten Do the Airflow scheduler logs say anything about being able to load from the DAG bag?
The schedules are only stored in meltano.yml, so make sure your Airflow Docker image/container has the latest version of that
m
Here is my full airflow scheduler log
```2023-07-18 121325 2023-07-18T191325.441961Z [debug ] /etc/timezone found, contents: 2023-07-18 121325 Etc/UTC 2023-07-18 121325 2023-07-18 121325 2023-07-18T191325.442586Z [debug ] /etc/localtime found 2023-07-18 121325 2023-07-18T191325.443599Z [debug ] 2 found: 2023-07-18 121325 {'/etc/timezone': 'Etc/UTC', '/etc/localtime is a symlink to': 'Etc/UTC'} 2023-07-18 121325 2023-07-18T191325.448623Z [info ] Environment 'dev' is active 2023-07-18 121325 2023-07-18T191325.651772Z [debug ] Creating engine '<meltano.core.project.Project object at 0x7f7d0e529c40>@sqlite:////project/.meltano/meltano.db' 2023-07-18 121325 2023-07-18T191325.716633Z [debug ] Found plugin parent parent=airflow plugin=airflow source=<DefinitionSource.LOCKFILE: 8> 2023-07-18 121325 2023-07-18T191325.939589Z [debug ] Invoking: ['/project/.meltano/utilities/airflow/venv/bin/airflow_invoker', 'scheduler'] 2023-07-18 121331 ____________ _____________ 2023-07-18 121331 __ |__( )_________ / /________ __ 2023-07-18 121331 __ /| |_ /__ ___/_ /_ _ / _ \ | /| / / 2023-07-18 121331 _ _ | / _ / _ __/ _ / / /_/ /_ |/ |/ / 2023-07-18 121331 _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/ 2023-07-18 121331 [2023-07-18 191331,807] {scheduler_job.py:708} INFO - Starting the scheduler 2023-07-18 121331 [2023-07-18 191331,808] {scheduler_job.py:713} INFO - Processing each file at most -1 times 2023-07-18 121331 [2023-07-18 191331 +0000] [13] [INFO] Starting gunicorn 20.1.0 2023-07-18 121331 [2023-07-18 191331 +0000] [13] [INFO] Listening at: http://0.0.0.0:8793 (13) 2023-07-18 121331 [2023-07-18 191331 +0000] [13] [INFO] Using worker: sync 2023-07-18 121331 [2023-07-18 191331 +0000] [14] [INFO] Booting worker with pid: 14 2023-07-18 121331 [2023-07-18 191331,828] {executor_loader.py:105} INFO - Loaded executor: LocalExecutor 2023-07-18 121331 [2023-07-18 191331 +0000] [87] [INFO] Booting worker with pid: 87 2023-07-18 121331 [2023-07-18 191331,904] {manager.py:160} INFO - Launched DagFileProcessorManager with pid: 113 2023-07-18 121331 [2023-07-18 191331,905] {scheduler_job.py:1233} INFO - Resetting orphaned tasks for active dag runs 2023-07-18 121331 [2023-07-18 191331,941] {settings.py:55} INFO - Configured default timezone Timezone('UTC') 2023-07-18 121831 [2023-07-18 191831,969] {scheduler_job.py:1233} INFO - Resetting orphaned tasks for active dag runs 2023-07-18 121831 [2023-07-18 191831,973] {scheduler_job.py:1256} INFO - Marked 1 SchedulerJob instances as failed 2023-07-18 122332 [2023-07-18 192332,007] {scheduler_job.py:1233} INFO - Resetting orphaned tasks for active dag runs 2023-07-18 122400 [2023-07-18 192400,027] {serve_logs.py:57} WARNING - The Authorization header is missing: Host: localhost:8793 2023-07-18 122400 Connection: keep-alive 2023-07-18 122400 Sec-Ch-Ua: "Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114" 2023-07-18 122400 Sec-Ch-Ua-Mobile: ?0 2023-07-18 122400 Sec-Ch-Ua-Platform: "Windows" 2023-07-18 122400 Upgrade-Insecure-Requests: 1 2023-07-18 122400 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 2023-07-18 122400 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7 2023-07-18 122400 Sec-Fetch-Site: none 2023-07-18 122400 Sec-Fetch-Mode: navigate 2023-07-18 122400 Sec-Fetch-User: ?1 2023-07-18 122400 Sec-Fetch-Dest: document 2023-07-18 122400 Accept-Encoding: gzip, deflate, br 2023-07-18 122400 Accept-Language: en-US,en;q=0.9 2023-07-18 122400 Cookie: session=19a694ea-b854-4538-a2eb-9e8352b53c02.jEqxqmJockKcwtl9Dg97HChac0U 2023-07-18 122400 2023-07-18 122400 . 2023-07-18 122400 [2023-07-18 192400,028] {serve_logs.py:101} WARNING -…
The warning was when I tried to hit the server by browser
I dont get any message in the logs about
dag bag
and I can confirm that all 3 running containers have the latest version of the
meltano.yml
, they all use the same image
I am able to login to the airflow system from the user I created in the database so the connection should be ok
d
I see
{manager.py:160} INFO - Launched DagFileProcessorManager with pid: 113
in the logs, are you able to find hte logs for the dag_file_processor? I believe they may be under
.meltano/utilities/airflow/logs
m
the
dag_file_processor
log says this
Copy code
[2023-07-18 19:53:20,599] {manager.py:480} INFO - Processing files using up to 2 processes at a time 
[2023-07-18 19:53:20,599] {manager.py:481} INFO - Process each file at most once every 30 seconds
[2023-07-18 19:53:20,599] {manager.py:482} INFO - Checking for new files in /project/orchestrate/airflow/dags every 300 seconds
[2023-07-18 19:53:20,599] {manager.py:690} INFO - Searching for files in /project/orchestrate/airflow/dags
[2023-07-18 19:53:20,600] {manager.py:693} INFO - There are 0 files in /project/orchestrate/airflow/dags
[2023-07-18 19:58:21,213] {manager.py:690} INFO - Searching for files in /project/orchestrate/airflow/dags
[2023-07-18 19:58:21,213] {manager.py:693} INFO - There are 0 files in /project/orchestrate/airflow/dag
And the folder
/project/orchestrate/airflow/dags
does not exist, I have a folder called
/project/orchestrate/dags
with the following files
Copy code
root@da7e06b353e8:/project/orchestrate/dags# ls
 __pycache__  'meltano (files-airflow).py'   meltano.py
So it seems like the project is pointed to the wrong folder?
I did not override that value IIRC
d
Ah, I think that’s because of the switch from the airflow orchestrator to utility -- when I set up a new project with utility airflow a few weeks ago it worked out of the box. Can you move it into
airflow/dags
, and keep only one of the 2
.py
files (the most recently created one)?
m
got it! will do
ok this looks good, Ill promote to our aws env and watch! thanks!
Ok so now that I have put the project into the pipeline theres no way around the error
Copy code
Executable 'airflow_invoker' could not be found. Utility 'airflow' may not have been installed yet using `meltano install utility airflow`, or the executable name may be incorrect.
What I am doing is: 1. Pull the Meltano code from the repo 2. pip install --upgrade pip 3. pip install "meltano" 4. meltano install
Copy code
Installing 3 plugins...
Installing extractor 'tap-stackoverflow-sampledata'...
Installing loader 'target-jsonl'...
Installed loader 'target-jsonl'
Installing utility 'airflow'...
Installed extractor 'tap-stackoverflow-sampledata'
Installed utility 'airflow'
Installed 3/3 plugins
And then finally I run docker build and upload to ECR. No matter how many times I run this build the container when it gets deployed gets that error message
Copy code
Executable 'airflow_invoker' could not be found. Utility 'airflow' may not have been installed yet using `meltano install utility airflow`, or the executable name may be incorrect.
Its clearly there, I can see it in the filesystem, but why does it not think its there when the container starts?
Copy code
#14 [9/9] RUN ls -la .meltano/utilities/airflow/venv/bin
#14 0.252 total 228
#14 0.252 drwxr-xr-x 3 root root 4096 Jul 18 22:57 .
#14 0.252 drwxr-xr-x 5 root root 4096 Jul 18 22:57 ..
#14 0.252 drwxr-xr-x 2 root root 4096 Jul 18 22:57 __pycache__
#14 0.252 -rw-r--r-- 1 root root 2285 Jul 18 22:56 activate
#14 0.252 -rwxr-xr-x 1 root root 3548 Jul 18 22:56 activate-global-python-argcomplete
#14 0.252 -rw-r--r-- 1 root root 1552 Jul 18 22:56 activate.csh
#14 0.252 -rw-r--r-- 1 root root 3115 Jul 18 22:56 activate.fish
#14 0.252 -rw-r--r-- 1 root root 2800 Jul 18 22:56 <http://activate.nu|activate.nu>
#14 0.252 -rw-r--r-- 1 root root 1650 Jul 18 22:56 activate.ps1
#14 0.252 -rw-r--r-- 1 root root 1337 Jul 18 22:56 activate_this.py
#14 0.252 -rwxr-xr-x 1 root root  291 Jul 18 22:56 airflow
#14 0.252 -rwxr-xr-x 1 root root  289 Jul 18 22:56 airflow_extension
#14 0.252 -rwxr-xr-x 1 root root  323 Jul 18 22:56 airflow_invoker
#14 0.252 -rwxr-xr-x 1 root root  289 Jul 18 22:56 alembic
#14 0.252 -rwxr-xr-x 1 root root  291 Jul 18 22:56 cmark
#14 0.252 -rwxr-xr-x 1 root root  288 Jul 18 22:56 connexion
#14 0.252 -rwxr-xr-x 1 root root  292 Jul 18 22:56 docutils
#14 0.252 -rwxr-xr-x 1 root root  290 Jul 18 22:56 email_validator
#14 0.252 -rwxr-xr-x 1 root root  297 Jul 18 22:56 fabmanager
#14 0.252 -rwxr-xr-x 1 root root  284 Jul 18 22:56 flask
#14 0.252 -rwxr-xr-x 1 root root 1727 Jul 18 22:56 get_objgraph
#14 0.252 -rwxr-xr-x 1 root root  293 Jul 18 22:56 gunicorn
#14 0.252 -rwxr-xr-x 1 root root  280 Jul 18 22:56 httpx
#14 0.252 -rwxr-xr-x 1 root root  289 Jul 18 22:56 jsonschema
#14 0.252 -rwxr-xr-x 1 root root  289 Jul 18 22:56 mako-render
#14 0.252 -rwxr-xr-x 1 root root  296 Jul 18 22:56 markdown-it
#14 0.252 -rwxr-xr-x 1 root root  290 Jul 18 22:56 markdown_py
#14 0.252 -rwxr-xr-x 1 root root  320 Jul 18 22:56 normalizer
#14 0.252 -rwxr-xr-x 1 root root  291 Jul 18 22:56 nvd3
#14 0.252 -rwxr-xr-x 1 root root  297 Jul 18 22:56 pip
#14 0.252 -rwxr-xr-x 1 root root  297 Jul 18 22:56 pip3
#14 0.252 -rwxr-xr-x 1 root root  297 Jul 18 22:56 pip3.10
#14 0.252 -rwxr-xr-x 1 root root  298 Jul 18 22:56 pybabel
#14 0.252 -rwxr-xr-x 1 root root  291 Jul 18 22:56 pygmentize
#14 0.252 lrwxrwxrwx 1 root root   16 Jul 18 22:56 python -> /usr/bin/python3
#14 0.252 -rwxr-xr-x 1 root root 2631 Jul 18 22:56 python-argcomplete-check-easy-install-script
#14 0.252 -rwxr-xr-x 1 root root  383 Jul 18 22:56 python-argcomplete-tcsh
#14 0.252 lrwxrwxrwx 1 root root    6 Jul 18 22:56 python3 -> python
#14 0.252 lrwxrwxrwx 1 root root    6 Jul 18 22:56 python3.10 -> python
#14 0.252 -rwxr-xr-x 1 root root 1993 Jul 18 22:56 register-python-argcomplete
#14 0.252 -rwxr-xr-x 1 root root  668 Jul 18 22:56 rst2html.py
#14 0.252 -rwxr-xr-x 1 root root  790 Jul 18 22:56 rst2html4.py
#14 0.252 -rwxr-xr-x 1 root root 1135 Jul 18 22:56 rst2html5.py
#14 0.252 -rwxr-xr-x 1 root root  867 Jul 18 22:56 rst2latex.py
#14 0.252 -rwxr-xr-x 1 root root  690 Jul 18 22:56 rst2man.py
#14 0.252 -rwxr-xr-x 1 root root  856 Jul 18 22:56 rst2odt.py
#14 0.252 -rwxr-xr-x 1 root root 1794 Jul 18 22:56 rst2odt_prepstyles.py
#14 0.252 -rwxr-xr-x 1 root root  675 Jul 18 22:56 rst2pseudoxml.py
#14 0.252 -rwxr-xr-x 1 root root  711 Jul 18 22:56 rst2s5.py
#14 0.252 -rwxr-xr-x 1 root root  947 Jul 18 22:56 rst2xetex.py
#14 0.252 -rwxr-xr-x 1 root root  676 Jul 18 22:56 rst2xml.py
#14 0.252 -rwxr-xr-x 1 root root  744 Jul 18 22:56 rstpep2html.py
#14 0.252 -rwxr-xr-x 1 root root  291 Jul 18 22:56 slugify
#14 0.252 -rwxr-xr-x 1 root root  292 Jul 18 22:56 sqlformat
#14 0.252 -rwxr-xr-x 1 root root  285 Jul 18 22:56 tabulate
#14 0.252 -rwxr-xr-x 1 root root  663 Jul 18 22:56 undill
#14 0.252 -rwxr-xr-x 1 root root  284 Jul 18 22:56 wheel
#14 0.252 -rwxr-xr-x 1 root root  284 Jul 18 22:56 wheel-3.10
#14 0.252 -rwxr-xr-x 1 root root  284 Jul 18 22:56 wheel3
#14 0.252 -rwxr-xr-x 1 root root  284 Jul 18 22:56 wheel3.10
d
Ah -
meltano install
needs to happen inside the container, because otherwise it’ll link the executables to the Python executable on your host system rather than the one that exists inside the container.
m
ok then I can run that on the Dockerfile!
Based on the build output I am seeing more what I expect to see, this is making lots of sense
🤞
works! thank you again!