Hi, I am unfortunately struggling with something I...
# troubleshooting
r
Hi, I am unfortunately struggling with something I thought I had sorted out. I have a new Meltano installation on a k8s cluster, it is being built from
meltano/meltano:v2.16.1-python3.8
I am using several taps and only target-snowflake as a loader. Meltano version:
version 2.16.1
Airflow is installed with the following config:
Copy code
orchestrators:
  - name: airflow
    pip_url: psycopg2 apache-airflow==2.3.2 --constraint <https://raw.githubusercontent.com/apache/airflow/constraints-2.3.2/constraints-3.8.txt>
  files:
  - name: airflow
    pip_url: git+<https://github.com/meltano/files-airflow.git>
And results in installing Airflow v2.3.2 - when I try to run
meltano invoke airflow dags list-import-errors
I get:
Copy code
/projects/orchestrate/dags/meltano.py | Traceback (most recent call last):
                                      |   File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
                                      |   File "/projects/orchestrate/dags/meltano.py", line 53, in <module>
                                      |     <http://logger.info|logger.info>(f"Considering schedule '{schedule['name']}': {schedule}")
                                      | TypeError: string indices must be integers
Worth noting that I have also created a completely new CloudSQL instance with empty
airflow
and
meltano-dbs
. Happy to get any sort of insight into this - maybe the 2.3.2 version of Airflow won’t work with Meltano 2.16.1? What am I doing wrong?
When I run
meltano invoke airflow scheduler
I get the following:
Copy code
____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
[2023-04-08 12:03:03,499] {scheduler_job.py:696} INFO - Starting the scheduler
[2023-04-08 12:03:03,500] {scheduler_job.py:701} INFO - Processing each file at most -1 times
[2023-04-08 12:03:03 +0000] [4366] [INFO] Starting gunicorn 20.1.0
[2023-04-08 12:03:03 +0000] [4366] [ERROR] Connection in use: ('0.0.0.0', 8793)
[2023-04-08 12:03:03 +0000] [4366] [ERROR] Retrying in 1 second.
[2023-04-08 12:03:03,511] {executor_loader.py:105} INFO - Loaded executor: LocalExecutor
[2023-04-08 12:03:03,691] {manager.py:160} INFO - Launched DagFileProcessorManager with pid: 4502
[2023-04-08 12:03:03,693] {scheduler_job.py:1221} INFO - Resetting orphaned tasks for active dag runs
[2023-04-08 12:03:03,702] {settings.py:55} INFO - Configured default timezone Timezone('UTC')
[2023-04-08 12:03:04 +0000] [4366] [ERROR] Connection in use: ('0.0.0.0', 8793)
[2023-04-08 12:03:04 +0000] [4366] [ERROR] Retrying in 1 second.
[2023-04-08 12:03:05 +0000] [4366] [ERROR] Connection in use: ('0.0.0.0', 8793)
[2023-04-08 12:03:05 +0000] [4366] [ERROR] Retrying in 1 second.
[2023-04-08 12:03:06 +0000] [4366] [ERROR] Connection in use: ('0.0.0.0', 8793)
[2023-04-08 12:03:06 +0000] [4366] [ERROR] Retrying in 1 second.
[2023-04-08 12:03:07 +0000] [4366] [ERROR] Connection in use: ('0.0.0.0', 8793)
[2023-04-08 12:03:07 +0000] [4366] [ERROR] Retrying in 1 second.
[2023-04-08 12:03:08 +0000] [4366] [ERROR] Can't connect to ('0.0.0.0', 8793)
The airflow db is initiated already (I ran it manually too, just in case):
meltano invoke airflow db init
OK, now I somehow managed to see the list of dags, but I also get the error at the same time:
Copy code
Error: Failed to load all files. For details, run `airflow dags list-import-errors`
dag_id                                           | filepath                   | owner   | paused
=================================================+============================+=========+=======
meltano_adaptive-to-snowflake                    | meltano (files-airflow).py | airflow | False
meltano_xactly-to-snowflake                      | meltano (files-airflow).py | airflow | False
meltano_zendesk-community-relations-to-snowflake | meltano (files-airflow).py | airflow | False
meltano_zendesk-to-snowflake                     | meltano (files-airflow).py | airflow | False
meltano_zengrc-to-snowflake                      | meltano (files-airflow).py | airflow | False
Interesting 🤔
This is how the schedule definition looks like:
Copy code
schedules:
- name: xactly-to-snowflake
  interval: 0 5 * * *
  extractor: tap-xactly
  loader: target-snowflake--xactly
  transform: skip
  start_date: 2021-07-13
- name: zengrc-to-snowflake
  interval: 0 8 * * *
  extractor: tap-zengrc
  loader: target-snowflake--zengrc
  transform: skip
  start_date: 2021-07-13
- name: adaptive-to-snowflake
  interval: 0 4 * * *
  extractor: tap-adaptive
  loader: target-snowflake--adaptive
  transform: skip
  start_date: 2021-10-01
- name: zendesk-to-snowflake
  interval: 0 4 * * *
  extractor: tap-zendesk
  loader: target-snowflake--zendesk
  transform: skip
  start_date: 2012-12-30 00:00:00
- name: zendesk-community-relations-to-snowflake
  interval: 0 5 * * *
  extractor: tap-zendesk--community-relations
  loader: target-snowflake--zendesk-community-relations
  transform: skip
  start_date: 2017-01-01 00:00:00
More details, running:
meltano invoke airflow dags next-execution meltano_zendesk-to-snowflake
I get:
Copy code
[2023-04-08 17:02:04,015] {dagbag.py:507} INFO - Filling up the DagBag from /projects/orchestrate/dags
[2023-04-08 17:02:05,713] {dagbag.py:320} ERROR - Failed to import: /projects/orchestrate/dags/meltano.py
Traceback (most recent call last):
  File "/projects/.meltano/orchestrators/airflow/venv/lib/python3.8/site-packages/airflow/models/dagbag.py", line 317, in parse
    loader.exec_module(new_module)
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/projects/orchestrate/dags/meltano.py", line 53, in <module>
    <http://logger.info|logger.info>(f"Considering schedule '{schedule['name']}': {schedule}")
TypeError: string indices must be integers
So, if anyone else goes through this too, this helped: Delete file
orchestrate/dags/meltano.py
Run
meltano add files files-airflow
The new files will be added, together with a meltano.yml, so for me I had:
orchestrate/dags/meltano.py
orchestrate/dags/meltano (files-airflow).py
Which caused the meltano invoke airflow dags list to run in an error again for one of the pipelines, somehow it had ended up in both files. So last step was to remove the
orchestrate/dags/meltano.py
file again -> seems like I have no other errors atm and the scheduler is working.