nick_hamlin
05/05/2021, 1:25 PM--select args on the same tap (to allow some tables to get updated more often than others). As far as I can tell, the meltano schedule options don’t support this, so I need to use airflow directly. I was able to set up the custom DAG, start the airflow scheduled and webserver locally, and have everything run just fine. I then followed the instructions here, but when I bring everything up, I get [Errno 2] No such file or directory: '/project/.meltano/run/airflow/airflow.cfg'. Oddly, when I look in the equivalent directory in my working non-containerized example, I also don’t find that file. Any ideas what I might be doing wrong here?nick_hamlin
05/05/2021, 1:26 PMdouwe_maan
05/05/2021, 2:21 PMI’d like to be able to run separate jobs with different@nick_hamlin Have you considered using inheritance to create different plugins with their ownargs on the same tap (to allow some tables to get updated more often than others). As far as I can tell, the meltano schedule options don’t support this, so I need to use airflow directly--select
select: or select_filter: definition, so that you can reference each inheriting plugin individually by name from meltano schedule and the schedules list in meltano.yml? That's the official recommended solution:
plugins:
extractors:
- name: tap-foo--my_stream
inherit_from: tap-foo
select_filter:
- my_stream
schedules:
- name: my_stream
extractor: tap-foo--my_stream
# ...
Another option is to continue to use the same plugin, but to set the --select using the <EXTRACTOR>__SELECT_FILTER env var (https://meltano.com/docs/plugins.html#select-filter-extra) in the env dict under schedules (https://meltano.com/docs/integration.html#pipeline-specific-configuration):
schedules:
- name: ...
env:
TAP_FOO__SELECT_FILTER: '["my_stream"]'douwe_maan
05/05/2021, 2:24 PMI then followed the instructions here, but when I bring everything up, I getAre you still invoking airflow through.[Errno 2] No such file or directory: '/project/.meltano/run/airflow/airflow.cfg'
meltano invoke airflow? That command creates the airflow.cfg on the fly based on the Airflow config Meltano stores: https://gitlab.com/meltano/meltano/-/blob/master/src/meltano/core/plugin/airflow.py#L74
Can you change the command to meltano --log-level=debug invoke airflow ... , so that we can see if that debug log message shows up as expected, with the correct path?
Oddly, when I look in the equivalent directory in my working non-containerized example, I also don’t find that file. Any ideas what I might be doing wrong here?It's automatically deleted when
meltano invoke airflow finishes, so it's expected that you wouldn't find it there: https://gitlab.com/meltano/meltano/-/blob/master/src/meltano/core/plugin/airflow.py#L116nick_hamlin
05/05/2021, 2:55 PMmeltano invoke airflow scheduler and meltano invoke airflow webserver in my local testing, and the docker-compose is doing basically the same thingdouwe_maan
05/05/2021, 3:02 PMCan you change the command toAnd more complete output logs that include, so that we can see if that debug log message shows up as expected, with the correct path?meltano --log-level=debug invoke airflow ...
[Errno 2] No such file or directory: '/project/.meltano/run/airflow/airflow.cfg' . I wonder if that error is coming from Meltano or Airflownick_hamlin
05/05/2021, 3:04 PMnick_hamlin
05/05/2021, 3:30 PMairflow-scheduler_1 | [2021-05-05 15:23:51,056] [1|MainThread|meltano.cli.utils] [DEBUG] [Errno 2] No such file or directory: '/project/.meltano/run/airflow/airflow.cfg'
airflow-scheduler_1 | Traceback (most recent call last):
airflow-scheduler_1 | File "/usr/local/lib/python3.6/site-packages/meltano/core/plugin_invoker.py", line 220, in _invoke
airflow-scheduler_1 | yield (popen_args, popen_options, popen_env)
airflow-scheduler_1 | File "/usr/local/lib/python3.6/site-packages/meltano/core/plugin_invoker.py", line 228, in invoke
airflow-scheduler_1 | return subprocess.Popen(popen_args, **popen_options, env=popen_env)
airflow-scheduler_1 | File "/usr/local/lib/python3.6/subprocess.py", line 729, in __init__
airflow-scheduler_1 | restore_signals, start_new_session)
airflow-scheduler_1 | File "/usr/local/lib/python3.6/subprocess.py", line 1364, in _execute_child
airflow-scheduler_1 | raise child_exception_type(errno_num, err_msg, err_filename)
airflow-scheduler_1 | FileNotFoundError: [Errno 2] No such file or directory: '/project/.meltano/orchestrators/airflow/venv/bin/airflow': '/project/.meltano/orchestrators/airflow/venv/bin/airflow'
airflow-scheduler_1 |
airflow-scheduler_1 | The above exception was the direct cause of the following exception:
airflow-scheduler_1 |
airflow-scheduler_1 | Traceback (most recent call last):
airflow-scheduler_1 | File "/usr/local/lib/python3.6/site-packages/meltano/core/plugin_invoker.py", line 157, in prepared
airflow-scheduler_1 | self.prepare(session)
airflow-scheduler_1 | File "/usr/local/lib/python3.6/site-packages/meltano/core/plugin_invoker.py", line 141, in prepare
airflow-scheduler_1 | with self.plugin.trigger_hooks("configure", self, session):
airflow-scheduler_1 | File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__
airflow-scheduler_1 | return next(self.gen)
airflow-scheduler_1 | File "/usr/local/lib/python3.6/site-packages/meltano/core/behavior/hookable.py", line 70, in trigger_hooks
airflow-scheduler_1 | self.__class__.trigger(self, f"before_{hook_name}", *args, **kwargs)
airflow-scheduler_1 | File "/usr/local/lib/python3.6/site-packages/meltano/core/behavior/hookable.py", line 97, in trigger
airflow-scheduler_1 | raise err
airflow-scheduler_1 | File "/usr/local/lib/python3.6/site-packages/meltano/core/behavior/hookable.py", line 89, in trigger
airflow-scheduler_1 | hook_func(target, *args, **kwargs)
airflow-scheduler_1 | File "/usr/local/lib/python3.6/site-packages/meltano/core/plugin/airflow.py", line 52, in before_configure
airflow-scheduler_1 | stderr=subprocess.DEVNULL,
airflow-scheduler_1 | File "/usr/local/lib/python3.6/site-packages/meltano/core/plugin_invoker.py", line 228, in invoke
airflow-scheduler_1 | return subprocess.Popen(popen_args, **popen_options, env=popen_env)
airflow-scheduler_1 | File "/usr/local/lib/python3.6/contextlib.py", line 99, in __exit__
airflow-scheduler_1 | self.gen.throw(type, value, traceback)
airflow-scheduler_1 | File "/usr/local/lib/python3.6/site-packages/meltano/core/plugin_invoker.py", line 224, in _invoke
airflow-scheduler_1 | ) from err
airflow-scheduler_1 | meltano.core.plugin_invoker.ExecutableNotFoundError: Executable 'airflow' could not be found. Orchestrator 'airflow' may not have been installed yet using `meltano install orchestrator airflow`, or the executable name may be incorrect.nick_hamlin
05/05/2021, 3:31 PMnick_hamlin
05/05/2021, 3:33 PMversion: '3.8'
x-meltano-image: &meltano-image
image: globalgiving/meltano:latest
volumes:
- .:/project
services:
meltano-ui:
<<: *meltano-image
command: ui
expose:
- 5000
ports:
- 5000:5000
restart: unless-stopped
# Uncomment if you are using the Airflow orchestrator, delete otherwise
airflow-scheduler:
<<: *meltano-image
command: --log-level=debug invoke airflow scheduler
expose:
- 8793
restart: unless-stopped
airflow-webserver:
<<: *meltano-image
command: --log-level=debug invoke airflow webserver
expose:
- 8080
ports:
- 8080:8080
restart: unless-stoppeddouwe_maan
05/05/2021, 3:38 PMglobalgiving/meltano a containerized Docker image for the Meltano project in question, or just your fork of meltano/meltano? In the former case, you shouldn't need to mount the project volume, since it should already be baked into the containerdouwe_maan
05/05/2021, 3:39 PMdouwe_maan
05/05/2021, 3:39 PMdocker-compose exec meltano-ui meltano install , since the last error you saw suggests that the executable can't be foundnick_hamlin
05/05/2021, 3:40 PMnick_hamlin
05/05/2021, 3:40 PMARG MELTANO_IMAGE=meltano/meltano:latest
FROM $MELTANO_IMAGE
WORKDIR /project
# Install any additional requirements
COPY ./requirements.txt .
RUN pip install -r requirements.txt
# Install all plugins into the `.meltano` directory
COPY ./meltano.yml .
RUN meltano install
# Pin `discovery.yml` manifest by copying cached version to project root
RUN cp -n .meltano/cache/discovery.yml . 2>/dev/null || :
# Don't allow changes to containerized project files
ENV MELTANO_PROJECT_READONLY 1
# Copy over remaining project files
COPY . .
# Expose default port used by `meltano ui`
EXPOSE 5000
ENTRYPOINT ["meltano"]douwe_maan
05/05/2021, 3:42 PMnick_hamlin
05/05/2021, 3:43 PMdouwe_maan
05/05/2021, 3:45 PMdocker-compose.prod.yml handles for you: https://gitlab.com/meltano/files-docker-compose/-/blob/master/bundle/README.md#production-usagedouwe_maan
05/05/2021, 3:45 PMIf you'd like to use Docker Compose to experiment with a production-grade setup of your containerized project, you can add the appropriateI think you're currently using the non-production copy that assumes you'll usefile to your project by adding the `docker-compose` file bundle :docker-compose.prod.yml
meltano/meltano (not a containerized project) + mounting your projectdouwe_maan
05/05/2021, 3:45 PMnick_hamlin
05/05/2021, 3:48 PMnick_hamlin
05/05/2021, 3:50 PMnick_hamlin
05/05/2021, 3:55 PMdouwe_maan
05/05/2021, 4:00 PMdouwe_maan
05/05/2021, 4:01 PMnick_hamlin
05/05/2021, 4:01 PMnick_hamlin
05/05/2021, 8:31 PMdouwe_maan
05/05/2021, 8:32 PMnick_hamlin
05/05/2021, 8:33 PMmeltano_postgresql_data), and so I’d expect to find those somewhere on my local machine if I’m understanding correctly but I’m not seeing them anywhere?nick_hamlin
05/05/2021, 8:38 PMdocker-compose up/down and I still have stuff persisting in my airflow UI the way I’d expectnick_hamlin
05/05/2021, 9:39 PMJump to end
Toggle wrap
*** Log file does not exist: /project/.meltano/run/airflow/logs/meltano_test/extract_load/2021-05-05T21:01:16.002963+00:00/1.log
*** Fetching from: <http://562d5dfe87fd:8793/log/meltano_test/extract_load/2021-05-05T21:01:16.002963+00:00/1.log>
*** Failed to fetch log file from worker. HTTPConnectionPool(host='562d5dfe87fd', port=8793): Max retries exceeded with url: /log/meltano_test/extract_load/2021-05-05T21:01:16.002963+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f71e6ae64e0>: Failed to establish a new connection: [Errno 111] Connection refused',))douwe_maan
05/05/2021, 10:18 PMmeltano_postgresql_data volume would actually be stored, but these docs should help you figure that out: https://docs.docker.com/compose/compose-file/compose-file-v3/#volume-configuration-referencedouwe_maan
05/05/2021, 10:20 PMlooks like it’s not actually able to run jobs because of issues accessing the logsAre the jobs actually failing to be run by the scheduler? Or is the UI failing to show their logs?
nick_hamlin
05/06/2021, 12:23 PMnick_hamlin
05/06/2021, 12:24 PMdouwe_maan
05/06/2021, 3:11 PMnick_hamlin
05/06/2021, 4:15 PMnick_hamlin
05/10/2021, 1:51 PMdocker-compose.prod.yml template assumes you’re using the meltano UI and not the airflow UI. It exposes the meltano logs between the various containers properly, but the airflow logs are technically separate/stored in a different place (even if they wind up containing essentially the same information). Following the existing pattern for setting up the shared volume for the meltano logs, it was straightforward to add another one to share the airflow logs between the scheduler and the webservernick_hamlin
05/10/2021, 1:54 PMnick_hamlin
05/10/2021, 1:55 PMnick_hamlin
05/10/2021, 1:56 PMdouwe_maan
05/10/2021, 2:36 PMnick_hamlin
05/10/2021, 2:39 PMdouwe_maan
05/10/2021, 2:40 PMrodney_greenfield
06/04/2021, 12:35 AMI was able to workaround this for now by manually deleting the stuck records (I also tweaked the docker-compose to support direct connections to the meltano PG database, which was useful)@nick_hamlin - can you please share your docker compose changes there?
nick_hamlin
06/04/2021, 12:28 PMmeltano-system-db section of the docker compose:
ports:
- 5432:5432nick_hamlin
06/04/2021, 12:28 PMor_barda
04/18/2022, 4:04 PMmeltano run elt from my airflow task and when airflow is interrupted sometimes that task is stuck on RUNNING STATE in the database. I am using v1.77.0douwe_maan
04/18/2022, 4:28 PMor_barda
04/18/2022, 4:34 PM