Hey guys, I am trying to dockerize a meltano proje...
# troubleshooting
c
Hey guys, I am trying to dockerize a meltano project that is running on my local machine. I loaded the docker file bundle and also the docker-compose file bundle. I can not get the docker to work. It keeps saying it can't find the taps. I look in the docker container and they are there. Any ideas?
e
Hi @cory_hurst! What's the exact message you're seeing about the taps not being found? There's a few things that could be happening: the taps are not installed, the tap venvs managed by meltano are broken, etc.
c
Copy code
ELT could not be completed: Cannot start extractor: Executable 'tap-salesforce' could not be found. Extractor 'tap-salesforce' may not have been installed yet using `meltano install extractor tap-salesforce`, or the executable name may be incorrect.
I would say that it may be the venv thing..
I use conda usually for local dev
attempting to install extractor result in
Copy code
[Errno 8] Exec format error: '/project/.meltano/extractors/tap-salesforce/venv/bin/python'
e
The docker-compose bundle includes a
docker-compose.yml
that mounts the entire project, including the
.meltano/
dir so your venvs are also mounted, which is problematic. So as a solution, you might wanna run a clean install (remove and recreate) with
Copy code
meltano install --clean
c
why is that problematic? Is there a way to get this to work just with docker compose up without using
Copy code
meltano install --clean
? (I ran that on the container and then it started working -so thanks. That must be related to the issue.)
I ran it on the container with the cli command
Copy code
meltano elt tap target --job-id job1
But when I ran it through the UI which comes up with the docker compose, it fails and the logs spit out
Copy code
meltano-ui_1  | 2022-01-13T18:53:53.509898Z [error    ] exception calling callback for <Future at 0x7f3634c090f0 state=finished raised FileNotFoundError> 
meltano-ui_1  | Traceback (most recent call last):
meltano-ui_1  |   File "/usr/local/lib/python3.6/concurrent/futures/_base.py", line 324, in _invoke_callbacks
meltano-ui_1  |     callback(self)
meltano-ui_1  |   File "/usr/local/lib/python3.6/site-packages/flask_executor/executor.py", line 28, in propagate_exceptions_callback
meltano-ui_1  |     raise exc
meltano-ui_1  |   File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
meltano-ui_1  |     result = self.fn(*self.args, **self.kwargs)
meltano-ui_1  |   File "/usr/local/lib/python3.6/site-packages/flask_executor/executor.py", line 20, in wrapper
meltano-ui_1  |     return fn(*args, **kwargs)
meltano-ui_1  |   File "/usr/local/lib/python3.6/site-packages/flask/ctx.py", line 158, in wrapper
meltano-ui_1  |     return f(*args, **kwargs)
meltano-ui_1  |   File "/usr/local/lib/python3.6/site-packages/meltano/api/executor/__init__.py", line 32, in defer_run_schedule
meltano-ui_1  |     env={"MELTANO_JOB_TRIGGER": "ui"},
meltano-ui_1  |   File "/usr/local/lib/python3.6/site-packages/meltano/core/schedule_service.py", line 156, in run
meltano-ui_1  |     ["elt", *schedule.elt_args, *args], env={**schedule.env, **env}, **kwargs
meltano-ui_1  |   File "/usr/local/lib/python3.6/site-packages/meltano/core/meltano_invoker.py", line 23, in invoke
meltano-ui_1  |     env=self._executable_env(env)
meltano-ui_1  |   File "/usr/local/lib/python3.6/subprocess.py", line 423, in run
meltano-ui_1  |     with Popen(*popenargs, **kwargs) as process:
meltano-ui_1  |   File "/usr/local/lib/python3.6/subprocess.py", line 729, in __init__
meltano-ui_1  |     restore_signals, start_new_session)
meltano-ui_1  |   File "/usr/local/lib/python3.6/subprocess.py", line 1364, in _execute_child
meltano-ui_1  |     raise child_exception_type(errno_num, err_msg, err_filename)
meltano-ui_1  | FileNotFoundError: [Errno 2] No such file or directory: '/project/.meltano/run/bin': '/project/.meltano/run/bin'
That file is present
When I use the instructions for dockerizing the project:
Copy code
# For these examples to work, ensure that
  # Docker has been installed
  docker --version

  # Add Docker files to your project
  meltano add files docker

  # Build Docker image containing
  # Meltano, your project, and all of its plugins
  docker build --tag meltano-demo-project:dev .

  Your meltano-demo-project:dev Docker image is now ready for its first container!

  # View Meltano version
  docker run meltano-demo-project:dev --version

  # Run gitlab-to-jsonl pipeline with
  # mounted volume to exfiltrate target-jsonl output
  docker run \
    --volume $(pwd)/output:/project/output \
    meltano-demo-project:dev \
    elt tap-salesforce target-redshift --job_id=gitlab-to-postgres
I get the following:
Copy code
➜  convertiv-salesforce git:(master) ✗ docker run \
>   --volume $(pwd)/output:/project/output \
>   meltano-demo-project:dev \
>   elt tap-salesforce target-redshift --job_id=gitlab-to-postgres
2022-01-13T19:39:02.291600Z [info     ] Running extract & load...      job_id=gitlab-to-postgres name=meltano run_id=6f43835a-0b71-45e0-a502-c1aba6499112
2022-01-13T19:39:02.414712Z [warning  ] No state was found, complete import.
2022-01-13T19:39:02.968743Z [info     ] ELT could not be completed: Cannot start extractor: Catalog discovery failed: command ['/project/.meltano/extractors/tap-salesforce/venv/bin/tap-salesforce', '--config', '/project/.meltano/run/elt/gitlab-to-postgres/6f43835a-0b71-45e0-a502-c1aba6499112/tap.e3e7749c-dfab-433e-b862-cdd8f3df5b4a.config.json', '--discover'] returned 1 cmd_type=elt job_id=gitlab-to-postgres name=meltano run_id=6f43835a-0b71-45e0-a502-c1aba6499112 stdio=stderr
ELT could not be completed: Cannot start extractor: Catalog discovery failed: command ['/project/.meltano/extractors/tap-salesforce/venv/bin/tap-salesforce', '--config', '/project/.meltano/run/elt/gitlab-to-postgres/6f43835a-0b71-45e0-a502-c1aba6499112/tap.e3e7749c-dfab-433e-b862-cdd8f3df5b4a.config.json', '--discover'] returned 1
Is it not possible to dockerize an existing project?
e
It is certainly possible. I was running dockerized Meltano on kubernetes successfully a while ago. The hiccups are around the compose file bundle incorrectly binding to the host's
.meltano/
and probably passing secrets. Can you try running with debug enabled?
Copy code
docker run \
    --volume $(pwd)/output:/project/output \
    --env MELTANO_LOG_LEVEL=debug
    meltano-demo-project:dev \
    elt tap-salesforce target-redshift --job_id=gitlab-to-postgres
c
Ok, I'll try that. I have also tried building and running from the straight docker image. and I get this:
Copy code
ELT could not be completed: Cannot start extractor: Catalog discovery failed: command ['/project/.meltano/extractors/tap-salesforce/venv/bin/tap-salesforce', '--config', '/project/.meltano/run/elt/gitlab-to-postgres/b0ff6acc-506f-4479-a543-387c0448c185/tap.e43ad727-00d5-4351-baf7-d5e7a365e4cb.config.json', '--discover'] returned 1
what does that catalog discovery indicate?
debug response was this:
Copy code
➜  convertiv-salesforce git:(master) ✗     docker run \
>       --volume $(pwd)/output:/project/output \
>       --env MELTANO_LOG_LEVEL=debug \
>       meltano-demo-project:dev \
>       elt tap-salesforce target-redshift --job_id=gitlab-to-postgres
2022-01-13T20:21:18.359226Z [info     ] Running extract & load...      job_id=gitlab-to-postgres name=meltano run_id=7724df1c-90dc-4a11-a7bf-c1a683bd632e
2022-01-13T20:21:18.507799Z [warning  ] No state was found, complete import.
2022-01-13T20:21:19.102020Z [info     ] ELT could not be completed: Cannot start extractor: Catalog discovery failed: command ['/project/.meltano/extractors/tap-salesforce/venv/bin/tap-salesforce', '--config', '/project/.meltano/run/elt/gitlab-to-postgres/7724df1c-90dc-4a11-a7bf-c1a683bd632e/tap.588558c0-c39f-4648-b0b5-6384b443ef1d.config.json', '--discover'] returned 1 cmd_type=elt job_id=gitlab-to-postgres name=meltano run_id=7724df1c-90dc-4a11-a7bf-c1a683bd632e stdio=stderr
ELT could not be completed: Cannot start extractor: Catalog discovery failed: command ['/project/.meltano/extractors/tap-salesforce/venv/bin/tap-salesforce', '--config', '/project/.meltano/run/elt/gitlab-to-postgres/7724df1c-90dc-4a11-a7bf-c1a683bd632e/tap.588558c0-c39f-4648-b0b5-6384b443ef1d.config.json', '--discover'] returned 1
e
what does that catalog discovery indicate?
The Singer spec uses a catalog file to store inspected schemas and metadata (like primary keys) and to allow you to modify it to update replication methods, etc. So Meltano first runs
<tap-executable> --discover
to generate that catalog file. Can you change
MELTANO_LOG_LEVEL
to
MELTANO_CLI_LOG_LEVEL
? For some reason the former is not being picked up by the CLI (cc @taylor, see https://gitlab.com/meltano/meltano/-/issues/3158)
c
Here is the output from the cli debug level
e
Ok so this is the error:
Copy code
Cannot create credentials from config.
According to the code, it means at least one setting is missing either for basic or oauth. Looks like you're missing
client_secret
and
refresh_token
. You can check if/whence they're being picked up with
meltano config tap-salesforce list
.
c
I have those in a .env file..... It seems to be copied in to the container..
yeah they don't come out in the output of that
do I need to do somethign to make them visible/ pass them other than have them in the .env file in the project directory?
e
do I need to do somethign to make them visible/ pass them other than have them in the .env file in the project directory?
Nope, having them
.env
should be enough, which makes me think they might not have the right names in the file
c
Copy code
TARGET_REDSHIFT_PASSWORD=""
TARGET_REDSHIFT_AWS_ACCESS_KEY_ID=""
TARGET_REDSHIFT_AWS_SECRET_ACCESS_KEY=""
TAP_SALESFORCE_CLIENT_SECRET=""
TAP_SALESFORCE_REFRESH_TOKEN=""
Is that right? I saw in the code it does not have the tap name prepended to the variable name... where as here it does https://meltano.com/docs/configuration.html#configuring-settings
e
Yeah those seem ok. You can confirm from the output of
meltano config tap-salesforce list
:
Copy code
client_id [env: ...] current value: xyz (from .env)
client_secret [env: ...] current value: ... (from ?)
...
the "env" part should tell what variables names it expects
c
actually it looks like it is not picking up the .env.... I am using the docker file bundle not the docker-compose file.
I mean it is not copying it, it seems
here is the docker file:
Copy code
ARG MELTANO_IMAGE=meltano/meltano:latest
FROM $MELTANO_IMAGE

WORKDIR /project

# Install any additional requirements
COPY ./requirements.txt . 
RUN pip install -r requirements.txt

# Install all plugins into the `.meltano` directory
COPY ./meltano.yml . 
RUN meltano install

# Pin `discovery.yml` manifest by copying cached version to project root
RUN cp -n .meltano/cache/discovery.yml . 2>/dev/null || :

# Don't allow changes to containerized project files
ENV MELTANO_PROJECT_READONLY 1

# Copy over remaining project files
COPY . .

# Expose default port used by `meltano ui`
EXPOSE 5000

ENTRYPOINT ["meltano"]
looks like the .env was in the .dockerignore . I am running now... Is that in the dockerfile bundle like that?? I don;t think I changed anything
e
Oh ok, so yeah
.env
won't be COPYed into the image because it's ignored (see your
.dockerignore
. You can pass an env file to either
docker run
and it will create environment variables for you:
Copy code
docker run --envfile .env ...
c
yeah that makes sense and actually, probably more convenient! Thank you for your help!