Hello, I am trying to setup my first meltano pipel...
# troubleshooting
s
Hello, I am trying to setup my first meltano pipeline using the console. I added the s3 tap to my project and ran select which is triggering a permission error. Not too sure where to look for more details on this. Could someone maybe help me understand this "Catalog discovery failed" error. Did I miss a step in the getting started tutorial?
Copy code
meltano add extractor tap-s3
meltano select --list --all tap-s3
Error produced ```Cannot list the selected attributes: Catalog discovery failed: command ['/home/ec2-user/workspace/feature-store/meltano/first-pipe/.meltano/extractors/tap-s3/venv/bin/tap-airbyte', '--config', '/home/ec2-user/workspace/feature-store/meltano/first-pipe/.meltano/run/tap-s3/tap.0461fef6-3ada-4752-a6d6-ec9c8c47e155.config.json', '--discover'] returned 1 with stderr: Traceback (most recent call last): File "/usr/lib64/python3.9/shutil.py", line 660, in _rmtree_safe_fd dirfd = os.open(entry.name, os.O_RDONLY, dir_fd=topfd) PermissionError: [Errno 13] Permission denied: 'tmpzbuqebi0' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ec2-user/workspace/feature-store/meltano/first-pipe/.meltano/extractors/tap-s3/venv/bin/tap-airbyte", line 8, in <module> sys.exit(TapAirbyte.cli()) File "/home/ec2-user/workspace/feature-store/meltano/first-pipe/.meltano/extractors/tap-s3/venv/lib/python3.9/site-packages/click/core.py", line 1130, in call return self.main(*args, **kwargs) File "/home/ec2-user/workspace/feature-store/meltano/first-pipe/.meltano/extractors/tap-s3/venv/lib/python3.9/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/home/ec2-user/workspace/feature-store/meltano/first-pipe/.meltano/extractors/tap-s3/venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/ec2-user/workspace/feature-store/meltano/first-pipe/.meltano/extractors/tap-s3/venv/lib/python3.9/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/home/ec2-user/workspace/feature-store/meltano/first-pipe/.meltano/extractors/tap-s3/venv/lib/python3.9/site-packages/tap_airbyte/tap.py", line 269, in cli tap: TapAirbyte = cls( # type: ignore File "/home/ec2-user/workspace/feature-store/meltano/first-pipe/.meltano/extractors/tap-s3/venv/lib/python3.9/site-packages/tap_airbyte/tap.py", line 314, in init super().__init__(*args, **kwargs) File "/home/ec2-user/workspace/feature-store/meltano/first-pipe/.meltano/extractors/tap-s3/venv/lib/python3.9/site-packages/singer_sdk/tap_base.py", line 97, in init self.mapper.register_raw_streams_from_catalog(self.catalog) File "/home/ec2-user/workspace/feature-store/meltano/first-pipe/.meltano/extractors/tap-s3/venv/lib/python3.9/site-packages/singer_sdk/tap_base.py", line 159, in catalog self._catalog = self.input_catalog or self._singer_catalog File "/home/ec2-user/workspace/feature-store/meltano/first-pipe/.meltano/extractors/tap-s3/venv/lib/python3.9/site-packages/singer_sdk/tap_base.py", line 251, in _singer_catalog for stream in self.streams.values() File "/home/ec2-user/workspace/feature-store/meltano/first-pipe/.meltano/extractors/tap-s3/venv/lib/python3.9/site-packages/singer_sdk/tap_base.py", line 122, in streams for stream in self.load_streams(): File "/home/ec2-user/workspace/feature-store/meltano/first-pipe/.meltano/extractors/tap-s3/venv/lib/python3.9/site-packages/singer_sdk/tap_base.py", line 283, in load_streams for stream in self.discover_streams(): File "/home/ec2-user/workspace/feature-store/meltano/first-pipe/.meltano/extractors/tap-s3/venv/lib/python3.9/site-packages/tap_airbyte/tap.py", line 688, in discover_streams for stream in self.airbyte_catalog["streams"]: File "/home/ec2-user/workspace/feature-store/meltano/first-pipe/.meltano/extractors/tap-s3/venv/lib/python3.9/site-packages/tap_a…
I peeked deeper into what's happening here and it seems that we're running the
tap-airbyte
extractor to discover a catalog which tries to create some temporary files and then read them. Somehow those files are created as the root user and not the current user and so cannot be read. Would anyone know where to look to understand this behaviour? Is this expected and I need to run as root? Or is there some other problem lurking?
c
stack trace seems to be pointing to this line. you may need to check docker permissions.
s
Hi @ceyhun_kerti, thanks for that pointer. What kind of docker permissions are needed? I installed docker on my ubuntu host from packages and added the current user to the docker user group.
p
@shailesh_kochhar can you share you meltano config with anything sensitive removed?
s
@pat_nadolny I think I ended up destroying the host where I was trying this out. I'll regenerate it and share with you in a day or so -- sorry for taking long to take up your assistance I'm a bit caught up today.
@pat_nadolny here's the current meltano.yml
Copy code
version: 1
default_environment: dev
project_id: d88d7541-5c12-4c90-8b59-03766a4e666c
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-s3
    variant: airbyte
    pip_url: git+<https://github.com/MeltanoLabs/tap-airbyte-wrapper.git>
    config:
      airbyte_config:
        dataset: XXXXXX-XXXXXX-logs
        path_pattern: dt=*/hr=*/*
        format:
          filetype: json
          infer_datatypes: true
  loaders:
  - name: target-jsonl
    variant: andyh1203
    pip_url: target-jsonl
And config.json
Copy code
{
  "airbyte_spec": {
    "image": "airbyte/source-s3",
    "tag": "latest"
  }
}
p
@shailesh_kochhar where are you getting that config.json from? It doesnt look like it matches your meltano.yml config. If you run
meltano config tap-s3
you should get your full config. Theres no need to manually pass in a config.json though, meltano will manage that for you.
I think you might be missing a few config options too
connector_config.provider.bucket
and
connector_config.path_pattern
are required with docs in https://docs.airbyte.com/integrations/sources/s3/#s3-provider-settings
I dont think missing those configs would give you a permission error though. What commands are you running?
One other thing to consider also is that you could have an outdated image on your machine, you could also try deleting your local image and letting it pull the latest again 🤔
s
I run
meltaon run tap-s3 target-jsonl
By outdated image do you mean the airbyte-tap image being run by docker?
@pat_nadolny -- the config.json is in the root of my project. I manually generated it once by running
meltano config tap-s3
to see what it contained. I don't pass it into the run command.
u
By outdated image do you mean the airbyte-tap image being run by docker?
Yes I doubt this is your problem but if that image is really old it might have bugs, just something to try.
u
s
@pat_nadolny The config entry for
connector_config.path_pattern
is present in the snipper above. There's also another file in the directory named `.env`with sensitive parameters like bucket, access_key_id and access_key_params
Copy code
TAP_S3_AIRBYTE_CONFIG_PROVIDER_BUCKET='XXXXXXXXXXXX'
TAP_S3_AIRBYTE_CONFIG_PROVIDER_AWS_ACCESS_KEY_ID='XXXXXXXXXXXX'
TAP_S3_AIRBYTE_CONFIG_PROVIDER_AWS_SECRET_ACCESS_KEY='XXXXXXXX'
Docker shows that the airbyte/tap-s3 image is the latest version
Copy code
~/workspace/meltano/first-pipe$ docker image ls
REPOSITORY                            TAG       IMAGE ID       CREATED         SIZE
airbyte/source-s3                     latest    6cca1f00ca50   2 weeks ago     459MB
I have also tried explicitly deleting the image and running the s3 tap with
Copy code
~/workspace/meltano/first-pipe$ docker rmi airbyte/source-s3
~/workspace/meltano/first-pipe$ docker image ls
REPOSITORY                            TAG       IMAGE ID       CREATED         SIZE
~/workspace/meltano/first-pipe$ meltano config tap-s3 test
~/workspace/meltano/first-pipe$ docker image ls
REPOSITORY          TAG       IMAGE ID       CREATED       SIZE
airbyte/source-s3   latest    6cca1f00ca50   2 weeks ago   459MB
The
meltano config
command fails with an 'Operation not permitted' error Since there was an earlier mention of docker permissions, I'm wondering if docker is correctly installed on the host. Is there a preferred way to install docker? IIRC, I followed the install instructions for installing docker engine on Ubuntu from apt repos. https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository
a
Hi! Were you able to resolve this issue? Encountering the same error.