Hello, I'm trying to extract csv files from a s3 s...
# troubleshooting
j
Hello, I'm trying to extract csv files from a s3 storage and keep getting this error:
Copy code
Catalog discovery failed: command ['/project/.meltano/extractors/tap-s3-csv/venv/bin/tap-s3-csv', '--config', '/project/.meltano/run/tap-s3-csv/tap.7fc9a9ab-13e7-4217-b88f-8c73274b1601.config.json', '--discover'] returned 1 with stderr:
 time=2023-03-17 07:33:32 name=tap_s3_csv level=CRITICAL message=expected a dictionary @ data[0]
Traceback (most recent call last):
  File "/project/.meltano/extractors/tap-s3-csv/venv/bin/tap-s3-csv", line 8, in <module>
    sys.exit(main())
  File "/project/.meltano/extractors/tap-s3-csv/venv/lib/python3.9/site-packages/singer/utils.py", line 229, in wrapped
    return fnc(*args, **kwargs)
  File "/project/.meltano/extractors/tap-s3-csv/venv/lib/python3.9/site-packages/tap_s3_csv/__init__.py", line 85, in main
    config['tables'] = CONFIG_CONTRACT(config.get('tables', {}))
  File "/project/.meltano/extractors/tap-s3-csv/venv/lib/python3.9/site-packages/voluptuous/schema_builder.py", line 272, in __call__
    return self._compiled([], data)
  File "/project/.meltano/extractors/tap-s3-csv/venv/lib/python3.9/site-packages/voluptuous/schema_builder.py", line 647, in validate_sequence
    raise er.MultipleInvalid(errors)
voluptuous.error.MultipleInvalid: expected a dictionary @ data[0]
This is my extractor:
Copy code
- name: tap-s3-csv
    variant: transferwise
    capabilities:  # This will override the capabilities declared in the lockfile
    - properties
    - discover
    - state
    pip_url: pipelinewise-tap-s3-csv
    config:
      start_date: '2020-01-01T00:00:00Z'
      bucket: my-bucket-name
      tables:
      - my-csv-file.csv
How would I add several .csv files?
s
It's telling you you configured the "tables" key inside your YAML wrong. It expects a dictionary.
Here's a complete working config: (see https://github.com/sbalnojan/meltano-example-el/blob/main/new_project/meltano.yml)
Copy code
extractors:
  - name: tap-s3-csv
    variant: transferwise
    pip_url: pipelinewise-tap-s3-csv
    config:
      bucket: test
      tables:
        - search_prefix: ""
          search_pattern: "raw_customers.csv"
          table_name: "raw_customers"
          key_properties: ["id"]
          delimiter: ","
      start_date: 2000-01-01T00:00:00Z
      aws_endpoint_url: <http://host.docker.internal:5005>
      aws_access_key_id: s
      aws_secret_access_key: s
      aws_default_region: us-east-1
If the table has the same schema, you can simply adapt the search pattern to match multiple files. Otherwise you need to add a second table for a second stream. (another list entry in the YAML)