Just getting started with Meltano and I'm trying t...
# troubleshooting
b
Just getting started with Meltano and I'm trying to connect to an API where there is only one endpoint and all auth information and query params are passed in the body as json. My meltano.yml is:
Copy code
version: 1
default_environment: dev
project_id: 84ee0daa-98cb-442b-ac54-d65f476bfd32
environments:
  - name: dev
    config:
      plugins:
        extractors:
          - name: tap-rest-api-msdk
            config:
              api_url: https://<sub>.<domain>.com/default.aspx
              streams:
                - name: default
  - name: staging
  - name: prod
plugins:
  extractors:
    - name: tap-rest-api-msdk
      variant: widen
      pip_url: tap-rest-api-msdk
      config:
        api_url: https://<sub>.<domain>.com/default.aspx
        params:
          credentials:
            org: <pass in org value>
            username: <pass in username value>
            password: <pass in password value>
          database: <pass in db name>
        use_request_body_not_params: true
        streams:
          - name: default
  loaders:
    - name: target-jsonl
      variant: andyh1203
      pip_url: target-jsonl
I'm receiving the following error when running
meltano config tap-rest-api-msdk test
: ```[info ] The default environment 'dev' will be ignored for
meltano config
. To configure a specific environment, please use the option
--environment=<environment name>
. Need help fixing this problem? Visit http://melta.no/ for troubleshooting steps, or to join our friendly Slack community. Plugin configuration is invalid Catalog discovery failed: command ['/Users/bplexico/dev/meltano-projects/testetl/.meltano/extractors/tap-rest-api-msdk/venv/bin/tap-rest-api-msdk', '--config', '/Users/bplexico/dev/meltano-projects/testetl/.meltano/run/tap-rest-api-msdk/tap.8c88e553-e803-4657-8680-fc124c9b2a64.config.json', '--discover'] returned 1 with stderr: 2023-09-25 125517,654 | INFO | tap-rest-api-msdk | No schema found. Inferring schema from API call. Traceback (most recent call last): File "/Users/bplexico/dev/meltano-projects/testetl/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.11/site-packages/requests/models.py", line 971, in json return complexjson.loads(self.text, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/bplexico/dev/meltano-projects/testetl/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.11/site-packages/simplejson/__init__.py", line 514, in loads return _default_decoder.decode(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/bplexico/dev/meltano-projects/testetl/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.11/site-packages/simplejson/decoder.py", line 386, in decode obj, end = self.raw_decode(s) ^^^^^^^^^^^^^^^^^^ File "/Users/bplexico/dev/meltano-projects/testetl/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.11/site-packages/simplejson/decoder.py", line 416, in raw_decode return self.scan_once(s, idx=_w(s, idx).end()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ simplejson.errors.JSONDecodeError: Expecting value: line 3 column 1 (char 4) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/bplexico/dev/meltano-projects/testetl/.meltano/extractors/tap-rest-api-msdk/venv/bin/tap-rest-api-msdk", line 8, in <module> sys.exit(TapRestApiMsdk.cli()) ^^^^^^^^^^^^^^^^^^^^ File "/Users/bplexico/dev/meltano-projects/testetl/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.11/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/bplexico/dev/meltano-projects/testetl/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.11/site-packages/click/core.py", line 1077, in main with self.make_context(prog_name, args, **extra) as ctx: ^^^^^^^^^^^^^^^^^^^^^^^…
l
That's probably copy/paste but there's missing
'
in the
curl
command. Another thing is if the json validates properly (any
"
in the password?)
b
yeah, the missing
'
was just from copy/paste. It's there in Postman and the curl command properly returns data. I included it to show exactly what works in case that helps someone point me in the correct direction with regard to the tap config.
l
Error seems related to json - worst case scenario I would go to the file
File "/Users/bplexico/dev/meltano-projects/testetl/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.11/site-packages/requests/models.py", line 971
set a debugger there and check what is the issue. This may be the tap temporary file
/Users/bplexico/dev/meltano-projects/testetl/.meltano/run/tap-rest-api-msdk/tap.8c88e553-e803-4657-8680-fc124c9b2a64.config.json
and maybe it'll show what's wrong with the tap configuration, if some value is missing. Just making my best guess here 🙂
s
May I suggest you look at the requests being generated. If you uncomment the lines https://github.com/s7clarke10/tap-rest-api-msdk/blob/02fdcd916658eafd1a0cd6e26ed161f5c0a8a34e/tap_rest_api_msdk/streams.py#L27-L37 in the venv directory for the tap it will provide additional logging information. Run your tap with log-level=debug - this will provide more details. I suspect the complexity of the dictionary of parameters sent in the body is perhaps contributing to this issue, it is wanting individual parameters and cannot handle the credentials nested inside the credentials parameter. It would be interesting if you set the streams via an environment variable like so.
Copy code
export TAP_REST_API_MSDK_STREAMS='[{"name": "default", "params": {"credentials": { "org": "<pass in org value>", "username": "<pass in username>", "password":"<pass in password>" }}, "path": "/", "primary_keys": []}]'
Then try running in debug mode to see the requests going through and what is being generated. E.g.
Copy code
meltano --log-level=debug invoke tap-rest-api-msdk --discover
Looking at this error further, I believe it doesn't like the response it received back and is erroring when it is trying to parse the JSON while building up schema for the self-discovery. https://github.com/s7clarke10/tap-rest-api-msdk/blob/02fdcd916658eafd1a0cd6e26ed161f5c0a8a34e/tap_rest_api_msdk/tap.py#L535C36-L535C36 To debug what is going on it would be useful to see the values of • records_path • r (response back) It might be good to also add the http_request debugging lines (which are commented out in the streams.py - see last post) to the beginning of the tap.py program and uncomment them - place them directly the venv (follow the path to the files via the error messages). This will show what is happening with the request and run meltano with a log-level=debug. I suspect the r.json() is not valid / expected when it is being parsed by line 535. Add these lines before line 535.
Copy code
<http://self.logger.info|self.logger.info>(f"{records_path=}")
 <http://self.logger.info|self.logger.info>(f"{r=}")
b
After uncommenting the lines in streams.py as suggested above, I get the following. It looks like it doesn't handle the nesting of the params well.
Copy code
send: b'GET /Default.aspx?ReturnUrl=%2Fdefault.aspx%2F%3Fcredentials%3Dorg%26credentials%3Dusername%26credentials%3Dpassword&credentials=org&credentials=username&credentials=password
If I add the logging in
tap.py
the response code is returned as 200 but the actual text sent back is the html of
default.aspx
which means the authentication didn't work which isn't suprising given the way the params are passed to the API.
To narrow down what is going on, I wrote a script that uses the requests library directly and found that the
data=
attribute is unable to accept nested params. You have to use
json=
instead. See an explanation here. this works
Copy code
r = requests.get(
    "https://<sub>.<domain>.com/default.aspx",
    json={
        "credentials": {"org": "<org_value>", "username": "<username_value>", "password": "<password_value>"},
        "database": "<database_value>">,
        "type": "<type_value>",
    },
)
but tap.py is only using
data=
Copy code
r = requests.get(
            self.config["api_url"] + path,
            auth=self.http_auth,
            data=params,
            headers=headers,
        )
tap.py#58
s
Okay, I understand. The streams.py supports sending params via the json parameter (when use_request_body_not_params: true) where as the tap.py doesn't have that logic built into the schema discovery request because it was assumed authentication would use another means other than params sent via the request body. Out of a matter of interest if you change line https://github.com/s7clarke10/tap-rest-api-msdk/blob/02fdcd916658eafd1a0cd6e26ed161f5c0a8a34e/tap_rest_api_msdk/tap.py#L531C22-L531C22 from
Copy code
params=params,
to
Copy code
json=params,
Does that allow authentication with a complex nested dictionary? Can you ingest data? If this doesn't work, it would appear that an additional authentication method is required to accept your authentication parameters and have an option to send them as part of the body request. This would need to be a PR to the current tap-rest-api-msdk.
You can also send params in the stream as well. See the last open search example in the README.md .
b
I tried the
params=
to
json=
and it still isn't working. 🤷