Hi team, I want to extract data from public API, I...
# troubleshooting
a
Hi team, I want to extract data from public API, I am using https://dummy.restapiexample.com/api/v1/employees I am going through https://hub.meltano.com/extractors/tap-rest-api-msdk/#api_url-setting documentation and set api_url, but when I test I am not getting any response Its a simple public api without authentication please help on this
r
Can you share your
meltano.yml
config and the error you are seeing (logs)?
I'm getting rate limited into oblivion by that API, but looks like this config might eventually work?
Copy code
plugins:
  extractors:
  - name: tap-rest-api-msdk
    variant: widen
    pip_url: tap-rest-api-msdk
    config:
      api_url: <https://dummy.restapiexample.com/api/v1>
      headers:
        User-Agent: meltano
      streams:
      - name: employees
        path: /employees
        records_path: $.data[*]
        primary_keys: [id]
It was necessary to set
User-Agent
to get a successful response.
👍 1
a
thank you ! I will check now
@Reuben (Matatika) while I test, I am getting below error Plugin configuration is invalid Catalog discovery failed: command ['/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/bin/tap-rest-api-msdk', '--config', '/opt/meltano/my-meltano-project/.meltano/run/tap-rest-api-msdk/tap.88aca75c-f8a7-4e8d-b9e3-75121b24843d.config.json', '--discover'] returned 1 with stderr: 2024-03-20 092826,683 | INFO | tap-rest-api-msdk | No schema found. Inferring schema from API call. 2024-03-20 092827,120 | ERROR | tap-rest-api-msdk | Error Connecting, message = {"message":"Error Occured! Page Not found, contact rstapi2example@gmail.com"} Traceback (most recent call last): File "/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/bin/tap-rest-api-msdk", line 8, in <module> sys.exit(TapRestApiMsdk.cli()) File "/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) File "/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.10/site-packages/click/core.py", line 1077, in main with self.make_context(prog_name, args, **extra) as ctx: File "/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.10/site-packages/click/core.py", line 943, in make_context self.parse_args(ctx, args) File "/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.10/site-packages/click/core.py", line 1408, in parse_args value, args = param.handle_parse_result(ctx, opts, args) File "/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.10/site-packages/click/core.py", line 2400, in handle_parse_result value = self.process_value(ctx, value) File "/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.10/site-packages/click/core.py", line 2362, in process_value value = self.callback(ctx, self, value) File "/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 528, in cb_discover tap.run_discovery() File "/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 288, in run_discovery catalog_text = self.catalog_json_text File "/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 308, in catalog_json_text return json.dumps(self.catalog_dict, indent=2) File "/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 299, in catalog_dict return t.cast(dict, self._singer_catalog.to_dict()) File "/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 319, in _singer_catalog for stream in self.streams.values() File "/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 128, in streams for stream in self.load_streams(): File "/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 352, in load_streams for stream in self.discover_streams(): File "/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.10/site-packages/tap_rest_api_msdk/tap.py", line 474, in discover_streams schema = self.get_schema( File "/opt/meltano/my-meltano-project/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.10/site-packages/tap_rest_api_msdk/tap.py", line 587, in get_schema raise ValueError(r.text) ValueError: {"message":"Error Occured! Page Not found, contact rstapi2example@gmail.com"}
r
You get that error if your request URL is wrong.
I honestly don't know how you are managing to test with that API though - the tap makes the request for the schema and then I immediately get rate-limited after. It's effectively unusable for me.
a
I am trying to retrieve data from sample/any public open api and then write the data to a table can you please help on this PFA screenshots
r
Copy code
2024-03-20T10:36:20.611557Z [info     ] Environment 'dev' is active
2024-03-20 10:36:21,575 | INFO     | tap-rest-api-msdk    | No schema found. Inferring schema from API call.
2024-03-20 10:36:22,209 | INFO     | singer_sdk.helpers.jsonpath | JSONPath matches: 24
{"type": "STATE", "value": {}}
2024-03-20 10:36:22,211 | INFO     | tap-rest-api-msdk    | Beginning full_table sync of 'employees'...
2024-03-20 10:36:22,211 | INFO     | tap-rest-api-msdk    | Tap has custom mapper. Using 1 provided map(s).
{"type": "SCHEMA", "stream": "employees", "schema": {"properties": {"id": {"type": "integer"}, "employee_name": {"type": "string"}, "employee_salary": {"type": "integer"}, "employee_age": {"type": "integer"}, "profile_image": {"type": "string"}}, "type": "object", "required": ["employee_age", "employee_name", "employee_salary", "id", "profile_image"]}, "key_properties": ["id"]}
2024-03-20 10:36:22,212 | INFO     | tap-rest-api-msdk    | the next_page_token_jsonpath = $.next_page.
2024-03-20 10:36:22,780 | INFO     | singer_sdk.metrics   | METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.566765, "tags": {"stream": "employees", "endpoint": "/employees", "http_status_code": 429, "status": "failed"}}
2024-03-20 10:36:22,780 | INFO     | backoff              | Backing off _request(...) for 2.9s (singer_sdk.exceptions.RetriableAPIError: 429 Client Error: Too Many Requests for path: /api/v1/employees)
2024-03-20 10:36:22,780 | ERROR    | root                 | Backing off 2.94 seconds after 1 tries calling function <bound method RESTStream._request of <tap_rest_api_msdk.streams.DynamicStream object at 0x73a01cb24430>> with args (<PreparedRequest [GET]>, None) and kwargs {}
2024-03-20 10:36:26,002 | INFO     | singer_sdk.metrics   | METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.271849, "tags": {"stream": "employees", "endpoint": "/employees", "http_status_code": 429, "status": "failed"}}
2024-03-20 10:36:26,002 | INFO     | backoff              | Backing off _request(...) for 4.5s (singer_sdk.exceptions.RetriableAPIError: 429 Client Error: Too Many Requests for path: /api/v1/employees)
2024-03-20 10:36:26,002 | ERROR    | root                 | Backing off 4.51 seconds after 2 tries calling function <bound method RESTStream._request of <tap_rest_api_msdk.streams.DynamicStream object at 0x73a01cb24430>> with args (<PreparedRequest [GET]>, None) and kwargs {}
2024-03-20 10:36:30,783 | INFO     | singer_sdk.metrics   | METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.263652, "tags": {"stream": "employees", "endpoint": "/employees", "http_status_code": 429, "status": "failed"}}
2024-03-20 10:36:30,783 | INFO     | backoff              | Backing off _request(...) for 8.2s (singer_sdk.exceptions.RetriableAPIError: 429 Client Error: Too Many Requests for path: /api/v1/employees)
2024-03-20 10:36:30,783 | ERROR    | root                 | Backing off 8.18 seconds after 3 tries calling function <bound method RESTStream._request of <tap_rest_api_msdk.streams.DynamicStream object at 0x73a01cb24430>> with args (<PreparedRequest [GET]>, None) and kwargs {}
2024-03-20 10:36:39,577 | INFO     | singer_sdk.metrics   | METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.598462, "tags": {"stream": "employees", "endpoint": "/employees", "http_status_code": 429, "status": "failed"}}
2024-03-20 10:36:39,577 | INFO     | backoff              | Backing off _request(...) for 16.1s (singer_sdk.exceptions.RetriableAPIError: 429 Client Error: Too Many Requests for path: /api/v1/employees)
2024-03-20 10:36:39,577 | ERROR    | root                 | Backing off 16.07 seconds after 4 tries calling function <bound method RESTStream._request of <tap_rest_api_msdk.streams.DynamicStream object at 0x73a01cb24430>> with args (<PreparedRequest [GET]>, None) and kwargs {}
ae2024-03-20 10:36:56,230 | INFO     | singer_sdk.metrics   | METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.564422, "tags": {"stream": "employees", "endpoint": "/employees", "http_status_code": 429, "status": "failed"}}
2024-03-20 10:36:56,230 | ERROR    | backoff              | Giving up _request(...) after 5 tries (singer_sdk.exceptions.RetriableAPIError: 429 Client Error: Too Many Requests for path: /api/v1/employees)
2024-03-20 10:36:56,230 | INFO     | singer_sdk.metrics   | METRIC: {"type": "counter", "metric": "http_request_count", "value": 0, "tags": {"stream": "employees", "endpoint": "/employees"}}
2024-03-20 10:36:56,230 | INFO     | singer_sdk.metrics   | METRIC: {"type": "timer", "metric": "sync_duration", "value": 34.01871681213379, "tags": {"stream": "employees", "context": {}, "status": "failed"}}
2024-03-20 10:36:56,230 | INFO     | singer_sdk.metrics   | METRIC: {"type": "counter", "metric": "record_count", "value": 0, "tags": {"stream": "employees", "context": {}}}
2024-03-20 10:36:56,230 | ERROR    | tap-rest-api-msdk    | An unhandled error occurred while syncing 'employees'
Traceback (most recent call last):
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/singer_sdk/streams/core.py", line 1187, in sync
    for _ in self._sync_records(context=context):
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/singer_sdk/streams/core.py", line 1081, in _sync_records
    for record_result in self.get_records(current_context):
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/singer_sdk/streams/rest.py", line 574, in get_records
    for record in self.request_records(context):
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/singer_sdk/streams/rest.py", line 395, in request_records
    resp = decorated_request(prepared_request, context)
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/singer_sdk/streams/rest.py", line 274, in _request
    self.validate_response(response)
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/singer_sdk/streams/rest.py", line 185, in validate_response
    raise RetriableAPIError(msg, response)
singer_sdk.exceptions.RetriableAPIError: 429 Client Error: Too Many Requests for path: /api/v1/employees
Traceback (most recent call last):
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/bin/tap-rest-api-msdk", line 8, in <module>
    sys.exit(TapRestApiMsdk.cli())
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/singer_sdk/tap_base.py", line 501, in invoke
    tap.sync_all()
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/singer_sdk/tap_base.py", line 460, in sync_all
    stream.sync()
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/singer_sdk/streams/core.py", line 1194, in sync
    raise ex
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/singer_sdk/streams/core.py", line 1187, in sync
    for _ in self._sync_records(context=context):
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/singer_sdk/streams/core.py", line 1081, in _sync_records
    for record_result in self.get_records(current_context):
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/singer_sdk/streams/rest.py", line 574, in get_records
    for record in self.request_records(context):
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/singer_sdk/streams/rest.py", line 395, in request_records
    resp = decorated_request(prepared_request, context)
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/singer_sdk/streams/rest.py", line 274, in _request
    self.validate_response(response)
  File "/tmp/p/.meltano/extractors/tap-rest-api-msdk/venv/lib/python3.8/site-packages/singer_sdk/streams/rest.py", line 185, in validate_response
    raise RetriableAPIError(msg, response)
singer_sdk.exceptions.RetriableAPIError: 429 Client Error: Too Many Requests for path: /api/v1/employees
It makes the first request to infer the schema find, finds 24 records and then I immediately get rate-limited. I suggest you find a different public free API: https://github.com/public-apis/public-apis
You are getting the
Error Occured! Page Not found
error because your
api_url
is
<https://dummy.restapiexample.com/api/v1/employees>
and the
employees
stream
path
is
/employees
, so the tap is trying to make a request to
<https://dummy.restapiexample.com/api/v1/employees/employees>
.
a
hi @Reuben (Matatika) , I am also facing a similar issue with an API. Is there a way to add a delay in this tap - tap-rest-api-msdk? By delay, I mean the API I am trying to use has a rate limit of 200 per hour. So, ideally, I want to hit it every 18 seconds. I tried setting the below config but it does not resolve anything. Can you suggest any workaround? Or do you suggest building a custom tap to overcome this rate-limit scenario?
Copy code
backoff_time_extension: 18
r
After looking through the README that would have been my first guess too, but looks like you need to supply any of the supported values for
backoff_type
(
message
or
header
) so that the tap actually applies your
backoff_time_extension
configuration, rather than falling back to the default SDK behaviour: https://github.com/Widen/tap-rest-api-msdk/blob/761f4bbf463cef95a836dc1b567c8305eba8083d/tap_rest_api_msdk/streams.py#L267-L273 As for whether or not you should build a custom tap - that's very much dependent on your use-case: I've always thought of
tap-rest-api-mdsk
as a great prototyping tool, but the configuration is bound to be more complicated versus a specific tap, which can abstract away a lot of that logic. I would say that if you're planning to use this in a production environment eventually, I would at least have a look at creating a custom tap once you have a POC working. From a FOSS perspective, It's also worth considering if there is/could be interest from users in integrating the source with the wider Singer ecosystem.
2
🙌 1