nicholas_van_kuren
03/17/2022, 9:48 PMstreams.py
and added the following to the stream class:
rest_method = "POST"
It is at this point I am getting a 403 error. I am able to get this request to work via Postman where I have to enter the data as "form-data" in the body and the form-data section seems to correspond to the properties list in taps.py
, but its not working for some reason. My best guess is that it has something to do with the endpoint property. I have tried leaving it blank as mentioned as well as specifying the whole url and leaving the base url blank and vice versa, but all return the same error. Any help would be much appreciated!Reuben (Matatika)
03/18/2022, 12:04 AM<http://www.api.com|www.api.com>
• There are no endpoints, so every request is made to <http://www.api.com|www.api.com>
• All requests are POST
requests that submit entity data to access resources
---
Assuming you are using the Singer SDK, you will need to supply url_base
and override rest_method
in the base tap stream class generated for you in client.py
- let’s refer to this as APIStream
.
client.py
from singer_sdk.streams import RESTStream
class APIStream(RESTStream):
"""API stream class."""
url_base = "<http://www.api.com|www.api.com>"
rest_method = "POST"
• RESTStream::url_base
---
The streams you define in streams.py
that inherit from APIStream
(e.g. CharactersStream
) will then need to override the RESTStream::prepare_request_payload
method to send entity data.
streams.py
from typing import Any, Optional
from tap_api.client import APIStream
class CharactersStream(APIStream):
"""Characters stream class"""
name = "stream_characters"
# overrides RESTStream::prepare_request_payload
def prepare_request_payload(
self, context: Optional[dict], next_page_token: Optional[Any]
) -> Optional[dict]:
# entity data
return {
"entity": "characters",
}
• RESTStream::prepare_request_payload
---
If you want to get entity data passed as a setting to the tap, you can access this through `Stream::config`:
def prepare_request_payload(
self, context: Optional[dict], next_page_token: Optional[Any]
) -> Optional[dict]:
return {
"entity": self.config["characters_entity"],
}
• Stream::config
This use-case doesn’t really make sense to me though, since you would need to supply an entity identifier setting for each entity stream, which doesn’t scale very well. If you have an entities
setting that specifies an array of entity identifiers, a better approach might be to do something like override Tap::discover_streams
in tap.py
(done for you if you used the SDK project cookiecutter
) and dynamically generate a stream for each entity from a generic stream class (e.g. EntityStream
).
streams.py
from typing import Any, Optional
from singer_sdk.plugin_base import PluginBase as TapBaseClass
from singer_sdk.streams import RESTStream
class EntityStream(RESTStream):
"""Entity stream class."""
def __init__(self, tap: TapBaseClass, entity: str):
super().__init__(tap, f"stream_{entity}")
self.entity = entity
def prepare_request_payload(
self, context: Optional[dict], next_page_token: Optional[Any]
) -> Optional[dict]:
return {
"entity": self.entity,
}
tap.py
from typing import List
from singer_sdk import Stream, Tap
from tap_api.streams import EntityStream
class TapAPI(Tap):
"""API tap class."""
def discover_streams(self) -> List[Stream]:
"""Return a list of discovered streams."""
entities: List[str] = self.config["entities"]
return [EntityStream(self, entity) for entity in entities]
• Tap::discover_streamsReuben (Matatika)
03/18/2022, 12:05 AMnicholas_van_kuren
03/18/2022, 1:19 PMconfig_jsonschema = th.PropertiesList(
th.Property(
"token",
th.StringType,
required=True,
description="The token to authenticate against the API service"
),
I am adding a property for token and any other properties into the SAMPLE_CONFIG in the initial pytest setup, but still getting a 403 forbidden error. I see your suggestion to add url_base
directly to the APIStream class, but Path
is a required attribute. That is why I have tried leaving this blank and trying to just use the url_base
in client.py to specify the full url and also tried adding full url to the Path
attribute. Neither has worked. It seems like I need to potentially override the Path
attribute?Reuben (Matatika)
03/18/2022, 2:10 PMPath
, but RESTStream::path
is not a required attribute. RESTStream::get_url
will use ""
instead of self.path
if not specified:
singer_sdk.streams.RESTStream
def get_url(self, context: Optional[dict]) -> str:
"""Get stream entity URL.
Developers override this method to perform dynamic URL generation.
Args:
context: Stream partition or context dictionary.
Returns:
A URL, optionally targeted to a specific partition or context.
"""
url = "".join([self.url_base, self.path or ""])
vals = copy.copy(dict(self.config))
vals.update(context or {})
for k, v in vals.items():
search_text = "".join(["{", k, "}"])
if search_text in url:
url = url.replace(search_text, self._url_encode(v))
return url
I assume you are writing unit tests (as opposed to integration tests) with pytest
, in which case are you sure you are mocking the API calls correctly? If you are writing integration tests, then I would expect the API a 401 Unauthorized
response given invalid credentials, but this might not be the case since the API seems pretty unconventional. 403 Forbidden
implies you are authorised but not permitted to access the resource, which could caused by a request made to an incorrect URL, as you say.
Maybe this is an issue with your client stream authenticator. What does your auth.py
look like?nicholas_van_kuren
03/18/2022, 2:55 PMpoetry run pytest
and the default test is failing on this. I searched and I see no other reference to a path attribute so maybe need to dig a bit further into this test. Looks like its using Singer's SDK for tests: singer_sdk.testingReuben (Matatika)
03/18/2022, 3:45 PMself.path
is not assigned if a Falsy value for path
is passed to RESTStream::__init__
. So you will need to supply a path
of ""
in your stream class after all, in order to circumvent this - sorry for the confusion!
Any thoughts on this @aaronsteers @edgar_ramirez_mondragon? Looks like a stream inheriting from RESTStream
has to be supply path
as as static/class property, or assign self.path
in an instance method before RESTStream::get_url
is called.
Onto the 403 Forbidden
issue: given that your SAMPLE_CONFIG
is correct, there must be an issue with your client authenticator class, if you are making a request to the same URL with the same credentials in Postman successfully.nicholas_van_kuren
03/18/2022, 3:50 PMnicholas_van_kuren
03/18/2022, 3:52 PMFAILED tests/test_core.py::test_standard_tap_tests - requests.exceptions.MissingSchema: Invalid URL '': No scheme supplied.
Reuben (Matatika)
03/18/2022, 3:57 PMAuthentication happens via a token passed as a config setting
In Postman, do you pass the token as a URL parameter or in the request body?
```requests.exceptions.MissingSchema: Invalid URL '': No scheme supplied.
```Do you have
url_base
defined on your client stream?nicholas_van_kuren
03/18/2022, 4:01 PMReuben (Matatika)
03/18/2022, 4:08 PMfull url_base specified
Do you have it prefixed with a scheme (i.e.
http://
or https://
)? A scheme is required in url_base
.
the token is included in the request body
Great - in that case you will need to supply
token
in the prepare_request_payload
method of your client stream!
def prepare_request_payload(
self, context: Optional[dict], next_page_token: Optional[Any]
) -> Optional[dict]:
return {
"token": self.config["token"],
}
nicholas_van_kuren
03/18/2022, 4:30 PMhttps://
When I add this as the base_url and leave path as an empty string The specific error is:
403 Client Error: Forbidden for path:
So it seems like I need a way to override or ignore the path setting.Reuben (Matatika)
03/18/2022, 4:37 PMpath
is an empty string, naturally it is not displayed in the error message. What’s happening there is you are getting 403 Forbidden
on <https://api.com>
(just the url_base
value). There must be some difference in how you are supplying the value of token
in your tap versus in Postman.
Can you share your client stream prepare_request_payload
method implementation?nicholas_van_kuren
03/18/2022, 4:57 PM# overrides RESTStream::prepare_request_payload
def prepare_request_payload(
self, context: Optional[dict], next_page_token: Optional[Any]
) -> Optional[dict]:
return {
"token": self.config["token"],
"content": self.config["content"],
"action": self.config["action"],
"format": self.config["format"],
}
nicholas_van_kuren
03/18/2022, 4:57 PMtest_core.py
in SAMPLE_CONFIGnicholas_van_kuren
03/18/2022, 5:01 PMReuben (Matatika)
03/18/2022, 5:33 PMRESTStream::prepare_request
, since the SDK does not provides the files
parameter to the underlying requests.Request
object, which seems to be required for multipart/form-data
POST requests - see here.
from typing import Any, Optional, cast
import requests
from singer_sdk.streams import RESTStream
class APIStream(RESTStream):
"""API stream class."""
# overrides RESTStream::prepare_request_payload
def prepare_request_payload(
self, context: Optional[dict], next_page_token: Optional[Any]
) -> Optional[dict]:
return {
"token": self.config["token"],
"content": self.config["content"],
"action": self.config["action"],
"format": self.config["format"],
}
# overrides RESTStream::prepare_request
def prepare_request(
self, context: Optional[dict], next_page_token: Optional[Any]
) -> requests.PreparedRequest:
request_data = self.prepare_request_payload(context, next_page_token)
request = cast(
requests.PreparedRequest,
self.requests_session.prepare_request(
requests.Request(
method="POST",
url=self.url_base,
files=request_data,
),
),
)
return request
You then do not need to define the url_base
and rest_method
properties in your client stream, since these are now supplied inside prepare_request
. No call to RESTStream::get_url
will happen either, so you shouldn’t see the AttributeError
for path
you were getting earlier.nicholas_van_kuren
03/18/2022, 6:55 PMprepare_request
updating files
variable name to data
. Really appreciate your help. If you're curious, I am working with an OpenSource research data capture system called RedCap and seems to not be so standard. Appreciate all of your help!Reuben (Matatika)
03/19/2022, 12:47 AMaaronsteers
03/22/2022, 8:38 PMLooks like you might have to overrideIf there's a more natural integration point we can add into the SDK, we are always open to improvements. But for this case, the overriding of, since the SDK does not provides theRESTStream::prepare_request
parameter to the underlyingfiles
object, which seems to be required forrequests.Request
POST requests - see here.multipart/form-data
prepare_request()
does seem like a great solution.