Hi All I have built a custom tap using the cookie ...
# getting-started
t
Hi All I have built a custom tap using the cookie cutter SDK. However, I have an issue I am hoping the community can help me with. One of the API calls I make is built using the ID column from another call. For example Employees API = http://companyAPI/Employees returns all the EmployeeID for the company EmployeeDetails API = http://companyAPI/Employees/{EmployeeID} returns all the Employee detail information for the EmployeeID I pass to it from the Employee API. The issue I am having is that some of the EmployeeDetails API call returns a 404 File not found error. This is because that URL doesn't exist because there are no EmployeeDetails . Doesn't return an empty json file. Below are the classes in the streams.py file
Copy code
class Employees(LiveStream):
    primary_keys = ["id"]
    rest_method = "Get"
    path = '/3/employees'
    name = "Employees"
    schema_filepath = SCHEMAS_DIR / "Employees_Schema.json"    
    
    def get_child_context(self, record: dict, context: dict) -> dict:
        """Return a context dictionary for child streams."""
        return {
            "Id": record["Id"],
        }
		
class EmployeeDetails(EasiPayLiveStream):
    primary_keys = ["id"]
    rest_method = "Get"
    path = '/3/EmployeeDetails/{Id}'
    name = "EmployeeDetails"
    parent_stream_type = Employees
    schema_filepath = SCHEMAS_DIR / "EmployeeDetails_Schema.json"
I found the validate_response(response: Response) → None in the documentation but I have not been able to work out how to add this to my EmployeeDetails class. Is it possible to add Countinue_On_Failure = True type option to the EmployeeDetails class? So my code would look something like below
Copy code
class EmployeeDetails(EasiPayLiveStream):
    primary_keys = ["id"]
    rest_method = "Get"
    path = '/3/EmployeeDetails/{Id}'
    name = "EmployeeDetails"
    parent_stream_type = Employees
    schema_filepath = SCHEMAS_DIR / "EmployeeDetails.json"     
	Countinue_On_Failure = True
Thanks you for any help REgards
r
You're halfway there! Yes, you need to override
validate_response
for the
EmployeeDetails
stream to not error on
404 Not Found
- something like:
Copy code
from http import HTTPStatus
from singer_sdk.exceptions import FatalAPIError

...

    def validate_response(self, response):
        try:
            super().validate_response(response)
        except FatalAPIError as e:
            if response.status_code != HTTPStatus.NOT_FOUND:
                raise e
You will also need to override
parse_response
to yield no records in the case of `404 Not Found`:
Copy code
def parse_response(self, response):
        yield from super().parse_response(response) if response.status_code != HTTPStatus.NOT_FOUND else ()
t
Hi Reuben thank you for your reply Just one question do I put this code in my Streams.py or Client.py file? Regards Tim
r
You need to add those methods to the
EmployeeDetails
stream in your
streams.py
.
t
Thank you I shall give it a try
r
One other thing: generally, I've found it's almost always a good idea to extend (as opposed to overwrite) the SDK default stream method implementations (i.e. the
super
calls in the overridden methods) if you are handling edge-case behaviour, in order to take advantage of the boilerplate and features the SDK provides. In this case, we still want to leverage the default
validate_response
and
parse_response
behaviour for all non-
404 Not Found
responses: https://github.com/meltano/sdk/blob/b4f9ac59742300b00e77571156d432ce97df3c05/singer_sdk/streams/rest.py#L149-L195 https://github.com/meltano/sdk/blob/b4f9ac59742300b00e77571156d432ce97df3c05/singer_sdk/streams/rest.py#L583-L595
t
Hi Reuben Thank you for your suggestions. I have managed to write the def validate_response code in the EmployeeDetails class in the streams.py so the pipeline doesn't fail on the 404 - file not found error. However, I have been struggling with trying to the "else" part of the statement for the def parse_response No matter what I write I get the error simplejosn.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0). I am not sure if this is because the 404 returns an empty response or the response in Content Type: test/html. I have tried 1. replacing the response._Content with a valid json for on attribute found in the schema for example {"Address1":"X"} 2. replacing the response.headers["Content-Type"]="application/x-json, charset=utf-8" 3. I have tried just return after the else Sadly nothing seems to be working are you or anyone else in the community able to help? Many thanks Time
Hi All I managed to solve the issue I was creating json using the json.dumps().encode(). The error mistake was not putting 'utf-8' in the encode. The below code worked and a valid response was created and sent to the parse_response without error x={} response._content = json.dumps(x).encode('utf-8')
r
You shouldn't have to set an empty JSON body like that if you are overriding
parse_response
correctly (unless I'm missing something).
Copy code
def parse_response(self, response):
         yield from super().parse_response(response) if response.status_code != HTTPStatus.NOT_FOUND else ()
Here, for all valid non-
404 Not Found
responses, the default implementation of
parse_response
is called - this extracts the records from the JSON response body at the JSONPath defined for the stream (this involves decoding JSON, which is probably where you saw that error come from). For
404 Not Found
responses, the default implementation of
parse_response
is not called, and the method will yield from an empty generator (
()
) instead (i.e. no JSON decode). This works due to the order of evaluation in conditional expressions. Feel free to ignore this if you are happy with your own change. 🙂