Hi All, Need some guidance as i am developing a c...
# singer-tap-development
t
Hi All, Need some guidance as i am developing a custom tap which uses get starting timestamp (self.get_starting_timestamp(context)). 1. Question: Does this get_starting_timestamp ensure that on my first run of the tap it uses the start_date value i set in the yml file and then automatically store when the last run was, so that it ensures the future runs use the last run timestamp- automatically ? 2. Issue: I am trying to get this function working in the custom tap i am developing. However, i am stuck with an issue where it is returning an empty dict. The endpoint API needs the filter parameter to be like - /api/lists/?filter=greater-than(updated,2022-05-23T000000Z) - so, i followed suit by updating the params with filter as key and adding self.get_starting_timestamp(context) (below screenshot shows how i did it) . and ensured i had replication_key as ‘updated’. In addition, i set the start_date with the above timestamp in the meltano.yml file as well, but it keeps returning an empty dict . Any thoughts on where i might have gone wrong? Any help is much appreciated, thank you in advance
e
Hi @trinath! I see the potential replication key
attributes.updated
is a nested field. How did you declare the replication key for your stream?
t
I declared it as replication_key=“updated”, is that incorrect or should i declare it differently?
e
Nested replication keys are not currently supported, so a workaround for your stream would be to denest the field by overriding post_process:
Copy code
def post_process(row, context):
  row["updated"] = row["attributes"].pop("updated")
  return row
and updating the schema:
Copy code
schema = PropertiesList(
  ...
  Property("updated", th.DateTimeType)
)
t
Thank you @edgar_ramirez_mondragon, but are you suggesting i make this change so that get_starting_timestamp picks up the value i set in yml? as i am a bit confused on why a row from response would effect get_starting_timestamp?
e
Not the row, but the SDK is trying to get the JSON schema type for the replication key to determine if it’s a date-time string, and in that case parse it from the
start_value
if there’s no state.
t
Got it, I incorporated the change, but the issue persists.. can you see if i skipped a step or implemented your change in a wrong way?
e
You have to declare
Property("updated", th.StringType)
in the schema at the same level as
id
t
Got it, i modified updated to be th.DateTimeType as it was throwing an error saying “The replication key updated is not of timestamp type”.. but with updated as DateTimeType- it threw below error? thoughts?
e
Oh,
post_process
is missing the self argument:
Copy code
def post_process(self, row, context):
t
Umm.. Now a new error .. But i guess thats progress.. it is saying JSONDecodeerror.. put some logs to see the breakpoints.. but not sure what it could be
This is the code in streams
e
get rid of those `print`s. They’re polluting the tap’s output 🙂
…or use
print(…, file=sys.stderr)
t
Will do, do you think thats the cause?
e
Yeah, the target is failing to parse those lines as json
t
excellent, good to know! thank you very much for your help @edgar_ramirez_mondragon - now its working , you all are amazing and find it fascinating to see how collaborative the community is.. Also can you advise on my question #1 - RE: Question: Does this get_starting_timestamp ensure that on my first run of the tap it uses the start_date value i set in the yml file and then automatically store when the last run was, so that it ensures the future runs use the last run timestamp- automatically ?
e
RE: Question: Does this get_starting_timestamp ensure that on my first run of the tap it uses the start_date value i set in the yml file and then automatically store when the last run was, so that it ensures the future runs use the last run timestamp- automatically ?
Yes 🙂