Hello there! I'm implementing a few taps and seem ...
# singer-tap-development
i
Hello there! I'm implementing a few taps and seem to be stuck at using bookmarks. I have this: https://github.com/ilkkapeltola/tap-userflow and this https://github.com/ilkkapeltola/tap-sirene And both have the same issue: when I re-start the meltano job with the correct job_id, the job starts from the beginning, ignoring the stored state. Both of these, I started with the cookiecutter REST API template, Bearer token authorization, and haven't done anything for bookmarks. What am I missing? EDIT1: Well one thing I was missing was that while the API allows to sort results, it doesn't allow to use the date as a filter ("get me records after this timestamp"). Instead, I can only use the uuid that identifies a record as a "start after". So, I guess the replication_key needs to be this uuid, I can still say it's sorted, but the replication_key isn't a timestamp. Should this work out-of-the-box? No, how could it. I still need a way to tell the API in the beginning where to start.
So to summarize: • I can sort the results chronologially by created_at • I cannot use the created_at as a filter, e.g. start_date • Each record comes with a uuid • the API accepts the ID as a 'bookmark': "start after this uuid" Can this kind of api be implemented to use bookmarks?
Where should I implement a custom state thing? What's the method I need to override?
e
Where should I implement a custom state thing?
What's the behavior you're trying to modify in the tap? If it's just "filter out records older that X timestamp", we have an issue to implement it natively in the SDK, and also some suggestions in the original gitlab issue: https://gitlab.com/meltano/sdk/-/issues/227#note_926360644
i
Since the API only accepts the
id
field for the
starting_after
like so:
?starting_after=cc58bd14-3457-480a-a44d-2cea833fac24
and not like
?starting_after=2022-05-01T00:00:00
then, if I specify the
id
as the
replication_key
, the "highest" value
id
will be tracked, which often is something like
ff31e109-f873-41b6-9f1c-2d88b5bf6a2a
notice the leading
ff...
Instead of that, since I can order the records based on
created_at
, what I want to do is take the
id
of the last record I got and store that into the state, instead of what would be "highest".
e
I see, I've encountered that before: https://github.com/edgarrmondragon/tap-bitso/issues/6. It's not currently possible to override a method in order to change that behavior but the logic lives in https://github.com/meltano/sdk/blob/2166416c116528924ca0599e59b3db77d3be478e/singer_sdk/helpers/_state.py#L220-L223. Do log an issue, and if you feel like contributing we'd take a PR to support your use case.
a
@ilkka_peltola - Thanks for reaching out. Yes, the
starting_after
being non-sequential key is an interesting problem which I don't think the SDK has documentation or handling around. I'm sure there is a way to do it by disabling the "if greater than" check and always taking the newest key, but a logged issue would be super helpful here to find and prove some reusable patterns for others facing a similar challenge. As @edgar_ramirez_mondragon suggests, do you mind logging something here in the SDK issue tracker?
This logged issue is also related: https://github.com/meltano/sdk/issues/226 I think many have implemented a 'soft' filtering for incremental streams when an API-based filtering is not available, but we'd love to have formal support for this use case.
i
https://github.com/meltano/sdk/issues/729 https://github.com/meltano/sdk/pull/730 Hope that helps and was done right!I'm not super experienced with contributions to open source projects.