I hope everyone is enjoying their long weekend in ...
# singer-target-development
c
I hope everyone is enjoying their long weekend in the US. I am wondering if anyone ever had a situation where they needed to capture (and store) the response values from a HTTP REST API target? I.e. I want to talk to a REST API with Meltano to create a new record and update existing records in the destination system. The destination system assigns its own new primary key ID on create and provides it in the response to the POST request. I want to capture that primary key ID from the destination system and store it in my own internal "tracking" system to link it up with my own internal (original) copy of the record.. My internal tracking system is basically the source of truth where new records are born. But I need to keep a track of the primary key ID of the "synched" records from the external HTTP REST API destination. Anybody ever thought of doing something like this with Meltano. (The closest thing I could think of was Derek's AutoIDM solution)
Right now, my starting point is this
_after_process_record()
hook. https://github.com/meltano/sdk/blob/7962ebc9a9a0ff14e77e6aeb1b76d8e5cda2bc73/singer_sdk/sinks/core.py#L341-L347 I'd just need to figure out how much data will already be in the
context
and how much I would need to push onto the
context
in addition.
a
@christoph - We haven't defined an official method of adding artifacts like these in the sync operation. Taps can freely put anything they want to in
STATE
but there's no similar option for targets. I've opened a new discussion on this topic here: SaaS targets: strategy to maintain state and/or other artifacts created · Discussion #1229 · meltano/sdk (github.com)
cc @visch, @edgar_ramirez_mondragon 👆
c
Thanks AJ! Makes perfect sense to me. My current use case actually falls squarely into the
surrogate_key_lookup_table
bucket. And that's what I have put together for now using the Target SDK
Since my target is not a SQL target, I just use a redis list as the storage (since I already have Redis in my tech stack) and then I just have another pipeline that picks up all those JSON strings from the Redis list to put them back into the Datawarehouse staging area, so I can use those lookup tables in my models for future "synch" runs. It's not the most elegant solution, but that's what I was able to cobble together for now without much hassle.
v
My thought with this is that if you think about the target like a "mapper" in the sense that it's a tap and a target you kind of get all of this for "free" (Minus a guarantee that something is consuming the stdout data). Today I currently just log the parts of the json response I want if there's any. There are some very valid use cases for tracking things like What did the target change? For saas use cases this can be very nice right now I kind of fake it as I keep track of what was sent to the target, not what the target actually changes
I'm not doing it but that's the extent of my thoughts this far on it 🤷
But I need to keep a track of the primary key ID of the "synched" records from the external HTTP REST API destination.
I have needed this a few times, but luckily it's always with the same system. IE create an account in AzureAD and I need to add a manger to the account with that ID, and add that ID to groups in AzureAD. So it's pretty simple. For other integrations I tend to say that they should use tap-azuread (in this case) to pull the data themselves
My conclusion thus far is I don't need it to get the job done and it keeps things pretty simple. It would definitely help simplify some of my transformation logic if I followed a standard like "saas targets must output the record send to the target as a record in stdout".
Ok one new thought that seems interesting: We want saas targets to define a schema anyway for what the accept. The schema we define for the output from the saas target mapper would be the same schema (maybe?)
Maybe you even call them a different name than a target as we'd enforce the behavior? Anyways there's some ideas 😄