justin_wong
02/15/2023, 6:37 PMjustin_wong
02/15/2023, 6:38 PMexportVersions
, which calls an API that returns a list of report version_id's
, i.e
["Forecast Report1", "Forecast Report2"]
The new stream will need to use the version_id's to request an exportData endpoint, something like:
for version_id in version_ids:
request.get(exportData_url, params={'version_id': version_id})
Given these two requirements:
1. Need to get version_id's from existing stream and pass them to new stream
2. New stream needs to call exportData endpoint 'n' times, where 'n' is the number of version_id's passed in. All other streams only call their respective endpoint 1 time.
What would be the best design pattern? Thank you in advance.
cc @jstarkaaronsteers
02/15/2023, 6:44 PMexportVersions
as its parent...aaronsteers
02/15/2023, 6:45 PMaaronsteers
02/15/2023, 6:46 PMcontext
- which it can then use to make additional calls to get those data elements which are specific to each parent item (to each version in this use case).aaronsteers
02/15/2023, 6:49 PMVersionsStream
as a parent, so you might be able to model after this one.
Does this help at all?justin_wong
02/15/2023, 6:58 PMjustin_wong
02/15/2023, 7:00 PMmeltano invoke
, I'm not sure which functions (in this case within client.py) are called when.aaronsteers
02/15/2023, 7:10 PMRegarding, the second requirement of making the request multiple times...Can you give a specific example for what requests you'd be making? The example endpoints for example, or the differences between those extra calls?
aaronsteers
02/15/2023, 7:12 PMOne challenge for me has been following the flow of what happens after you runInternally, this is managed by two layers: Meltano itself calls the tap's CLI (, I'm not sure which functions (in this case within client.py) are called when.meltano invoke
meltano invoke tap-mysql --help
approx. equals tap-mysql --help
). And the tap's CLI is mostly handled by the SDK - so you only have to write the unique handling logic without worrying about Singer Spec and CLI arg passing.justin_wong
02/15/2023, 7:28 PM# returned from Versions endpoint
version_id = ["Forecast Report1", "Forecast Report2"]
for version_id in version_ids:
response = request.get(some_new_endpoint, params={'version_id': version_id})
payload = response.json()
# if it's possible to upload each payload directly to snowflake
upload_to_snowflake(payload)
# else save each payload in memory for final upload via tap-snowflake
final_payload = payload + final_payload
justin_wong
02/15/2023, 7:31 PMversion_id
, I need to request the endpoint with each version_id
that is returned from the versionStream.
That's the only difference between the calls.edgar_ramirez_mondragon
02/15/2023, 7:49 PMversionStream
returns multiple version_id
values in a list?justin_wong
02/15/2023, 7:54 PMversion_id
returned by the versionStream
.
Every other stream only needs to call its respective endpoint once.edgar_ramirez_mondragon
02/15/2023, 9:39 PMStream.get_context_from_parent(parent_context: dict) -> Iterable[dict]
that by default just yields the parent context, and Stream._sync_children
iterates over the generated contexts
2. Supporting passing the context to RESTStream.get_new_paginator
. This would allow use cases where the dev needs to paginate over a set of fixed values.
cc @aaronsteers wdyt?justin_wong
02/15/2023, 10:18 PMaaronsteers
02/15/2023, 10:37 PMThe challenge is that the new stream needs to call its endpoint multiple times, once for every version_id returned by the versionStream.
Just to make sure I understand: VersionStream returns one record per version_id, correct? And you just need to make one additional call per version record of the parent stream... is that right?
aaronsteers
02/15/2023, 10:38 PMaaronsteers
02/15/2023, 10:42 PMaaronsteers
02/15/2023, 10:43 PMjustin_wong
02/15/2023, 10:54 PMJust to make sure I understand: VersionStream returns one record per version_id, correct?Correct. Here's a sample payload returned from VersionStream:
{
"version": [
{
"@id": "144",
"@name": "Forecast1"
},
{
"@id": "1683",
"@name": "Forecast2"
},
{
"@id": "1543",
"@name": "Forecast3"
},
{
"@id": "1563",
"@name": "Forecast4"
}
]
}
And you just need to make one additional call per version record of the parent stream... is that right?Correct- one additional call per version record, so 4 version_id's = 4 additional calls to the new stream. One caveat- that's probably insignificant- is that some version_id's would be dropped based on a regex pattern, so in reality, 4 version_id's would yield 2 additional calls let's say.
justin_wong
03/20/2023, 7:49 PMIt looks like the version stream is passing a context with a specific version "name" to it's child streams here: https://gitlab.com/gitlab-data/meltano_taps/-/blob/main/tap-adaptive/tap_adaptive/streams.py#L271You are correct, currently, the version stream is only passing a specific version to its child streams. You also said:
So, from the context of the child, there should be only a single version to fetch into about - if I understand correctlyNo. You're right- that is how the code was implemented, it passes a single version, BUT there are actually dozens of versions that need to be passed down to a child stream. And the child stream needs to call each version. My understanding of this as it stands is that Meltano does not support this type of workflow. It looks like this type of logic was implemented in a Zendesk Singer tap, but I'm not sure if I can implement something like this using the current Meltano abstraction.
aaronsteers
03/21/2023, 3:10 PMYou are correct, currently, the version stream is only passing a specific version to its child streams.If you have visibility to all versions, are you able to pass all the versions in the child context? And then the child stream can iterate through each.
justin_wong
03/21/2023, 4:10 PM