Hey Team! Question 2 :smile: I'm trying to use str...
# singer-tap-development
s
Hey Team! Question 2 😄 I'm trying to use stream inheritance to query archived deals (it's the same query with different params => archived=True). I was thinking I could create a stream inheriting from my main stream called ArchivedStream and then simply add a parameter to get_url_params where archived = True Are there best practices for doing this? From my undersanding of python, inheritance does no take instance attributes, only methods and class attributres. Should I be redefining the init function?
e
Inheritance is indeed one way to do it. If you want to reuse some url parameters, you can call `super()`:
Copy code
class BaseStream(RESTStream):
    def get_url_params(self, context, next_page_token):
        return {"base": "param"}

class ArchivedStream(BaseStream):
    def get_url_params(self, context, next_page_token):
        params = super().get_url_params(context, next_page_token)
        params["archived"] = True
        return params
I think @visch has come up with a way to accomplish this using partitions so you wouldn't need extra classes.
s
@edgar_ramirez_mondragon my issue is that I define my schema as a property in the stream. Would this be inherited in a child stream?
e
you mean as a decorated
@property
? then yeah, it will be inherited. It's just syntactic sugar for a method without arguments 🙂
v
Yes, the big question is do you need State to be managed separately for
archived=True
and
archived=False
? If the answer to that is No it makes this a lot easier as you can 100% just go with what @edgar_ramirez_mondragon point you to. If the answer is yes. Then look at https://gitlab.com/meltano/sdk/-/issues/273 https://github.com/AutoIDM/tap-clickup/pull/117/files#diff-4490b7dbe49a82fea4fdf63d08c6c503536d30f56841c428f2998bf03b4b94a2R124-R138 which allows you to use Partitions with Child Streams to create a
len(Partitions) * len(ChildStreams)
number of Partitions. There's a few rough edges with it but it works well.
You'll end up with a state that looks something like this
Copy code
{
  "bookmarks": {
    "task": {
      "partitions": [
        {
          "context": {
            "team_id": "18011725",
            "archived": "true"
          },
          "replication_key": "date_updated",
          "replication_key_value": "1635447757140"
        },
        {
          "context": {
            "team_id": "18011725",
            "archived": "false"
          },
          "replication_key": "date_updated",
          "replication_key_value": "1650312773788"
        },
        {
          "context": {
            "team_id": "18016848",
            "archived": "true"
          }
        },
        {
          "context": {
            "team_id": "18016848",
            "archived": "false"
          },
          "replication_key": "date_updated",
          "replication_key_value": "1636847634320"
        }
      ]
    }
  }
}
(filtered this down a bit to get the point across
s
Thanks @visch! But no, I don't need separate states, so @edgar_ramirez_mondragon’s solution should work by just fine 😉
If I were to use stream partitionning, how do you translate the additional field (in this case, archived) to the schema / end result?
v
That's not really about stream partitioning.
You want to change the schema when you're querying archived data?
s
Id like to indicate that it IS archived
v
If it's a part of the child context it's already in the record
Just add the key to your schema and you're good, you'll notice a warning in the logs saying there's data there but no schema matches it
s
Ok glorious, I shall attempt it and get back with results 😉
Sorry to bother again; where do you define in a child stream how to query the data? I think the concept of stream partitionning is slightly confusing to me. I'm trying to have 2 streams, one with the request url:
/bla/bla/?archived=true
and
/bla/bla/?archived=false
. How would I go about achieving this?
v
I think your mixing up abstractions
Copy code
params["archived"] = True
could turn into
params["archived"] = context["archived"]
maybe? Depends on your code
s
Ok so stream partitionning is almost like making a child stream out of your own stream, in which you can utilise context to manipulate different params in the same stream?
v
No, I think I'd just pass on partitions for you
You don't need state to be queried independently, if you do then you'll know you need it
I would have to think for a while to sum up what partitions actually are succinctly
s
Hahaha yeah I think you lost me there
v
Thought about this a little this morning. I think going with partitioning may be easiest here even if you don't need the state stuff. (You will need the code I linked to in order to use it with parent and child streams) Another approach is to take get_records and loop over archived true and false for your stream (add that to your context or something)
s
Yeah that was my thought as well. I'm just unsure how to operate with partitioning; I did a 5 minute try, but context wasen't passing so I gave up. Do you have a link to additional doc on what partitioning is / how to apply it? Here is the branch I will be trying to modify, if you're interested. I have 3 streams,
ArchivedDealsStream, ArchivedCompaniesStream and ArchivedContacts stream
which I want to delete and replace by partitioning: https://github.com/potloc/tap-hubspot/tree/DE-223-hubspot_v3-introduce-stream-partitionning-to-archived-data
I definitely want to learn how to apply partitioning; the solution I have right now makes me feel like I'm carrying technical debt
v
Use the code link I posted earlier. Otherwise going with Edgar's option of an archived stream works easy as well
s
Awesome thanks
v
And got it, you already did the archived stream stuff but you want to optimize
In that pr I linked it's a bit messy but the magic is in client.py