Hey Team Question 2 smile I m trying to use stream inheritan Meltano #singer-tap-development

Hey Team! Question 2 :smile: I'm trying to use str...

Stéphane Burwash

05/17/2022, 1:12 PM

Hey Team! Question 2 😄 I'm trying to use stream inheritance to query archived deals (it's the same query with different params => archived=True). I was thinking I could create a stream inheriting from my main stream called ArchivedStream and then simply add a parameter to get_url_params where archived = True Are there best practices for doing this? From my undersanding of python, inheritance does no take instance attributes, only methods and class attributres. Should I be redefining the init function?

edgar_ramirez_mondragon

05/17/2022, 2:27 PM

Inheritance is indeed one way to do it. If you want to reuse some url parameters, you can call `super()`:

Copy code

class BaseStream(RESTStream):
    def get_url_params(self, context, next_page_token):
        return {"base": "param"}

class ArchivedStream(BaseStream):
    def get_url_params(self, context, next_page_token):
        params = super().get_url_params(context, next_page_token)
        params["archived"] = True
        return params

I think @visch has come up with a way to accomplish this using partitions so you wouldn't need extra classes.

Stéphane Burwash

05/17/2022, 2:28 PM

@edgar_ramirez_mondragon my issue is that I define my schema as a property in the stream. Would this be inherited in a child stream?

edgar_ramirez_mondragon

05/17/2022, 2:30 PM

you mean as a decorated

@property

? then yeah, it will be inherited. It's just syntactic sugar for a method without arguments 🙂

visch

05/17/2022, 2:31 PM

Yes, the big question is do you need State to be managed separately for

archived=True

and

archived=False

? If the answer to that is No it makes this a lot easier as you can 100% just go with what @edgar_ramirez_mondragon point you to. If the answer is yes. Then look at https://gitlab.com/meltano/sdk/-/issues/273 https://github.com/AutoIDM/tap-clickup/pull/117/files#diff-4490b7dbe49a82fea4fdf63d08c6c503536d30f56841c428f2998bf03b4b94a2R124-R138 which allows you to use Partitions with Child Streams to create a

len(Partitions) * len(ChildStreams)

number of Partitions. There's a few rough edges with it but it works well.

visch

05/17/2022, 2:33 PM

You'll end up with a state that looks something like this

Copy code

{
  "bookmarks": {
    "task": {
      "partitions": [
        {
          "context": {
            "team_id": "18011725",
            "archived": "true"
          },
          "replication_key": "date_updated",
          "replication_key_value": "1635447757140"
        },
        {
          "context": {
            "team_id": "18011725",
            "archived": "false"
          },
          "replication_key": "date_updated",
          "replication_key_value": "1650312773788"
        },
        {
          "context": {
            "team_id": "18016848",
            "archived": "true"
          }
        },
        {
          "context": {
            "team_id": "18016848",
            "archived": "false"
          },
          "replication_key": "date_updated",
          "replication_key_value": "1636847634320"
        }
      ]
    }
  }
}

(filtered this down a bit to get the point across

Stéphane Burwash

05/17/2022, 2:33 PM

Thanks @visch! But no, I don't need separate states, so @edgar_ramirez_mondragon’s solution should work by just fine 😉

Stéphane Burwash

05/17/2022, 8:20 PM

If I were to use stream partitionning, how do you translate the additional field (in this case, archived) to the schema / end result?

visch

05/17/2022, 8:20 PM

That's not really about stream partitioning.

visch

05/17/2022, 8:21 PM

You want to change the schema when you're querying archived data?

Stéphane Burwash

05/17/2022, 8:21 PM

Id like to indicate that it IS archived

visch

05/17/2022, 8:21 PM

If it's a part of the child context it's already in the record

visch

05/17/2022, 8:22 PM

Just add the key to your schema and you're good, you'll notice a warning in the logs saying there's data there but no schema matches it

Stéphane Burwash

05/17/2022, 8:22 PM

Ok glorious, I shall attempt it and get back with results 😉

Stéphane Burwash

05/17/2022, 8:42 PM

Sorry to bother again; where do you define in a child stream how to query the data? I think the concept of stream partitionning is slightly confusing to me. I'm trying to have 2 streams, one with the request url:

/bla/bla/?archived=true

and

/bla/bla/?archived=false

. How would I go about achieving this?

visch

05/17/2022, 8:43 PM

I think your mixing up abstractions

visch

05/17/2022, 8:43 PM

https://meltano.slack.com/archives/C01PKLU5D1R/p1652797647564099?thread_ts=1652793164.881499&cid=C01PKLU5D1R is how you do that

visch

05/17/2022, 8:44 PM

Copy code

params["archived"] = True

could turn into

params["archived"] = context["archived"]

maybe? Depends on your code

Stéphane Burwash

05/17/2022, 8:46 PM

Ok so stream partitionning is almost like making a child stream out of your own stream, in which you can utilise context to manipulate different params in the same stream?

visch

05/17/2022, 8:46 PM

No, I think I'd just pass on partitions for you

visch

05/17/2022, 8:47 PM

You don't need state to be queried independently, if you do then you'll know you need it

visch

05/17/2022, 8:48 PM

I would have to think for a while to sum up what partitions actually are succinctly

Stéphane Burwash

05/17/2022, 8:51 PM

Hahaha yeah I think you lost me there

visch

05/18/2022, 12:53 PM

Thought about this a little this morning. I think going with partitioning may be easiest here even if you don't need the state stuff. (You will need the code I linked to in order to use it with parent and child streams) Another approach is to take get_records and loop over archived true and false for your stream (add that to your context or something)

Stéphane Burwash

05/18/2022, 1:00 PM

Yeah that was my thought as well. I'm just unsure how to operate with partitioning; I did a 5 minute try, but context wasen't passing so I gave up. Do you have a link to additional doc on what partitioning is / how to apply it? Here is the branch I will be trying to modify, if you're interested. I have 3 streams,

ArchivedDealsStream, ArchivedCompaniesStream and ArchivedContacts stream

which I want to delete and replace by partitioning: https://github.com/potloc/tap-hubspot/tree/DE-223-hubspot_v3-introduce-stream-partitionning-to-archived-data

Stéphane Burwash

05/18/2022, 1:01 PM

I definitely want to learn how to apply partitioning; the solution I have right now makes me feel like I'm carrying technical debt

visch

05/18/2022, 1:02 PM

Use the code link I posted earlier. Otherwise going with Edgar's option of an archived stream works easy as well

Stéphane Burwash

05/18/2022, 1:02 PM

Awesome thanks

visch

05/18/2022, 1:03 PM

And got it, you already did the archived stream stuff but you want to optimize

visch

05/18/2022, 1:05 PM

In that pr I linked it's a bit messy but the magic is in client.py

2 Views

Open in Slack

Previous Next