Hey folks, what's the best way to run the same tap...
# troubleshooting
a
Hey folks, what's the best way to run the same tap multiple times with a slightly different config and output all the records to the same table? I'm developing a ticketmaster tap to pull concerts from different venues. I only want to pull data for certain venues which I'm specifying in my
meltano.yml
under the tap config:
Copy code
config:
      venues:
      - name: brooklyn_steel
        venue_id: KovZ917AC-V
      - name: music_hall_of_williamsburg
        venue_id: KovZpZA77eIA
      - name: terminal_5
        venue_id: KovZpZAFkn6A
      - name: webster_hall
        venue_id: KovZpa6WFe
      - name: irving_plaza
        venue_id: KovZpaFPje
      - name: mercury_lounge
        venue_id: KovZpZAJAkAA
      - name: gramercy_theatre
        venue_id: KovZpZAEAdaA
      - name: brooklyn_mirage
        venue_id: KovZ917AINX
      - name: sony_hall
        venue_id: KovZ917Ah5Q
      - name: elsewhere_brooklyn
        venue_id: KovZpZA6enaA
In
discover_streams
I'm trying to instantiate the stream multiple times:
Copy code
def discover_streams(self) -> list[streams.TicketmasterStream]:
        # ...
        return [streams.ConcertStream(self, name) for name in self.config["venues"]]
And because I want the data from each venue to end up in the same output table, I'm hardcoding the name in the stream class:
Copy code
class ConcertStream(TicketmasterStream):
    """Concert Stream."""

    def __init__(
        self,
        tap: TapBaseClass,
        venue: Dict[str, Any],
        name: Optional[str] = None,
        schema: Optional[Union[Dict[str, Any], Schema]] = None,
        path: Optional[str] = None,
    ) -> None:
        self.name = "concerts"
        self.venue_name = venue["name"]
        self.venue_id = venue["venue_id"]
        self.path = "/events.json"
        self.replication_key = None
But as a result, my stream only ends up running for the last venue in the tap config, in this case 'elsewhere_brooklyn'. What's the proper way to set this up so I can pull data for each specified venue and send the data to the same table output? I'd rather not have to
union all
the tables downstream and instead just have each venue's events sent to an 'events' table.
e
It might make sense for the tap to accept multiple venues, e.g. with a
venue_ids
config. Then initialize a partition for each venue ID. Similar to https://github.com/MeltanoLabs/tap-dbt/blob/e2aab53b1d2638c1fb02d364a39a8505c402b088/tap_dbt/streams.py#L44-L51
a
will this method allow me to pass a venue_id per partition to get_url_params? That's how I'm specifying the venue in the API call:
Copy code
def get_url_params(
        self,
        context: dict | None,
        next_page_token: Any | None,
    ) -> dict[str, Any]:
        # ...
        params: dict = {}
        if next_page_token:
            params["page"] = next_page_token
        params["size"] = 200
        params["venueId"] = self.venue_id
        return params
e
Yeah, you’d get the venue_id from the context object
a
That did it! Thank you!
e
Awesome!
a
So what's happening under the hood that caused Meltano to only run the stream for the last venue in my list? I suspect it's because they all had the same
name
e
I think it was the fact that in the tap you were instantiating multiple instances of your stream class with the same
name
. Internally, the SDK uses that attribute to fill a dictionary of streams so it was discarding all other values.
a
Hey! sorry to revive this thread! I was wondering what the best place to put the list of venues is? I could see it getting long and distracting in meltano.yml. Do y'all recommend I put them in a config file? If so, what's the best way to read that file as the tap's config in meltano?
e
One option is to split that extractor’s configuration from
meltano.yml
using include_paths and declare the plugin there