Hi All, I am kind of new to meltano & trying ...
# plugins-general
d
Hi All, I am kind of new to meltano & trying to solve multiple usecases using the available extractor tap-rest-api-msdk. Need your knowledge to guide me if the below mentioned use cases are possible to solve directly using tap-rest-api-msdk. If not, do you recommend to write custom extractor for it? Or there is any existing tap that is already solving these use cases. Use case 1: Chunking of data We have an historical API to load historical data. User-input: start date, end date. Example: start date 2024-03-01 & end date 2024-03-19 Scenario: Even if user provides such date range, can we chunk the data internally? Can we configure to extract data chunking per day or weekly basis? Use case 2: relay race of token Scenario: Call the first api to get the token & use the same token to fetch data from the second api Use case 3: relay race of data Scenario: We want solve an SQL join condition between two api calls. We are receiving values in different fields from the first api call. Then we are forming the next api call's parameter using the returned values of the first api call. Example: API 1: returning a=1, b=2, c=3 API 2: GET https://xyz.com/service/path?a=1&b=2&c=3
s
Hi Debashis, Here are some thoughts an your points. 1. Chunking Data. The may come down to the capability of the API to query a range of data. Some API's provide the ability to provide data ranges or a specific date. If you wish to chunk the data to extract data chunking per day or weekly basis. I would look at a combination of an Orchestration tool to provide the right query pattern for the days you wish to extract and tap-rest-api-msdk. Natively, I don't believe a chunking feature exists. 2. Relay race of token. tap-rest-api-msdk has a single API url. Generally it was designed to be a top-level config item to query a Single API and work with multiple endpoints / resources available via the API. Credentials are cached from one stream to another. You could attempt to see if you can configure top-level parameters like the api-url inside the stream, but I suspect the what you are looking for will not work. Worth trying. 3. For relay race of data. This capability is not built into the tap, it would be hard to determine what results needs to be available for the next call. I see this as an orchestration problem. A tool like Airflow could have a specific dag to achieve this. a. Call API 1 : Retrieve the results b. Call API 2 : Use the results from Call 1 as parameters to Call 2.
v
Personally I lean towards making a custom tap as it's really hard for me to fit my use cases into a generalized tap. If the data mechanism is very simple and "standard" then I'd look at the generalized rest tap.
Use case 1, if the tap pulls all this data for you, you don't have to worry about "chunking"? I think I just don't understand what you mean when you say chunking. Generally a tap tracks the state for you and everytime you run the tap would pull everything from your last run to today, and then save that latest state (Ideally it wouldn't be 2024-03-19, it'd be an id ideally, or a timestamp that goes down to a few miliseconds or less Use case 2 - Tap can do this, but the real question is why do you think this is so important Use case 3 - If you need sepearate taps, that's fine, load them all into one DB / DW and then do your transformationst here have the data lands (joins, etc)
a
I'm with Derek on this one, if you are comfortable with python and extending an existing class, the SDK is a great tool. Can you explain what you mean by chunking? Do you plan to do different things with each chunk of data?
🙂 1
😅 1