Hi all, I am new to meltano tap development, so i...
# singer-tap-development
t
Hi all, I am new to meltano tap development, so i apologize if this solution was provided in a documentation. I am working on a tap which pulls a list of values from one api and then i want to use each those values (one-by-one) to call another API so that i fetch it in a stream to load it in the database. I think the language in meltano SDK states its called an external API lookup(as mentioned below in screenshot), could you please confirm if the above functionality is called an external API lookup? if so, then is there a sample code i lookup to see how to “implement a custom Mapper plugin with inline lookup logic ” which was stated in the documentation or some reading where i can learn more on how to develop that functionality for my tap. If i a misunderstood the description- could you point me to the right resource where i can learn more about how to develop this functionality? Thanks a ton for all your help in advance.
s
Hey @trinath if someone else has something at hand that would be great because I don't think we have anything good available (@aaronsteers /@edgar_ramirez_mondragon?). If you're looking for a simple mapper, this one is quite simple: 1. https://github.com/MeltanoLabs/meltano-map-transform/blob/main/meltano_map_transform/mapper.py 2. You're basically looking to provide only the map_record_message and use the look up within there. If you want to add a column for the look up values, you will also need to change the schema message. Hope that helps!
a
Thanks for the ping, @Sven Balnojan. @trinath - The lookup functionality is explicitly out of scope for meltano-map-transform and the native mapping capability. That said, you could create your own mapper that performs lookups. What you'd probably want to do is fork
meltano-map-transform
and add a custom function to the mapper that performs the API lookup. Performance will also be much faster if you can retrieve and cache all lookup values in advance or at minimum cache those values that have already been seen; otherwise, you'll have to pay the round-trip lookup cost once per each record.
e
Also, if the first API where you source the values from doesn’t require, you could probably override partitions to generate a context dict for each value.
s
Btw. since we are currently brainstorming E&L best practices, I do have to say: 1. I would consider it a best practice to default to ingesting of mapping (look up) tables on their own and then join them together later on. (This allows to batch, and makes it much more fail proof) 2. If you do choose to have a look up while ingesting, you should do so by adding a column, again this is more fail proof (what if the look up fails? Simply fill up the column with a NULL value - if you do a replace or add your looked up values inside an existing column, this isn't possible as easy as this sounds) 3. Note: This is a fair request! I can think of a bunch of reasons to have a look up while ingesting - a lot of security & privacy issues would fall into this category - e.g. looking up whether a customers was marked "for deletion" by some GDPR related request.
t
Thank you so much, Sven, AJ & Edgar! Appreciate the guidance. I am learning through the documentation on how to modify the meltano-map-transform to perform the above suggestions. On your above points @Sven Balnojan ,RE: Point #1, Are you suggesting that I ingest the mapping lookup to the database or just ingest it in the cache, if cache or memory- i was looking at the git code you shared, but i am unclear on where to start- should i modify map_record_message function so that it can call the second api in my usecase?
s
Hey @trinath : 1. So for what you need to do, I put up a (very simple!) example on how to modify this mapper. I'm pulling in random beer names as look up and the word "foobar" but I think you get the idea 😉 https://github.com/sbalnojan/meltano-map-transform-to-beer (this is WIP, and might be a terrible implementation of this, I'm still getting feedback on it - will adopt and adapt the example.) 2. For the recommended look up best practice (mapping inside the database, not before), what i've done in the past is the following. We for instance imported currency conversion rates for specific dates: a. I connect to my database and check which values I have to get for my look up. (I have an orders table with orders coming in in EUR, GBP, USD,...) b. I then take this (very long) list and push it into my API, get the currency rates and store them in a separate table. c. I then build a final model in my database with revenue converted to my currency of choice (USD). d. Yep, that requires orchestration. The other approach is to simply get "all data from the look up". Sometimes that is feasible, sometimes it is not (in the currency case it is not - you have conversion rates for every day and every pair).
t
This has been super helpful, Thank you Sven! I piggy backed on this and learned more on the mapping lookup. However,i later realized that i could use parent-child stream functionality to pass variables and then make an api call for subsequent apis. Below is how i implemented it. Not sure if its the recommended approach, but seems to work fine for now. I havent tested it fully- like incremental loads etc.. so have to check
e
Parent-child streams is indeed the recommended way to do it 👍