Hello to Everyone! I’m developing a data warehous...
# singer-tap-development
r
Hello to Everyone! I’m developing a data warehouse for jira ->vertica db. This is my first ELT project ever. As i’m here on this slack one could assume I’m using meltano for that :). With tap-jira i can only set the start_date in the config file. I understand the regular daily loading...but how am i supposed to do the first extract say from 2018 until today? i've seen that some taps have also configurable end_date parameter which allows one to extract in controllable chunks, or alternatively some batch size control option. How would one add this configurable parameter to tap-jira, or is there some workaround for that problem? There exists some Vertica target https://github.com/full360/pipelinewise-target-vertica but it seems it's not operational anymore since it's not neither noted as being supported as existing pipelinewise target nor does it install on my local system as independent singer target! All the help is very appreciated guys!
p
You can probably add that target anyway by referencing the github repository
You just need to set pip_url to
git+<https://github.com/full360/pipelinewise-target-vertica.git>
r
thanks! i'll try it
p
Regarding the tap-jira, i think this one will try to get all data in batches until it catches up, so the initial run will take a lot of time. You could fork it and implement an end_date to avoid that. Next runs will take it from the latest data you synced.
r
how much work is it to implement an end_date?
can you point me to a place in tap-dev-docs where i could move on with the implementation? I'm not a very experienced developer yet 🙂
p
The tap is not using meltano's sdk, so this project's dev docs won't help you here, this tap uses singer-tools (https://github.com/singer-io/singer-tools)
a
I would recommend just setting the start date to the 2018 date you want and work on replicating the data. There is no benefit to be gained from the dev work of adding an “end date”. That’s why the taps don’t have it to begin with. Think of it this way. It is going to serially pull your data and periodically emit STATE messages. If you stop it at any point, assuming you are using meltano, it will just resume since meltano will save the state as long as it was emitted. It isn’t going to load many years of data into memory nor will it have to restart from the beginning if you cancel the load after 30 minutes for example.
p
Idk about Jira but some APIs rate limit you when you do so many requests
The autoretry is not always good enough
a
If the tap hasn’t already accounted for it, then letting the tap terminate is fine. The next run would pick up automatically with no added work.
The tap already accounts for rate limiting here. I would just focus on the vertica side of things, but of course do whats best for you! My only sentiment is I think an end date would be useless