Anyone have advice on how to insert next_page_toke...
# getting-started
m
Anyone have advice on how to insert next_page_token into the record. In this case it is a date, and I would like to store that date somewhere in the record. Have already tried passing into post process and assigning to row['current_date']. I have also tried messing with request_records and assigning paginator.current_value to a new context attribute but that gives 'NoneType object does not support assignment' (not exactly sure what is in context anyway).
r
You were looking in the right place - generally,
post_process
is the best place to modify record data. From our previous conversation, am I correct in assuming you want the date from the URL as a record property? Unfortunately,
post_process
only provides access to the record
row
and
context
as you have seen, and modifying the stream context directly is considered bad practice. You're may have to do this in
parse_response
, where you have access to the response to pull out the date from the request URL. Of course, here you will have to update your stream schemas to pick up the new record property (e.g.
current_date
).
@edgar_ramirez_mondragon Is my summary of
context
correct here? As the docs suggest, I assume it is not a good idea to be arbitrarily setting properties in context (like you can in
click
, for example).
e
That's correct, changing the context impacts incremental replication. And you're right that
parse_response
is the right place to accomplish this.
m
Thank you guys that is very helpful, I wasn't looking at parse_response as a possible solution, I will try messing with it to get what I want.
Okay so I can self.logger.info the response.request.url but I'm having trouble actually getting it in to the record. I created a stream property for current_date but when I try to iterate through the csv.DictReader object and assign the url to current_date it just empties the existing record. Sending the parse_response and output of meltano invoke tap-caiso - it's all metric and no record. edit: I just saw that you already did it lol, thank you so much once again Reuben, one issue: it shows up in meltano invoke but not in the .jsonl stream files?
r
Did you add
current_date
to the
DemandStream
schema yet?
m
Yes
Wait a minute, will re-running metlano run tap-caiso target-jsonl overwrite existing output .jsonl or just load at the end of what was already there - just answered my own question, I was looking at old records thinking it would overwrite - it works as it should, thank you
r
Yeah, it just appends to the
.jsonl
file. I run
rm output/<stream>.jsonl; meltano run <tap> target-jsonl
a fair bit to get around that behaviour.
m
trying to use pandas to localize and convert datetime and getting modulenotfounderror: no module named pandas. It's definitely installed - pip install pandas requirement already satisfied and pip show pandas has no error
r
I wouldn't have thought you need
pandas
just to localize a date... I think it is a fairly big package also, so adding it as a dependency is probably going to increase plugin install time significantly. As far as adding a dependency to the tap, the easiest way to do this is
poetry add <package>
. This adds entries in
pyproject.toml
and
poetry.lock
, as well as installing the package to the Poetry-managed virtual environment for the project (probably not where your
pip install
was targeting).
m
Yeah I guess it was a little ambitious, I wanted to get the date as utc with the pacific offset included and thought that was the easiest way to do it - I may be severely overcomplicating this
r
I would have a read into the inbuilt
datetime
module in Python to see how you can leverage that in the way you want (no doubt it is possible) - from a quick search I see a library called
pytz
mentioned a fair amount, so maybe look at how that works with
datetime
for handling timezones. https://docs.python.org/3/library/datetime.html https://pypi.org/project/pytz/
Is it the
current_date
timestamp value in the record you are trying to apply the offset for?
m
Yes, trying to use the 'Time' field being returned by the request to make a full datetime object - converting to utc with the offset and storing in current_date
r
Copy code
>>> import datetime
>>> today = datetime.date.today()
>>> time = datetime.time.fromisoformat("15:36")
>>> str(today)
'2023-08-25'
>>> str(time)
'15:36:00'
>>> combined = datetime.datetime.combine(today, time)
>>> str(combined)
'2023-08-25 15:36:00'
Not sure about the offset bit though...
You can decide if you want to remove
Time
from the record in
post_process
(I'm assuming you're already in
post_process
for the conversion), since it will be unneeded given
current_date
having the translated time component.
Copy code
>>> str(combined.replace(tzinfo=datetime.timezone(datetime.timedelta(hours=-7))))
'2023-08-25 15:36:00-07:00'
🤔