Hello, I’m not sure what’s happening with a tap I ...
# troubleshooting
m
Hello, I’m not sure what’s happening with a tap I worked on. An empty json record is returned for the meltano.yml configuration under
actions
- set
tickers: [ "AYO.F" ]
. I believe my code handles empty data by yielding an empty record with the required properties. When I run
meltano invoke tap-yfinance
the tap works properly and you can see
actions
ran fine (no errors and contains the required properties). However, when running
meltano el tap-yfinance target-jsonl --select actions
it fails with
simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
By entering debug mode (I’m using pycharm) I was able to pinpoint the below: breaks:
Copy code
def get_records(self, context: dict | None) -> Iterable[dict]:
        <http://logging.info|logging.info>(f"\n\n\n*** Running ticker {context['ticker']} *** \n\n\n")
        state = self.get_context_state(context)

        financial_tap = FinancialTap(schema=self.schema, ticker=context["ticker"], config=self.config, name=self.name)
        df = getattr(financial_tap, self.method_name)(ticker=context["ticker"]) # returns empty df

        yield {"timestamp": "2021-01-01 00:00:00"}
works fine:
Copy code
def get_records(self, context: dict | None) -> Iterable[dict]:
    <http://logging.info|logging.info>(f"\n\n\n*** Running ticker {context['ticker']} *** \n\n\n")
    state = self.get_context_state(context)

    financial_tap = FinancialTap(schema=self.schema, ticker=context["ticker"], config=self.config, name=self.name)
    # df = getattr(financial_tap, self.method_name)(ticker=context["ticker"])

    yield {"timestamp": "2021-01-01 00:00:00"}
In debug mode,
df = getattr(<>)
returns an empty df as expected, and going through the lines of code it yields
{"timestamp": "2021-01-01 00:00:00"}
as expected. I’m really stuck now, beacuse I’m not sure why commenting a line that doesn’t get used breaks only when calling
meltano el
tldr;
meltano el
breaks, but
meltano invoke
works. Error is
simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
but the data is a valid json.
meltano el
breaks from a line of code that doesn’t even get used
e
simplejson.scanner.JSONDecodeError
makes me think it's an error coming from a call to
simplejson.load
, and that is not done anywhere in the SDK afaict. So, it's probably coming from
target-jsonl
or upstream from it in the singer-python library (maybe here?), that's why
meltano invoke
works but
meltano el
does not. I'd inspect the output of
meltano invoke
line by line to see if there's something that's not valid JSON and is causing the target to crash.
@matt_elgazar did you figure it out?
m
Hey Edgar! I was writing a message but then when I retried it somehow I couldn’t reproduce the error 🤔 - I will try to reproduce within docker container and see if I get the same error. I thought it was jsonl as well, but I don’t think that’s the case. If it is the case, that’s also bad because then other loaders will also likely break. I dug into this a bit more over the weekend and I think it has something to do with the update of
yfinance
library. My hypothesis is that they integrated some sort of
logging
change where if the ticker returns an empty dataset it calls
logging.error
or something somewhere, but I haven’t validated that. I’ll have to dig into it further, because when I downgrade to yfinance version
0.38
then meltano works fine, but
0.40
breaks
meltano el
. I’m not sure why that is since there’s not necessarily an error return from the command per say.
e
My suspicion is that the library is printing something to stdout. Maybe that's what changed between releases. I'd still suggest looking at the tap output with the newer library version, maybe doing something like
meltano invoke tap-yfinance > inspect.me.jsonl
and searching the file for a line that's not valid json.
🙌 1
m
Will try that, and makes sense that they may have written something to
print
to stdout!
why does meltano fail when
print
is called?
e
The singer-io spec which Meltano's extractors and loaders use, relies on stdout to send data from source to destination. Each message in the output is expected to be a valid JSON line and a valid singer message.
m
ahh interesting, so that could be an issue. Take a look at this:
Copy code
import yfinance as yf

 yf.Ticker('XYZKALSFJKSDLF1231293098F').actions
XYZKALSFJKSDLF1231293098F: No timezone found, symbol may be delisted
Series([], dtype: object)
^ Of course that’s an invalid ticker. Now if we look at a ticker that used to be valid:
Copy code
yf.Ticker('AYO.F').actions
AYO.F: No price data found, symbol may be delisted (1d 1925-06-21 -> 2024-05-28)
Series([], dtype: object)
^ this output may be the issue because it’s an invalid json
by setting in meltano.yml (say under
actions
) you may be able to reproduce
Copy code
tickers: ["AYO.F"]
e
m
in the tap I’m testing i’m not calling
price_history_wide
so it doesnt use
pdr
It might be here though! https://github.com/ranaroussi/yfinance/blob/930b305327e2e3769b5d62115b3ab25bc58f28de/yfinance/utils.py#L58-L62
calling the
actions
tap calls
FinancialStream
(a base stream) which calls
ActionsStream
116 Views