Hello everyone, I am working on my own ETL pipeli...
# troubleshooting
s
Hello everyone, I am working on my own ETL pipeline, and developed a custom tap and target. When running the pipe though CLI without an orchestrator (as specified by Singer) it works as desired. However, I want to streamline the process and use Meltano for it in future, so I’ve set it up and tried to run my pipeline. When doing so, I get an error:
Copy code
jsonschema.exceptions.ValidationError: '-6667.0' is not of type 'null', 'number'
I debugged my tap code and figured out that it works as desired and prints the values as floats - as desired. When running my tap (without Meltano) into the target-csv I also get correct values and data types. When running it through Meltano at some point the datatype of the field gets converted into a string and the verification of the specified schema fails.
Copy code
"subt_nat_amount": {
            "type": ["null", "number"],
            "format": "singer.decimal"
}
I’ve experienced the same problem with the open source singer-runner. I’m not sure why and where this conversion is introduced and how to fix it. I guess it does not happen as a part of my tap code, as it is printed correctly to CLI and CSV when running manually. Thanks, Steven
v
I'd be curious if when you run
meltano invoke tap-yourtap > output
, does that give you the value in the correct form or not. I'd assume it wouldn't but it'd give a good place to start troubleshooting from if it's different from just running
tap-yourtap
directly
d
Also try running
meltano --log-level=debug elt <tap> <target>
(https://meltano.com/docs/command-line-interface.html#debugging), which will print the tap's output lines with a
<tap> (out)
prefix. Do those
RECORD
messages have the number as expected?
s
@visch, thanks a lot for your response and for helping me. I executed your command and checked on the values in the file. They are also strings rather than floats:
"subt_nat_amount": "10000.0"
Just to double-check, I invoked my tap again directly and it creates the desired result:
"subt_nat_amount": 49.99
Any idea how I could approach further debugging?
@douwe_maan, thanks a lot for your response and help as well. Running in debug mode produces also wrong entries through the out messages just before crashing with:
ERROR Loading failed (1):   '-6667.0'
DEBUG ELT could not be completed: Loader failed
message has been deleted
v
Ok so unless Meltano is messing with your output (which I doubt but it's of course technically possible) maybe you're running different versions of the tap/target in Meltano vs when you test running the taps / targets directly?
So the data from invoking the tap directly does show clearly that when you invoke the tap you are getting data you don't expect. Now it's just figuring out why that is. *Tap version could be it, you could isolate that by running the tap directly from the same venv that Meltano is using. That way you could bypass Meltano completely but still run with the same exact Tap if you think It's Meltano. If running the Tap directly from the same version Meltano is using results in different data then next is to make sure the config settings are the same that you're passing in via Meltano and running directly. These are just my guesses and how I'd flow through this, but maybe there's better ways. Personally I haven't had an issue with Meltano changing data on me
s
@visch, running the tap which was installed into Meltano was a very good idea. I ran the extractor stored and installed in .meltano/extractors/tap-quickbooks-reports/venv/bin/tap-quickbooks-reports directly, passing in the same config file I used to run my tap directly. And, indeed, I get a string printed rather than a float! So it seems like Meltano is not introducing the problem. However, I am using the very same git repository in the most recent version and correct branch, same Python versions in both venvs maintained through pyenv. This seems very weird to me.
v
Yeah that is weird but atleast we are closer
Git log -1 in each directory to see if you're on the same version?
s
I just did a fresh install of my tap and also there the values are printed as strings. Hence, definitely not a Meltano issue. Checked the git log -1 and I am on the same version in each directory.
Could it be a problem with any of the dependencies installed?
v
Maybe it could be a lot of things still but it's isolated which is good. From here it's really about what's different between the meltano venv tap run and your other tap run. Most likely cause is env variable differences, that would be my guess atleast. https://github.com/goodeggs/tap-quickbooks-report/blob/master/tests/data/test.config.json has an environment setting. Maybe production and sandbox data is different?
Invoking the tap in debug mode prints your env variables. meltano config --format=env <plugin> also does it. I wonder if you use those env variables directly in your working setup what happens
Dependencies is low on my guess list but of course it can always be anything :D . I constantly think about ways meltano could make finding problems easier but normally it turns out to be something that's my fault pretty clearly
In this case like a run diff tool would be interesting, but I don't know. The other thing I wonder is how are you sure all of you records are outputting incorrectly formatted data. Are you sure that it's not just a few records that are failing Those are my best guesses I can't hop on a session today but maybe Monday / Tuesday we could do A debug session. Meltano folks May have some other ideas too when they hop on
s
Really appreciate your effort on this man! To clarify, I did a fresh install of my tap without Meltano completely, just like my original version. So everything should be the same except the dependencies. I thought about doing a pip freeze and do another install (without Meltano) with this generated requirements file then. If I got it right then I guess it must be about the dependencies. If not, something else is different. Does this logic makes sense? I mean, I can’t imagine what else can be wrong if the code is the same and dependencies are the same. 🤷‍♂️ I’ll keep you posted and investigate further. 👍
v
pip freeze isn't a bad idea, if it's dependencies that should catch it! My guess is Config variables are different producing different result Another thing to check would be the actual data coming through which could be different due to caching strategies done by quick books
Data being different is a long shot, but as they say in IT "it's always DNS"
s
@visch, I just figured out what caused the issue: dependencies. 🤦‍♂️ I reinstalled my tap into another directory with the requirements file from the original installation, and suddenly, it works. 🥳 Therefore, I carefully specified the dependencies properly, reinstalled it into meltano, and there we go! Thanks so much for your help figuring this out!
v
weird! With venvs you'd hope they'd be isolated, was it a global dependency or something? Glad you figured it out I wonder how Meltano could help find that problem for you, or if it could give you some kind of warning
s
Yes, I always install stuff into separate venvs, and I also maintain the requirement entries in setup files. I just haven’t specified the version for some packages, and, apparently, something has changed somewhere…