After a few weeks I executed (again) a pipeline lo...
# troubleshooting
j
After a few weeks I executed (again) a pipeline locally and it fails:
Copy code
(.venv) jacek@holly:~/work/src/panther-internal-analytics$ make extract_load 
cd "data_pipeline" && meltano --environment $ELT_ENVIRONMENT run tap-salesforce $MELTANO_TARGET
Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
join our friendly Slack community.

Emitter.__init__() got an unexpected keyword argument 'request_timeout'
make: *** [Makefile:25: extract_load] Error 1
I deleted virtualenv and .meltano and installed everything from scratch, it did not help. Anyone hit this issue?
w
@edgar_ramirez_mondragon I believe we saw this recently with
tap-jaffle-shop-template
@jan_soubusta Could you please run
pip freeze
, and provide us with the output
j
pip freeze result
w
You're using
snowplow-tracker==0.10.0
, which is surprising since the
Emitter
class for that version has the keyword argument `request_timeout`: https://github.com/snowplow/snowplow-python-tracker/blob/0.10.0/snowplow_tracker/emitters.py#L61
What version of Meltano are you using @jan_soubusta?
Have you installed anything else into the virtualenv with Meltano? You have about 30 more dependencies than I would expect from a clean installation.
j
Meltano 2.15.3 To simplify onboarding, devs install meltano, dbt and some other libs into the same virtual environment
Only locally, in the pipeline I build separate images
w
Perhaps we could get some useful information by running with
--log-level=debug
. Another thing to check is what the output of the following is:
Copy code
python -c 'import inspect, snowplow_tracker; print(snowplow_tracker.__version__); print(inspect.signature(snowplow_tracker.emitters.Emitter))'
u
devs install meltano, dbt and some other libs into the same virtual environment
If you change the installation order that might fix it, i.e. installing meltano after dbt and other libs. I suspect dbt’s snowplow tracker requirement is the offender.
w
I figured it might be something like that, but the
pip freeze
results suggest that a valid version of
snowplow-tracker
is installed, so now I'm not sure
j
There is the result of the execution with --log-level-debug
Looks like core/tracker.py is responsible, not sure what is the purpose of this tracker
ah, I see:
Copy code
"""Meltano tracker backed by Snowplow."""
Copy code
(.venv) jacek@holly:~/work/src/panther-internal-analytics/data_pipeline$ python -c 'import inspect, snowplow_tracker; print(snowplow_tracker.__version__); print(inspect.signature(snowplow_tracker.emitters.Emitter))'
0.0.2
(endpoint, protocol='http', port=None, method='get', buffer_size=None, on_success=None, on_failure=None, byte_limit=None)
Copy code
(.venv) jacek@holly:~/work/src/panther-internal-analytics/data_pipeline$ pip list | grep snowplow
minimal-snowplow-tracker   0.0.2
snowplow-tracker           0.10.0
So the minimal-snowplow-tracker is used in this case, if I understand it well
Copy code
pip install --upgrade snowplow-tracker
helps. It upgrades from 0.10.0 to 0.15.0. Will set the version in my requirements file.
Eh, now dbt does not work:
Copy code
TypeError: Emitter.__init__() got an unexpected keyword argument 'buffer_size'
I am so sad 😉
OK, now I am totally confused. I installed Meltano 2.19 and as a part of the installation, old snowplow-tracker==0.10.0 was installed. Now both meltano and dbt work properly.
w
We recommend installing Meltano into its own virtual environment to avoid dependency conflicts like this. Would pipx be an option for you @jan_soubusta? Anything installed with it will automatically get its own virtual environment.
j
If I understand the DOC well (and also an explanation from ChatGPT), I would still need to switch between such venvs using deactivate/activate commands, which would be sort of annoying....
w
That is not correct. When using
pipx
the packages installed become globally available. If I run
pipx install meltano
, then I can run
meltano
from within any venv (or outside of one) so long as there is not
PATH
conflict (e.g. from also installing
meltano
within a venv).
j
Let's say I need to install meltano and dbt and iterate quickly with commands
meltano run ...
and
dbt run
. If I run
pipx install meltano
and
pipx install dbt
, both (including conflicting dependencies) will be installed into the same global scope, right? So the problem persists, right?
u
@jan_soubusta No, by installing with pipx Meltano and dbt will each be installed in its own virtualenv (so no dependency conflicts) and their corresponding executable added to PATH, so you'll be able to call
meltano run
and
dbt run
outside any virtualenv (so no activation required)
j
Wow, that sounds really great! What a fool I have been! 😉
n
@edgar_ramirez_mondragon One question I’ve received from colleagues is that pipx can lead to conflicts in python versioning. Say, for example, pipx installed meltano is a pyenv with python 3.9 but I’m developing a meltano pipeline with python 3.10. What’s the best way to handle the potential pyenv conflicts?
u
@niall_keleher You can specify the Python version you want the venv to be created with the
--python
option. I actually use both pyenv and pipx and have a few Meltano versions installed side-by-side:
Copy code
$ pipx list | grep meltano@cp
   package meltano 2.19.0 (meltano@cp310), installed using Python 3.10.8
    - meltano@cp310
   package meltano 2.19.0 (meltano@cp37), installed using Python 3.7.16
    - meltano@cp37
   package meltano 2.19.0 (meltano@cp38), installed using Python 3.8.14
    - meltano@cp38
   package meltano 2.19.0 (meltano@cp39), installed using Python 3.9.16
    - meltano@cp39
n
perfect! thanks for the tip @edgar_ramirez_mondragon!
j
Getting back to this issue.
pipx
does not provide the option
-r requirements.txt
. I cannot install each package separately. I no longer can use the workaround with two venvs - I want to use Dagster orchestrating both dbt and meltano from the same (v)env. Thinking about how to solve it...
btw the issue with
snowplow-tracker
still exists, just it manifests differently 😉
Meanwhile I upgraded both Meltano (3.1) and dbt(1.5), it does not help
e
@jan_soubusta Can you log an issue in the Meltano issue tracker to avoid initializing the tracker when usage stats are disabled? That may help fix this issue for folks that have no option but to share a venv with dbt.
j
https://github.com/meltano/meltano/issues/8256 Is there a chance that it will be delivered any time soon?
Could I implement a solution myself? If anyone would point me to the correct place in the code, I can offer my time and Python knowledge 😉
e
j
Cool! It's 10:30pm here in Prague. Tomorrow we have company offsite. If anyone else does not prepare a PR till Friday, I try my best and create one.
j
I think I'm running into the same issue, Meltano started failing for me 5 days ago with this error:
ImportError: cannot import name 'SelfDescribing' from 'snowplow_tracker' (/Users/jacob/.pyenv/versions/3.10.11/envs/peach-meltano/lib/python3.10/site-packages/snowplow_tracker/__init__.py)
Reverting from
meltano==2.20.0
to
meltano==2.19
in my requirements.txt file "fixed" the issue
e
The issue is accepting PRs. A community member volunteered but I have not heard back so removed their assignment, if anyone's interested in picking it up 🙂
j
Eh, sad. I try to find a time for this and develop a solution.
Commented in the issue as well.
Setting up meltano/meltano based on CONTRIBUTING DOC. When installing pre-commit hooks, it fails with:
Copy code
npm ERR! Cannot read properties of undefined (reading 'isStream')
node v18.17.1 npm 9.6.7 Python 3.10.12
j
thanks @jan_deelstra!
e
@jan_soubusta Thanks for the interest in contributing! It's probably the
eslint
hook. FWIW I got npm 9.8.1, so that's probably it: https://github.com/npm/cli/issues/6665#issuecomment-1645728379
j
How to execute tests locally?
Can't find it in README/CONTRIBUTING/...
OK, google helped:
Copy code
poetry run pytest