Hi everyone, I'm working on updating superset to v...
# plugins-general
j
Hi everyone, I'm working on updating superset to v2.0.2 and for whatever reason meltano seems to be installing superset inside of python 3.10, which is not compatible with superset. My docker container is built on a 3.9 image but for some reason only superset seems to be trying to install inside 3.10?? Am I going crazy?
relevant error:
Copy code
Utility 'superset' could not be installed: failed to install plugin 'superset'.
  error: subprocess-exited-with-error
  
  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [944 lines of output]
      Ignoring numpy: markers 'python_version < "3.9"' don't match your environment
relevant superset config:
Copy code
utilities:
    - name: superset
      variant: apache
      pip_url: apache-superset>=2.0.0 markupsafe==2.0.1 Werkzeug==2.0.3 WTForms==2.3.0 duckdb-engine==0.6.4 cryptography==3.4.7
      config:
        ENABLE_PROXY_FIX: true
e
It seems that numpy (transitive dep of superset) requires Python <= 3.8, but you’re on 3.9
(you could try
meltano install --force
to ignore the python version marker, but I don’t know if that will then cause pip to try building the numpy binary 🤔)
j
ok super odd bc I get this error down below in the logs:
Copy code
setup.py:67: RuntimeWarning: NumPy 1.19.4 may not yet support Python 3.10.
c
Superset 2.0.2 runs perfectly fine on Python 3.10
NumPy 1.19.4
That sounds like an incorrect numpy version ...
e
Right, one of the other deps (e.g.
duckdb-engine
) might require that older numpy
j
ok that does make sense
let me check duckdb engine etc and see if there are conflicts
c
duckdb-engine is pinned to an unsupported SQLAlchemy as well ..
e
duckdb-engine is liberal with numpy itself (
*
) but some other dep is causing numpy to resolve to 1.19.5: https://github.com/Mause/duckdb_engine/blob/f4f277cdccd10f78e3a2621a47cfc8ae95ec881f/poetry.lock#L322-L328
j
ok updating duckdb engine to latest release - 6.6
i'm not quite sure how to page through these dependencies and figure out where numpy is coming from tho
c
Well if you can somehow manage to install a working installation, you could run
pipdeptree
to analyse the dependency chain https://github.com/tox-dev/pipdeptree
j
yeah it builds on my mac inside a docker image, just failing to build in CI, which uses a slightly different image
c
Without doing an installation,
pipgrip
can actually do the dependency analysis as well ... https://github.com/ddelange/pipgrip
Boom ...
Copy code
pipgrip --tree duckdb-engine
duckdb-engine (0.6.6)
├── duckdb>=0.4.0 (0.6.1)
│   └── numpy>=1.14 (1.24.1)
├── numpy (1.24.1)
└── sqlalchemy<2.0.0,>=1.3.19 (1.4.46)
    └── greenlet!=0.4.17 (2.0.1)
So. duckdb itself has a very low numpy version minimum
Not sure where the maximum constraint would be coming from in your case. Probably
cryptography
??
Hmmm .. nope.
Copy code
pipgrip --tree cryptography
cryptography (39.0.0)
└── cffi>=1.12 (1.15.1)
    └── pycparser (2.21)
j
i'm peeling back the pinned version and seeing if I can get CI to build with different errors
c
Hmmm .. I think the problem may actually be in the apache-superset 2.0.1 sdist itself.
Somehow the
pyarrow
constraints in Superset 2.0.1 are really old ... https://github.com/apache/superset/blob/507a7562e099707ab1103f4173d6c3b0ade2ec2d/setup.py#L105
I'm not quite sure why that is the case tbh
I'll try and find out why 2.0.1 has this old pyarrow constraint. I think most of the Preset folks are still away this week though. In general, the PIP sdist distribution on PyPI for apache-superset is notoriously neglected. As many of the maintainers seems to not install it from there ...
j
much appreicated!!
c
Can you confirm in your docker image where the old numpy version gets selected from with a
pipdeptree
output?
Regarding the old Pyarrow constraint ... basically it looks like this commit was never cherry-picked into 1.5 or 2.0 branches ... https://github.com/apache/superset/pull/21002
j
Can you confirm in your docker image where the old numpy version gets selected from with a
pipdeptree
output?
Yeah, just not sure how to do that inside of the github action template images. hasn't been an issue until today :/
so will research what is possible 😄
c
Yeah, just not sure how to do that inside of the github action template images.
Oh. Sorry, I thought you had a docker image you can launch locally and inspect via a shell ... I think I have a py3.9 interpreter still lying around on my laptop that I can use to do a superset 2.0.1 install and check for numpy myself.
j
would appreciate it, but not required.
c
But yeah. I think the old numpy is definitely getting pulled in via the old version of pyarrow
I've left a note for the Preset team in the release strategy Slack channel as well. https://apache-superset.slack.com/archives/C032Z7FSD9A/p1672790053379589
j
you rock!! thank you Christoph. Is the preferred method for deploying superset locally via the superset docker image?
c
Is the preferred method for deploying superset locally via the superset docker image?
Not for me, no. I'm not a docker guy
j
it broke very recently, last 1-2 weeks at most. was building fine before that.
c
And Docker also is not really a technology that the majority of users should have to deal with. And by 'users' I am referring to non-developer type people, like data scientists and just general engineers who happen to use scientific Python packages like numpy and scipy for their day-to-day work.
it broke very recently, last 1-2 weeks at most. was building fine before that.
That is odd. 2.0.1 was released on December 21st and 2.0.0 was released on July 15th ... but that particular pyarrow 5.0 constraint has been present in both, 2.0.0 and 2.0.1 .. so, something else, maybe specific to your environment would have changed.
j
this config in meltano worked on dec 16th (I know bc i tested it that day)
Copy code
- name: superset
      variant: apache
      pip_url: apache-superset>=1.5.0 markupsafe==2.0.1 Werkzeug==2.0.3 WTForms==2.3.0 duckdb-engine==0.6.4
      config:
        ENABLE_PROXY_FIX: true
unsure which version it would have installed - probably 1.5.2 or 2.0.0.
building locally in my dockercontainer now, so I can run
pipdeptree
once its stood up
c
building locally in my dockercontainer now, so I can run
pipdeptree
once its stood up
It should definitely be coming from pyarrow.
apache-superset>=1.5.0
That should not have worked in Python 3.10, only in Python 3.9
j
yes that is right.
c
So, the thing that changed is an upgrade from Python 3.9 to Python 3.10 in your environment.
Also, it looks like Superset is following quite a conservative release strategy for the Python support ... https://github.com/apache/superset/issues/22582#issuecomment-1370341706
Superset 2.0.2 runs perfectly fine on Python 3.10
For clarification. I am actually running from Superset
master
branch. I didn't realise that the 2.0 branch was already so old that it didn't have the Python 3.10 support in it ...
j
ahh you know this might actually be from a change to a meltano version... edit: it was not.
i'm in
fixed - adding some dependencies to the
pip_url
cleared up my issue.
Copy code
- name: superset
      variant: apache
      pip_url: apache-superset>=1.5.0 markupsafe==2.0.1 Werkzeug==2.0.3 WTForms==2.3.0 duckdb-engine==0.6.6 jinja2<3.1.0 cryptography==3.4.7
      config:
        ENABLE_PROXY_FIX: true
c
Yup. That's the other problem with the PyPI sdist distribution of apache-superset. There is no constraints file provided which would deal with these transitive dependency issues ...
So, it's always a bit of a song and dance to figure out which of the transitive dependencies has gotten upgraded and now is no longer compatible with Superset.
a
Just catching up on this thread. Does this need an update in the hub yaml definition?
c
@aaronsteers: In the end, this is what I surmise is the current situation: 1. Superset PyPI releases still don't run on anything higher than Python 3.9 2. The lack of a constraints file managed by the Superset team means that the user always has to figure out which transitive dependency updates are breaking Superset installs
The point 2. is addressed by Jacob via adding the specific version downgrades for
markupsafe
,
werkzeug
,
wtforms
and
jinja2
a
Okay, yes that makes a lot of sense. Thanks for this context. Maybe then we just add a section to the hub install docs mentioning the weird behaviors for 3.10, perhaps with a link to this thread or similar with troubleshooting assist?
cc @pat_nadolny, @Sven Balnojan
j
At what point is it better to use the docker image instead of installing "locally"
I know that it can be done with non-python plugins, and it seems like it might be better to do superset that way too if it makes it easier to get started
c
At what point is it better to use the docker image instead of installing "locally"
That's WHOLE different topic. The Superset team's release strategy for their official docker images is currently 180 degrees opposite of the PyPI releases .....
The superset official docker image currently always pulls a nightly build from master if you keep the default
latest
tag.
<- (Not a docker guy) I don't know how easy it would be to use a different tag
j
ok reading that thread. perhaps using pypl is not so bad after all
c
Well, I personally run a private fork of Superset that lags master by no more than a month. And I build my own wheels that get installed into a virtualenv completely outside the purview of meltano.
So, I take the best of both worlds: latest features + ease of deployment (barring the constraints file issue)
That of course only works for me because I am following the development of Superset as close as you can possibly do
I think for the general Meltano userbase, the PyPI releases might sill be the best option for now, with the caveats outlined by me above in response to AJ. (Not compatible with 3.10 or 3.11 and transitive dependencies will keep breaking your installs as those come out with new versions)
j
yeah, i think recommending to pin with a list of dependencies its probably the way to go.