Support passing of index URLs to piplite (CLI, runtime configuration, and requirements files)#169
Conversation
Here, we just assume one index per requirement file as this is the pip behaviour. Multiple indices will require different requirements files in subsequent commands.
|
cc: @bollwyvl I can write TypeScript to get by, but it is still very much Greek to me. Even though this is a draft, I would like to hear more from you about my approach and whether I'm going in the right direction with these changes when you have time. :) I think splitting the changes into multiple PRs might also be possible. Also, you mentioned "support parsing of said feature as a line of |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
|
A bit more testing reveals that this works well. Here are a few points about things I noticed, and some questions:
|
Co-Authored-By: Nicholas Bollweg <bollwyvl@users.noreply.github.com>
Co-Authored-By: Nicholas Bollweg <bollwyvl@users.noreply.github.com>
This comment was marked as resolved.
This comment was marked as resolved.
d9f112e to
3fc2381
Compare
|
I think this PR is ready to garner some more feedback. About the #166 (comment) mentioned adding it as a site-level configurable, but in this case, this acts more like I feel we should limit my scope with this PR, but I am happy to implement what sounds best either upstream or here. |
Yes. That's what site-level configurations should do. We're answering a lot of different use cases, from the wide open playground, hosted on a dumb public CDN to an exam setting with very locked down CORS and a hot API token and icky telemtry. This is why we also provide the ability to disable PyPI fallback altogether. While
Yes, for the time being, a site administrator would be on the hook to configure the Or if they intend to have just a few local packages: We use the
... From a |
|
I've been spending some time on this PR over the weekend, and I believe we are ready now – I'll mark it as such. I've made my best pass at addressing the comments from the conversations we had over a year ago. It was a bit of a mess reading up on the code needed to resolve the merge conflicts 😅, but I think they should be resolved now. |
|
|
||
| # Matches --index-url or -i directives (with optional = separator and optional quotes) | ||
| INDEX_URL_SPEC = ( | ||
| r'^(--index-url|-i)\s*=?\s*(?:"([^"]*)"|\047([^\047]*)\047|([^\s]*))\s*$' |
There was a problem hiding this comment.
similar to the above about uri-reference, but also, trying to write "real" URL matches in regexen is one of those things that will almost certainly break in some case.
There was a problem hiding this comment.
Makes sense. I can shlex it if that's easier, though maybe I should check what pip is doing the same as well.
We need to handle cases like --index-url URL, --index-url=URL, -i URL, -i=URL, but I don't know if I am missing any.
There was a problem hiding this comment.
sure, parsing the left side of the thing, which we control, is fine, but trying to parse the value containing :// is better handled by an actual URL parser... if even needed. pyodide even defines some of its own conventions like emfs: which would probably work with this, if done correctly, but might need deeper configuration: relative URLs, for example, might need to have the base URL appended in the upstream config-utils.js, as a worker may not get something accurate.
There was a problem hiding this comment.
Yes, if someone puts a relative path like -i ../my-index/simple in a requirements file is a different aspect. This belongs in config-utils.js upstream I suppose (it should recurse into nested objects such as pipliteInstallDefaultOptions to resolve index_urls), but we can bridge that in index.ts right now using the same new URL(value, baseUrl).href pattern I see we have for loadPyodideOptions. Edit: see f15c661
But yeah, requirements-file -i values are a known limitation since they're parsed in the worker without context of base URLs, so I am not sure what we can do about that...
There was a problem hiding this comment.
If we used jsStyleKeys, a config-based key ending in Url or Urls will "just work":
https://github.com/jupyterlite/jupyterlite/blob/v0.7.2/app/config-utils.js#L228
Not sure how this would work with runtime-provided -i values.
There was a problem hiding this comment.
If we used
jsStyleKeys, a config-based key ending inUrlorUrlswill "just work":
Thanks. We will need to go back to pipliteIndexUrls in that case, but then we should also move it out of pipliteInstallDefaultOptions and make it a top-level plugin config key?
Not sure how this would work with runtime-provided
-ivalues.
The answer is already in piplite.py. _install will use index_urls from the CLI if given, and otherwise falls back to _PIPLITE_DEFAULT_INSTALL_ARGS.get("index_urls") from pipliteIndexUrls/pipliteInstallDefaultOptions.index_urls/whatever.
There was a problem hiding this comment.
go back to
pipliteIndexUrls
Sure, that's one option. That config value is used in the wild (though sometimes the very wild, as in private repos referencing private values). Wherever it eventually lands, at least one release should have both.
The magic _ values are underscored because they are not intended to be public, but we all know how that goes.
index_urlsfrom the CLI if given
Right, but if they are relative, resolving them accurately might need, e.g. _JUPYTERLITE_REAL_BASE_URL or something: maybe they can "just" be emfs:// links. It's a little difficult to reason about these changes purely from the code perspective, and might require bringing a more exhaustive set of demonstrative, interactive tests over, such as the python-packages.ipynb notebook into this repo so these kind of changes are directly reviewable in a representative setting (e.g. RTD).
There was a problem hiding this comment.
go back to
pipliteIndexUrlsSure, that's one option. That config value is used in the wild (though sometimes the very wild, as in private repos referencing private values). Wherever it eventually lands, at least one release should have both.
The magic
_values are underscored because they are not intended to be public, but we all know how that goes.
I can make this change in a while – I'll make pipliteIndexUrls take precedence over pipliteInstallDefaultOptions.index_urls though and merge them in worker.ts. I didn't get what you meant by it being used in the wild though, since pipliteIndexUrls doesn't exist right now?
index_urlsfrom the CLI if givenRight, but if they are relative, resolving them accurately might need, e.g.
_JUPYTERLITE_REAL_BASE_URLor something: maybe they can "just" beemfs://links. It's a little difficult to reason about these changes purely from the code perspective, and might require bringing a more exhaustive set of demonstrative, interactive tests over, such as thepython-packages.ipynbnotebook into this repo so these kind of changes are directly reviewable in a representative setting (e.g. RTD).
Thanks for being thorough on this. I'd still lean toward saying that hopefully no one is (going to be) writing -i ./my-local-index in a requirements file inside a JupyterLite notebook. Could we punt that to a future PR, i.e., as a known limitation for now? I think the more realistic use case (for me and various others) would be for them to always be absolute (PyPI, Anaconda.org, any other CORS-enabled hosted index, and so on).
I can add that notebook to this PR, yes, though the changes right now are mostly testable on RTD and work well. Let me try out a few scenarios with various paths, relative links, and the like, and comment here.
| `); | ||
| } | ||
|
|
||
| const pythonConfig = [ |
There was a problem hiding this comment.
If this is a dict/object, it could probably be something like
const pyJson = JSON.stringify(
{ piplite_urls: pipliteUrls, disable_pypi: disablePyPIFallback, ...(pipliteInstallDefaultOptions || {}) }
);
pythonConfig = [
"import piplite.piplite, json",
`from_js = json.loads( """${pyJson}""")`,
`piplite.piplite._PIPLITE_DEFAULT_INSTALL_ARGS.update(from_js)`,
]Not sure about the none/ null case.
|
|
||
|
|
||
| #: a list of Warehouse-like API endpoints or derived multi-package all.json | ||
| _PIPLITE_URLS: list[str] = [] |
There was a problem hiding this comment.
Yep, dunno about just dropping these; might be a cas that needs to be deprecated, but supported through this major version. micropip API changed give us enough surprises without adding our own breaking changes.
There was a problem hiding this comment.
Sure, can restore both as aliases. _PIPLITE_URLS points to the same list object so mutations still work, but for _PIPLITE_DISABLE_PYPI we can only give a snapshot of the initial value.
We can deprecate them properly in a follow-up later!
Edit: see a3fff15
Co-Authored-By: Nicholas Bollweg <bollwyvl@users.noreply.github.com>
Co-Authored-By: Nicholas Bollweg <bollwyvl@users.noreply.github.com>
Co-Authored-By: Nicholas Bollweg <bollwyvl@users.noreply.github.com>
191bda6 to
f15c661
Compare
|
With the help of Claude and prompting it for a bunch of scenarios to test this PR, I was able to compile the following document and test out many of these cases: # Scenario 1 — Baseline: nightly numpy via pipliteIndexUrls WORKING
"pipliteIndexUrls": [
"https://pypi.anaconda.org/scientific-python-nightly-wheels/simple",
"https://pypi.org/simple" ]
%pip install numpy import importlib; import numpy; importlib.reload(numpy)
numpy.**version** Expected: a dev version like 2.3.0.dev0. Confirms the nightly index is
hit first.
# Scenario 2 — Index order matters: PyPI first WORKING
"pipliteIndexUrls": [ "https://pypi.org/simple",
"https://pypi.anaconda.org/scientific-python-nightly-wheels/simple" ]
%pip install numpy import numpy; numpy.**version** Expected: a stable version like
2.2.x. Confirms PyPI is tried first when it's listed first.
# Scenario 3 — pipliteIndexUrls takes precedence over legacy pipliteInstallDefaultOptions.index_urls
"pipliteIndexUrls": ["https://pypi.org/simple"], "pipliteInstallDefaultOptions": {
"index_urls": ["https://pypi.anaconda.org/scientific-python-nightly-wheels/simple"] }
%pip install scikit-learn import sklearn; sklearn.**version** Expected: stable
scikit-learn from PyPI (not nightly). Confirms pipliteIndexUrls wins.
# Scenario 4 — Legacy pipliteInstallDefaultOptions.index_urls still works alone
"pipliteInstallDefaultOptions": { "index_urls": [
"https://pypi.anaconda.org/scientific-python-nightly-wheels/simple",
"https://pypi.org/simple" ] } (No pipliteIndexUrls at all)
%pip install scikit-learn import sklearn; sklearn.**version** Expected: nightly version.
Confirms backwards compatibility.
# Scenario 5 — Cell-level --index-url overrides site config
"pipliteIndexUrls": ["https://pypi.org/simple"]
%pip install numpy --index-url
https://pypi.anaconda.org/scientific-python-nightly-wheels/simple import numpy;
numpy.**version** Expected: nightly numpy. Confirms cell-level --index-url beats
pipliteIndexUrls.
# Scenario 6 — Requirements file --index-url overrides site config
First, write a requirements file in a notebook cell:
%%writefile requirements.txt --index-url
https://pypi.anaconda.org/scientific-python-nightly-wheels/simple numpy scikit-learn
Config:
"pipliteIndexUrls": ["https://pypi.org/simple"]
%pip install -r requirements.txt import numpy; numpy.**version** Expected: nightly
numpy. Confirms -i in a requirements file overrides site config.
# Scenario 7 — Single nightly index, package missing there, PyPI fallback active
"pipliteIndexUrls":
["https://pypi.anaconda.org/scientific-python-nightly-wheels/simple"] (No
disablePyPIFallback, no PyPI in the list)
%pip install requests import requests; requests.**version** Expected: installs from PyPI
(micropip falls back since requests isn't in the nightly index). Confirms fallback works
when only one index is specified and the package isn't found there.
# Scenario 8 — disablePyPIFallback + single nightly index + package not in nightly
"pipliteIndexUrls":
["https://pypi.anaconda.org/scientific-python-nightly-wheels/simple"],
"disablePyPIFallback": true
%pip install requests Expected: error — something like PiplitePyPIDisabled: requests
could not be installed: PyPI fallback is disabled. This checks that disablePyPIFallback
blocks the escape to PyPI.
# Scenario 9 — No pipliteIndexUrls at all, default micropip behaviour
{} (Completely empty plugin config)
%pip install numpy import numpy; numpy.**version** Expected: stable numpy from PyPI.
Confirms the feature is fully opt-in and doesn't break the default case.
# Scenario 10 — import numpy bypasses piplite entirely
"pipliteIndexUrls":
["https://pypi.anaconda.org/scientific-python-nightly-wheels/simple"]
import numpy; numpy.**version** # NO %pip install first Expected: Pyodide's built-in
numpy version (from the lock file, e.g. 2.0.0), not nightly. This is the known "gotcha"
— import loads from the Pyodide lock file, %pip install is needed first to get the
nightly version.
# Scenario 11 — Relative pipliteIndexUrls resolved by config-utils.js (devtools verification, no real index needed)
"pipliteIndexUrls": ["./non-existent-simple/", "https://pypi.org/simple"]
%pip install requests
import requests; requests.__version__
Expected: requests installs fine (from PyPI, second in the list).
Key thing to verify in the browser devtools Network tab: the first outgoing request
is to an absolute URL like http://localhost:8000/non-existent-simple/requests/
(or wherever jupyter lite is served from) — NOT the literal string ./non-existent-simple/.
This 404s, then micropip falls through to pypi.org and succeeds.
This confirms config-utils.js resolved the ./ relative path to an absolute URL before
the extension ever saw it.
# Scenario 12 — Relative pipliteIndexUrls with a real local PEP 503 index
Setup: in the examples/ directory create a minimal static simple repository.
examples/
simple/
index.html ← root index listing all packages
cowsay/
index.html ← package index linking to the wheel
wheels/
cowsay-6.1-py3-none-any.whl ← download from PyPI
examples/simple/index.html:
<!DOCTYPE html><html><body><a href="cowsay/">cowsay</a></body></html>
examples/simple/cowsay/index.html:
<!DOCTYPE html><html><body>
<a href="../../wheels/cowsay-6.1-py3-none-any.whl">cowsay-6.1-py3-none-any.whl</a>
</body></html>
Then run: jupyter lite build && jupyter lite serve
Config:
"pipliteIndexUrls": ["./simple/"]
%pip install cowsay
import cowsay; cowsay.cow("it works")
Expected: cowsay installs from the local relative index (no network request to PyPI).
Verify in devtools: request goes to http://localhost:PORT/simple/cowsay/ (resolved).
# Scenario 13 — Relative -i in a requirements file (known limitation: NOT resolved)
Config:
"pipliteIndexUrls": ["https://pypi.org/simple"]
In a cell:
%%writefile requirements.txt
--index-url ./simple/
numpy
%pip install -r requirements.txt
Expected: fails — the ./simple/ path is NOT resolved by config-utils.js because this
parsing happens inside the Python worker, which has no base URL context. micropip
receives the literal string ./simple/ as an index URL and cannot fetch it.
This documents the known limitation: only pipliteIndexUrls in jupyter-lite.json
gets relative-URL resolution; -i flags in requirements files do not.
# Scenario 14 — Relative -i in requirements file, site-level absolute fallback does NOT kick in
This is a follow-on to Scenario 13. One might expect that if ./simple/ fails, pipliteIndexUrls
would take over as a fallback. It does not — once a requirements file sets --index-url,
that value is passed directly to micropip as index_urls, completely replacing the site-level
default (pipliteIndexUrls). There is no per-package fallback.
Config:
"pipliteIndexUrls": ["https://pypi.org/simple"]
%%writefile requirements.txt
--index-url ./simple/
numpy
%pip install -r requirements.txt
Expected: same failure as Scenario 13 — the site-level https://pypi.org/simple is NOT
used as a fallback. The only fix is to use an absolute URL in the requirements file.I'd like some advice on how we can test this PR in the CI here in an apt manner, based on these listed scenarios. We can indeed add https://github.com/jupyterlite/jupyterlite/blob/main/examples/pyodide/python-packages.ipynb from the JupyterLite repo, as I've been suggested in #169 (comment). However, it will also mean adding many test harnesses, and it will involve extra work to review. Also, I’m still a bit hesitant to immediately support local index URLs for installation in JupyterLite, especially since it seems like a less common scenario right now, given that my aim with this PR has been to support absolute URLs in the first go. Do we think our users would find it helpful? Thanks! :) |
Description
Users can now specify alternate package indices when installing packages through
piplitevia three interfaces: the CLI (-i/--index-url), requirements files, and JupyterLite-site-wide configuration injupyter-lite.json.Closes #166
Changes made
These are mostly along the lines of #166 (comment).
In the Python layer (
piplite)-i/--index-urlflag to the piplite CLI, forwarded as index_urls tomicropip.install--index-urldirectives are extracted from-r requirements.txtfiles and applied to all packages in that file, matching the behaviour ofpip(https://stackoverflow.com/a/2477610). However, the CLI flag takes priority if both are provided.
_PIPLITE_DEFAULT_INSTALL_ARGSdict (replacing the previous separate_PIPLITE_URLSand_PIPLITE_DISABLE_PYPImodule-level variables),which now also carries
index_urlsfrompipliteInstallDefaultOptionsIn the TypeScript/configuration layer
IPipliteInstallOptionsinterface for representation ofpipliteinstall optionskernel.v0.schema.jsonto acceptpipliteInstallDefaultOptions, withindex_urlsaccepting a single URL string or a listpipliteInstallDefaultOptionsfromjupyter-lite.json➡️PyodideKernel➡️initRemoteOptions➡️worker➡️ sets_PIPLITE_DEFAULT_INSTALL_ARGSkeys viarunPythonAsyncon kernel startThe intended behaviour in
piplitewill look like the following:%pip install numpy -i https://pypi.myindex.com/simplerequirements.txt
and then
%pip install -r requirements.txtprocesses the index at the top line.jupyter-lite.json(i.e., a site-wide default){ "jupyter-config-data": { "litePluginSettings": { "@jupyterlite/pyodide-kernel-extension:kernel": { "pipliteInstallDefaultOptions": { "index_urls": ["https://pypi.myindex.com/simple"] } } } } }Note that this still requires PyPI to be used effectively, since some packages the kernel relies on may not be present in
myindexand need to be installed from PyPI. So, in real-life situations, we're most likely to see bothmyindexand PyPI configured.