Set notebook progressively#393
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
jupyter_ydoc/ynotebook.py
Outdated
| await lowlevel.checkpoint() | ||
| gen = self._set(value) | ||
| while True: | ||
| async with self._ydoc.transaction(): |
There was a problem hiding this comment.
The async context manager makes it possible to register async document observer callbacks, so that back-pressure can be applied e.g. to send updates over the wire.
tests/test_ynotebook.py
Outdated
| async def test_modify_single_cell(modifications, expected_events, doc_action): | ||
| do, is_async = doc_action | ||
| if is_async: | ||
| pytest.skip(reason="FIXME: check progressive updates") |
There was a problem hiding this comment.
It's not easy to test events, since they are now more granular.
There was a problem hiding this comment.
This is gone in f537967, and we update the notebook in a single transaction in this test.
| return val | ||
|
|
||
| async def aset(self, value: dict) -> None: | ||
| async def aset(self, value: dict, progressive: bool = False) -> None: |
There was a problem hiding this comment.
Since the progressive creation of the notebook is behind a flag, this PR keeps the previous behavior by default.
tests/test_ynotebook.py
Outdated
| nb.observe(record_changes) | ||
| await do(nb, "set", model) | ||
| if is_async: | ||
| await do(nb, "set", model, progressive=True) |
There was a problem hiding this comment.
Would it be better to parametrize this test with progressive True/False? I guess it's a bit confusing when async=False but we could skip (async=False, progressive=True) as not implemented.
| if progressive: | ||
| gen = self._set(value) | ||
| while True: | ||
| async with self._ydoc.transaction(): |
There was a problem hiding this comment.
I'm not sure about enforcing async transactions. They currently require global collaboration in the code base. On the other hand it's the only way to support back-pressure.
|
I don't know if we should set all cell inputs first, then cell outputs. That might complicate the logic in _set? |
Maybe we could contain the delayed output insertion to jupyter_ydoc/jupyter_ydoc/ynotebook.py Lines 157 to 193 in 3181bf2 I initially suggested Regardless of the way we implement it, I feel like this could be a separate PR. |
|
I also think this could be done in another PR, this one already allows us to experiment. I started doing that in Jupyverse and it seems to work fine. |
krassowski
left a comment
There was a problem hiding this comment.
I think this is useful, but not sure if final. I would recommend merging after flagging progressive as experimental.
I started doing that in Jupyverse and it seems to work fine.
Did you manage to get the cells to show up as websocket streams? So far I only can confirm that streaming works fine as in instead of one big blob over websocket we get many small blobs as a function of cell count (good). I wonder if we will need to also emit sync step messages in between updates so that frontend knows when to process a batch?
|
Should we also add |
|
I think the sync handshake should happen first (with an empty document), and then we start populating the document. This raises the question of how a frontend will react to an empty document. For instance, I don't think the frontend will like an empty notebook (not only no cell, but also no metadata), as it will think it is corrupted (it doesn't conform to nbformat). |
|
After some more attempts I had a reasonable success with this patch: diff --git a/packages/docprovider/src/yprovider.ts b/packages/docprovider/src/yprovider.ts
index bbbd10a..116a971 100644
--- a/packages/docprovider/src/yprovider.ts
+++ b/packages/docprovider/src/yprovider.ts
@@ -135,6 +135,7 @@ export class WebSocketProvider implements IDocumentProvider, IForkProvider {
}
);
+ this._yWebsocketProvider.on('status', this._onStatus);
this._yWebsocketProvider.on('sync', this._onSync);
this._yWebsocketProvider.on('connection-close', this._onConnectionClosed);
}
@@ -165,6 +166,7 @@ export class WebSocketProvider implements IDocumentProvider, IForkProvider {
private _disconnect(): void {
this._yWebsocketProvider?.off('connection-close', this._onConnectionClosed);
this._yWebsocketProvider?.off('sync', this._onSync);
+ this._yWebsocketProvider?.off('status', this._onStatus);
this._yWebsocketProvider?.destroy();
this._yWebsocketProvider = null;
}
@@ -208,6 +210,14 @@ export class WebSocketProvider implements IDocumentProvider, IForkProvider {
const state = this._sharedModel.ydoc.getMap('state');
state.set('document_id', this._yWebsocketProvider.roomname);
}
+ // TODO replace it with synced? Or use a new promise for connected
+ // and use that instead of `provider.ready` in ydrive?
+ this._ready.resolve();
+ }
+ };
+
+ private _onStatus = (status: {status: 'connecting' | 'connected' | 'disconnected'}) => {
+ if (status.status === 'connected') {
this._ready.resolve();
}
};
diff --git a/projects/jupyter-server-ydoc/jupyter_server_ydoc/rooms.py b/projects/jupyter-server-ydoc/jupyter_server_ydoc/rooms.py
index 9cc4543..13731da 100644
--- a/projects/jupyter-server-ydoc/jupyter_server_ydoc/rooms.py
+++ b/projects/jupyter-server-ydoc/jupyter_server_ydoc/rooms.py
@@ -160,10 +160,16 @@ class DocumentRoom(YRoom):
self._room_id,
self._file.path,
)
- await self._document.aset(model["content"])
+ awaitable = self._document.aset(model["content"], progressive=True) # TODO check if progressive supported
+ self.ready = True # as in ready to start streaming, not ready as all content is shown
+ self._emit(LogLevel.INFO, "initialize", "Room initialized")
+ self.create_task(awaitable)
+ self._document.dirty = False
+ return
if self.ystore:
await self.ystore.encode_state_as_update(self.ydoc)
self._document.dirty = False
self.ready = TrueI did not see any issue on the frontend. The only issue so far was an error in saving routine: This is the |
Yes. Unless we find a way to replay the transactions by following a similar
"no cell" is no longer a problem, "no metadata" is kind of a problem because it will show "Choose kernel dialog", but we could enforce sending metadata first and delaying "ready" signal (or whatever we will call it) until metadata was received. |
The problem is that it's taken care of in the handshake mechanism, so not something we directly control. I don't think we can just replay changes, because they depend on the differences between the peers. |
I will open an issue on yjs to see if that was explored/could be supported in the future.
Well, it would already be great for YStore disabled (no-op implementation) use case. |
I opened y-crdt/y-crdt#602.
I don't understand what you mean, I don't see how it can work except for the very first client. |
See jupyterlab/jupyter-collaboration#552.