Running code in an isolate group rather than isolate #4379

mraleph · 2025-05-16T12:58:35Z

During the recent meeting we have discussed progress of shared native memory multithreading prototype. One of the topics of contention was behavior of the Dart code which executes within an isolate group but outside any isolate.

As currently described in the proposal and implemented in the prototype an attempt to access static state not marked as shared by the code which is running outside of the isolate throws an exception. For example:

int i = 0;

void foo() { 
  i++; // (*)
}

void main() {
  Isolate.runShared(() {
    foo(); // Throws AccessError at (*)
  });
}

Language team raised a concern about this behavior - we will use this issue to discuss arguments for and against this behavior.

This behavior was chosen in the original proposal because in our (implementors opinion) it makes behavior of callbacks entering Dart from native on an arbitrary thread cleaner. The only other viable option is to create a temporary isolate which would exist for the duration of the execution and is destroyed after the call.

Creating temporary isolate is fast but not free: baseline cost is measured in tenths of microseconds, plus additional cost of initializing statics which are accessed by the callback itself. Such initialization will also generate garbage as isolate is destroyed making this approach unsuitable for any sort of high-frequency callbacks.
State silently resetting between callback invocations can lead to hard to understand bugs - as opposed to developer immediately hitting a runtime error.

If users want to use code that requires isolated static state - they can always manage that explicitly, e.g. something along the line of

void callback() {  // invoked inside a group, but not specific isolate
  final isolate = Isolate.create();  // Can even cache this if needed.
  try {
    isolate.run(() {
      // use static isolate state.
    });
  } finally {
    isolate.destroy();
  }
}

Essentially we are giving developers a way to write code which they currently need to write in C/C++ using embedding API.

cc @lrhn @leafpetersen

mraleph · 2025-05-16T12:58:48Z

cc @aam @mkustermann

lrhn · 2025-05-16T14:51:49Z

My main issue with the "code crashes if you touch non-shared static variables" is that it's an invisible and uncontrollable limit on what you can do inside such a not-real-isolate.

Unless the code you run is completely authored by yourself, which it never is because you will use platform libraries, there is no way to predict whether code will crash.
You can run it and see, and hope that it doesn't only update its static cache on Thursdays.

If you choose to rely on code you haven't written, your code can break in the very next update, if that third-party code decides to access a non-shared static variable. Accessing static variables is not considered a breaking change, so nothing will prevent others from doing it. And accessing a new static variable must not be considered a breaking change. That's far too onerous a restriction. (I'll personally be happy to close any issue on core packages for not accessing static variables with a "No thanks".)

That is: The vast majority of code, and all code written so far, has no incentive to make itself compatible with shared isolates.
It's tricky, it may not be possible while retaining functionality, and it may be costly (I assume a shared variable will be protected by locks.)

The code you can use in a shared isolate will be very limited. That doesn't worry my as much as it being fragile, if you don't even know that the code you have tested will keep working. You might be incentivized to pin yourself to specific version numbers of things you depend on, because those are known to work, and a patch-level increment of a package might stop working.

aam · 2025-05-16T15:39:08Z

How isolategroup-shared callbacks restriction on non-shared static field access is different from, for example, restriction on what can be sent in a message to an isolate? Some class from some third-party package didn't use to have some non-sendable state in it, now it does, so your code that used to send an instance of that class will now throw and have to be changed. That seems to be acceptable.

Basically, isolategroup-shared callbacks have limitations, but they are very useful for integration with native libraries(remove a need to write/build native code). If errors are clear and actionable(send message to an isolate that would run the code that accesses non-shared static field), why rob users of this ability?

leafpetersen · 2025-05-16T22:42:08Z

@mraleph I think I could get comfortable with this if it was made more explicit that this is not intended to be a way to run arbitrary Dart code. So something like:

Change the name from Isolate.runShared to something like Isolate.unsafeRunWithoutState with the emphasis on having the word unsafe in it (much like Haskell's unsafePerformIO).
Explicitly call out in the documentation that it is not recommended to run any code that you do not control within that isolate (including core library code)
Explicitly call out in the documentation that any code that you run which you do not control within that isolate might suddenly stop working at any point, and that this is explicitly not part of the breaking change contract of any code (including the core libraries)
Explicitly include in our breaking change documentation that changes to the core libraries which cause code running in unsafe shared isolates to break are outside of the scope of our breaking change policy.

@sigmundch This feels connected to things you've been exploring around limiting what code dynamic modules can access.

mkustermann · 2025-05-19T09:21:25Z

Here's some of my thoughts:

We want to use shared memory multithreading support for callbacks from C, especially for synchronous callbacks where C calls Dart and expects a synchronous return value. This may require the Dart code to do arbitrary work - sometimes it may do very little work, sometimes it may do much more work.

I'd argue that those callbacks should be able to use RegExp, Uri.parse(), protobuf decoding, print(), ... - all of which would throw-at-access atm because they rely on global fields. So we need to to have a solution for this.

If we created a fresh static field state for each callback invocation, then

each invocation of the callback may require O(<size-of-program>) memory state to be allocated & default-initialized - plus the time it takes to initialize all these global fields that are actually accessed
the print() function (and some other things) would still not work, because those are initialized by the embedder on normal isolate creation, so we'd need to add even more overhead in terms of allocation & initialization

For high-frequency C->Dart calls this is just way too much overhead. e.g. Imagine C calls Dart and passes a simple protobuf message describing some state and Dart should return a boolean whether proceed or not. Now the overhead we'd be adding is enormous: We'd maybe allocate xx KB of memory for static field state & default initialize it, lazy-initialize various static protobuf metadata fields on-first-access and finally decode a few bytes of proto message and return a boolean.

=> IMHO We should have a solution where the callbacks perform only the work they actually need to perform - no O(<program-size>) extra memory overhead and no O(<program initialization>) time.

So to make this more performant I'm convinced that we want to make any global fields (corelib & user-defined static fields) that are on the hot path to be initialized only once and ready-to-use for all callback invocations. We have the new shared fields for that. Whatever those fields on the hot path contain has to be sharable (and we should have a discussion whether that can be mutable dart objects as well - possibly via opting in individual classes, ...)

Once we have this "run a dart closure with zero extra overhead" concurrently with normal dart isolate, we have a powerful mechanism. We're going to use that mechanism then also to parallelize programs: Create N threads, do some work in parallel and join those N threads (similar to using lightweight isolates today -- but now avoiding the extra O(<program-size>) memory and O(<program initialization>) time as well as more sharing of state)

Coming back to the original issue: I can see the point that users will have a hard time to predict what packages, libraries or core library features they can use in C callbacks - or whether they may get exceptions -- and how they may silently get broken with a pub upgrade. We cannot avoid this problem entirely as code that triggers microtasks, timers, anything async will fail - as there's no event loop running after the callback returns to C.

What I could imagine is that we don't throw on normal global field access but

make the cost O(<number of fields accessed>) memory as opposed to O(<number of fields accessed>) memory
have 0 eager work done by embedder (e.g. installing print closure)

Currently the VM represents the static fields of a program as a large array indexed by a field id - this is very fast to access. An isolate creation involves a copy of that large array with default values & sentinels. Though one could make shared isolates use a different representation that can grow as the number of fields accessed grows (e.g. hashmap). That would make us only pay for what's actually used (and a little code size cost and a little access time cost for normal isolates accessing normal static fields - a "am I shared isolate" bit test and branch)

All the state we currently initialize by embedder on isolate creation (e.g. print closure, etc) will be migrated to use shared fields.

That would make us end up in a place where the overhead of invoking a callback is purely based on what that callback needs (no extra embedder setup, no setup for fields that aren't used, ...) while still not throwing when accessing normal static fields. When profiling, one will observe slowness due to using non-shared field initializers and start migrating them to be shared fields.

lrhn · 2025-05-19T15:16:52Z

So to make this more performant I'm convinced that we want to make any global fields (corelib & user-defined static fields) that are on the hot path to be initialized only once and ready-to-use for all callback invocations. We have the new shared fields for that. Whatever those fields on the hot path contain has to be sharable (and we should have a discussion whether that can be mutable dart objects as well - possibly via opting in individual classes, ...)

The problem is then that some static variables are mutable and not shareable.

What if we had a special "initialize-once" shared variable, which does have per-isolate state, but it is guaranteed to be able to share the initialization value, so if that requires complicated computation, it'll still only happen once. Then the resulting value is cached and the next initialization is just "read value from isolate-shared cache".

If we used a SharedVariable class to represent shared variables, and maybe an IsolateVariable to represent per-isolate state that is accessible in a shared isolate, then that could be:

  const _uriParserTable = IsolateVariable.late(_createParserTables);

which would invoke that function the first time it's read, store the value in a real shared variable (or a global authority array of initialization values), then initialize the isolate variable with that value.

You will need O(#sharedVariables) shared space, O(#isolateVariables) per-isolate state initialized, which can be copied from a global authority array for each shared isolate creation, or initialize to null and copy on first use, but you won't be able to access arbitrary static variables that are not declared as IsolateVariable.
Unless everybody makes their static variables into IsolateVariables, it's less of an issue than accessing all static variables.

as code that triggers microtasks, timers, anything async will fail - as there's no event loop running after the callback returns to C.

And it won't even get that far, since it tries to read the Zone._current static variable before doing anything asynchronous. That variable cannot be shared, it must not be mutated concurrently, and must have the correct value for each concurrent thread. It is per-exection thread, which currently means per-isolate.

(And for the record: Uri.parse has been fixed, it now uses a const String table.)

mraleph · 2025-05-20T12:52:40Z

@leafpetersen

@mraleph I think I could get comfortable with this if it was made more explicit that this is not intended to be a way to run arbitrary Dart code. So something like:

What if there is no Isolate.runShared only NativeCallable.shared (or some other name, e.g. isolateGroupLocal or NativeCallable.withoutIsolate or similar) which creates a native callback which is invoked in an isolate group rather than an isolate? Of course we would then document all the restrictions on what it means to run without isolate state.

I would really like to avoid unsafe as a prefix. It does not communicate what is actually unsafe about the function - so it does not add any value except looking dangerous.

We already have NativeCallable.isolateLocal which just crashes the process when invoked on a wrong thread - but we did not prefix that with unsafe. Similarly SendPort.send(o) throws an exception if o transitively references something unsendable but it is not called SendPort.unsafeSend(o).

leafpetersen · 2025-05-20T18:18:52Z

It does not communicate what is actually unsafe about the function - so it does not add any value except looking dangerous.

Looking dangerous is the value. Seriously. runShared communicates nothing about the function either (what does "shared" mean anyway? it is no more shared than any other isolate is), but it specifically does not communicate that the isolate that you are about to create has non-standard and dangerous semantics. I'm not particularly set on the specifics of the naming, but the whole point is that you should not be able to pick this thing up without it being immediately obvious that you are in dangerous of losing a toe.

NativeCallable.shared

Something like this is probably reasonable. In general I think that going through a native interface is a good signal that you're holding something sharp. I still don't understand the "shared" terminology.

Similarly SendPort.send(o) throws an exception if o transitively references something unsendable but it is not called SendPort.unsafeSend(o).

Perhaps it should? But SendPort only has one API, so you can't accidentally stumble onto the wrong one. If there were SendPort.send which worked in all cases and SendPort.sendShared which blew up occasionally I think I would have the same concerns.

Another possible approach might be to just add an optional parameter to the existing Isolate.run? Something like {..., bool throwOnStaticAccess = false}?

aam · 2025-05-20T23:42:41Z

NativeCallable.shared

I still don't understand the "shared" terminology.

So far the name for the ffi callback that we've been using is NativeCallable.isolateGroupShared. Meaning that it's the isolate_group state that is shared - such ffi callback has access only to that. The states of all isolates remains private, isolated, not shared, not accessible to such ffi callbacks.

leafpetersen · 2025-05-21T01:02:35Z

Meaning that it's the isolate_group state that is shared

But that state is shared with all of the other isolates as well. That is, the distinguishing feature of the isolate you get from calling this api is not that it has access to shared state - all of the isolates in the group have access to that state. The distinguishing feature is that it does not have access to any other state of its own.

mraleph added the question Further information is requested label May 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running code in an isolate group rather than isolate #4379

Running code in an isolate group rather than isolate #4379

mraleph commented May 16, 2025

mraleph commented May 16, 2025

lrhn commented May 16, 2025

aam commented May 16, 2025

leafpetersen commented May 16, 2025

mkustermann commented May 19, 2025

lrhn commented May 19, 2025 •

edited

Loading

mraleph commented May 20, 2025

leafpetersen commented May 20, 2025

aam commented May 20, 2025

leafpetersen commented May 21, 2025

Running code in an isolate group rather than isolate #4379

Running code in an isolate group rather than isolate #4379

Comments

mraleph commented May 16, 2025

mraleph commented May 16, 2025

lrhn commented May 16, 2025

aam commented May 16, 2025

leafpetersen commented May 16, 2025

mkustermann commented May 19, 2025

lrhn commented May 19, 2025 • edited Loading

mraleph commented May 20, 2025

leafpetersen commented May 20, 2025

aam commented May 20, 2025

leafpetersen commented May 21, 2025

lrhn commented May 19, 2025 •

edited

Loading