-
Notifications
You must be signed in to change notification settings - Fork 14
Improve Classpath Implementation #416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think there is a simpler but related stretch target. It seems from the example that you are proposing to share between globals, but if we limit it to a new run with the same settings, but a changed file-system I think that would be easier and closer to the IDE flow If we are to support some reuse of the symbols, it would be easier I think to allow reuse when the classpath doesn't change, but either the source and/or the jar content has changed. This is the typical case for a multi-module build running in parallel. For instance if we detect that the difference between a previous compile cycle and a new one has only changed sources and jars a set of packages, can we invalidate those package symbols and all children, and thing that depend on then I think that this would work well for an IDE, as typically a downstream project has changed some jar/classes and/or the user has edited some source xsbt-dependency provides some information about the dependencies for the sources the classPath could provide info on what changed between the prior run and current snap reasonably easyly |
Jars that are no involved in an active compilation should not need sbt support should it? If the Run is compiling (or being constructed) its active otherwise it isn't. That is what I tried in scala/scala#5957 |
-D scala.classpath.closeZip is a measurable overhead in real world benchmarks |
I think it should be a design constraint (or at least a strong target) to not require SBT to implement changes. I think that the compiler should be standalone, as there are many build systems. Certainly we should allow APIs to tools builders to make the environment better, but should not rely on these API for the functional behaviour or the compiler |
I've added support for benchmarking the resident compiler to get a feel for how much we can speedup with symbol table re-use. |
|
@fommil with small changes to the way classpath works we can remove this overhead completely, and speedup general classpath access, while not locking jar files (and not having to restart sbt). The 'measured overhead' is increasing, and will increase further due to changes since you left. Maybe it just what is regarded as 'significant' though. Either way it seems like a free wil |
Anything that avoids locking jar files is awesome in my book. I don't care about Windows anymore but I do care about the memory leak in ensime... especially so now that we have a compiler process per project/configuration 👍 |
Here's a straw-man proposal for a hook to support reading classfiles from a cache or a yet-to-be-written classfile from another trait ClasspathPlugin {
def replaceClasspath(cp: Classpath): Classpath
} A new variant of compiler plugin could implement this interface. The plugin could then replace one or more elements of the classpath, and override the likes of: def classes(inPackage: String): Seq[ClassFileEntry] We need to do a bit of a cleanup of the
I would have liked to replace At that point, the custom classpath could then return a Going a bit further, we could avoid the need for the dummy classfile with: This trait ClasspathPlugin {
def info(file: AbstractFile): Option[ClassfileInfo]
}
trait ClassfileInfo {
def pickle: Option[java.nio.ByteBuffer]
def isScalaRaw: Boolean
def isScala: Boolean
} |
(fyi, class monkey does this already) |
Background
The Scala compiler uses its Classpath abstraction to present the compilation classpath and sourcepath to the compiler. Entries in these paths are typically directories, JARs, or, more recently, the jrt:// virtual filesystem exposed by Java 9.
Scala 2.12 switched to a refactored implementation of the classpath (known as the "flat classpath"). This work was primary motivated by reducing the memory footprint of the classpath representation. Improved compilation speed by saving JAR listings is a secondary benefit.
An example of an environment where this matters is in a modularized build loaded into Scala IDE. There may be hundreds of instances of
Global
, each with classpaths numbering in hundreds of elements, that point at some smaller set of JARs.Two main changes were made:
Like any new implementation, there have been some teething problems.
Symbol
to the corresponding classpath entry becameO(N)
with poor constant factors. This was fixed in scala/scala#5956.exportJars =: true
mode of SBT. A workaround is to use-YdisableFlatCpCaching
Furthermore, the cost of navigating through the classpath to a
.class
and converting it to aSymbol
is a non-neglible part of some compilation use cases (in particular, very small compile runs), so finding ways to make this faster or avoid parts of it via reliable caching is appealing.Goals
New Globals don't see stale JAR listings
Improve the current caching mechanism so that If a JAR is overwritten, a newly constructed
Global
should use the listing from the new JAR, rather than the stale cached info.If aggregates are cached, these need to be invalidated when a component is invalidated.
Thread Safety of shared instances
Code review and adversarial testing should be used to flush out any thread safety problems in
Classpath
as it is shared by instance ofGlobal
.JARs are not locked when there are no active compilations
There are two approaches here:
-D scala.classpath.closeZip
that open and closes zips around the extraction of each class file. While this hurts performance in a microbenchmark, it might be acceptable in the context of the compiler.Global
and/orRun
(perhaps involving SBT) to implement a reference counting for cached classpath elements that correspond to openZipFile
-s , and close them when the ref-count drops to 0. I've prototyped this in retronym:ticket/10295Performant Implementation
Create benchmarks and tests to measure:
O(n)
, aggregates should indexpackage => entry
New
Run
-s respect overwritten JAR / .class filesWhile we're in the neighborhood of cache invalidation, we should set a stretch target of making multi-run compilation more sure of foot when treading over the shifting sands of overwritten JARs and
.class
files.The naive approach would be to wipe out the symbol table on any change to the underlying files that had been witnessed during its construction.
A more nuanced approach would be able to prune only symbols that had changed, or had depended (perhaps transitively) on something that has changed.
As a concrete example, in a multi-module SBT:
scalac -cp static-lib.jar -d a/target/classes a/src/...
scalac -cp static-lib.jar:a/target/classes -d b/target/classes b/src/...
scalac -cp static-lib.jar -d a/target/classes a/src/...
Overwrites some classfilesscalac -cp static-lib.jar:a/target/classes -d b/target/classes b/src/...
We could reuseSymbol
-s representingstatic-lib.jar
,rt.jar
, and reset_root_.a
back to its initial state. This should be possible because no signature in the those JARs refers to_root_.a.*
This has been tried in the past without success, so we must assume it isn't trivial to get right. As such, this is a stretch target.
Non Goals
Mutation of JAR or
.class
files during compilation currently has undefined behaviour, we don't plan to improve that.The highest possible performance of the classpath in isolation. Performance tuning of the classpath should be motivated by scenarios in which we measure that it imposes an unreasonable overhead to a real world compilation use case.
The text was updated successfully, but these errors were encountered: