Skip to content

Commit 165c1a3

Browse files
authored
Add a port of mimalloc, a fast and scalable multithreaded allocator (#20651)
The new allocator can be used with -sMALLOC=mimalloc. On the benchmark added in this PR, dlmalloc does quite poorly here (getting actually slower with each additional core, because the lock contention is much larger than the actual work in the artificial benchmark). mimalloc, in comparison, scales the same as natively: more cores keeps helping. So mimalloc can be a significant speedup in codebases that have lock contention on malloc. mimalloc is significantly larger than dlmalloc, however, so we do not want it on by default. It also uses more memory, because of how mimalloc works and also due to #20645. Design-wise, this layers mimalloc on top of emmalloc. emmalloc functions as the "system allocator", which is more powerful than just using raw sbrk - sbrk can't free holes in the middle, for example. Code-wise, all of system/lib/mimalloc is unchanged from upstream (see README.emscripten) except for an ifdef or two, and then the new backend which is in system/lib/mimalloc/src/prim/emscripten/prim.c. That file has more comments explaining the design of the port. A new test is added which is also usable as a benchmark, test/other/test_mimalloc.cpp, which is where the numbers above come from. Fixes #18369
1 parent 90ab3a7 commit 165c1a3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+17498
-5
lines changed

ChangeLog.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ See docs/process.md for more on how version tagging works.
2020

2121
3.1.50 (in development)
2222
-----------------------
23+
- Add a port of mimalloc, a fast and scalable multithreaded allocator. To use
24+
it, build with `-sMALLOC=mimalloc`. (#20651)
2325
- When compiling, Emscripten will now invoke `clang` or `clang++` depending only
2426
on whether `emcc` or `em++` was run. Previously it would determine which to
2527
run based on individual file extensions. One side effect of this is that you

embuilder.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,8 @@
5656
'libemmalloc-memvalidate',
5757
'libemmalloc-verbose',
5858
'libemmalloc-memvalidate-verbose',
59+
'libmimalloc',
60+
'libmimalloc-mt',
5961
'libGL',
6062
'libhtml5',
6163
'libsockets',

site/source/docs/optimizing/Optimizing-Code.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,15 @@ Enable :ref:`debugging-EMCC_DEBUG` to output files for each compilation phase, i
221221

222222
.. _optimizing-code-unsafe-optimisations:
223223

224+
Allocation
225+
----------
226+
227+
The default ``malloc/free`` implementation used is ``dlmalloc``. You can also
228+
pick ``emmalloc`` (``-sMALLOC=emmalloc``) which is smaller but less fast, or
229+
``mimalloc`` (``-sMALLOC=mimalloc``) which is larger but scales better in a
230+
multithreaded application with contention on ``malloc/free`` (see
231+
:ref:`Allocator_performance`).
232+
224233
Unsafe optimizations
225234
====================
226235

site/source/docs/porting/pthreads.rst

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,24 @@ The Emscripten implementation for the pthreads API should follow the POSIX stand
148148

149149
Also note that when compiling code that uses pthreads, an additional JavaScript file ``NAME.worker.js`` is generated alongside the output .js file (where ``NAME`` is the basename of the main file being emitted). That file must be deployed with the rest of the generated code files. By default, ``NAME.worker.js`` will be loaded relative to the main HTML page URL. If it is desirable to load the file from a different location e.g. in a CDN environment, then one can define the ``Module.locateFile(filename)`` function in the main HTML ``Module`` object to return the URL of the target location of the ``NAME.worker.js`` entry point. If this function is not defined in ``Module``, then the default location relative to the main HTML file is used.
150150

151+
.. _Allocator_performance:
152+
153+
Allocator performance
154+
=====================
155+
156+
The default system allocator in Emscripten, ``dlmalloc``, is very efficient in a
157+
single-threaded program, but it has a single global lock which means if there is
158+
contention on ``malloc`` then you can see overhead. You can use
159+
`mimalloc <https://github.com/microsoft/mimalloc>`_
160+
instead by using ``-sMALLOC=mimalloc``, which is a more sophisticated allocator
161+
tuned for multithreaded performance. ``mimalloc`` has separate allocation
162+
contexts on each thread, allowing performance to scale a lot better under
163+
``malloc/free`` contention.
164+
165+
Note that ``mimalloc`` is larger in code size than ``dlmalloc``, and also uses
166+
more memory at runtime (so you may need to adjust ``INITIAL_MEMORY`` to a higher
167+
value), so there are tradeoffs here.
168+
151169
Running code and tests
152170
======================
153171

src/settings.js

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,9 @@ var STACK_SIZE = 64*1024;
108108
// * emmalloc-verbose - use emmalloc with assertions + verbose logging.
109109
// * emmalloc-memvalidate-verbose - use emmalloc with assertions + heap
110110
// consistency checking + verbose logging.
111+
// * mimalloc - a powerful mulithreaded allocator. This is recommended in
112+
// large applications that have malloc() contention, but it is
113+
// larger and uses more memory.
111114
// * none - no malloc() implementation is provided, but you must implement
112115
// malloc() and free() yourself.
113116
// dlmalloc is necessary for split memory and other special modes, and will be

system/lib/mimalloc/LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2018-2021 Microsoft Corporation, Daan Leijen
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

system/lib/mimalloc/README.emscripten

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
2+
This contains mimalloc 4e50d6714d471b72b2285e25a3df6c92db944593 with
3+
Emscripten backend additions.
4+
5+
Origin: https://github.com/microsoft/mimalloc
6+
7+
For the Emscripten port design see src/prim/emscripten/prim.c
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
/* ----------------------------------------------------------------------------
2+
Copyright (c) 2018-2020 Microsoft Research, Daan Leijen
3+
This is free software; you can redistribute it and/or modify it under the
4+
terms of the MIT license. A copy of the license can be found in the file
5+
"LICENSE" at the root of this distribution.
6+
-----------------------------------------------------------------------------*/
7+
#pragma once
8+
#ifndef MIMALLOC_NEW_DELETE_H
9+
#define MIMALLOC_NEW_DELETE_H
10+
11+
// ----------------------------------------------------------------------------
12+
// This header provides convenient overrides for the new and
13+
// delete operations in C++.
14+
//
15+
// This header should be included in only one source file!
16+
//
17+
// On Windows, or when linking dynamically with mimalloc, these
18+
// can be more performant than the standard new-delete operations.
19+
// See <https://en.cppreference.com/w/cpp/memory/new/operator_new>
20+
// ---------------------------------------------------------------------------
21+
#if defined(__cplusplus)
22+
#include <new>
23+
#include <mimalloc.h>
24+
25+
#if defined(_MSC_VER) && defined(_Ret_notnull_) && defined(_Post_writable_byte_size_)
26+
// stay consistent with VCRT definitions
27+
#define mi_decl_new(n) mi_decl_nodiscard mi_decl_restrict _Ret_notnull_ _Post_writable_byte_size_(n)
28+
#define mi_decl_new_nothrow(n) mi_decl_nodiscard mi_decl_restrict _Ret_maybenull_ _Success_(return != NULL) _Post_writable_byte_size_(n)
29+
#else
30+
#define mi_decl_new(n) mi_decl_nodiscard mi_decl_restrict
31+
#define mi_decl_new_nothrow(n) mi_decl_nodiscard mi_decl_restrict
32+
#endif
33+
34+
void operator delete(void* p) noexcept { mi_free(p); };
35+
void operator delete[](void* p) noexcept { mi_free(p); };
36+
37+
void operator delete (void* p, const std::nothrow_t&) noexcept { mi_free(p); }
38+
void operator delete[](void* p, const std::nothrow_t&) noexcept { mi_free(p); }
39+
40+
mi_decl_new(n) void* operator new(std::size_t n) noexcept(false) { return mi_new(n); }
41+
mi_decl_new(n) void* operator new[](std::size_t n) noexcept(false) { return mi_new(n); }
42+
43+
mi_decl_new_nothrow(n) void* operator new (std::size_t n, const std::nothrow_t& tag) noexcept { (void)(tag); return mi_new_nothrow(n); }
44+
mi_decl_new_nothrow(n) void* operator new[](std::size_t n, const std::nothrow_t& tag) noexcept { (void)(tag); return mi_new_nothrow(n); }
45+
46+
#if (__cplusplus >= 201402L || _MSC_VER >= 1916)
47+
void operator delete (void* p, std::size_t n) noexcept { mi_free_size(p,n); };
48+
void operator delete[](void* p, std::size_t n) noexcept { mi_free_size(p,n); };
49+
#endif
50+
51+
#if (__cplusplus > 201402L || defined(__cpp_aligned_new))
52+
void operator delete (void* p, std::align_val_t al) noexcept { mi_free_aligned(p, static_cast<size_t>(al)); }
53+
void operator delete[](void* p, std::align_val_t al) noexcept { mi_free_aligned(p, static_cast<size_t>(al)); }
54+
void operator delete (void* p, std::size_t n, std::align_val_t al) noexcept { mi_free_size_aligned(p, n, static_cast<size_t>(al)); };
55+
void operator delete[](void* p, std::size_t n, std::align_val_t al) noexcept { mi_free_size_aligned(p, n, static_cast<size_t>(al)); };
56+
void operator delete (void* p, std::align_val_t al, const std::nothrow_t&) noexcept { mi_free_aligned(p, static_cast<size_t>(al)); }
57+
void operator delete[](void* p, std::align_val_t al, const std::nothrow_t&) noexcept { mi_free_aligned(p, static_cast<size_t>(al)); }
58+
59+
void* operator new (std::size_t n, std::align_val_t al) noexcept(false) { return mi_new_aligned(n, static_cast<size_t>(al)); }
60+
void* operator new[](std::size_t n, std::align_val_t al) noexcept(false) { return mi_new_aligned(n, static_cast<size_t>(al)); }
61+
void* operator new (std::size_t n, std::align_val_t al, const std::nothrow_t&) noexcept { return mi_new_aligned_nothrow(n, static_cast<size_t>(al)); }
62+
void* operator new[](std::size_t n, std::align_val_t al, const std::nothrow_t&) noexcept { return mi_new_aligned_nothrow(n, static_cast<size_t>(al)); }
63+
#endif
64+
#endif
65+
66+
#endif // MIMALLOC_NEW_DELETE_H
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
/* ----------------------------------------------------------------------------
2+
Copyright (c) 2018-2020 Microsoft Research, Daan Leijen
3+
This is free software; you can redistribute it and/or modify it under the
4+
terms of the MIT license. A copy of the license can be found in the file
5+
"LICENSE" at the root of this distribution.
6+
-----------------------------------------------------------------------------*/
7+
#pragma once
8+
#ifndef MIMALLOC_OVERRIDE_H
9+
#define MIMALLOC_OVERRIDE_H
10+
11+
/* ----------------------------------------------------------------------------
12+
This header can be used to statically redirect malloc/free and new/delete
13+
to the mimalloc variants. This can be useful if one can include this file on
14+
each source file in a project (but be careful when using external code to
15+
not accidentally mix pointers from different allocators).
16+
-----------------------------------------------------------------------------*/
17+
18+
#include <mimalloc.h>
19+
20+
// Standard C allocation
21+
#define malloc(n) mi_malloc(n)
22+
#define calloc(n,c) mi_calloc(n,c)
23+
#define realloc(p,n) mi_realloc(p,n)
24+
#define free(p) mi_free(p)
25+
26+
#define strdup(s) mi_strdup(s)
27+
#define strndup(s,n) mi_strndup(s,n)
28+
#define realpath(f,n) mi_realpath(f,n)
29+
30+
// Microsoft extensions
31+
#define _expand(p,n) mi_expand(p,n)
32+
#define _msize(p) mi_usable_size(p)
33+
#define _recalloc(p,n,c) mi_recalloc(p,n,c)
34+
35+
#define _strdup(s) mi_strdup(s)
36+
#define _strndup(s,n) mi_strndup(s,n)
37+
#define _wcsdup(s) (wchar_t*)mi_wcsdup((const unsigned short*)(s))
38+
#define _mbsdup(s) mi_mbsdup(s)
39+
#define _dupenv_s(b,n,v) mi_dupenv_s(b,n,v)
40+
#define _wdupenv_s(b,n,v) mi_wdupenv_s((unsigned short*)(b),n,(const unsigned short*)(v))
41+
42+
// Various Posix and Unix variants
43+
#define reallocf(p,n) mi_reallocf(p,n)
44+
#define malloc_size(p) mi_usable_size(p)
45+
#define malloc_usable_size(p) mi_usable_size(p)
46+
#define cfree(p) mi_free(p)
47+
48+
#define valloc(n) mi_valloc(n)
49+
#define pvalloc(n) mi_pvalloc(n)
50+
#define reallocarray(p,s,n) mi_reallocarray(p,s,n)
51+
#define reallocarr(p,s,n) mi_reallocarr(p,s,n)
52+
#define memalign(a,n) mi_memalign(a,n)
53+
#define aligned_alloc(a,n) mi_aligned_alloc(a,n)
54+
#define posix_memalign(p,a,n) mi_posix_memalign(p,a,n)
55+
#define _posix_memalign(p,a,n) mi_posix_memalign(p,a,n)
56+
57+
// Microsoft aligned variants
58+
#define _aligned_malloc(n,a) mi_malloc_aligned(n,a)
59+
#define _aligned_realloc(p,n,a) mi_realloc_aligned(p,n,a)
60+
#define _aligned_recalloc(p,s,n,a) mi_aligned_recalloc(p,s,n,a)
61+
#define _aligned_msize(p,a,o) mi_usable_size(p)
62+
#define _aligned_free(p) mi_free(p)
63+
#define _aligned_offset_malloc(n,a,o) mi_malloc_aligned_at(n,a,o)
64+
#define _aligned_offset_realloc(p,n,a,o) mi_realloc_aligned_at(p,n,a,o)
65+
#define _aligned_offset_recalloc(p,s,n,a,o) mi_recalloc_aligned_at(p,s,n,a,o)
66+
67+
#endif // MIMALLOC_OVERRIDE_H

0 commit comments

Comments
 (0)