You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
mm: Allow userspace to reserve VA range for use by userspace only
Add support for ELF binaries to reserve address ranges. Address
range can be reserved at load time by adding an ELF NOTE section, or
at run time with mprotect() with PROT_RESERVED flag. Reserved ranges
can be allocated with mmap(..... MAP_FIXED...) and shmat(....,
SHM_REMAP) later. Any reserved address ranges are annotated with
"[rsvd]" in /proc/<pid>/maps output. A binary can check if the
kernel supports VA range reservation by checking the value of
auxiliary vector AT_VA_RESERVATION.
VA reservation is done by adding a special NOTE section to binary
using declarations similar to following:
.section .note.rsvd_range, "a", @note
.p2align 2
.long 1f - 0f # name size (not including padding)
.long 3f - 2f # desc size (not including padding)
.long 0x07c10001
0: .asciz "Reserved VA" # name
1: .p2align 2
2: .quad 0x7f2000000000
.quad 0x7f2000e00000
.quad 0x7f5000200000
.quad 0x7f500d000000
3: .p2align 2
Each reserved range is specified as pair of addresses (start and
end). This note section is read by kernel elf loader and address
ranges are reserved for the lifetime of process. A maximum of 64
such entries can be made in NOTE section. Execution of a binary file
with more than 64 pairs of addresses in this note section will be
terminated with ENOEXEC.
NOTE: Kernel can not guarantee all VA ranges in the NOTE section
will be reserved. If the address range is valid but is already in
use (possibly by a shared library loaded earlier), execution of
binary will be terminated with ENOMEM.
NOTE: This feature needs two VMA flag bits. There are no free bits
available in lower 32 bits. As a result this feature can only be
supported on architectures that support high VMA flag bits (bits
32-63).
NOTE: Due to limitations in the implementation, when mapping a
range over over one or more reserved ranges the range must be
entirely contained within a reserved range or a contiguous set of
reserved ranges. mmap() will fail and set errno to EINVAL if
the range to map is only partly reserved.
-----------------------------
Upstream status of this patch
-----------------------------
This patch will not be submitted upstream. It solves a specific
problem for database in a way that mostly works for DB. It will be
very difficult to get this patch accepted upstream. There are two
issues that make it difficult - (1) These changes do not solve the
problem fully, (2) There are other ways to solve this problem
without kernel changes.
The reason these changes do not solve this problem fully is kernel
does not get called to load ELF binary until after the loader has
already loaded all the libraries. It is possible that one of the
libraries might get loaded at the address we want to reserve and by
the time kernel gets a chance to reserve the address range, it is
already too late.
There are three other ways to solve this problem besides modifying
the kernel:
1. Create a binary that gets started before DB starts, reserves
address ranges using mmap(MAP_FIXED) and then launches DB as child
process with address ranges reserved. This still leaves open the
possibility address ranges were already consumed by libraries but I
believe this is roughly the solution used on Solaris.
2. Use LD_PRELOAD to preload a special library which reserves
address ranges in its init routine. Now when DB starts, it can call
into this special library and get addresses for all the address
ranges that have been reserved. This can work conceptually but it
needs to be prototyped and tested to see if LD_PRELOAD libraries get
loaded before other libraries and if address reservation using mmap
in special library survives.
3. A custom loader which reserves address ranges first before
loading any other libraries. This is the only solution that can
guarantee DB will get the address ranges it wants to reserve.
This feature became more or less a requirement for DB to be able to
enable ASLR which customers were asking for. A custom loader can
provide a potential solution even with ASLR.
Orabug: 30135230
Signed-off-by: Khalid Aziz <[email protected]>
Signed-off-by: Anthony Yznaga <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Reviewed-by: Khalid Aziz <[email protected]>
0 commit comments