Skip to content

Commit 59893b3

Browse files
sirlucjanopsiff
authored andcommitted
FROMEXT: sched: import bore v6.6.1 from cachyos
bore inclusion category: feature [Overview] BORE (Burst-Oriented Response Enhancer) is an enhanced versions of the EEVDF (Earliest Eligible Virtual Deadline First) Linux schedulers. Developed with the aim of maintaining these schedulers' high performance while delivering resilient responsiveness to user input under as versatile load scenario as possible. To achieve this, BORE introduces a dimension of flexibility known as "burstiness" for each individual tasks, partially departing from CFS's inherent "complete fairness" principle. Burstiness refers to the score derived from the accumulated CPU time a task consumes after explicitly relinquishing it, either by entering sleep, IO-waiting, or yielding. This score represents a broad range of temporal characteristics, spanning from nanoseconds to hundreds of seconds, varying across different tasks. Leveraging this burstiness metric, BORE dynamically adjusts scheduling properties such as weights and delays for each task. Consequently, in systems experiencing diverse types of loads, BORE prioritizes tasks requiring high responsiveness, thereby improving overall system responsiveness and enhancing the user experience. [How it works] * The scheduler tracks each task's burst time, which is the amount of CPU time the task has consumed since it last yielded, slept, or waited for I/O. * While a task is active, its burst score is continuously calculated by counting the bit count of its normalized burst time and adjusting it using pre-configured offset and coefficient. * The burst score functions similarly to "niceness" and takes a value between 0-39. For each decrease in value by 1, the task can consume approximately 1.25x longer timeslice. * This process acts as a radix conversion from binary logarithm to common logarithm, converting between two different magnitudes (nano-seconds-to-minutes timescale to about 0.01-100x scale) dimensionlessly. * As a result, less "greedy" tasks are given more timeslice and wakeup preemption aggressiveness, while greedier tasks that yield their timeslice less frequently are weighted less. * The burst score of newly-spawned processes is calculated in a unique way to prevent tasks like "make" from overwhelming interactive tasks by forking many CPU-hungry children. * The final effect is an equilibrium between opposing greedy and weak tasks (usually CPU-bound batch tasks) and modest and strong tasks (usually I/O-bound interactive tasks), providing a more responsive user experience under the coexistence of various types of workloads. [Tunables] [Example] `$ sudo sysctl -w kernel.sched_bore=1` [sched_bore (range: 0 - 1, default: 1)] 1 Enables the BORE mechanism. 0 Disables the BORE mechanism. [sched_burst_cache_lifetime (range: 0 - 4294967295, default: 75000000)] How many nanoseconds to hold as cache the on-fork calculated average burst time of each task's child tasks. Increasing this value results in less frequent re-calculation of average burst time, in barter of more coarse-grain (=low time resolution) on-fork burst time adjustments. [sched_burst_inherit_type (range: 0 - 2, default: 2)] 0: Disables the inheritance of the average child burst time from ancestor processes. 1: Enables the inheritance from parent processes. 2: Enables the inheritance of the average child burst time from ancestor processes using a topological hub/stub style hierarchy tree, rather than the traditional parent-to-child style. When this feature is enabled, nodes with only one child process are ignored when finding and calculating ancestor/descendant processes for inheritance. Enabling this feature may improve system responsiveness in situations with massive process-forking, such as kernel builds. [sched_burst_penalty_offset (range: 0 - 63, default: 24)] How many bits to reduce from burst time bit count when calculating burst score. Increasing this value prevents tasks of shorter burst time from being too strong. Increasing this value also lengthens the effective burst time range. [sched_burst_penalty_scale (range: 0 - 4095, default: 1536)] How strongly tasks are discriminated accordingly to their burst time ratio, scaled in 1/1024 of its precursor value. Increasing this value makes burst score rapidly grow as the burst time grows. That means tasks that run longer without sleeping/yielding/iowaiting rapidly lose their power against those that run shorter. Decreasing vice versa. [sched_burst_smoothness (range: 0 - 3, default: 1)] A task's actual burst score is the larger one of its latest calculated score or its "historical" score which inherits past score(s). This is done to smoothen the user experience under "burst spike" situations. Every time burst score is updated (when the task is dequeued/yielded), its historical score is also updated by mixing burst_time into the previous history in an exponential moving average style. burst_smoothness=1 means no smoothening. Link: https://github.com/firelzrd/bore-scheduler/blob/178fd0a4bec8ad7b66facdee879602eb157c17a1/README.md Link: https://github.com/firelzrd/bore-scheduler/blob/178fd0a4bec8ad7b66facdee879602eb157c17a1/patches/stable/linux-6.18-bore/0001-linux6.18.3-bore-6.6.1.patch Link: https://github.com/CachyOS/kernel-patches/blob/fd1f0c2b7020e735de3981dcceada77fe1305580/6.18/sched/0001-bore.patch Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
1 parent aa48928 commit 59893b3

File tree

14 files changed

+743
-15
lines changed

14 files changed

+743
-15
lines changed

include/linux/sched.h

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -817,6 +817,37 @@ struct kmap_ctrl {
817817
#endif
818818
};
819819

820+
#ifdef CONFIG_SCHED_BORE
821+
#define BORE_BC_TIMESTAMP_SHIFT 16
822+
823+
struct bore_bc {
824+
union {
825+
struct {
826+
u64 timestamp: 48;
827+
u64 penalty: 16;
828+
};
829+
u64 value;
830+
};
831+
};
832+
833+
struct bore_ctx {
834+
u64 burst_time;
835+
u16 prev_penalty;
836+
u16 curr_penalty;
837+
union {
838+
u16 penalty;
839+
struct {
840+
u8 _;
841+
u8 score;
842+
};
843+
};
844+
bool stop_update;
845+
bool futex_waiting;
846+
struct bore_bc subtree;
847+
struct bore_bc group;
848+
} ____cacheline_aligned;
849+
#endif /* CONFIG_SCHED_BORE */
850+
820851
struct task_struct {
821852
#ifdef CONFIG_THREAD_INFO_IN_TASK
822853
/*
@@ -875,6 +906,9 @@ struct task_struct {
875906
#ifdef CONFIG_SCHED_CLASS_EXT
876907
struct sched_ext_entity scx;
877908
#endif
909+
#ifdef CONFIG_SCHED_BORE
910+
struct bore_ctx bore;
911+
#endif /* CONFIG_SCHED_BORE */
878912
const struct sched_class *sched_class;
879913

880914
#ifdef CONFIG_SCHED_CORE

include/linux/sched/bore.h

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
#ifndef _KERNEL_SCHED_BORE_H
2+
#define _KERNEL_SCHED_BORE_H
3+
4+
#include <linux/sched.h>
5+
#include <linux/sched/cputime.h>
6+
#include <linux/atomic.h>
7+
#include <linux/list.h>
8+
#include <linux/rcupdate.h>
9+
10+
#define SCHED_BORE_AUTHOR "Masahito Suzuki"
11+
#define SCHED_BORE_PROGNAME "BORE CPU Scheduler modification"
12+
13+
#define SCHED_BORE_VERSION "6.6.1"
14+
15+
extern u8 __read_mostly sched_bore;
16+
extern u8 __read_mostly sched_burst_inherit_type;
17+
extern u8 __read_mostly sched_burst_smoothness;
18+
extern u8 __read_mostly sched_burst_penalty_offset;
19+
extern uint __read_mostly sched_burst_penalty_scale;
20+
extern uint __read_mostly sched_burst_cache_lifetime;
21+
22+
extern u8 effective_prio_bore(struct task_struct *p);
23+
extern void update_curr_bore(struct task_struct *p, u64 delta_exec);
24+
extern void restart_burst_bore(struct task_struct *p);
25+
extern void restart_burst_rescale_deadline_bore(struct task_struct *p);
26+
extern void task_fork_bore(struct task_struct *p, struct task_struct *parent,
27+
u64 clone_flags, u64 now);
28+
extern void sched_init_bore(void);
29+
extern void reset_task_bore(struct task_struct *p);
30+
31+
extern int sched_bore_update_handler(const struct ctl_table *table,
32+
int write, void __user *buffer, size_t *lenp, loff_t *ppos);
33+
extern int sched_burst_inherit_type_update_handler(const struct ctl_table *table,
34+
int write, void __user *buffer, size_t *lenp, loff_t *ppos);
35+
36+
extern void reweight_entity(
37+
struct cfs_rq *cfs_rq, struct sched_entity *se, unsigned long weight);
38+
39+
#endif /* _KERNEL_SCHED_BORE_H */

init/Kconfig

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1423,6 +1423,23 @@ config CHECKPOINT_RESTORE
14231423

14241424
If unsure, say N here.
14251425

1426+
config SCHED_BORE
1427+
bool "Burst-Oriented Response Enhancer"
1428+
default y
1429+
help
1430+
In Desktop and Mobile computing, one might prefer interactive
1431+
tasks to keep responsive no matter what they run in the background.
1432+
1433+
Enabling this kernel feature modifies the scheduler to discriminate
1434+
tasks by their burst time (runtime since it last went sleeping or
1435+
yielding state) and prioritize those that run less bursty.
1436+
Such tasks usually include window compositor, widgets backend,
1437+
terminal emulator, video playback, games and so on.
1438+
With a little impact to scheduling fairness, it may improve
1439+
responsiveness especially under heavy background workload.
1440+
1441+
If unsure, say Y here.
1442+
14261443
config SCHED_AUTOGROUP
14271444
bool "Automatic process group scheduling"
14281445
select CGROUPS

kernel/Kconfig.hz

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,3 +57,20 @@ config HZ
5757

5858
config SCHED_HRTICK
5959
def_bool HIGH_RES_TIMERS
60+
61+
config MIN_BASE_SLICE_NS
62+
int "Default value for min_base_slice_ns"
63+
default 2000000
64+
help
65+
The BORE Scheduler automatically calculates the optimal base
66+
slice for the configured HZ using the following equation:
67+
68+
base_slice_ns =
69+
1000000000/HZ * DIV_ROUNDUP(min_base_slice_ns, 1000000000/HZ)
70+
71+
This option sets the default lower bound limit of the base slice
72+
to prevent the loss of task throughput due to overscheduling.
73+
74+
Setting this value too high can cause the system to boot with
75+
an unnecessarily large base slice, resulting in high scheduling
76+
latency and poor system responsiveness.

kernel/exit.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ static void __unhash_process(struct release_task_post *post, struct task_struct
147147
detach_pid(post->pids, p, PIDTYPE_SID);
148148

149149
list_del_rcu(&p->tasks);
150-
list_del_init(&p->sibling);
150+
list_del_rcu(&p->sibling);
151151
__this_cpu_dec(process_counts);
152152
}
153153
list_del_rcu(&p->thread_node);

kernel/fork.c

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,10 @@
116116
/* For dup_mmap(). */
117117
#include "../mm/internal.h"
118118

119+
#ifdef CONFIG_SCHED_BORE
120+
#include <linux/sched/bore.h>
121+
#endif /* CONFIG_SCHED_BORE */
122+
119123
#include <trace/events/sched.h>
120124

121125
#define CREATE_TRACE_POINTS
@@ -2320,6 +2324,11 @@ __latent_entropy struct task_struct *copy_process(
23202324
p->start_time = ktime_get_ns();
23212325
p->start_boottime = ktime_get_boottime_ns();
23222326

2327+
#ifdef CONFIG_SCHED_BORE
2328+
if (likely(p->pid))
2329+
task_fork_bore(p, current, clone_flags, p->start_time);
2330+
#endif /* CONFIG_SCHED_BORE */
2331+
23232332
/*
23242333
* Make it visible to the rest of the system, but dont wake it up yet.
23252334
* Need tasklist lock for parent etc handling!
@@ -2393,7 +2402,7 @@ __latent_entropy struct task_struct *copy_process(
23932402
*/
23942403
p->signal->has_child_subreaper = p->real_parent->signal->has_child_subreaper ||
23952404
p->real_parent->signal->is_child_subreaper;
2396-
list_add_tail(&p->sibling, &p->real_parent->children);
2405+
list_add_tail_rcu(&p->sibling, &p->real_parent->children);
23972406
list_add_tail_rcu(&p->tasks, &init_task.tasks);
23982407
attach_pid(p, PIDTYPE_TGID);
23992408
attach_pid(p, PIDTYPE_PGID);

kernel/futex/waitwake.c

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@
44
#include <linux/sched/task.h>
55
#include <linux/sched/signal.h>
66
#include <linux/freezer.h>
7+
#ifdef CONFIG_SCHED_BORE
8+
#include <linux/sched/bore.h>
9+
#endif /* CONFIG_SCHED_BORE */
710

811
#include "futex.h"
912

@@ -355,7 +358,15 @@ void futex_do_wait(struct futex_q *q, struct hrtimer_sleeper *timeout)
355358
* is no timeout, or if it has yet to expire.
356359
*/
357360
if (!timeout || timeout->task)
361+
#ifdef CONFIG_SCHED_BORE
362+
{
363+
current->bore.futex_waiting = true;
364+
#endif /* CONFIG_SCHED_BORE */
358365
schedule();
366+
#ifdef CONFIG_SCHED_BORE
367+
current->bore.futex_waiting = false;
368+
}
369+
#endif /* CONFIG_SCHED_BORE */
359370
}
360371
__set_current_state(TASK_RUNNING);
361372
}

kernel/sched/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,3 +37,4 @@ obj-y += core.o
3737
obj-y += fair.o
3838
obj-y += build_policy.o
3939
obj-y += build_utility.o
40+
obj-$(CONFIG_SCHED_BORE) += bore.o

0 commit comments

Comments
 (0)