hamadmarri/cacule-cpu-scheduler

sched_yield tweaks

xalt7x opened this issue ยท 31 comments

Some alternative schedulers have " yield_type" tunable
From MuQSS description:

This determines what type of yield calls to sched_yield will perform.
 0: No yield.
 1: Yield only to better priority/deadline tasks. (default)
 2: Expire timeslice and recalculate deadline.

From "Project C" (BMQ/PDS) description:

0 - No yield.
1 - Deboost and requeue task. (default)
2 - Set run queue skip task.

I guess "0" is a questionable value (especially with BMQ where my system with a single core and lowest frequency could stuck) but "1" is fine.

So the question is how to implement it with CFS.

In kernel/sched/core.c there's such part
static void do_sched_yield(void)

{
	struct rq_flags rf;
	struct rq *rq;

	rq = this_rq_lock_irq(&rf);

	schedstat_inc(rq->yld_count);
	current->sched_class->yield_task(rq);

	preempt_disable();
	rq_unlock_irq(rq, &rf);
	sched_preempt_enable_no_resched();

	schedule();
}

With "yield_type=0" MuQSS and PrjC put "return" before "rq = this_rq_lock_irq(&rf);"

static void do_sched_yield(void)
{
	struct rq *rq;
	struct rq_flags rf;

	if (!sched_yield_type)
		return;

	rq = this_rq_lock_irq(&rf);

But for "current->sched_class->yield_task(rq);" they have their own code which they use for "yield_type=2"
So is it it safe to just comment
current->sched_class->yield_task(rq);
to get something similar to "yield_type=1" for CFS?

Hi @Alt37

Thank you for proposing this idea. I am studying the differences in yield effects. Is there any test or benchmark that can show the effects of different yield types?

Thank you

Hi @hamadmarri

Is there any test or benchmark that can show the effects of different yield types?

You can find some user feedback at CK's blog. However results are mixed.
http://ck-hack.blogspot.com/2016/12/linux-49-ck1-muqss-version-0150.html
http://ck-hack.blogspot.com/2017/02/linux-410-ck1-muqss-version-0152-for.html

Hi @Alt37 @hamadmarri

I am trying the following patch to replicate the do_sched_yield() from MuQSS. Not sure if this would provide similar sched_yield() behaviour as MuQSS/PDS/BMQ though.

--- a/kernel/sched/core.c	2021-07-07 22:26:52.000000000 +1000
+++ b/kernel/sched/core.c	2021-07-08 14:20:23.952787349 +1000
@@ -80,6 +80,7 @@
  */
 int sysctl_sched_rt_runtime = 950000;
 
+int sched_yield_type __read_mostly = 1;
 
 /*
  * Serialization rules:
@@ -6949,10 +6950,14 @@
 	struct rq_flags rf;
 	struct rq *rq;
 
+	if (!sched_yield_type)
+		return;
+
 	rq = this_rq_lock_irq(&rf);
 
 	schedstat_inc(rq->yld_count);
-	current->sched_class->yield_task(rq);
+	if (sched_yield_type > 1)
+		current->sched_class->yield_task(rq);
 
 	preempt_disable();
 	rq_unlock_irq(rq, &rf);
--- a/kernel/sysctl.c	2021-07-07 22:26:52.000000000 +1000
+++ b/kernel/sysctl.c	2021-07-08 14:26:24.121562132 +1000
@@ -120,6 +120,7 @@
 static int one_hundred = 100;
 static int two_hundred = 200;
 static int one_thousand = 1000;
+extern int sched_yield_type;
 #ifdef CONFIG_PRINTK
 static int ten_thousand = 10000;
 #endif
@@ -1843,6 +1844,15 @@
 		.extra1		= SYSCTL_ONE,
 	},
 #endif
+	{
+		.procname	= "yield_type",
+		.data		= &sched_yield_type,
+		.maxlen		= sizeof (int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= &two,
+	},
 #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
 	{
 		.procname	= "sched_energy_aware",

Hi @Alt37 @raykzhao

I have read about yield issue, and also read the Con's approach where he gave more options on how to deal with yield.

Probably yield = 0 is almost the best solution but I have different idea which can take the advantage of yield = 0 i.e. no yield and also if yield is needed where there are more than one task in the runqueue. For example, if the current task asks to yield where it is the only task in the runqueue then do nothing as yield = 0 does. However, if there are other tasks, then don't lock and don't call schedule() but only mark the current tasks with a label or a mask such as curr->yield_asked = 1 <- or any other approach such as adding more state besides RUNNABLE and others.

Here is the algorithms:

yield function

  • if a runqueue has more tasks than the current (which asked to yield)
    • mark current with YIELD
  • else, do nothing

check_preempt function

  • if current is marked with YIELD
    • then preempt
  • else, do the normal checks

_enqueue_entity function

  • if task is marked with YIELD
    • place it at tail (the end of the queue), and clear the YILED mark
  • else, do the normal enqueuing

pick_next function

  • if the curr task (which is not in the queue yet because it has been dequeued when it was picked) is marked with YILED
    • then do not compare with the se at runqueue head (i.e. pick se and preempt curr)
    • unless the head is NULL then already pick curr <- which is very rare since check_preempt already asked to resched_curr when there was more tasks in runqueue.

Please let me know your opinion on this approach.

Thank you

We also can adjust the algorithm a bit to make it work with RDB as the following:

If current task is marked with YIELD (with the modification of "if it is the only task in the runqueue") then try to pull some tasks from other runqueues.

We also can adjust the algorithm a bit to make it work with RDB as the following:

If current task is marked with YIELD (with the modification of "if it is the only task in the runqueue") then try to pull some tasks from other runqueues.

That COULD be a improvement for the RDB.
Im ready to test it as daily RDB USer

@hamadmarri thanks for explanation, this sounds like great solution. I remember a game where different yield types (in PDS, BMQ, MuQSS) had great impact. There were probably lots of optimisations in kernel/wine since then, but I will gladly test it if you have it ready. I'm mostly running wihout RDB, but will test both again.

I am thinking that if simply bitwise OR the vruntime with
0x800000.... as yield mark will have the same impact as the algorithms I posted

Hi @Alt37 @raykzhao @ptr1337 @JohnyPeaN

Could you please test this patch on top of any cacule (the patch seems to me is going to work with any cacule version also it is going to work with RDB)

Notice that yield_type = 0 means the patch is disabled (i.e. just like normal cfs)
yield_type = 1 means the changes are enabled (i.e. the modification on normal cfs yield are active).

So, it is quite the opposite of Con's patch where yield_type=0 modifies the normal yield.

commit 02befde445c400f6db2e9943684c4bb405024743
Author: Hamad Al Marri <hamad@cachyos.org>
Date:   Mon Jul 12 20:50:42 2021 +0300

    yield_rework

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index 5a66fc5826fc..d17e2d60a6ab 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -36,6 +36,7 @@ extern unsigned int sysctl_sched_wakeup_granularity;
 extern unsigned int interactivity_factor;
 extern unsigned int interactivity_threshold;
 extern unsigned int cacule_max_lifetime;
+extern int sched_yield_type;
 #endif
 
 enum sched_tunable_scaling {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 88131d66856f..b14b172a4f13 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -82,6 +82,10 @@ const_debug unsigned int sysctl_sched_nr_migrate = 32;
  */
 unsigned int sysctl_sched_rt_period = 1000000;
 
+#ifdef CONFIG_CACULE_SCHED
+int __read_mostly sched_yield_type = 1;
+#endif
+
 __read_mostly int scheduler_running;
 
 /*
@@ -6968,6 +6972,15 @@ static void do_sched_yield(void)
 	struct rq_flags rf;
 	struct rq *rq;
 
+#ifdef CONFIG_CACULE_SCHED
+	struct task_struct *curr = current;
+	struct cacule_node *cn = &curr->se.cacule_node;
+
+	if (sched_yield_type) {
+		cn->vruntime |= YIELD_MARK;
+		return;
+	}
+#endif
 	rq = this_rq_lock_irq(&rf);
 
 	schedstat_inc(rq->yld_count);
@@ -7136,6 +7149,12 @@ int __sched yield_to(struct task_struct *p, bool preempt)
 	unsigned long flags;
 	int yielded = 0;
 
+// not sure about yield_to
+//#ifdef CONFIG_CACULE_SCHED
+	//if (sched_yield_type)
+		//return 0;
+//#endif
+
 	local_irq_save(flags);
 	rq = this_rq();
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 162395e3fda2..56585f578ad1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1072,7 +1072,7 @@ static void update_tg_load_avg(struct cfs_rq *cfs_rq)
 static void normalize_lifetime(u64 now, struct sched_entity *se)
 {
 	struct cacule_node *cn = &se->cacule_node;
-	u64 max_life_ns, life_time;
+	u64 max_life_ns, life_time, old_hrrn_x;
 	s64 diff;
 
 	/*
@@ -1085,8 +1085,12 @@ static void normalize_lifetime(u64 now, struct sched_entity *se)
 	diff		= life_time - max_life_ns;
 
 	if (diff > 0) {
+		// unmark YIELD. No need to check or remark since
+		// this normalize action doesn't happen very often
+		cn->vruntime &= YIELD_UNMARK;
+
 		// multiply life_time by 1024 for more precision
-		u64 old_hrrn_x	= (life_time << 7) / ((cn->vruntime >> 3) | 1);
+		old_hrrn_x = (life_time << 7) / ((cn->vruntime >> 3) | 1);
 
 		// reset life to half max_life (i.e ~15s)
 		cn->cacule_start_time = now - (max_life_ns >> 1);
@@ -4919,6 +4923,9 @@ static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *prev)
 		/* in !on_rq case, update occurred at dequeue */
 		update_load_avg(cfs_rq, prev, 0);
 	}
+
+	prev->cacule_node.vruntime &= YIELD_UNMARK;
+
 	cfs_rq->curr = NULL;
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0affe3be7c21..ff9ebf5da738 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -159,6 +159,11 @@ extern void call_trace_sched_update_nr_running(struct rq *rq, int count);
  */
 #define RUNTIME_INF		((u64)~0ULL)
 
+#ifdef CONFIG_CACULE_SCHED
+#define YIELD_MARK	0x8000000000000000ULL
+#define YIELD_UNMARK	0x7FFFFFFFFFFFFFFFULL
+#endif
+
 static inline int idle_policy(int policy)
 {
 	return policy == SCHED_IDLE;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index e8cdedf74fed..e1146d89ef9e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1758,6 +1758,15 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.procname	= "yield_type",
+		.data		= &sched_yield_type,
+		.maxlen		= sizeof (int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= &one_ul,
+	},
 #endif
 #ifdef CONFIG_SCHEDSTATS
 	{

yield-rework.zip

Thanks

Ive built with my general patches and with the yield-patches the rdb and normal cacule kernel for archlinux, it is built with -march=x86-64-v3

Can be found here:

https://aur.cachyos.org/?dir=cachyos-aur/x86_64_v3

@hamadmarri

Could you please test this patch on top of any cacule

Unfortunately on my machine system feels unbalanced (unexpectedly input latency is more noticeable) when yield is completely disabled (similar impression with CFS tweak, MUQSS & BMQ). "yield_type=1" of MuQSS, BMQ and possibly CFS (with @raykzhao patch ) seems best.

Notice that yield_type = 0 means the patch is disabled (i.e. just like normal cfs)
yield_type = 1 means the changes are enabled (i.e. the modification on normal cfs yield are active).
So, it is quite the opposite of Con's patch where yield_type=0 modifies the normal yield.

I guess it's better to have values which provide similar effect to those found in MuQSS, BMQ/PDS. Or maybe even rename tunable. Otherwise it would confuse users which will try to compare different schedulers (and especially those who set values through sysctl.conf).

@hamadmarri

Could you please test this patch on top of any cacule

Unfortunately on my machine system feels unbalanced (unexpectedly input latency is more noticeable) when yield is completely disabled (similar impression with CFS tweak, MUQSS & BMQ). "yield_type=1" of MuQSS, BMQ and possibly CFS (with @raykzhao patch ) seems best.

Notice that yield_type = 0 means the patch is disabled (i.e. just like normal cfs)
yield_type = 1 means the changes are enabled (i.e. the modification on normal cfs yield are active).
So, it is quite the opposite of Con's patch where yield_type=0 modifies the normal yield.

I guess it's better to have values which provide similar effect to those found in MuQSS, BMQ/PDS. Or maybe even rename tunable. Otherwise it would confuse users which will try to compare different schedulers (and especially those who set values through sysctl.conf).

@Alt37

So the patch performs worse?

@hamadmarri
At least in my case yes.

@hamadmarri
I haven't tested the game that had problems which could be alleviated with yield overrides, but until now I noticed that this probably introduced lags. A few times even the mouse pointer froze during loading (disk activity). I will test this (switching yield, dropping caches between tests, wihout RDB) during weekend to see if it is related.
Also I was confused with the yield_type values.

Hi @hamadmarri @JohnyPeaN

Same here. I also notice lags under heavy multithreading loads e.g. LTO by using lld. I have tested CacULE without RDB and it doesn't seem helpful in my case.

Maybe try to compile with LTO @raykzhao and with GCC and the GCC March patch, and watch if this problem also happens.

Maybe we still need the normal yield schedule() call, but only making sure the task is not going to be repicked

Could you please try this fix

commit 52dc256da20c5dac596a74d728cfa7176630ce89
Author: Hamad Al Marri <hamad@cachyos.org>
Date:   Fri Jul 16 12:32:23 2021 +0300

    keep yield work but ensure yield flag is set only

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b14b172a4f13..c294c3bc2356 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6978,7 +6978,6 @@ static void do_sched_yield(void)
 
 	if (sched_yield_type) {
 		cn->vruntime |= YIELD_MARK;
-		return;
 	}
 #endif
 	rq = this_rq_lock_irq(&rf);

yield-rework-fix1.zip

@hamadmarri
With "yield-rework-fix1.patch" applied on top of "cacule-5.13.patch" + "yield_rework.patch" latency is definitely better than previous attempt and "yield_type=0"
Thank you.

Hi @Alt37 @hamadmarri

Unfortunately it seems that the new fix doesn't help the heavy multithreading load case under RDB. When I switched to kernel.yield_type=0 under such load, my system suddenly became more responsive.

I will try to see how CacULE+yield_rework+fix without RDB performs.

Hi @Alt37 @hamadmarri

Unfortunately it seems that the new fix doesn't help the heavy multithreading load case under RDB. When I switched to kernel.yield_type=0 under such load, my system suddenly became more responsive.

I will try to see how CacULE+yield_rework+fix without RDB performs.

Hi @raykzhao

Are you using RDB + autogroup?

Hi @Alt37 @hamadmarri
Unfortunately it seems that the new fix doesn't help the heavy multithreading load case under RDB. When I switched to kernel.yield_type=0 under such load, my system suddenly became more responsive.
I will try to see how CacULE+yield_rework+fix without RDB performs.

Hi @raykzhao

Are you using RDB + autogroup?

No. I'm not using autogroup. I also tested the fix without RDB and got similar issues (w/o autogroup).

By the way, I am using CONFIG_PREEMPT_VOLUNTARY=y right now. Not sure if sched_yield would work differently between CONFIG_PREEMPT_VOLUNTARY=y and CONFIG_PREEMPT=y.

Hi @Alt37 @hamadmarri
Unfortunately it seems that the new fix doesn't help the heavy multithreading load case under RDB. When I switched to kernel.yield_type=0 under such load, my system suddenly became more responsive.
I will try to see how CacULE+yield_rework+fix without RDB performs.

Hi @raykzhao
Are you using RDB + autogroup?

No. I'm not using autogroup. I also tested the fix without RDB and got similar issues (w/o autogroup).

By the way, I am using CONFIG_PREEMPT_VOLUNTARY=y right now. Not sure if sched_yield would work differently between CONFIG_PREEMPT_VOLUNTARY=y and CONFIG_PREEMPT=y.

What kind of CPU load you tested? compiling?
I am trying to see if I/O is related to yield.

Thank you

By the way, I am using CONFIG_PREEMPT_VOLUNTARY=y right now. Not sure if sched_yield would work differently between CONFIG_PREEMPT_VOLUNTARY=y and CONFIG_PREEMPT=y.

It could be, but I don't think that the changes I made on yield is related to preempt.

Hi @Alt37 @hamadmarri
Unfortunately it seems that the new fix doesn't help the heavy multithreading load case under RDB. When I switched to kernel.yield_type=0 under such load, my system suddenly became more responsive.
I will try to see how CacULE+yield_rework+fix without RDB performs.

Hi @raykzhao
Are you using RDB + autogroup?

No. I'm not using autogroup. I also tested the fix without RDB and got similar issues (w/o autogroup).
By the way, I am using CONFIG_PREEMPT_VOLUNTARY=y right now. Not sure if sched_yield would work differently between CONFIG_PREEMPT_VOLUNTARY=y and CONFIG_PREEMPT=y.

What kind of CPU load you tested? compiling?
I am trying to see if I/O is related to yield.

Thank you

Yes, the lag happened during compiling, especially during the LTO linking. In my case I think the I/O could also be a factor, since I am using XFS on top of dmcrypt on my laptop, and both XFS and dmcrypt would spawn lots of kworker during compilation.

I would like to test if there would be a difference between XFS and ext4 during compilation workload. However, due to the COVID restrictions in my city, I cannot access the workstation with ext4 in my workplace right now.

Hi @hamadmarri,

With RDB+autogroup, the yield-rework patch with fix is good on my laptop. It seems that the yield-rework may work best on CONFIG_SCHED_AUTOGROUP=y builds.

Thanks.

Hi @Alt37 @raykzhao @JohnyPeaN @ptr1337

Should we add the yield work to CacULE? or wait for couple more tests?

I can flip the yield_type to match the convention of Con's work to prevent confusion.

Hi @Alt37 @raykzhao @JohnyPeaN @ptr1337

Should we add the yield work to CacULE? or wait for couple more tests?

I can flip the yield_type to match the convention of Con's work to prevent confusion.

I think we may add it to CacULE, but disable it i.e. using the original CFS behaviour by default. People who want to test it can enable it manually via sysctl. We may suggest enabling it together with autogroup in the suggested configuration.

MAybe with a sysctl option for the yield type.

the autogroup rdb worked on my amd cpu really weird and used not that what he normally used.

Hi @hamadmarri,

I just tested the yield-rework with rdb+autogroup on the workstation with ext4 filesystem, and somehow the system feels smoother than my laptop under load even with a lower CONFIG_HZ. The I/O related configuration are:
Workstation: SSHD with ext4 filesystem, using bfq I/O scheduler
Laptop: SSD with xfs filesystem, using mq-deadline I/O scheduler

Here is the difference of the kernel configuration:

--- config-5.13.5-laptop	2021-07-28 14:52:32.877669998 +1000
+++ config-5.13.5-workstation	2021-07-28 14:52:26.217687389 +1000
@@ -410,7 +410,7 @@
 CONFIG_NR_CPUS_RANGE_BEGIN=2
 CONFIG_NR_CPUS_RANGE_END=512
 CONFIG_NR_CPUS_DEFAULT=64
-CONFIG_NR_CPUS=4
+CONFIG_NR_CPUS=8
 CONFIG_SCHED_SMT=y
 CONFIG_SCHED_MC=y
 # CONFIG_SCHED_MC_PRIO is not set
@@ -477,13 +477,13 @@
 CONFIG_EFI=y
 CONFIG_EFI_STUB=y
 CONFIG_EFI_MIXED=y
-# CONFIG_HZ_100 is not set
+CONFIG_HZ_100=y
 # CONFIG_HZ_250 is not set
 # CONFIG_HZ_300 is not set
-CONFIG_HZ_500=y
+# CONFIG_HZ_500 is not set
 # CONFIG_HZ_1000 is not set
 # CONFIG_HZ_2000 is not set
-CONFIG_HZ=500
+CONFIG_HZ=100
 CONFIG_SCHED_HRTICK=y
 # CONFIG_KEXEC is not set
 # CONFIG_KEXEC_FILE is not set
@@ -503,7 +503,7 @@
 # CONFIG_LEGACY_VSYSCALL_XONLY is not set
 CONFIG_LEGACY_VSYSCALL_NONE=y
 CONFIG_CMDLINE_BOOL=y
-CONFIG_CMDLINE="page_alloc.shuffle=1 nohz_full=1-3 rcupdate.rcu_expedited=1"
+CONFIG_CMDLINE="page_alloc.shuffle=1 nohz_full=1-3,5-7 rcupdate.rcu_expedited=1"
 # CONFIG_CMDLINE_OVERRIDE is not set
 # CONFIG_MODIFY_LDT_SYSCALL is not set
 CONFIG_HAVE_LIVEPATCH=y
@@ -532,7 +532,7 @@
 CONFIG_PM_TRACE_RTC=y
 CONFIG_PM_CLK=y
 CONFIG_PM_GENERIC_DOMAINS=y
-CONFIG_WQ_POWER_EFFICIENT_DEFAULT=y
+# CONFIG_WQ_POWER_EFFICIENT_DEFAULT is not set
 CONFIG_PM_GENERIC_DOMAINS_SLEEP=y
 # CONFIG_ENERGY_MODEL is not set
 CONFIG_ARCH_SUPPORTS_ACPI=y
@@ -608,12 +608,12 @@
 CONFIG_CPU_FREQ_GOV_ATTR_SET=y
 CONFIG_CPU_FREQ_GOV_COMMON=y
 CONFIG_CPU_FREQ_STAT=y
-# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
+CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
 # CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
 # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
 # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
 # CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
-CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL=y
+# CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not set
 CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
 CONFIG_CPU_FREQ_GOV_POWERSAVE=y
 CONFIG_CPU_FREQ_GOV_USERSPACE=y
@@ -741,7 +741,7 @@
 CONFIG_VIRTUALIZATION=y
 CONFIG_KVM=m
 CONFIG_KVM_INTEL=m
-# CONFIG_X86_SGX_KVM is not set
+CONFIG_X86_SGX_KVM=y
 CONFIG_KVM_AMD=m
 CONFIG_KVM_AMD_SEV=y
 CONFIG_KVM_XEN=y
@@ -945,7 +945,7 @@
 #
 CONFIG_MQ_IOSCHED_DEADLINE=y
 CONFIG_MQ_IOSCHED_KYBER=m
-CONFIG_IOSCHED_BFQ=m
+CONFIG_IOSCHED_BFQ=y
 CONFIG_BFQ_GROUP_IOSCHED=y
 # CONFIG_BFQ_CGROUP_DEBUG is not set
 # end of IO Schedulers
@@ -8884,14 +8884,14 @@
 CONFIG_FS_IOMAP=y
 # CONFIG_EXT2_FS is not set
 # CONFIG_EXT3_FS is not set
-CONFIG_EXT4_FS=m
+CONFIG_EXT4_FS=y
 CONFIG_EXT4_USE_FOR_EXT2=y
 CONFIG_EXT4_FS_POSIX_ACL=y
 CONFIG_EXT4_FS_SECURITY=y
 # CONFIG_EXT4_DEBUG is not set
-CONFIG_JBD2=m
+CONFIG_JBD2=y
 # CONFIG_JBD2_DEBUG is not set
-CONFIG_FS_MBCACHE=m
+CONFIG_FS_MBCACHE=y
 CONFIG_REISERFS_FS=m
 # CONFIG_REISERFS_CHECK is not set
 CONFIG_REISERFS_PROC_INFO=y
@@ -8903,7 +8903,7 @@
 CONFIG_JFS_SECURITY=y
 # CONFIG_JFS_DEBUG is not set
 CONFIG_JFS_STATISTICS=y
-CONFIG_XFS_FS=y
+CONFIG_XFS_FS=m
 CONFIG_XFS_SUPPORT_V4=y
 CONFIG_XFS_QUOTA=y
 CONFIG_XFS_POSIX_ACL=y
@@ -8947,7 +8947,7 @@
 CONFIG_FILE_LOCKING=y
 # CONFIG_MANDATORY_FILE_LOCKING is not set
 CONFIG_FS_ENCRYPTION=y
-CONFIG_FS_ENCRYPTION_ALGS=m
+CONFIG_FS_ENCRYPTION_ALGS=y
 CONFIG_FS_ENCRYPTION_INLINE_CRYPT=y
 CONFIG_FS_VERITY=y
 # CONFIG_FS_VERITY_DEBUG is not set
@@ -9585,7 +9585,7 @@
 CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
 CONFIG_ARCH_USE_SYM_ANNOTATIONS=y
 CONFIG_CRC_CCITT=m
-CONFIG_CRC16=m
+CONFIG_CRC16=y
 CONFIG_CRC_T10DIF=y
 CONFIG_CRC_ITU_T=m
 CONFIG_CRC32=y
@@ -9597,7 +9597,7 @@
 CONFIG_CRC64=m
 CONFIG_CRC4=m
 CONFIG_CRC7=m
-CONFIG_LIBCRC32C=y
+CONFIG_LIBCRC32C=m
 CONFIG_CRC8=m
 CONFIG_XXHASH=y
 # CONFIG_RANDOM32_SELFTEST is not set

Hi @raykzhao

Soon I will patch cacule with the yield rework. Do you think it is a good idea to enable it by default?

Thank you

Hi @raykzhao

Soon I will patch cacule with the yield rework. Do you think it is a good idea to enable it by default?

Thank you

I think maybe we can enable the yield-rework by default now, since the performance issue on my device is likely due to the I/O and filesystem.