Corosync Main Process Was Not Scheduled for x ms
This article will help explain what the Corosync message, “Corosync main process was not scheduled for x ms”, means in your logs, and where you should look to address it.
The “Corosync main process was not scheduled for x ms” messages in the logs are generated when Corosync is not scheduled for CPU time for over 2 seconds (2000ms). Corosync runs as a real-time process. Real-time processes receive the highest priority for CPU time on a Linux system, and therefore should receive “real-time scheduling”. If Corosync is not getting scheduled in a timely manner, then either the system is severely overloaded, or in the case of a virtualized cluster node, the virtual machine (VM) is not getting the CPU time from the hypervisor that it requires.
There is no amount of Corosync tuning that will mitigate these messages. If this is a virtualized cluster, you should investigate the hypervisor load to verify that it has the appropriate amount of resources to host the cluster, and also check that the hypervisor is not “freezing” the cluster VMs for any reason, such as making backups or performing automated live migrations.
Reviewed 2020/12/01 - DGT