“If we would like to get real-time performance on single-CPU systems it is necessary to adapt the entire system, e.g. using the PREEMPT_RT patch or an RTOS. This is not always necessary in a multicore system.”
Architecture layout
Before:
lstopo
After:
sudo gedit /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash isolcpus=1,5"
reboot and chneck that the parameter was passed to the kernel
cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.15.0-43-generic root=UUID=47226ffd-864a-4f7d-9f15-779cdee4bdf3 ro quiet splash isolcpus=1,5 vt.handoff=1
Processor per CPU before CPU isolation
cpu: 0 pid: 37.0
cpu: 1 pid: 49.0
cpu: 2 pid: 48.0
cpu: 3 pid: 45.0
cpu: 4 pid: 41.0
cpu: 5 pid: 46.0
cpu: 6 pid: 45.0
cpu: 7 pid: 32.0
Processes per CPU after CPU isolation
cpu: 0 pid: 51.0
cpu: 1 pid: 8.0
cpu: 2 pid: 49.0
cpu: 3 pid: 42.0
cpu: 4 pid: 42.0
cpu: 5 pid: 8.0
cpu: 6 pid: 63.0
cpu: 7 pid: 57.0
The remaining 8 process for both cpus are:
- cpuhp PRI: RT NI 0
- watchdog PRI: RT NI 0
- migration PRI: RT NI 0
- ksoftirqd PRI: 20 NI 0
- kworker/1:0 PRI 20 NI 0
- kworker/1:0H PRI 0 NI -20
linux kernel software interrupts
stress -d 4 --hdd-bytes 20M -c 4 -i 4 -m 4 --vm-bytes 15M -t 40s
Ethernet Optimization
network questions
- Is there one transmite queue per interface ?
- Can I increase the rate at which TCP execution occures.
- What does increasing the interrupt rate of the NIC driver do ?
rx-tx optimization
/sys/class/net holds all the network interfaces /sys/class/net/eno1/queues holds queues for receiving and sending data over the network interface card. In my case I have:
- rx-0
- tx-0
I have one queue for receiving data (rx-0) and one for sending data (tx-0)
low latency ubuntu kernel
sudo apt-get install linux-lowlatency
Low latency vs generic ubuntu Kernel
diff config-4.15.0-50-generic config-4.15.0-50-lowlatency
The low latency has the following additions:
- CONFIG_IRQ_FORCED_THREADING_DEFAULT=y
- CONFIG_TREE_RCU (removed)
- CONFIG_PREEMPT_RCU=y
-
CONFIG_UNINLINE_SPIN_UNLOCK=y
- CONFIG_PREEMPT_VOLUNTARY=y (removed)
- CONFIG_PREEMPT=y
-
CONFIG_PREEMPT_COUNT=y
- CONFIG_HZ_1000=y
-
CONFIG_HZ=1000
- CONFIG_CEC_PIN=y (removed)
-
CONFIG_CEC_GPIO=m
- CONFIG_LATENCYTOP=y
- CONFIG_PREEMPT_TRACER (removed)
Bios settings
-
C-State The C-State represents the processor power state of the core. The C-State is often more commonly known as the processor “idle” state of the core. C-State values range from C0 to Cn, where n is dependent on the specific processor. When the core is active and executing instructions it is in the C0 state. Higher C-States indicate how deep the CPU idle state is.
-
P-State The voltage frequency pair is known as the Device and Processor Performance State (P-State). A P-State of P0 is the highest voltage/frequency pairing. A high P-State will have lower voltage and frequency levels. It takes the processor longer to complete a task in a high P-State, but less energy is consumed.
Links
- Ethernet optimization
- Networking bonding
- C++11 thread affinity
- Improving real time properties
- How To Optimize Performance
- Linux kernel IRQ affinity is a good resouce on the subject.
- SMP-affinity