Create an Open Source Ecosystem - DapuStor's Fix for an Inherent FIO Problem
There is no better tool for SSD performance testing than FIO, an open source testing tool developed by Jens Axboe, which is very powerful and generally available in the repositories of major Linux distributions.
Below we will briefly describe a problem inherent in the historical version of FIO that DapuStor has identified and resolved during its recent use of FIO. The solution to this problem was submitted to the FIO GitHub repository and has been successfully adopted and is now incorporated into:
(https://github.com/axboe/fio/pull/1479)
Problems Found During PCIe 5.0 SSD Performance Testing with FIO
In our daily SSD performance tests, we usually turn off hyper-threading and bind the CPU of the FIO test process with the cpus_allowed parameter, so that the test results are more stable and reproducible. However, we have encountered a new problem in the process. In certain system environments, setting cpus_allowed causes FIO to exit with an error.
The following figure shows the lscpu output of a Linux system running on an Intel i9-12900K with hyperthreading turned off.
The above diagram shows that the On-line CPU(s) list is made up of non-consecutive numbers 0,2,4,6,8,10,12,14,16-19, for a total of 12 available CPUs.
When the parameter cpus_allowed=14 is set to run FIO, FIO reports an error and fails to start the test with the following error message:
Solutions from DapuStor engineers
Based on the error message above, a global search for too large in the FIO code shows that the function set_cpus_allowed is the location of the error message. Further analysis shows that this function uses sysconf(_SC_NPROCESSORS_ONLN) - 1 as the maximum CPU ID.
Checking the documentation via man sysconf, we find that _SC_NPROCESSORS_ONLN is not applicable and that _SC_NPROCESSORS_CONF is more suitable for the cpus_allowed parameter check logic. The _SC_NPROCESSORS_ONLN refers to the number of online CPUs. There are only 12 online CPUs in the current Linux system, but the available CPU IDs are any value from 0,2,4,6,8,10,12,14,16-19, so there are cases where the CPU ID is greater than the number of cpus_online - 1. The _SC_NPROCESSORS_CONF refers to the maximum number of CPUs that can be configured by the operating system.
To confirm the above analysis, we wrote a simple printout which shows that the current system returns 12 for _SC_NPROCESSORS_ONLN and 20 for _SC_NPROCESSORS_CONF. _SC_NPROCESSORS_CONF is more in line with the check logic of the set_cpus_allowed function.
After replacing the sysconf parameter _SC_NPROCESSORS_ONLN with _SC_NPROCESSORS_CONF, we recompile the FIO and it runs normally with the parameter cpus_allowed=14.
Based on the above analysis, our engineers made changes to the FIO code and submitted a github pull request to FIO at https://github.com/axboe/FIO/pull/1479. After community discussion and historical code cleanup, the commit has been merged in.
DapuStor has always been a supporter of open source technology and has joined open source communities such as AliCloud PolarDB and openEuler. The open source platform has allowed us to enjoy the sharing and contribution of countless engineers' achievements, and at the same time DapuStor, as a participant and contributor to the open source ecology, has actively engaged in it to continuously promote the development of the open infrastructure ecology, hoping to continue to embrace the new value of the digital age with leading technology and an open attitude in the future, rubbing more sparks of innovation with the industry and building a prosperous open ecosystem.