Knowledge: Disk Performance Hints & Tips

Use I/O Threads for best disk performance

Per default, all I/O requests are handled in a single event loop within QEMU's main thread. Virtual servers with more than one virtual CPU running I/O intensive workloads can therefore experience lock contentions leading to noticeable performance impacts.
Using separate threads for I/O event handling can significantly improve throughput of virtual disks. I/O threads must be allocated explicitly and disks must be associated to them. The allocation of I/O threads is requested with the iothread tag in the libvirt domain XML.
will allocate 2 I/O threads for the QEMU process, and
      <disk type='block' device='disk>
         <driver name='qemu' type='raw' iothread='2'/>
will assign the disk to the I/O thread number 2.
Note that the gain in I/O performance comes at the cost of CPU consumption. Therefore the number of I/O threads and the distribution to virtual disks needs to be chosen considerately.
Rules of thumb:
  • The number of I/O threads should not exceed the number of host CPUs.
  • Over-provisioning of I/O threads should be avoided: A good starting point would be to have one I/O thread for every two to three virtual disks.
  • Even a single I/O thread will instantly improve the overall I/O performance compared to default behavior and should therefore always be configured.

Choosing the right AIO mode

In order to achieve the best possible throughput, QEMU performs disk I/O operations asynchronously, either
  • through a pool of userspace threads (not to be confused with I/O threads), or
  • by means of Linux kernel AIO (Asynchronous I/O).
By default, the userspace method is used, which is supposed to work in all environments. However, it is not as efficient as kernel AIO.
If the virtual disks are backed by block devices, raw file images or pre-allocated QCOW2 images, it is recommended to use kernel AIO, which can be enabled using the following libvirt XML snippet:
      <disk [...]>
         <driver name='qemu' format='raw' io='native' cache='none'/>
Note the cache='none' attribute that should always be specified together with io='native' to prevent QEMU from falling back to userspace AIO. Also it might be necessary to increase the system limit for asynchronous I/O requests, see this article.
When space efficient image files are used (QCOW2 without pre-allocation, or sparse raw images) the default of io='threads' may be better suited. This is because writing to not yet allocated sectors may temporarily block the virtual CPU and thus decrease I/O performance.


  1. Hello Stefan,

    I noticed that by read numbers are better by running a microbenchmark, say fio. But, the write numbers aren't that great. Do you have any recommendations for boosting write numbers?


    1. Hi,
      Are you using QCOW images? These are known to be slow on first write as the underlying sectors are allocated on demand. Writes that do not demand additional allocations should perform fine.
      One way to work around this is to use the "preallocation" option (see the qemu-img man page) when creating the QEMU image.