Monday, January 28, 2019

libvirt v4.10 released, providing PCI passthrough support

libvirt v4.10, available for download at the libvirt project website, adds support for PCI passthrough devices on IBM Z (requires Linux kernel 4.14 and QEMU v2.11).
To setup passthrough for a PCI device, follow these steps:
  1. Make sure the vfio-pci module is  available, e.g. using the modinfo command:
       $ modinfo vfio-pci
       filename:       /lib/modules/4.18.0/kernel/drivers/vfio/pci/vfio-pci.ko
       description:    VFIO PCI - User Level meta-driver
  2. Verify that the pciutils package, providing the lspci command et al, is available using your distro's package manager
  3. Determine the PCI device's address using the lspci command:
       $ lspci

       0002:06:00.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family

                    [ConnectX-3/ConnectX-3 Pro Virtual Function]
     
  4. Add the following element to the guest domain XML's devices section:
       <hostdev mode='subsystem' type='pci' managed='yes'>

         <source>

           <address domain='0x0002' bus='0x06' slot='0x00' function='0x0'/>

         </source>

       </hostdev>

    Note that if attribute managed is set to no (which is the default), it becomes the user's duty to unbind the PCI device from the respective device driver, and rebind to vfio-pci in the host prior to starting the guest.
Once done and the guest is started, running the lspci command in the guest should show the PCI device, and one can proceed to configure it as needed.
It is well worth checking out the expanded domain XML:
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0002' bus='0x06' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0002' bus='0x00' slot='0x01' function='0x0'>
        <zpci uid='0x0001' fid='0x00000000'/>
      </address>
    </hostdev>

Theoretically, the PCI address in the guest can change between boots. However, the <zpci> element guarantees address persistence inside of the guest. The actual address of the passthrough device is based solely on the uid attribute: The uid becomes the PCI domain, all remaining values of the address (PCI bus, slot and function) are set to zero. Therefore, in this example, the PCI address in the guest would be 0001:00:00:0.
Take note of the fid attribute, whose value is required to hotplug/hotunplug PCI devices within a guest.
Furthermore note that the target PCI address is not visible anywhere (except within the QEMU process) at all. I.e. it is not related to the PCI address as observed within the KVM guest, and could be set to an arbitrary value. However, choosing the "wrong" values might have undesired subtle side effects with QEMU. Therefore, we strongly recommend not to specify a target address, and to rather rely on the auto-assignment. I.e. if the guest's PCI address has to be chosen, at a maximum restrict the target address element to uid (for PCI address definition) and fid (so that e.g. scripts in the guest for hotplugging PCI devices can rely on a specific value) as follows:
   <address type='pci'>
     <zpci uid='0x0001' fid='0x00000000'/>
   </address>


For further (rather technical) details see here and here (git commit).