Configuring PCI passthrough - Virtual machines | Virtualization (2025)

To prepare a host device for PCI passthrough by using the CLI, create a MachineConfig object and add kernel arguments to enable the Input-Output Memory Management Unit (IOMMU). Bind the PCI device to the Virtual Function I/O (VFIO) driver and then expose it in the cluster by editing the permittedHostDevices field of the HyperConverged custom resource (CR). The permittedHostDevices list is empty when you first install the OpenShift Virtualization Operator.

To remove a PCI host device from the cluster by using the CLI, delete the PCI device information from the HyperConverged CR.

Adding kernel arguments to enable the IOMMU driver

To enable the IOMMU (Input-Output Memory Management Unit) driver in the kernel, create the MachineConfig object and add the kernel arguments.

Prerequisites

  • Administrative privilege to a working OpenShift Container Platform cluster.

  • Intel or AMD CPU hardware.

  • Intel Virtualization Technology for Directed I/O extensions or AMD IOMMU in the BIOS (Basic Input/Output System) is enabled.

Procedure

  1. Create a MachineConfig object that identifies the kernel argument. The following example shows a kernel argument for an Intel CPU.

    apiVersion: machineconfiguration.openshift.io/v1kind: MachineConfigmetadata: labels: machineconfiguration.openshift.io/role: worker (1) name: 100-worker-iommu (2)spec: config: ignition: version: 3.2.0 kernelArguments: - intel_iommu=on (3)...
    1Applies the new kernel argument only to worker nodes.
    2The name indicates the ranking of this kernel argument (100) among the machine configs and its purpose. If you have an AMD CPU, specify the kernel argument as amd_iommu=on.
    3Identifies the kernel argument as intel_iommu for an Intel CPU.
  2. Create the new MachineConfig object:

    $ oc create -f 100-worker-kernel-arg-iommu.yaml

Verification

  • Verify that the new MachineConfig object was added.

    $ oc get MachineConfig

Binding PCI devices to the VFIO driver

To bind PCI devices to the VFIO (Virtual Function I/O) driver, obtain the values for vendor-ID and device-ID from each device and create a list with the values. Add this list to the MachineConfig object. The MachineConfig Operator generates the /etc/modprobe.d/vfio.conf on the nodes with the PCI devices, and binds the PCI devices to the VFIO driver.

Procedure

  1. Run the lspci command to obtain the vendor-ID and the device-ID for the PCI device.

    $ lspci -nnv | grep -i nvidia

    Example output

    02:01.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] [10de:1eb8] (rev a1)
  2. Create a Butane config file, 100-worker-vfiopci.bu, binding the PCI device to the VFIO driver.

    See "Creating machine configs with Butane" for information about Butane.

    Example

    variant: openshiftversion: 4.12.0metadata: name: 100-worker-vfiopci labels: machineconfiguration.openshift.io/role: worker (1)storage: files: - path: /etc/modprobe.d/vfio.conf mode: 0644 overwrite: true contents: inline: | options vfio-pci ids=10de:1eb8 (2) - path: /etc/modules-load.d/vfio-pci.conf (3) mode: 0644 overwrite: true contents: inline: vfio-pci
    1Applies the new kernel argument only to worker nodes.
    2Specify the previously determined vendor-ID value (10de) and the device-ID value (1eb8) to bind a single device to the VFIO driver. You can add a list of multiple devices with their vendor and device information.
    3The file that loads the vfio-pci kernel module on the worker nodes.
  3. Use Butane to generate a MachineConfig object file, 100-worker-vfiopci.yaml, containing the configuration to be delivered to the worker nodes:

    $ butane 100-worker-vfiopci.bu -o 100-worker-vfiopci.yaml
  4. Apply the MachineConfig object to the worker nodes:

    $ oc apply -f 100-worker-vfiopci.yaml
  5. Verify that the MachineConfig object was added.

    $ oc get MachineConfig

    Example output

    NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE00-master d3da910bfa9f4b599af4ed7f5ac270d55950a3a1 3.2.0 25h00-worker d3da910bfa9f4b599af4ed7f5ac270d55950a3a1 3.2.0 25h01-master-container-runtime d3da910bfa9f4b599af4ed7f5ac270d55950a3a1 3.2.0 25h01-master-kubelet d3da910bfa9f4b599af4ed7f5ac270d55950a3a1 3.2.0 25h01-worker-container-runtime d3da910bfa9f4b599af4ed7f5ac270d55950a3a1 3.2.0 25h01-worker-kubelet d3da910bfa9f4b599af4ed7f5ac270d55950a3a1 3.2.0 25h100-worker-iommu 3.2.0 30s100-worker-vfiopci-configuration 3.2.0 30s

Verification

  • Verify that the VFIO driver is loaded.

    $ lspci -nnk -d 10de:

    The output confirms that the VFIO driver is being used.

    Example output

    04:00.0 3D controller [0302]: NVIDIA Corporation GP102GL [Tesla P40] [10de:1eb8] (rev a1) Subsystem: NVIDIA Corporation Device [10de:1eb8] Kernel driver in use: vfio-pci Kernel modules: nouveau

Exposing PCI host devices in the cluster using the CLI

To expose PCI host devices in the cluster, add details about the PCI devices to the spec.permittedHostDevices.pciHostDevices array of the HyperConverged custom resource (CR).

Procedure

  1. Edit the HyperConverged CR in your default editor by running the following command:

    $ oc edit hyperconverged kubevirt-hyperconverged -n openshift-cnv
  2. Add the PCI device information to the spec.permittedHostDevices.pciHostDevices array. For example:

    Example configuration file

    apiVersion: hco.kubevirt.io/v1kind: HyperConvergedmetadata: name: kubevirt-hyperconverged namespace: openshift-cnvspec: permittedHostDevices: (1) pciHostDevices: (2) - pciDeviceSelector: "10DE:1DB6" (3) resourceName: "nvidia.com/GV100GL_Tesla_V100" (4) - pciDeviceSelector: "10DE:1EB8" resourceName: "nvidia.com/TU104GL_Tesla_T4" - pciDeviceSelector: "8086:6F54" resourceName: "intel.com/qat" externalResourceProvider: true (5)...
    1The host devices that are permitted to be used in the cluster.
    2The list of PCI devices available on the node.
    3The vendor-ID and the device-ID required to identify the PCI device.
    4The name of a PCI host device.
    5Optional: Setting this field to true indicates that the resource is provided by an external device plugin. OpenShift Virtualization allows the usage of this device in the cluster but leaves the allocation and monitoring to an external device plugin.

    The above example snippet shows two PCI host devices that are named nvidia.com/GV100GL_Tesla_V100 and nvidia.com/TU104GL_Tesla_T4 added to the list of permitted host devices in the HyperConverged CR. These devices have been tested and verified to work with OpenShift Virtualization.

  3. Save your changes and exit the editor.

Verification

  • Verify that the PCI host devices were added to the node by running the following command. The example output shows that there is one device each associated with the nvidia.com/GV100GL_Tesla_V100, nvidia.com/TU104GL_Tesla_T4, and intel.com/qat resource names.

    $ oc describe node <node_name>

    Example output

    Capacity: cpu: 64 devices.kubevirt.io/kvm: 110 devices.kubevirt.io/tun: 110 devices.kubevirt.io/vhost-net: 110 ephemeral-storage: 915128Mi hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 131395264Ki nvidia.com/GV100GL_Tesla_V100 1 nvidia.com/TU104GL_Tesla_T4 1 intel.com/qat: 1 pods: 250Allocatable: cpu: 63500m devices.kubevirt.io/kvm: 110 devices.kubevirt.io/tun: 110 devices.kubevirt.io/vhost-net: 110 ephemeral-storage: 863623130526 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 130244288Ki nvidia.com/GV100GL_Tesla_V100 1 nvidia.com/TU104GL_Tesla_T4 1 intel.com/qat: 1 pods: 250

Removing PCI host devices from the cluster using the CLI

To remove a PCI host device from the cluster, delete the information for that device from the HyperConverged custom resource (CR).

Procedure

  1. Edit the HyperConverged CR in your default editor by running the following command:

    $ oc edit hyperconverged kubevirt-hyperconverged -n openshift-cnv
  2. Remove the PCI device information from the spec.permittedHostDevices.pciHostDevices array by deleting the pciDeviceSelector, resourceName and externalResourceProvider (if applicable) fields for the appropriate device. In this example, the intel.com/qat resource has been deleted.

    Example configuration file

    apiVersion: hco.kubevirt.io/v1kind: HyperConvergedmetadata: name: kubevirt-hyperconverged namespace: openshift-cnvspec: permittedHostDevices: pciHostDevices: - pciDeviceSelector: "10DE:1DB6" resourceName: "nvidia.com/GV100GL_Tesla_V100" - pciDeviceSelector: "10DE:1EB8" resourceName: "nvidia.com/TU104GL_Tesla_T4"...
  3. Save your changes and exit the editor.

Verification

  • Verify that the PCI host device was removed from the node by running the following command. The example output shows that there are zero devices associated with the intel.com/qat resource name.

    $ oc describe node <node_name>

    Example output

    Capacity: cpu: 64 devices.kubevirt.io/kvm: 110 devices.kubevirt.io/tun: 110 devices.kubevirt.io/vhost-net: 110 ephemeral-storage: 915128Mi hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 131395264Ki nvidia.com/GV100GL_Tesla_V100 1 nvidia.com/TU104GL_Tesla_T4 1 intel.com/qat: 0 pods: 250Allocatable: cpu: 63500m devices.kubevirt.io/kvm: 110 devices.kubevirt.io/tun: 110 devices.kubevirt.io/vhost-net: 110 ephemeral-storage: 863623130526 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 130244288Ki nvidia.com/GV100GL_Tesla_V100 1 nvidia.com/TU104GL_Tesla_T4 1 intel.com/qat: 0 pods: 250
Configuring PCI passthrough - Virtual machines | Virtualization (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Delena Feil

Last Updated:

Views: 5980

Rating: 4.4 / 5 (65 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Delena Feil

Birthday: 1998-08-29

Address: 747 Lubowitz Run, Sidmouth, HI 90646-5543

Phone: +99513241752844

Job: Design Supervisor

Hobby: Digital arts, Lacemaking, Air sports, Running, Scouting, Shooting, Puzzles

Introduction: My name is Delena Feil, I am a clean, splendid, calm, fancy, jolly, bright, faithful person who loves writing and wants to share my knowledge and understanding with you.