打印

[軟件] [Tutorial] Enabling SR-IOV for Intel NIC (X550-T2) on Proxmox 6

[Tutorial] Enabling SR-IOV for Intel NIC (X550-T2) on Proxmox 6

The most up-to-date tutorial can be found here:
https://www.reddit.com/r/Proxmox/comments/cm81tc/tutorial_enabling_sriov_for_intel_nic_x550t2_on/

As I have struggled through setting up (and succeed, yay!) SR-IOV with an Intel NIC, I decide to do a little write up for myself and as a sharing. I am not an expert in Linux/KVM/Networking, so my implementation might not be the best. I am glad if you can point out anything to improve regarding the steps.
In this tutorial I try to outline the essential steps to get SR-IOV up and running by enabling virtual functions (VF) on your NIC in a PVE system. You can set this up without even connecting to internet. Evidently, you will need a compatible system to begin with.
First of all, here is my system for reference:
  • Athlon 200GE
  • Asrock X470 Fatality ITX
  • Intel X550-T2
Then run a standalone PVE and house a router VM (ClearOS) and a few NAS/Web servers. SR-IOV is not required but I guess it doesn't hurt to learn how it can be used so I bothered.
In the following, I assume you have a fresh installation of PVE. You can skip some steps in case you have already performed them.

Part 1 - Enable IOMMU

First you want to make sure IOMMU and SR-IOV are enabled in BIOS. Also, you want to set ACS to enable in BIOS. If you do not have these setting in the BIOS, it is highly likely your system does not support SR-IOV to begin with and little can be done. For reference, some Asrock's consumer boards have these setting, e.g. Asrock X470 Fatality ITX
Here, I will follow PVE's Wiki: https://pve.proxmox.com/wiki/PCI(e)_Passthrough
For editing conf, you can use either vi or nano to edit a text file, e.g. nano /etc/default/grub. Then you can do Ctrl+O, Enter to save. Ctrl+X to leave.

Enabling IOMMU
  • edit /etc/default/grub, on the line GRUB_CMDLINE_LINUX_DEFAULT, add after the flag quiet two more flags:
  • For Intel CPU
intel_iommu=on iommu=pt
  • For AMD CPU
amd_iommu=on iommu=pt

The second flag allows higher performance when using SR-IOV VFs.
  • update grub

update-grub
  • next, add the vfio modules to /etc/modules

vfiovfio_iommu_type1vfio_pcivfio_virqfd
  • followed by updating initramfs

update-initramfs -u -k all
  • reboot the PVE host
Verifying IOMMU

After rebooting, you can check if IOMMU is functioning by reading driver message:

dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
It should display that IOMMU, Directed I/O or Interrupt Remapping is enabled, depending on hardware and kernel the exact message can vary. In my AMD system, the latter is shown.
At this point you should be able to assign PCI(e) devices to guests. To pass devices as PCI-E, you will need to use q35 as the BIOS, and SeaBIOS won't work. If you haven't already, it's a good point to go ahead and try the pass-through to make sure it works.

Part 2 - Enable SR-IOV

Enabling SR-IOV is surprisingly straight-forward (when it works), I am following this guide:
https://software.intel.com/en-us/articles/using-sr-iov-to-share-an-ethernet-port-among-multiple-vms
Checking the name of your NICs
First of all, you need to find out the name of the NICs you want to pass. This can be either done by checking them in the tree directory in PVE GUI: click on the node --> System --> Network. In my case the X550-T2's two ports are respectively named enp1s0f0 and enp1s0f1. Yours may be different.
Alternatively, you can see them in the terminal by executing

ip link

With the name of the NICs known, you can now test if SR-IOV can be switched on.
  • execute the below. Replace N with number of VF you want, and <name of your NIC> with the name of your NIC you found in the previous step.
echo N > /sys/class/net/<name of your NIC>/device/sriov_numvfs

If it works you should see no return. Otherwise here are a few possible errors.
Debugging
  • Device busy.
    Probably for some reason VF has been assigned. You can try to set N to 0 first, then a number you want.
  • -bash: echo: write error: Cannot allocate memory
    This can be more troublesome and related to BIOS setting. You want to check driver message and see if there is any debugging tips:
dmesg | grep sriov

For point 2, you may try to solve it by adding a line to the grub (https://access.redhat.com/solutions/37376), after iommu=pt, add

pci_pt_e820_access=on pci=assign-busses

Reboot and try the echo command again. I have not tested the functionality of the VFs with these flag so it might not work.
Verifying the assignment of virtual functions
Now assuming you have successfully enabled the VFs, we can check if they exist by looking them up.
  • execute
lspci -vv | grep Eth

And you should see a lot more Ethernet controllers with a bunch of them called virtual functions. Congratulations! It has worked for you.

Part 3 - Setting SR-IOV up for use in PVE

Making the VF persistent

The above assignment was a test to check if your system can assign VFs. To make the assignment persistent (stay over boots), you need to make the system start them automatically. We can do it the Debian way using systemd. Alternatively, you could also perform the same using rc.local.
We will first need to set up a service.
  • create a service at this location: /etc/systemd/system/sriov-NIC.service (you can pick a different name)
  • paste the below content into the above service. Again, modify N to number of VFs, replace <name of your NICx> to that of yours.
[Unit]Description=Script to enable SR-IOV on boot
[Service]Type=oneshotExecStart=/usr/bin/bash -c '/usr/bin/echo N > /sys/class/net/<name of your NIC1>/device/sriov_numvfs'ExecStart=/usr/bin/bash -c '/usr/bin/echo N > /sys/class/net/<name of your NIC2>/device/sriov_numvfs'
[Install]WantedBy=multi-user.target
  • enable the service
systemctl enable sriov-NIC

It is good to test the script once. First, repeat the echo comment in part 2 with N set to 0. Check by executing "lspci -vv | grep Eth" that the VFs are gone. Then try to start the service and read the status:

systemctl start sriov-NICsystemctl status sriov-NIC

You should see for each echo comment, the status reads 0/SUCCESS. From now on your system will have VFs assigned on boot. To disable the assignment on boot, execute "systemctl disable sriov-NIC".

Setting PF to UP on boot

To use the VFs, you will actually need the PF (Physical functions) to turn on first. To have them turned on automatically on boot, we can set it up in the GUI. Go to your node, System --> Network, double click your NIC and check the box Autostart. Alternatively, this can be set by adding a line "auto <name of your nic>" in /etc/network/interfaces.
Block VFs from being loaded by PVE
As we plan to assign the VF to the guests, we can prevent PVE from loading it to avoid any conflicts. First, we need to know what VF driver is being loaded.
  • execute
lspci -nnk | grep -A4 Eth

Look for the line "Kernel driver in use:" and see what is being loaded. With my X550-T2, it is ixgbevf. We can then blacklist this module.
  • edit /etc/modprobe.d/pve-blacklist.conf and add the following at the bottom

# <your VF module>blacklist <your VF module>
  • then execute
update-initramfs -u -k all
  • Reboot the PVE host
With this you should have prepared your system for passing VFs to the guests. Happy virtualizing!

Hints on assigning VFs

Here I am sharing some experience in using VFs.
Knowing which port is assigned to which VF
It is crucial to know which VF is corresponding to which port when a multiport NIC is used. This can be roughly found by looking at the last digit of the device ID.
  • execute
lspci | grep Eth

From the list of Ethernet controller, you can see multiple device IDs from the virtual functions. With a two-port NIC, the assignment is such that the 1st port always has an even last digit, i.e. 01:10.0, 01:10.2, 01:10.4 and so on. The 2nd port always has odd last digit, i.e. 01:10.1, 01:10.3, 01:10.5 and so on.
I have not tested with my four-port NIC, but I guess it will be a modular, such that port 1-4 got assigned to 01:10.0-01:10.3, then repeat. (someone please correct me if I am wrong).
Checking assignment of the individual VF
Further, you can check if VF is really being used by the guest. Taking a ubuntu guest as an example. Once you have set the VF adapter UP link, you should be able to see a MAC address being assigned in the PVE host (use ip link). Otherwise they may be all zeros to begin with. It is always good to verify if the VF is functional.

Fixing the MAC address of VFs

Today I ran into some issue where my router VM failed to initialize after rebooting itself. This also happens after I have rebooted the PVE host. After some checks, I realize it was caused by the fact that the VFs' MAC are randomly assigned by the guest VMs. As a router this proves problematic as now the ISP (modem) sees a different MAC everytime it boot and it needs to reassign a new IP which can run into troubles if done too often.

This can be easily solved by fixing the MAC for a certain VF. Again, we can do this with the same service we created earlier.
  • edit /etc/systemd/system/sriov-NIC.service with the follwing:
[Unit]
Description=Script to enable SR-IOV on boot

[Service]
Type=oneshot
# Starting SR-IOV
ExecStart=/usr/bin/bash -c '/usr/bin/echo N > /sys/class/net/<name of your NIC1>/device/sriov_numvfs'
ExecStart=/usr/bin/bash -c '/usr/bin/echo N > /sys/class/net/<name of your NIC2>/device/sriov_numvfs'
# Setting static MAC for VFs
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set <name of your NIC1> vf M mac <mac addr of vf M of NIC1>'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set <name of your NIC2> vf M mac <mac addr of vf M of NIC2>'

[Install]
WantedBy=multi-user.target

As usual, replace N and <name of your NICx> with your configuration. On the two added lines, replace also M with the Mth VF of NICx you want to set a fixed MAC for. Replace <mac addr of vf M of NICx> with your desired MAC address.

After setting the above, you can follow the same method before to test the service.

Finally, for reference, I have tested the VF with my virtual router and it easily gives full 1 Gbps link speed. I have yet to test 10 Gbps, but then OpenVPN (route all traffic, tested inside LAN) gave me 600 Mbps of sustained throughput (80% CPU utilization at router). It looks like the VFs are working well with minimal performance hit.

[ 本帖最後由 Sandbo 於 2019-8-6 06:36 編輯 ]

TOP