Tag Archives: Driver

[Solved] NIC cannot be generated vf, intel/mellanox, write error: Cannot allocate memory “not enough MMIO resources for SR-IOV”

Phenomenon: # echo 2 > /sys/class/infiniband/mlx5_0/device/mlx5_num_vfs
write error: Cannot allocate memory
#echo 8 > /sys/class/net/enp1s0f0/device/sriov_numvfs
write error: Cannot allocate memory
Verification.
You can see this error in dmesg “not enough MMIO resources for SR-IOV”
Analysis.
Due to BIOS limitations or errors, the PCI code cannot reallocate enough MMIO. RHEL’s SR-IOV support makes it necessary to have enough resources to map all possible VFs, otherwise all VF MMIO space allocation will fail.
Solution.
1. The BIOS does not provide enough MMIO space for the VFs. Contact your hardware vendor for a firmware or bios update.
2. As a workaround, you can pass “pci=realloc” to kernel 2.6.32-228.el6 during boot.
Implementation.
Add the following section in red to grub.cfg.
[root@localhost ~]# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=”$(sed ‘s, release .*$,,g’ /etc/system-release)”
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT=”console”
GRUB_CMDLINE_LINUX=”crashkernel=auto resume=/dev/mapper/cl-swap rd.lvm.lv=cl/root rd.lvm.lv=cl/swap rhgb quiet iommu=pt intel_iommu=on pci=assign-busses pci=realloc”
GRUB_DISABLE_RECOVERY=”true”
GRUB_ENABLE_BLSCFG=true
[root@localhost ~]#
Verification:
[root@localhost ~]# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt9)/vmlinuz-4.18.0-240.22.1.el8_3.x86_64 root=/dev/mapper/cl-root ro crashkernel=auto resume=/dev/mapper/cl-swap rd.lvm.lv=cl/root rd.lvm.lv=cl/swap rhgb quiet iommu=pt intel_iommu=on pci=assign-busses pci=realloc
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]# lspci
00:00.0 Host bridge: Intel Corporation Device 9b33 (rev 05)
00:01.0 PCI bridge: Intel Corporation 6th-9th Gen Core Processor PCIe Controller (x16) (rev 05)
00:02.0 VGA compatible controller: Intel Corporation Device 9bc5 (rev 05)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6/E3-1500 v5/6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:12.0 Signal processing controller: Intel Corporation Comet Lake PCH Thermal Controller
00:14.0 USB controller: Intel Corporation Comet Lake USB 3.1 xHCI Host Controller
00:14.2 RAM memory: Intel Corporation Comet Lake PCH Shared SRAM
00:15.0 Serial bus controller [0c80]: Intel Corporation Comet Lake PCH Serial IO I2C Controller #0
00:16.0 Communication controller: Intel Corporation Comet Lake HECI Controller
00:17.0 SATA controller: Intel Corporation Device 06d2
00:1b.0 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port #21 (rev f0)
00:1c.0 PCI bridge: Intel Corporation Device 06bd (rev f0)
00:1c.6 PCI bridge: Intel Corporation Device 06be (rev f0)
00:1f.0 ISA bridge: Intel Corporation Device 0687
00:1f.3 Audio device: Intel Corporation Comet Lake PCH cAVS
00:1f.4 SMBus: Intel Corporation Comet Lake PCH SMBus Controller
00:1f.5 Serial bus controller [0c80]: Intel Corporation Comet Lake PCH SPI Controller
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (11) I219-LM
01:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
01:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
01:00.2 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
01:00.3 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
01:00.4 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
01:00.5 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
02:00.0 Non-Volatile memory controller: Intel Corporation SSD 660P Series (rev 03)
03:00.0 PCI bridge: Texas Instruments XIO2001 PCI Express-to-PCI Bridge
05:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01)
[root@localhost ~]#related commands:
#modprobe mlx5_core max_vfs=8
#mlxconfig -d /dev/mst/mt4119_pciconf0 q set SRIOV_EN=1 NUM_OF_VFS=8
#mst start    //mlx manager tools   mst status
modprobe:
options mlx4_core num_vfs=4 port_type_array=1,2 probe_vf=1echo 0 > /sys/class/net/enp1s0f0/device/sriov_numvfs
echo 8 > /sys/class/net/enp1s0f0/device/sriov_numvfs

NVIDIA NVML Driver/library version mismatch

if Cuda and Nvidia drivers mismatch, the nvidia-smi command mismatches with Nvidia NVML Driver/library version mismatch, and needs to check the version of the Nvidia Driver to see if it matches.

enter the command as follows:

ubuntu-drivers devices

will show:

driver: nvidia-driver-418-server-distro non-free

driver: nvidia-driver-440-server-distro non-free:

driver: nvidia-driver-435-distro non-free

driver: nvidia-driver-440-distro non-free

Distro free builtin

driver: xserver-xorg-video-nouveau-distro free builtin

directly install all, command as follows:

sudo ubuntu-drivers autoinstall

then reboot

sudo reboot

use nvidia-smi command, found that you can see the graphics card information, and CUDA model 10.2, Driver version 440, successful match.

Tue Aug 4 21:05:21 2020

+—————————————————————————–+

Driver Version: 440.95.01 CUDA Version: 10.2 |

|——————————-+———————-+———————-+

| GPU Name Persistence-M| bb-id disp.a | Volatile uncorr.ecc |

| Fan Temp Perf Pwr:Usage/Cap| memory-usage | gpu-util Compute m. |

|===============================+======================+======================|

| 0 GeForce RTX 208… Off | 00000000:7 p.m. Off | 0 N/A |

23 c | 41% P8 2 12 mib/11019 w/260 w | mib | 0% Default |

+——————————-+———————-+———————-+