Proxmox VE

解决Proxmox上Intel网卡一直重启的问题

2020-05-03  本文已影响0人  DarrickBM

现象

网卡不断重启,导致Proxmox上所有服务都中断了通信。

报错

查看/var/log/kern.log,发现如下报错(并反复循环,重启)

May  3 17:43:41 pve kernel: [409387.721072] vmbr0: port 1(eno1) entered blocking state
May  3 17:44:09 pve kernel: [409416.200046] vmbr0: port 1(eno1) entered disabled state
May  3 17:44:17 pve kernel: [409423.956276] vmbr0: port 1(eno1) entered blocking state
May  3 17:44:19 pve kernel: [409425.959886] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May  3 17:44:19 pve kernel: [409425.959886]   TDH                  <0>
May  3 17:44:19 pve kernel: [409425.959886]   TDT                  <9>
May  3 17:44:19 pve kernel: [409425.959886]   next_to_use          <9>
May  3 17:44:19 pve kernel: [409425.959886]   next_to_clean        <0>
May  3 17:44:19 pve kernel: [409425.959886] buffer_info[next_to_clean]:
May  3 17:44:19 pve kernel: [409425.959886]   time_stamp           <10618bd00>
May  3 17:44:19 pve kernel: [409425.959886]   next_to_watch        <0>
May  3 17:44:19 pve kernel: [409425.959886]   jiffies              <10618be88>
May  3 17:44:19 pve kernel: [409425.959886]   next_to_watch.status <0>
May  3 17:44:19 pve kernel: [409425.959886] MAC Status             <40080083>
May  3 17:44:19 pve kernel: [409425.959886] PHY Status             <796d>
May  3 17:44:19 pve kernel: [409425.959886] PHY 1000BASE-T Status  <3800>
May  3 17:44:19 pve kernel: [409425.959886] PHY Extended Status    <3000>
May  3 17:44:19 pve kernel: [409425.959886] PCI Status             <10>
May  3 17:44:23 pve kernel: [409429.991735] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May  3 17:44:23 pve kernel: [409429.991735]   TDH                  <0>
May  3 17:44:23 pve kernel: [409429.991735]   TDT                  <9>
May  3 17:44:23 pve kernel: [409429.991735]   next_to_use          <9>
May  3 17:44:23 pve kernel: [409429.991735]   next_to_clean        <0>
May  3 17:44:23 pve kernel: [409429.991735] buffer_info[next_to_clean]:
May  3 17:44:23 pve kernel: [409429.991735]   time_stamp           <10618bd00>
May  3 17:44:23 pve kernel: [409429.991735]   next_to_watch        <0>
May  3 17:44:23 pve kernel: [409429.991735]   jiffies              <10618c278>
May  3 17:44:23 pve kernel: [409429.991735]   next_to_watch.status <0>
May  3 17:44:23 pve kernel: [409429.991735] MAC Status             <40080083>
May  3 17:44:23 pve kernel: [409429.991735] PHY Status             <796d>
May  3 17:44:23 pve kernel: [409429.991735] PHY 1000BASE-T Status  <3800>
May  3 17:44:23 pve kernel: [409429.991735] PHY Extended Status    <3000>
May  3 17:44:23 pve kernel: [409429.991735] PCI Status             <10>
May  3 17:44:25 pve kernel: [409432.007628] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May  3 17:44:25 pve kernel: [409432.007628]   TDH                  <0>
May  3 17:44:25 pve kernel: [409432.007628]   TDT                  <9>
May  3 17:44:25 pve kernel: [409432.007628]   next_to_use          <9>
May  3 17:44:25 pve kernel: [409432.007628]   next_to_clean        <0>
May  3 17:44:25 pve kernel: [409432.007628] buffer_info[next_to_clean]:
May  3 17:44:25 pve kernel: [409432.007628]   time_stamp           <10618bd00>
May  3 17:44:25 pve kernel: [409432.007628]   next_to_watch        <0>
May  3 17:44:25 pve kernel: [409432.007628]   jiffies              <10618c470>
May  3 17:44:25 pve kernel: [409432.007628]   next_to_watch.status <0>
May  3 17:44:25 pve kernel: [409432.007628] MAC Status             <40080083>
May  3 17:44:25 pve kernel: [409432.007628] PHY Status             <796d>
May  3 17:44:25 pve kernel: [409432.007628] PHY 1000BASE-T Status  <3800>
May  3 17:44:25 pve kernel: [409432.007628] PHY Extended Status    <3000>
May  3 17:44:25 pve kernel: [409432.007628] PCI Status             <10>
May  3 17:44:50 pve kernel: [409456.326913] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May  3 17:44:50 pve kernel: [409456.326969] vmbr0: port 1(eno1) entered forwarding state
May  3 17:44:52 pve kernel: [409458.346577] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May  3 17:44:52 pve kernel: [409458.346577]   TDH                  <0>
May  3 17:44:52 pve kernel: [409458.346577]   TDT                  <3>
May  3 17:44:52 pve kernel: [409458.346577]   next_to_use          <3>
May  3 17:44:52 pve kernel: [409458.346577]   next_to_clean        <0>
May  3 17:44:52 pve kernel: [409458.346577] buffer_info[next_to_clean]:
May  3 17:44:52 pve kernel: [409458.346577]   time_stamp           <10618dc50>
May  3 17:44:52 pve kernel: [409458.346577]   next_to_watch        <0>
May  3 17:44:52 pve kernel: [409458.346577]   jiffies              <10618de29>
May  3 17:44:52 pve kernel: [409458.346577]   next_to_watch.status <0>
May  3 17:44:52 pve kernel: [409458.346577] MAC Status             <40080083>
May  3 17:44:52 pve kernel: [409458.346577] PHY Status             <796d>
May  3 17:44:52 pve kernel: [409458.346577] PHY 1000BASE-T Status  <3800>
May  3 17:44:52 pve kernel: [409458.346577] PHY Extended Status    <3000>
May  3 17:44:52 pve kernel: [409458.346577] PCI Status             <10>
May  3 17:45:16 pve kernel: [409482.437698] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May  3 17:45:16 pve kernel: [409482.437698]   TDH                  <0>
May  3 17:45:16 pve kernel: [409482.437698]   TDT                  <8>
May  3 17:45:16 pve kernel: [409482.437698]   next_to_use          <8>
May  3 17:45:16 pve kernel: [409482.437698]   next_to_clean        <0>
May  3 17:45:16 pve kernel: [409482.437698] buffer_info[next_to_clean]:
May  3 17:45:16 pve kernel: [409482.437698]   time_stamp           <10618ee00>
May  3 17:45:16 pve kernel: [409482.437698]   next_to_watch        <0>
May  3 17:45:16 pve kernel: [409482.437698]   jiffies              <10618f5b0>
May  3 17:45:16 pve kernel: [409482.437698]   next_to_watch.status <0>
May  3 17:45:16 pve kernel: [409482.437698] MAC Status             <40080083>
May  3 17:45:16 pve kernel: [409482.437698] PHY Status             <796d>
May  3 17:45:16 pve kernel: [409482.437698] PHY 1000BASE-T Status  <3800>
May  3 17:45:16 pve kernel: [409482.437698] PHY Extended Status    <3000>
May  3 17:45:16 pve kernel: [409482.437698] PCI Status             <10>
May  3 17:45:18 pve kernel: [409484.453566] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May  3 17:45:18 pve kernel: [409484.453566]   TDH                  <0>
May  3 17:45:18 pve kernel: [409484.453566]   TDT                  <8>
May  3 17:45:18 pve kernel: [409484.453566]   next_to_use          <8>
May  3 17:45:18 pve kernel: [409484.453566]   next_to_clean        <0>
May  3 17:45:18 pve kernel: [409484.453566] buffer_info[next_to_clean]:
May  3 17:45:18 pve kernel: [409484.453566]   time_stamp           <10618ee00>
May  3 17:45:18 pve kernel: [409484.453566]   next_to_watch        <0>
May  3 17:45:18 pve kernel: [409484.453566]   jiffies              <10618f7a8>
May  3 17:45:18 pve kernel: [409484.453566]   next_to_watch.status <0>
May  3 17:45:18 pve kernel: [409484.453566] MAC Status             <40080083>
May  3 17:45:18 pve kernel: [409484.453566] PHY Status             <796d>
May  3 17:45:18 pve kernel: [409484.453566] PHY 1000BASE-T Status  <3800>
May  3 17:45:18 pve kernel: [409484.453566] PHY Extended Status    <3000>
May  3 17:45:18 pve kernel: [409484.453566] PCI Status             <10>
May  3 17:45:18 pve kernel: [409485.061421] vmbr0: port 1(eno1) entered disabled state
May  3 17:45:26 pve kernel: [409492.477592] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May  3 17:45:26 pve kernel: [409492.477648] vmbr0: port 1(eno1) entered forwarding state
May  3 17:45:28 pve kernel: [409494.501256] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May  3 17:45:28 pve kernel: [409494.501256]   TDH                  <0>
May  3 17:45:28 pve kernel: [409494.501256]   TDT                  <1>
May  3 17:45:28 pve kernel: [409494.501256]   next_to_use          <1>
May  3 17:45:28 pve kernel: [409494.501256]   next_to_clean        <0>
May  3 17:45:28 pve kernel: [409494.501256] buffer_info[next_to_clean]:
May  3 17:45:28 pve kernel: [409494.501256]   time_stamp           <106190000>
May  3 17:45:28 pve kernel: [409494.501256]   next_to_watch        <0>
May  3 17:45:28 pve kernel: [409494.501256]   jiffies              <106190178>
May  3 17:45:28 pve kernel: [409494.501256]   next_to_watch.status <0>
May  3 17:45:28 pve kernel: [409494.501256] MAC Status             <40080083>
May  3 17:45:28 pve kernel: [409494.501256] PHY Status             <796d>
May  3 17:45:28 pve kernel: [409494.501256] PHY 1000BASE-T Status  <3800>
May  3 17:45:28 pve kernel: [409494.501256] PHY Extended Status    <3000>
May  3 17:45:28 pve kernel: [409494.501256] PCI Status             <10>
May  3 17:45:30 pve kernel: [409496.517174] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May  3 17:45:30 pve kernel: [409496.517174]   TDH                  <0>
May  3 17:45:30 pve kernel: [409496.517174]   TDT                  <1>
May  3 17:45:30 pve kernel: [409496.517174]   next_to_use          <1>
May  3 17:45:30 pve kernel: [409496.517174]   next_to_clean        <0>
May  3 17:45:30 pve kernel: [409496.517174] buffer_info[next_to_clean]:
May  3 17:45:30 pve kernel: [409496.517174]   time_stamp           <106190000>
May  3 17:45:30 pve kernel: [409496.517174]   next_to_watch        <0>
May  3 17:45:30 pve kernel: [409496.517174]   jiffies              <106190370>
May  3 17:45:30 pve kernel: [409496.517174]   next_to_watch.status <0>
May  3 17:45:30 pve kernel: [409496.517174] MAC Status             <40080083>
May  3 17:45:30 pve kernel: [409496.517174] PHY Status             <796d>
May  3 17:45:30 pve kernel: [409496.517174] PHY 1000BASE-T Status  <3800>
May  3 17:45:30 pve kernel: [409496.517174] PHY Extended Status    <3000>
May  3 17:45:30 pve kernel: [409496.517174] PCI Status             <10>
May  3 17:45:31 pve kernel: [409498.116850] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
May  3 17:45:31 pve kernel: [409498.116929] vmbr0: port 1(eno1) entered disabled state
May  3 17:45:38 pve kernel: [409504.505067] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May  3 17:45:42 pve kernel: [409508.548719] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May  3 17:45:42 pve kernel: [409508.548719]   TDH                  <0>
May  3 17:45:42 pve kernel: [409508.548719]   TDT                  <8>
May  3 17:45:42 pve kernel: [409508.548719]   next_to_use          <8>
May  3 17:45:42 pve kernel: [409508.548719]   next_to_clean        <0>
May  3 17:45:42 pve kernel: [409508.548719] buffer_info[next_to_clean]:
May  3 17:45:42 pve kernel: [409508.548719]   time_stamp           <106190b80>
May  3 17:45:42 pve kernel: [409508.548719]   next_to_watch        <0>
May  3 17:45:42 pve kernel: [409508.548719]   jiffies              <106190f30>
May  3 17:45:42 pve kernel: [409508.548719]   next_to_watch.status <0>
May  3 17:45:42 pve kernel: [409508.548719] MAC Status             <40080083>
May  3 17:45:42 pve kernel: [409508.548719] PHY Status             <796d>
May  3 17:45:42 pve kernel: [409508.548719] PHY 1000BASE-T Status  <3800>
May  3 17:45:42 pve kernel: [409508.548719] PHY Extended Status    <3000>
May  3 17:45:42 pve kernel: [409508.548719] PCI Status             <10>
May  3 17:45:44 pve kernel: [409510.564636] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May  3 17:45:44 pve kernel: [409510.564636]   TDH                  <0>
May  3 17:45:44 pve kernel: [409510.564636]   TDT                  <8>
May  3 17:45:44 pve kernel: [409510.564636]   next_to_use          <8>
May  3 17:45:44 pve kernel: [409510.564636]   next_to_clean        <0>
May  3 17:45:44 pve kernel: [409510.564636] buffer_info[next_to_clean]:
May  3 17:45:44 pve kernel: [409510.564636]   time_stamp           <106190b80>
May  3 17:45:44 pve kernel: [409510.564636]   next_to_watch        <0>
May  3 17:45:44 pve kernel: [409510.564636]   jiffies              <106191128>
May  3 17:45:44 pve kernel: [409510.564636]   next_to_watch.status <0>
May  3 17:45:44 pve kernel: [409510.564636] MAC Status             <40080083>
May  3 17:45:44 pve kernel: [409510.564636] PHY Status             <796d>
May  3 17:45:44 pve kernel: [409510.564636] PHY 1000BASE-T Status  <3800>
May  3 17:45:44 pve kernel: [409510.564636] PHY Extended Status    <3000>
May  3 17:45:44 pve kernel: [409510.564636] PCI Status             <10>
May  3 17:45:46 pve kernel: [409512.580469] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May  3 17:45:46 pve kernel: [409512.580469]   TDH                  <0>
May  3 17:45:46 pve kernel: [409512.580469]   TDT                  <8>
May  3 17:45:46 pve kernel: [409512.580469]   next_to_use          <8>
May  3 17:45:46 pve kernel: [409512.580469]   next_to_clean        <0>
May  3 17:45:46 pve kernel: [409512.580469] buffer_info[next_to_clean]:
May  3 17:45:46 pve kernel: [409512.580469]   time_stamp           <106190b80>
May  3 17:45:46 pve kernel: [409512.580469]   next_to_watch        <0>
May  3 17:45:46 pve kernel: [409512.580469]   jiffies              <106191320>
May  3 17:45:46 pve kernel: [409512.580469]   next_to_watch.status <0>
May  3 17:45:46 pve kernel: [409512.580469] MAC Status             <40080083>
May  3 17:45:46 pve kernel: [409512.580469] PHY Status             <796d>
May  3 17:45:46 pve kernel: [409512.580469] PHY 1000BASE-T Status  <3800>
May  3 17:45:46 pve kernel: [409512.580469] PHY Extended Status    <3000>
May  3 17:45:46 pve kernel: [409512.580469] PCI Status             <10>
May  3 17:46:24 pve kernel: [409551.106819] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly

环境

硬件:NUC8i5BEH
软件:

root@pve:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-2-pve)
pve-manager: 6.1-7 (running version: 6.1-7/13e58d5e)
pve-kernel-helper: 6.1-6
pve-kernel-5.3: 6.1-5
pve-kernel-5.3.18-2-pve: 5.3.18-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-13
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-4
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-21
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-8
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-3
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-6
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

原因

e1000驱动问题,见:https://forum.proxmox.com/threads/e1000-driver-hang.58284/

解决方案(据说会影响性能):

vim /etc/network/interfaces

iface vmbr0 inet static的最后加入 post-up ethtool -K eno1 tso off gso off,例如:

auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.1.117
        gateway 192.168.1.1
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0
        post-up ethtool -K eno1 tso off gso off

然后:wq 保存并退出

上一篇下一篇

猜你喜欢

热点阅读