NIC hangs, then resets during speedtest, high throughput.

General questions.
Post Reply
User avatar
mpyusko
Posts: 6
Joined: July 19th, 2018, 6:41 pm
Location: Rochester, NY

NIC hangs, then resets during speedtest, high throughput.

Post by mpyusko » July 19th, 2018, 11:00 pm

Home = 400 Mbps x 25 Mbps (peaks at 480x35)
Rack = 1Gbps connection ("iperf3 -s" and Speedtest.net are available at this data center.)

ipfire (core 120)is running on:
  • Intel(R) Pentium(R) Dual CPU E2200 @ 2.20GHz
  • Intel DG31PR Motherboard
  • 2GB DDR @ 667 MHz
  • Seagate Barracuda 7200.10 250 GB
  • 04:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
    Subsystem: Intel Corporation PRO/1000 PT Quad Port Server Adapter
  • (not in use)06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 01)
    Subsystem: Intel Corporation RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
Problem:
  • A direct connection from my laptop to the cable modem "pegs the line" (@480 Mbps down and 35 Mbps up) via speedtest and iperf3
  • An iperf3 -c (ip of rack) -P20 -R from my ipfire box pegs the line.
  • An iperf3 -s -B (ip of red) on my ipfire box is available.
  • iperf3 -c (ip of ipfire red) -P20 -R from a Linux box connected to green hits 1 Gbps consistently
  • iperf3 -c (ip of rack) -P20 -R from a Linux box connected to green hits 480 Mbps, but consistently fails after 3 to 5 seconds.
  • Speedtest from any device connected to green, blue and orange.... downstream dies out after 3 to 5 seconds. Upstream usually completes @ 35 Mbps
The following appears in the log EVERY time I run a speed test through the firewall. (Substitute blue0 and orange0 for green0. It's all the same when I do a speedtest or iperf3 to the Internet.

Code: Select all

[root@skynet ~]# cat /var/log/messages |grep -A15 Hang
Jul 19 17:50:48 skynet kernel: e1000e 0000:03:00.0 green0: Detected Hardware Unit Hang:
Jul 19 17:50:48 skynet kernel:   TDH                  <41>
Jul 19 17:50:48 skynet kernel:   TDT                  <9f>
Jul 19 17:50:48 skynet kernel:   next_to_use          <9f>
Jul 19 17:50:48 skynet kernel:   next_to_clean        <34>
Jul 19 17:50:48 skynet kernel: buffer_info[next_to_clean]:
Jul 19 17:50:48 skynet kernel:   time_stamp           <100024758>
Jul 19 17:50:48 skynet kernel:   next_to_watch        <41>
Jul 19 17:50:48 skynet kernel:   jiffies              <100024cb2>
Jul 19 17:50:48 skynet kernel:   next_to_watch.status <0>
Jul 19 17:50:48 skynet kernel: MAC Status             <80383>
Jul 19 17:50:48 skynet kernel: PHY Status             <792d>
Jul 19 17:50:48 skynet kernel: PHY 1000BASE-T Status  <3c00>
Jul 19 17:50:48 skynet kernel: PHY Extended Status    <3000>
Jul 19 17:50:48 skynet kernel: PCI Status             <10>
Jul 19 17:50:51 skynet kernel: e1000e 0000:03:00.0 green0: Reset adapter unexpectedly
--
Jul 19 17:51:04 skynet kernel: e1000e 0000:04:00.1 red0: Detected Hardware Unit Hang:
Jul 19 17:51:04 skynet kernel:   TDH                  <de>
Jul 19 17:51:04 skynet kernel:   TDT                  <d>
Jul 19 17:51:04 skynet kernel:   next_to_use          <d>
Jul 19 17:51:04 skynet kernel:   next_to_clean        <de>
Jul 19 17:51:04 skynet kernel: buffer_info[next_to_clean]:
Jul 19 17:51:04 skynet kernel:   time_stamp           <100025a06>
Jul 19 17:51:04 skynet kernel:   next_to_watch        <e2>
Jul 19 17:51:04 skynet kernel:   jiffies              <100025f72>
Jul 19 17:51:04 skynet kernel:   next_to_watch.status <0>
Jul 19 17:51:04 skynet kernel: MAC Status             <80387>
Jul 19 17:51:04 skynet kernel: PHY Status             <792d>
Jul 19 17:51:04 skynet kernel: PHY 1000BASE-T Status  <3800>
Jul 19 17:51:04 skynet kernel: PHY Extended Status    <3000>
Jul 19 17:51:04 skynet kernel: PCI Status             <10>
Jul 19 17:51:06 skynet kernel: e1000e 0000:04:00.1 red0: Reset adapter unexpectedly
Any Idea what is causing this? I have two identical machines and they are experiencing the exact same issue.

(Note: We have ipfire running in the datacenter on lower spec'd hardware and it is stable at the 1Gbps passing through to dozens of VMs and hundreds of sites.)
Dell PowerEdge 1850
8GB DDR2 RAM
Dual Intel(R) Xeon(TM) CPU 2.80GHz (socket 604)
03:0c.0 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03)
Subsystem: Intel Corporation PRO/1000 MT Dual Port Server Adapter


Thanks.
-mpyusko

Image

Image

Image

silverknight
Posts: 15
Joined: June 27th, 2010, 2:01 pm

Re: NIC hangs, then resets during speedtest, high throughput.

Post by silverknight » July 20th, 2018, 6:05 am

Best I can tell it may be assigning the wrong driver. looks like its using the e1000e driver however a few years back the 82546GB wasn't compatible with it and instead used the e1000 driver. Only advice I can give would be to try blacklisting the e1000e driver and try loading the cards with the e1000. I have no idea how old the IPFire Intel driver is and this could be entirely unhelpful but its where I would start.

User avatar
mpyusko
Posts: 6
Joined: July 19th, 2018, 6:41 pm
Location: Rochester, NY

Re: NIC hangs, then resets during speedtest, high throughput.

Post by mpyusko » July 20th, 2018, 2:21 pm

That's interesting. I think you might be right. It certainly feels like a driver issue. A quick lsmod on the production fw shows it is using the e1000 driver.

Thanks,
-mpyusko

Image

Image

Image

User avatar
mpyusko
Posts: 6
Joined: July 19th, 2018, 6:41 pm
Location: Rochester, NY

Re: NIC hangs, then resets during speedtest, high throughput.

Post by mpyusko » July 22nd, 2018, 5:58 pm

Code: Select all

echo "blacklist e1000e" >> /etc/modprobe.d/blacklist
echo e1000 >> /etc/sysconfig/modules
cd / var/ipfire/ethernet
edited "settings", changed e1000e to e1000
edited "scanned_nics", changed e1000e to e1000

Rebooted. Intel NICs were not found. modprobe e1000e detected the NICs.

Even running the setup program will not detect the NICs with only e1000 loaded. Ugh.... this is Frustrating!
-mpyusko

Image

Image

Image

User avatar
Arne.F
Core Developer
Core Developer
Posts: 8388
Joined: May 7th, 2006, 8:57 am
Location: BS <-> NDH
Contact:

Re: NIC hangs, then resets during speedtest, high throughput.

Post by Arne.F » July 23rd, 2018, 5:00 am

E1000 is the driver for most Intel gigabit pci chips.
E1000E for the some PCI-EXPRESS chips
This are complete defferent cards and the drivers are not interchangeable.

The problem is not the nic itself. It is the interrupt handling on the mainboard. If you serch for "tx hang" or "e1000e reset" you find varios threads for many Intel cards (e1000, e1000e, ibg ...)
First try to update the bios, then check the IntMode parameters of the e1000e driver. If this not help disable the hardware accleration. (GSO, GRO ...)
Arne

Support the project on the donation!

Image

Image

Image
PS: I will not answer support questions via email and ignore IPFire related messages on my non IPFire.org mail addresses.

User avatar
mpyusko
Posts: 6
Joined: July 19th, 2018, 6:41 pm
Location: Rochester, NY

Re: NIC hangs, then resets during speedtest, high throughput.

Post by mpyusko » July 23rd, 2018, 6:04 am

Code: Select all

ethtool -K green0 gso off gro off tso off
Seemed to fix it. I'm not sure about the performance hit though, if any. It pegs the line and that's all I can ask for right now.

Thank you.
-mpyusko

Image

Image

Image

User avatar
makara
Posts: 11
Joined: March 16th, 2013, 8:25 pm
Location: New Zealand

Re: NIC hangs, then resets during speedtest, high throughput.

Post by makara » November 18th, 2018, 8:09 pm

I have had the NIC reset problem happening for a long time and used the command:

"ethtool -K green0 gso off gro off tso off"

which fixed he problem.

In the last month or 2 I have been getting the result:

"Cannot get device udp-fragmentation-offload settings: Operation not supported"

Could this be because of a kernel change in a recent IPFire update?

User avatar
Arne.F
Core Developer
Core Developer
Posts: 8388
Joined: May 7th, 2006, 8:57 am
Location: BS <-> NDH
Contact:

Re: NIC hangs, then resets during speedtest, high throughput.

Post by Arne.F » November 19th, 2018, 11:58 am

I think we have also updated ethtool which display more detailed warnings... But it should still work with this warning.
Arne

Support the project on the donation!

Image

Image

Image
PS: I will not answer support questions via email and ignore IPFire related messages on my non IPFire.org mail addresses.

doiiido
Posts: 1
Joined: November 26th, 2018, 4:25 pm

Re: NIC hangs, then resets during speedtest, high throughput.

Post by doiiido » November 26th, 2018, 4:40 pm

Hi!
Got myself a similar problem:
My Realtek 8139 NIC is stopping at loads such as 10 MBps, but not aways also.
I´m at 2.21 (x86_64) - Core Update 124 fresh install and have had this issue since core 121 (when I started to use ipfire).
Is there a similar issue associated to that NIC?
Also, there is no info on logs, just that green0 seems to disappear when it goes down.
I can bring it back up with:
ip link set green0 down
ip link set green0 up

Post Reply