I actually have an issue that is similar. My server goes unresponsive/freezes after N hours of uptime. N is a variable, so far meassured between 6 and 72 hours.
I tried working around it, by auto-rebooting the server each night. But it still sometimes happen before the 24 hour mark.
Nothing in logs, so my best option is to auto-reboot at this time. 😆
Do you have an Intel ethernet NIC? That’s a known issue, in particular for more recent Linux Kernels used in Debian distros. This also means it extends to TrueNAS, Proxmox, etc. There’s a known fix for it too (or you could just downgrade the Kernel).
I had a bad NVME drive that caused that on two separate computers.
One of them I slowly replaced every single piece of hardware except the NVME, still crashed about once a day. Finally sucked it up and bought a new drive and magically everything stopped crashing.
Started happening on my server so I just immdietely replaced the NVME drive and magically no crashes anymore.
Zero issues in the logs, no failures on bootup, no issues with any hardware scanners, just hard freeze randomly.
Hey, I have the same thing for my second router that works as an extender, to cover some remote area. I auto-reboot it every 3 hours during the day (I don’t during the night). Sometimes, it stops transmitting data before the 3 hours mark, so I have to go and physically reboot it. It always helps, while there are very rare occasions when this software reboot does not help.
I have no idea what’s going on. I’ve bought it cheap as a broken one, but re-flashing it to OpenWrt seems like solved all its issues. However, I’m not qualified to say there’s no issues with it. It’s just that from a user perspective, it works exceptionally well. I see no issues. Except this forced auto-reboot thing, but I think it could be me not understanding the networking properly, and doing something wrong / not optimal. It gets the signal wirelessly via 5 GHz band (for speed) and shares it via 2.4 GHz band (for the distance). I fixed some obvious mistakes with the help of a GPT, which seems to work better now. But I’m not really sure. Could be that it’s winter and it was cold in there, I have to see how it’ll behave during the summer.
Honestly, I even started thinking maybe it has no issues now, and I can remove that cron job. But I think I can live with being offline for a minute or two a few times a day, when I’m in that remote location.
Yeah, I mean. Tried to compliment your story with mine.
I actually have an issue that is similar. My server goes unresponsive/freezes after N hours of uptime. N is a variable, so far meassured between 6 and 72 hours. I tried working around it, by auto-rebooting the server each night. But it still sometimes happen before the 24 hour mark.
Nothing in logs, so my best option is to auto-reboot at this time. 😆
Do you have an Intel ethernet NIC? That’s a known issue, in particular for more recent Linux Kernels used in Debian distros. This also means it extends to TrueNAS, Proxmox, etc. There’s a known fix for it too (or you could just downgrade the Kernel).
I had a bad NVME drive that caused that on two separate computers.
One of them I slowly replaced every single piece of hardware except the NVME, still crashed about once a day. Finally sucked it up and bought a new drive and magically everything stopped crashing.
Started happening on my server so I just immdietely replaced the NVME drive and magically no crashes anymore.
Zero issues in the logs, no failures on bootup, no issues with any hardware scanners, just hard freeze randomly.
I just solved this exact issue after living with it for a few months.
For me it was a bad PSU, voltage drop probably stopping the HDDs and SSDs, which knocked over the Kernel
Hm. The PSU is the one delivered with the system. And the system is rated to handle this and more. I really hope it’s not a bad PSU.
Be better than bad RAM! Or CPU probably
Hey, I have the same thing for my second router that works as an extender, to cover some remote area. I auto-reboot it every 3 hours during the day (I don’t during the night). Sometimes, it stops transmitting data before the 3 hours mark, so I have to go and physically reboot it. It always helps, while there are very rare occasions when this software reboot does not help.
I have no idea what’s going on. I’ve bought it cheap as a broken one, but re-flashing it to OpenWrt seems like solved all its issues. However, I’m not qualified to say there’s no issues with it. It’s just that from a user perspective, it works exceptionally well. I see no issues. Except this forced auto-reboot thing, but I think it could be me not understanding the networking properly, and doing something wrong / not optimal. It gets the signal wirelessly via 5 GHz band (for speed) and shares it via 2.4 GHz band (for the distance). I fixed some obvious mistakes with the help of a GPT, which seems to work better now. But I’m not really sure. Could be that it’s winter and it was cold in there, I have to see how it’ll behave during the summer.
Honestly, I even started thinking maybe it has no issues now, and I can remove that
cronjob. But I think I can live with being offline for a minute or two a few times a day, when I’m in that remote location.Yeah, I mean. Tried to compliment your story with mine.