Few days ago froze my Server out of nowhere. Now is happening everyday 2 - 3 times. Every time something differently invoked oom-killer. After 1 - 3 Minutes everything works smoothly.
In that moment when server is freezing, every Processors activity grows to 98%.
Can someone help me to find what is the trigger to invoke oom-killer?
I also had such a case in the past. Due to a (own) programming fault, a program continuously requested memory but didn’t free anything. You should trace your memory consumption somehow.
Not at the moment. One of your processes is requesting too much memory… probably a programming bug. To find out which process, study memory usage with top or htop.
Identify the rogue pricess and kill it.
If the rogue process is something vital, you are going to need to look for alternatives.
And how much RAM does each instance use on the server? You see, RDP runs the applications of the users on the server they’re connecting with, NOT on the client.
I’m not sure this helps you, but I use early-oom. This kicks in sooner than the kernles oom killer. The built-in oom killer in my experience never finishes in a sane time. Earlyoom just kills most memory eating process before real oom could happen, but memory is running out.
Maybe you could try.
The user, who (or whose process) causes the oom, just experiences a crash of that process without further notifications, etc. This may cause some loss of unsaved anything ffrom that process, but in general the rest of the system survives, and stays responsive.
It’s not my issue, but IMHO prepending another watcher or guardian sounds like symptom fighting.
If a system runs out of resources regularly, spotting and removing the cause is, for me, the only way to go.
That’s right. In my case it means to replace me, as the main cause.
I have some virtual machines to test things. Some of them have a visible amount of memory assigned. That would be fine, but as I’m impatient sometimes or a bit inattentive, rarely, but sometimes launch a new VM before the previous was completely shut down. That creates the oom problem instantly.
Early oom comes handy that time.
Up to now, the OP never mentioned the available physical memory of his server. If you refer to the swap size, you might be right. But then there is still something blindly eating memory, thus no solution.
These oom-killer apps seem to be suffering from a lack of intelligence.
What may be needed is some sort of extension of the nice command to cover memory usage as well as cpu usage. … For example one might want to flag a process as
kill me if I hog memory
dont kill me in any circumstances
That would help oom to distinguish between laziness , lack of foresight, and a real need to keep a process going.