Ubuntu 20.04.3 freezing after recent install

After my old desktop finally died recently, I inherited a Lenovo Ideacentre 700 and initially was using the Windows 10 that came with, it with the intention of eventually installing Ubuntu. But it was running fine and I’m a bit lazy so I kept using Windows 10. Then, after a few weeks, it started randomly freezing, and then freezing more frequently. Despite doing some extensive research regarding this problem and discovering that many others were having similar issues, I was never able to solve the freezing problem so I finally got my lazy butt in gear and installed Ubuntu 20.04.3, dual booting it with Windows 10.

At first, Ubuntu ran just fine, but after a couple of weeks, it also started freezing. First thing I noticed after a reboot was that one of the four CPUs was running constantly at near 100%.
CPU-3

After another freeze, I saw the same thing, but on one of the other CPUs, not sure if that’s significant.

I restarted the PC before another freeze occurred and all the CPUs seemed to be running fine. I thought, thanks to the ‘magic’ of turning it off and on again, problem solved. But several days later it froze once again. When it froze this time I had the system monitor opened and the CPUs appeared to be running fine moments before the freeze. Reboots after the recent freezes did not stop one of the CPUs from running high.

I ran a ‘top’ command following several of the freezes and it showed various roots running high, depending on which CPU was running at near 100% at that time.
top
Again, after much research, I have not yet found a solution to this problem. Because this freezing happened in both operating systems I fear it may be a hardware problem but I’m hoping that there’s something I can do in Ubuntu that will solve this. Any thoughts or suggestions will be greatly appreciated. Here are some of the tech details:

inxi -Fxz
System:
Kernel: 5.11.0-43-generic x86_64 bits: 64 compiler: N/A
Desktop: Gnome 3.36.9 Distro: Ubuntu 20.04.3 LTS (Focal Fossa)
Machine:
Type: Desktop System: LENOVO product: 90ED0009US v: ideacentre 700-25ISH
serial:
Mobo: LENOVO model: SKYBAY v: SDK0J40700 WIN 3258025733565
serial: UEFI: LENOVO v: FWKT05A date: 09/11/2015
Memory:
RAM: total: 7.55 GiB used: 1.39 GiB (18.5%)
RAM Report:
permissions: Unable to run dmidecode. Root privileges required.
CPU:
Topology: Quad Core model: Intel Core i5-6400 bits: 64 type: MCP
arch: Skylake-S rev: 3 L2 cache: 6144 KiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3
bogomips: 21599
Speed: 3243 MHz min/max: 800/3300 MHz Core speeds (MHz): 1: 3254 2: 3300
3: 3217 4: 3200
Graphics:
Device-1: Intel HD Graphics 530 vendor: Lenovo driver: i915 v: kernel
bus ID: 00:02.0
Display: x11 server: X.Org 1.20.13 driver: i915
resolution: 1440x900~60Hz, 1440x900~75Hz
OpenGL: renderer: Mesa Intel HD Graphics 530 (SKL GT2) v: 4.6 Mesa 21.0.3
direct render: Yes
Audio:
Device-1: Intel 100 Series/C230 Series Family HD Audio vendor: Lenovo
driver: snd_hda_intel v: kernel bus ID: 00:1f.3
Sound Server: ALSA v: k5.11.0-43-generic
Network:
Device-1: Intel Ethernet I219-LM vendor: Lenovo driver: e1000e v: kernel
port: f040 bus ID: 00:1f.6
IF: eno1 state: up speed: 100 Mbps duplex: full mac:
Device-2: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter
vendor: Lenovo driver: ath10k_pci v: kernel port: f040 bus ID: 01:00.0
IF: wlp1s0 state: down mac:
Device-3: Qualcomm Atheros type: USB driver: btusb bus ID: 1-4:2
Drives:
Local Storage: total: 1.14 TiB used: 12.23 GiB (1.1%)
ID-1: /dev/sda vendor: Samsung model: SSD 860 EVO 250GB size: 232.89 GiB
ID-2: /dev/sdb vendor: Seagate model: ST1000DX001-SSHD-8GB
size: 931.51 GiB temp: 25 C
Partition:
ID-1: / size: 640.82 GiB used: 12.20 GiB (1.9%) fs: ext4 dev: /dev/sdb3
Sensors:
System Temperatures: cpu: 25.0 C mobo: N/A
Fan Speeds (RPM): N/A
Info:
Processes: 257 Uptime: 11m Init: systemd runlevel: 5 Compilers: gcc: N/A
Shell: bash v: 5.0.17 inxi: 3.0.38

dmesg --time-format iso -l err,crit,alert,emerg
2021-12-21T17:15:28,107110-05:00 x86/cpu: VMX (outside TXT) disabled by BIOS
2021-12-21T17:15:28,416104-05:00 tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80
2021-12-21T17:15:28,416116-05:00 tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80
2021-12-21T17:15:30,432023-05:00 usb usb1-port13: over-current condition
2021-12-21T17:15:30,568023-05:00 usb usb1-port14: over-current condition

Just a wild guess: It might be a problem with the power management.

If I were you, I’d look into the BIOS/UEFI settings after boot and disable any kind of smart (or not so smart) power management features.

I had a similar problem once with an old laptop and that “fixed” it for the last months before retirement.

In my experience random freezes can be related to (among other things)

  1. Bad RAM,
  2. Badly seated RAM,
  3. Poorly-applied, or hardened, heat-conductive compound between the heat sinks and the chips they are meant to be cooling.
  4. Dust and fluff in the cooling fans or heat-exchange radiators.
  5. Dirt and fluff on a motherboard that is being used in a humid climate, or a place where it can get very cold then be exposed to warm humid air which allows moisture to condense inside it. After a few hours it dries out and the problem goes away, which makes the problem look like an unpredictable and random (subtle difference) failure.

These are all hardware issues but they can be easy to fix if you are brave and dexterous. I have fixed many problems by opening up the machine and tackling these problems. I am successful 9 times out of 10. The tenth eludes me and I have to give up.

  1. Bad RAM. If you have two RAM DIMMs take one out. Fire up the machine and see if the problem goes away. If not, shut down the machine, open it up again, pull out the RAM DIMM that was in it and replace it with the one you took out earlier. I have been able to track down bad RAM by this process of elimination.
    Be suspicious of the DIMM slots too. Test each one individually with one known good RAM DIMM. It might take anywhere from 5 minutes to half an hour to do these swap tests.
    Also, to save time you often don’t need to fully re-assemble the machine case after swapping RAM to test it. Some computers work quite well “undressed”.
  2. Badly seated RAM. Pull out each RAM DIMM, look carefully at each gold contact pad to see if any are scored or damaged. Use a good magnifier or loupe. If they look okay just re-insert them. This can resolve bad-contact issues. Make sure they don’t look dirty.
  3. Heat-conductive compound. This means removing the heat sinks from the motherboard, cleaning the old grease off (acetone on a tiny cotton-bud works well) but you need to be fastidiously clean. Apply new heat compound / grease and re-assemble. Be brave. I’ve done it many, many times and haven’t killed a motherboard yet. Also check the motherboard for any signs of water damage, or corrosion, look for fluffy white or green stuff around any of the solder joints of all components.
  4. Check the fans for built up dust, and check the ports that they blow air through, Clean dust and dirt out if necessary.

Some fixes are not so clear. A friend’s 10 y.o. Toshiba kept crashing, blue screen. It had Win 7 as originally installed. I checked all programs. All were good but the problem remained. I applied all updates and updated all relevant drivers. Still the problem remained but less often. I totally disassembled it, cleaned RAM contacts and slots, pulled out the motherboard and cleaned it, unscrewed all heatsinks and replaced the thermal grease, and cleaned the fans and ports. But to no avail; the problem remained when I turned it on. I updated it to Win 10 and all problems went away. This was not satisfactory because I still don’t know what the problem was even though my blind tinkering fixed it. But this doesn’t sound like your problem.

Sometime I have just opened a machine, unplugged all cables and daughter boards and plugged them all in again and it has fixed random crash issues.

If the motherboard looks dirty don’t be afraid to clean it with water. I remove the motherboard and anything that is plugged into it. Then I clean it in the kitchen sink under running warm water (if you don’t like this idea don’t read any further). Scrub the motherboard lightly with a SOFT toothbrush, not a HARD one. Dry out the motherboard quickly (in say 1-2 hours), being fast enough to evaporate the water off it before it has time to corrode anything (corrosion normally takes 24 hours or more). You can dry it by putting it out in the sun on a hot summers day, or by placing it in an oven at about the boiling temperature of water but not hot enough to melt any solder joints or plastic parts. Heat the oven up to about 100C / 220F, turn it OFF, then put the motherboard in and close the door. That will dry it pretty quickly (normally less than an hour) without damaging anything. Remember that all the components were soldered onto the board, which involved high temperature, and that didn’t destroy it. Don’t forget to turn the oven off before putting the motherboard in!!

By this method I have washed motherboards from Thinkpads, an industrial DEC server motheroard so covered in dirt that I could not read any markings on it, Apple Macbook Pros, various other brand laptops, and various mobile phones. All of them worked perfectly after they had dried out and I put them back together. Speed and cleanliness is very important.

And obey all sensible anti-static precautions. Water is a good cleaner because it is electrically conductive and it will NOT carry a static charge. Blown air on the other hand carries a higher risk because fast-moving air blown over a non-conductive surface can generate a static charge on that surface depending on its properties. This is what causes the build-up of static charge that makes lightning.

Random crashes / freezes can be anywhere from very easy to extremely difficult to find so be mentally prepared for it. And don’t be too disappointed if you don’t succeed.

1 Like

I tried this, disabled a couple of things in the BIOS/UEFI settings (not sure exactly what I disabled as it was a couple of weeks ago) and it’s been running smoothly since then, all CPUs running fine and no freezes. Thanks for this info!

2 Likes

When I inherited this PC, before it started freezing up, I did give a pretty thorough dusting and cleaning, and also made sure the RAM was properly installed and seated as far as I could tell. I hopefully have solved the issue via another suggestion but if I encounter similar problems in the future I will try some of your more extensive cleanings and fixes. Thanks for these suggestions.

Well, I’ve gone and jinxed myself y thinking the problem was solved. After an Ubuntu update just now, one of the CPUs was running at near 100% again, so I went back into the BIOS/UEFI settings and I must have accidentally done something wrong because now that PC has frozen at the Lenovo start-up screen, it will not continue on to start up properly nor will it let me into the start-up menu! I’ve tried restarting, unplugging, etc. but it freezes at that screen.

  1. What model is it?
    It reminds me of a similar problem I encountered with a Lenovo machine a couple of years ago (there is a neuron blinking in the depths of my memory). In the meantime I’ll check my notes.
    And is your machine set up to dual boot (or multi-boot), or does it have just one OS installed?

Please ignore my previous question.
I can see from reading your initial notes that your machine is an Ideacentre 700, and it is set up to dual boot Win 10 and Ubuntu.

Thanks for looking into this. I thought the problem had somehow resolved itself but it froze up again today. Any info would be much appreciated.

I have been troubleshooting computer problems since about 1989 and I have come across some extremely odd problems. So I can appreciate that issues like this can be VERY difficult to resolve.

However an important step is to check the drives in case a read error (or “unable to read” error) is causing the freeze.
I see you have a Seagate 1 TB drive and a Samsung 250 GB SSD.
Have you fully tested the drives yet? Please let me know.
If you haven’t done so already I suggest you download Seagate’s Seatools drive diagnostic software and run diagnostics, including complete surface scans, on both drives. Seatools should be able to check both drives. It used to be that Seatools would check drives from other manufacturers (eg Samsung) as long as they were installed in machine that had at least one Seagate product installed too.

And here is a story just for laughs.
A friend asked me if I could look at his desktop computer that used to crash at odd times when he was playing a certain game. During my testing I noticed that it crashed when a certain beep tone sounded. I was testing it at night and didn’t want to wake my wife with all the beeping so I unplugged the speaker. I found that when the speaker was unplugged it didn’t crash. I wrote a BASIC program to play a scale of tones and I found that it crashed when a certain tone (frequency) was played but it didn’t crash if the speaker was unplugged. I reasoned that perhaps an inductive resonance was being set up and causing feedback, or perhaps an inductive transient spike was causing the CPU to misinterpret its instructions, but I couldn’t tell without an oscilloscope. There is an old adage in electronics that says “if you suspect a transient spike from a component solder a small capacitor across it. It’s very unlikely to hurt anything and it might just solve the problem.”
The smallest capacitor I had handy was 10 pF. I figured that 10 pF was too small to adversely affect the speaker so I soldered it across the speaker terminals and all problems went away. The computer never crashed again and I was regarded as “a genius” and “solver of all problems”. But that was truly a “once in a lifetime” fault that I have never heard of again or encountered again in 30 years.

1 Like

I noticed two lines in your log above saying:
2021-12-21T17:15:30,432023-05:00 usb usb1-port13: over-current condition
2021-12-21T17:15:30,568023-05:00 usb usb1-port14: over-current condition

What was plugged into your USB ports when the freezes occurred?
The spec for USB ports is that they must provide a potential difference of 5 volts and be able to supply 500 mA of current (which is not a lot).
Did you have any external drives plugged in? USB external drives often draw 400 or 500 or even 700 mA of current.
Be aware that some manufacturers design their motherboards and USB ports so that they are better than the USB specification and can supply more than 500 mA. I have come across USB ports that happily supply 700 mA or even more, but the USB specification says that they don’t have to be able to supply that much. They only have to be able to supply 500 mA in order to meet the spec.
If you have any USB devices that draw more than 500 mA they need to have their own power supply, or run off a powered USB hub, to make up the extra current they need.

Could you have a faulty USB device, or faulty USB cable plugged in that is drawing excess current?

Note that USB-C ports are very different. They can supply up around 2,000 mA (if I remember correctly). And USB-C devices can draw much more current than the 500 mA that older USB ports can supply.

You can check current draw with a small, plug-in, USB tester. Look on ebay for < USB voltage current meter >, they cost about $10 or less. They are invaluable. I have one and bought another for my wife; she loves it and uses it check all her USB chargers!

Thanks for pointing this out to me, very interesting.

The only things plugged into my USB ports are the keyboard, mouse, and printer, using three of the four USB ports in the back of the machine. The two in front I’ve only used to initially load Ubuntu OS onto this machine, although now that I think back I had a bit of trouble with the PC recognizing the flash drive, unfortunately, I can’t remember now the exact situation and what I did to make it work.

For a little while now things appeared to be running smoothly, the CPUs appeared normal, but when I just did a restart after the latest Ubuntu update one of the CPUs went back to running high again.

Also, when this is occurring I cannot put the PC into suspend mode, it immediately wakes up again. Sometimes when I shut it down completely and restart later the CPUs appear to run normally, until the next restart or freeze. The last freeze actually occurred when all four CPUs were running very low. Go figure…

Anyway, below is the most recent error report I ran just after the last restart. It shows those USB port issues again along with some errors that appear related to the failure to suspend. Some of those error lines seem to be connected to USBs in some way, for example:

022-01-10T18:16:12,382170-05:00 PM: Device usb2 failed to suspend async: error -16

I’ll try looking into those USB cables and see what happens. Thanks again for your insights.

dmesg --time-format iso -l err,crit,alert,emerg
2022-01-10T18:12:45,108136-05:00 x86/cpu: VMX (outside TXT) disabled by BIOS
2022-01-10T18:12:45,413145-05:00 tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80
2022-01-10T18:12:45,413156-05:00 tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80
2022-01-10T18:12:47,457048-05:00 usb usb1-port13: over-current condition
2022-01-10T18:12:47,593048-05:00 usb usb1-port14: over-current condition
2022-01-10T18:16:12,382164-05:00 PM: dpm_run_callback(): usb_dev_suspend+0x0/0x20 returns -16
2022-01-10T18:16:12,382170-05:00 PM: Device usb2 failed to suspend async: error -16
2022-01-10T18:16:13,461997-05:00 PM: Some devices failed to suspend, or early wake event detected
2022-01-10T18:16:14,941800-05:00 PM: dpm_run_callback(): usb_dev_suspend+0x0/0x20 returns -16
2022-01-10T18:16:14,941806-05:00 PM: Device usb2 failed to suspend async: error -16
2022-01-10T18:16:15,684121-05:00 PM: Some devices failed to suspend, or early wake event detected
2022-01-10T18:16:38,733326-05:00 PM: dpm_run_callback(): usb_dev_suspend+0x0/0x20 returns -16
2022-01-10T18:16:38,733345-05:00 PM: Device usb2 failed to suspend async: error -16
2022-01-10T18:16:39,464833-05:00 PM: Some devices failed to suspend, or early wake event detected
2022-01-10T18:16:40,555615-05:00 PM: dpm_run_callback(): usb_dev_suspend+0x0/0x20 returns -16
2022-01-10T18:16:40,555621-05:00 PM: Device usb2 failed to suspend async: error -16
2022-01-10T18:16:41,307720-05:00 PM: Some devices failed to suspend, or early wake event detected

It probably goes without saying, but have you gone to the Lenovo site / Support, and downloaded and installed the latest USB drivers for your machine?

That being said I had a Lenovo Thinkpad T500 that exhibited similar problems to those you are describing. I.e., it would run fine for a week or so, then fail at the most inconvenient times. That was in 2011. I was working at sorting out the problems with it but somebody broke into my friend’s car and stole it before I had finished my troubleshooting plan.