Question regarding kernel panic

Rosika · November 1, 2023, 1:52pm

Hi all,

I ran into something weird today.

Here´s the setup scenario:

My main system Linux Lite 6.2. is installed on an external HDD. It´s a “Western Digital Technologies, Inc. Elements 25A2”.
So I always boot my system from there. I´ve been doing it this way for years and it´s been working really well.

Today however, shortly after pressing the on/off button, the status messages reported something about “kernel panic”, and the system wouldn´t boot.
I wish I had taken a photo of the screen so that I could be more specific about it. Well, that´s all I know.

So I pressed the on/off button (holding it down for a while) in order to power the machine down. The OS hadn´t booted anyway.

Now I tried it again and this time it worked flawlessly. No message of that kind anymore and the OS booted alright.

I believe the kernel panic message related to a USB device 1-1.2.3. I´m almost (but not totally) sure of that .
Now I looked it up:

lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/6p, 480M
        |__ Port 4: Dev 3, If 0, Class=Vendor Specific Class, Driver=rtsx_usb, 480M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 2: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
            |__ Port 2: Dev 4, If 1, Class=Human Interface Device, Driver=usbhid, 12M
            |__ Port 2: Dev 4, If 0, Class=Human Interface Device, Driver=usbhid, 12M
            |__ Port 3: Dev 5, If 0, Class=Mass Storage, Driver=usb-storage, 480M
            |__ Port 4: Dev 6, If 0, Class=Hub, Driver=hub/4p, 480M
[...]

It seems to be device 5. Then:

lsusb
[...]
Bus 001 Device 005: ID 1058:25a2 Western Digital Technologies, Inc. Elements 25A2
Bus 001 Device 004: ID 0458:0185 KYE Systems Corp. (Mouse Systems) Wireless Mouse
Bus 001 Device 003: ID 05e3:0608 Genesys Logic, Inc. Hub
Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

… which seems to point at the external HDD. I don´t like that.

The next step was to have a look at the HDD´s health for any clues:

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-87-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   122   108   021    Pre-fail  Always       -       4875
  4 Start_Stop_Count        0x0032   097   097   000    Old_age   Always       -       3434
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   084   084   000    Old_age   Always       -       12103
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       2399
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       163
193 Load_Cycle_Count        0x0032   178   178   000    Old_age   Always       -       66425
194 Temperature_Celsius     0x0022   121   104   000    Old_age   Always       -       26
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

I couldn´t find anything unusual.

To make sure everything still works as it should I powered down the PC again and initiated a cold start once more.
Again: everything´s alright.
So out of 3 attempts it´s just the first one which wasn´t successful.

I´ve no idea what might have initiated that kernel panic…

I also searched the log files but couldn´t find anything regarding that matter. Probably there would be no log mentioning the kernel panic as the system didn´t start the first time.

Hmm… I´m not quite sure what to do now, if anything.
What do you think

Many greetings from Rosika

easyt50 · November 1, 2023, 3:51pm

Hi Rosika,

I was doing a little research / reading on kernel panic. I found a site that I thought was interesting. In the article was this statement;

“A single, isolated kernel panic error doesn’t mean something is wrong with the hardware or software.”

Here is the link to the site if you would like to visit it.

Rosika · November 1, 2023, 4:05pm

Hi Howard,

thank you so much for your response and for looking up some information on my behalf.
I´ve done a bit of research myself but didn´t stumble across this article.

I´ll read it through thoroughly.

Like you pointed out:

A single, isolated kernel panic error doesn’t mean something is wrong with the hardware or software.
[However, if the system goes into a state of kernel panic often, tracking down the root of the issue is highly recommended. ]

That´s a bit reassuring.

But the “tracking down the root of the issue” part seems a bit strange to me.
How would I go about that if there are no log files present…

Well, as I said, I´ll have to read the whole of the article yet.

Thank you so much, Howard, for your help.

Many greetings from Rosika

P.S.:

I´ve read the whole of the article now.
Seems pretty informative.

So /var/log/kern.log seems to be the log-file to look out for.

I looked at it and it didn´t return anything hinting towards kernel panic:

rosika@rosika-Lenovo-H520e ~> cat /var/log/kern.log | grep panic
rosika@rosika-Lenovo-H520e ~ [0|1]>

The kernel logs for today´s date started at 1.46pm, which refer to my second (successful) boot attempt.

This is the first line:

Nov 1 13:46:19 rosika-Lenovo-H520e kernel: [ 0.000000] microcode: microcode updated early to revision 0x21, date = 2019-02-13

So there´s no entry as far as the previous one is concerned.

Steve_Rider · November 1, 2023, 9:42pm

I suspect your external drive was not completely up to speed when the error occurred.
Another possibility is that you may have connected or disconnected a different USB device just as it was booting.

I would not worry about this, flukes do happen.

nevj · November 1, 2023, 10:26pm

Look at the last few lines.

The most common issue leading to a kernel panic is when the
ram image has booted and the boot process is looking to shift to the HD and it cant find the / partition.
So some gliche with the usb drive… only other things would be a software update or a mistake editing fstab.

kovacslt · November 2, 2023, 1:16pm

When I see mysterious thing happen, the first I check is storage (SMART) and then the RAM (Memtest).

Reading your story, at the first paragaraph the question raised in my mind about the SMART data, but in the next paragraph you provided it.

Looking at it, your drive seems to be perfect.

If that phenomenon repeats, maybe gets regular, I’d investigate in the direction of the case of that drive, maybe the cable being used to connect it.
If it doesn’t repeat, and it was a “once in a decade”, I think it’s impossible to find the culprit.

Rosika · November 2, 2023, 1:59pm

Hi all,

Thank you for your replies.

@Steve_Rider :

Hi Steve, and welcome to the community.

Thanks for providing some possible explanations.

Well, I can eliminate this one. Perhaps it was your first suggestion then.

@nevj :

I would, Neville, but I can´t.

Like I said in my post #3:

Surely you meant the kernel log entries of the failed boot, but there are none.
I could just provide the last few lines of the succeeding one, which I guess wouldn´t be of much help.

I didn´t edit fstab. So that would leave the software update then.
That´s always a possibility, I think.

BTW:

I updated my system only yesterday, right after the “problem” had occurred. I even got a new kernel version (5.15.0.-88-generic). Perhaps it´ll help…

@kovacslt :

Thanks, László, for your evaluation. That´s good to know.

O.K., that´s a possibility as well.

Looking at the article @easyt50 provided yesterday I realized they provide a screenshot as well:

This seems familiar. Apart from the digits I think that was the last entry with me as well.

In the article they mentioned a number of reasons why a kernel panic would occur.
On the other hand they pointed out:

A single, isolated kernel panic error doesn’t mean something is wrong with the hardware or software.

That´s what Howard referred to as well.

Well, I keep asking myself: “How can that be?”
I mean if everything´s O.K. with both the hardware and the software there shouldn´t be a kernel panic in the first place, right?

Well, I guess that´s this “once in a decade” event you were talking about, László.

BTW:

After the kernel panic I booted my system for the fourth time now. No problem since then.

Many thanks to all of you and many greetings from Rosika

kovacslt · November 2, 2023, 3:46pm

Right.

Something WAS definitely not OK when that panic happened.
So if that was a single random error, you don’t need to worry, and it’s unlikely to find the cause.
If it is recurring, then it’s possible to find tha pattern, and track down the cause.

Rosika · November 2, 2023, 3:56pm

Thanks, László.

I´ll keep an eye on it. Couldn´t do otherwise, even if I wanted to.

Here I found a question regarding

Cannot boot because: Kernel panic - not syncing: Attempted to kill init!

… which seems pretty much the same I ran into.
The article is 11 years old though. But it seems thing like that happened back then as well.

One commentator remarked:

From my experience, I think that this problem is caused by upgrading to a newer kernel version.

Well, as I said, I upgraded to the latest kernel after that event.
Now I´m looking for some sort of protocol stating all the changes that were applied to
kernel 5.15.0-88.98.

I didn´t find anything on the net.
Do you by any chance know where such things can be looked up, or if there´s a site dealing with kernel changes at all

Thanks a lot and many greetings from Rosika

Rosika · November 3, 2023, 1:50pm

Hi all,

I did some more research on the matter and I finally came up with 2 methods that worked for me:

Within synaptic package manager I looked for “linux-image-5.15.0-88-generic”.
It found the package (which of course is already installed on my system) and from within synaptic I could click on the “get changelog” tab, which provided me with the info I was looking for.
I also used the terminal.
The command
apt changelog linux-image-5.15.0-88-generic
worked as well.
Curiously enough I couldn´t find the “changelog” option in apt´s man pages. It didn´t mention it. But: looking for it in apt-get´s man pages was successful. Here the “changelog” option of the command is mentioned.

I haven´t looked through the results yet, just wanted to inform about (at least some of) the ways to get changelog information about packages.

I´m sure most of you knew that already. But perhaps there´s still someone around who didn´t and would be interested in it as well.

Many greetings from Rosika

P.S.

In the meantime I had a look ath th results.
Seems they didn´t get me far after all:

Holen:1 https://changelogs.ubuntu.com linux-signed 5.15.0-88.98 Changelog [117 kB]
linux-signed (5.15.0-88.98) jammy; urgency=medium

  * Master version: 5.15.0-88.98

  * Miscellaneous Ubuntu changes
    - debian/tracking-bug -- update from master

 -- Roxana Nicolescu <roxana.nicolescu@canonical.com>  Mon, 02 Oct 2023 15:44:00 +0200
[...]

Hmm… doesn´t seem to be that useful to me.

nevj · November 3, 2023, 11:44pm

Hi Rosika,
I did not know either of those methods, but I do know that the
changelog files for software that you have installed can
be found in /usr/share/doc... and maybe also in /usr/local/share/doc . I think maybe only Debian based distros have this. BSD seems to have the info, but not in files called changelog

Regards
Neville

Rosika · November 4, 2023, 12:44pm

Hi Neville,

thanks for the info.

ll /usr/local/share/doc
ls: cannot access '/usr/local/share/doc': No such file or directory

So you´re right. I don´t have it.
But the other one seems to be available:

ll /usr/share/doc/linux-image-5.15.0-88-generic
total 24K
-rw-r--r-- 1 root root  17K Okt  2 15:44 changelog.Debian.gz
-rw-r--r-- 1 root root 1,5K Mär 13  2023 copyright

I´ll try to have a look at it.

file /usr/share/doc/linux-image-5.15.0-88-generic/changelog.Debian.gz 
/usr/share/doc/linux-image-5.15.0-88-generic/changelog.Debian.gz: gzip compressed data, max compression, from Unix, original size modulo 2^32 116926

So I have to de-compress the file first.

Thanks again and many greetings from Rosika

Rosika · November 5, 2023, 3:40pm

Hi Neville,

taking up the method you suggested I looked at what /usr/share/doc/linux-image-5.15.0-88-generic/changelog.Debian.gz had to say.

It´s this:

linux-signed (5.15.0-88.98) jammy; urgency=medium

  * Master version: 5.15.0-88.98

  * Miscellaneous Ubuntu changes
    - debian/tracking-bug -- update from master

 -- Roxana Nicolescu <roxana.nicolescu@canonical.com>  Mon, 02 Oct 2023 15:44:00 +0200
[...]

So no news there. It provides exactly the same content as I got by applying the other two methods.

Such a shame. It doesn´t get me any further, it seems.

Still: thanks a lot for your help, Neville.

Cheers from Rosika

nevj · November 5, 2023, 4:09pm

I expected it to be the same.

Rosika · November 10, 2023, 1:15pm

Hi all,

Update:

That kernel panic business happened again today, I´m sorry to say.

Last time (and first time) it happened was 9 days ago.
I used my PC on a daily basis, so I powered on the PC at least 9 times since then (actually more often due to reboots in the wake of updates) and I didin´t encounter any diffiulties… until just now.

This time I remembered to take a screenshot of the monitor with my smartphone.
(sorry for the mediocre quality):

These lines I don´t like:

failed to execute /init (error -2) [line 3]
Kernel panic - not syncinc … [line 13]
panic … [line 22]
end kernel panic [lst line]

Does this info help you at all

The same situation as 9 days ago:

I powered down the PC with the on/off button.
I tried again and this time time it worked flawlessly.

I don´t understand it: There can´t be anything fundamentally wrong, can it?
It worked well for 9 days and even the second attempt to boot was o.k. today.

I´m getting worried now.
Any advice

Many greetings from Rosika.

P.S.:

I think (and hope) I may rule out any hardware issues.
I already checked the HDD´s health, which seems fine.
I also checked the RAM with memtester and also with memtest. No issues reported.

The intermittent nature of the problem seems weird.
Why would the PC start the second time without fail (plus the fact that the problem didn´t occur for 9 days in a row)..
No changes whatsoever were applied in the meantime…

JoelA · November 10, 2023, 5:30pm

Probably not the problem, but might be worth trying.
Replace cmos battery. A low cmos battery can cause all kinds of strange random trouble.

In the past, I’ve had a random issue that wouldn’t be consistent, but it would start occurring more more frequently. - turned out to be cmos battery

kovacslt · November 10, 2023, 6:20pm

It looks like it got more frequent, and that pattern suggests to me that this is a just now developing hardware problem.
Replace that CMOS battery, that won’t hurt.
Maybe it helps.
I suspect the culprit to be the casement of your external drive, or more likely the USB cable. If you regularly remove the cable, then reconnect, the connection may get loose, causing first random errors, then later systematic errors.
A loose USB cable drove me crazy with a USB capture dongle: it worked flawlessly, then suddenly it freezed: checking the dmesg led me to the answer, that it was disconnected, but reconnected in fraction of a second. Then moving and bending the cable made it possible to reproduce my problem 100%, so I changed that cable.
Try slightly moving and bending the cable while booting, especially near the connectors (on both ends consecutively), and see if you can CAUSE the problem to appear?

JoelA · November 10, 2023, 7:14pm

FYI-
One of my issues was an inconsistent error during bootup- I can’t remember the exact error. It was years ago. Another time it was an inconsistent issue with a usb keyboard not functioning correctly.
Both times, replacing the cmos battery fixed the issue.

nevj · November 10, 2023, 8:27pm

I think It is trying to sync your wireless mouse when the panic occurs.
You might try a new mouse battery,

No its not the mouse, it is
usb1,1,2,3 Product Elements 25A2
what is that?

Intermittant problems are usually hardware, so maybe the CMOS battery.

I think the boot is past the point where it starts using the HDD… so it may not be that, at least this time.

Can you run a memtest?

Dont you panic … the kernel is doing it for you.

easyt50 · November 10, 2023, 9:07pm

Hi Rosika,

As you and many already know, an intermittent error is one of the hardest kind of errors to find the solution to. I too, feel you may have a hardware problem but right now it is also intermittent. I believe it is hardware since you have not performed any software changes.

I performed a google search for “failed to execute /init error 2”. There was many hits, but most seem to not apply to you or I did not fully understand the reason the person was getting the error.

Thru my reading, I found 2 items that did sound interesting.

1 - “This only happens when the EFI partition is on an external disk connected by USB. And only on my older macbook pro from 2012.” This item was interesting because he was booting off of a USB device.

2 - “Looks like it was a hardware issue.I found a cheap stick of DDR4 RAM on ebay to use as a test. No issues since replacing the RAM. The RAM that was causing the problem didn’t show any errors with memtest, or with the Windows Memory Diagnostic. It also ran fine with various stress tests.”
Again the finger was pointing to hardware.

You may want to start a list of things that you try to resolve the kernel panic problem. Some items that come to mind that you might try are;

Try a different USB port on your PC
Replace the USB cable
Replace the USB external controller
Re-seating the memory sticks
If you have 2 memory sticks, even tho your system would run slower, remove one stick of memory.
Replace the external disk with a spare if you have one.

The problem with this type of error is after you make a change, it may take 2 weeks or more to know if you are going to get the kernel panic again. Another option is to do nothing at this time since you are able to just re-boot and everything is fine. Wait to see how often it happens, document to see if the error message changes any, have a backup plan (like a live USB) or a good backup to restore in case the PC fails to boot.

I think it is interesting that the error, so far, has only occurred for the first boot of the day. This may be a clue. It is almost like the external disk is not quite ready yet.

Good Luck and report back on your progress.

Take care,
Howard