Tight Loop in Linux

Hi,
I have LM at the current level of release 22.1.
I have had a rare few times gone into a very tight CPU (?) loop with Linux where the mouse nor keyboard will respond. Not just with LM at 22.1, but with different versions of Linux Mint. The only way out of this type of loop is to turn off the PC. Once the PC comes back on all is fine.

Today was one of those rare times when the PC hanged. So it is a minor problem, but I would like to know what I log files or other places to look to get an idea of what caused the loop.

Back in my old mainframe days, the software had a trace table. So if you had a memory (core) dump, you could go to the trace table and see what was the last entry was.

Thanks,
Howard

So, do you have any suggestion that I could use to look into this or any other problem I might have with Linux?

3 Likes

Hi Howard,
Try /var/log/messages if you have syslog running.
if you only have systemd journal I cant help

Linux rarely core dumps, so you cant use those old techniques.

Maybe when it hangs, you could go to another console ( F2, F3, etc) and login and see what process is using all the time. If it really is a tight loop there will be a process consuming time.

Regards
Neville

2 Likes

With recent kernels, the kernel will display a QR-code when it panics. If there’s no kernel panic, then there’s no QR-code and it’s something else which is causing the problem.

I would look at /var/log/messages, also /var/log/Xorg.log would be of interest.

3 Likes

Yes, LM is systemd. I will have to look into the option, if I have one, to run syslog.

1 Like

I did not see a QR-code, the PC just locked up, so I doubt it was a kernel panic.

2 Likes

You should have, it may even be running
ps ax | grep syslogd
will tell you if it is running.

or
What do you have in /var/log ?
are there any log files?

Debian does some conventional logs in addition to systemd journal
Alternatively learn how to sesrch the systemd journal

1 Like

What he said… Alternatively - if you have more than one computer - SSH to it from another… I sometimes have to do this when Brave Browser eats my main Pop!_OS desktop machine - or - sometimes ffmpeg - I can SSH from another Linux or MacOS computer, run btop or top to have a look, “ps -ef” to find PIDs, kill the CPU hog or dodgy process and get my system back…
e.g.

ssh titan
btop
ps -ef |grep -i brave
kill -9 $(ps -ef |grep -i brave | awk '{print $2}')

or

ps -ef |grep -i ffmpeg
kill -9 $(ps -ef |grep -i ffmpeg | awk '{print $2}')

Sometimes ffmpeg will cause my PC to hang if I forget to limit the number of threads it can use…

Sometimes a seemingly hung system - where even “Ctrl+F1,2,3” won’t get the TTY - it will still respond over SSH. That’s assuming you’re running sshd (I think the default in most Ubuntu based distros is to not install openssh-server / sshd) - I personally don’t see the point of not having sshd running - I guess its good security practise, but I think of Linux and UNIX as Server - AND - Desktop machine simultaneously :smiley:
– edit (correction) –
I should also grep -v for grep itself above :
So

ps -ef |grep -i ffmpeg
kill -9 $(ps -ef |grep -i ffmpeg|grep -v grep | awk '{print $2}')

It’s not a biggie - but with the “grep -v grep” it will probably try to kill the grep process as well and it probably won’t exist…

2 Likes

In that case:
Could you post /var/log/messages and /var/log/Xorg.log of when the crash happened?

2 Likes

@easyt50 :

Hi Howard, :wave:

I know it´s not a solution of your problem, it´s rather a side-note.

Instead of turning off the PC (I guess you meant with the help of the on/off-switch)
it´s advisable to perform this action with the help of the REISUB sequence.

In order to get it working do the following:

  • edit the file /etc/sysctl.d/10-magic-sysrq.conf. At the end there should be the entry:
    kernel.sysrq = 244 (see remarks below)

  • Press “Alt” and “Print” at the same time and keep them pressed

  • Then type: R - E - I - S - U - B, preferably allowing a second to elapse before hitting the next key

Explanation:

  • R: unraw : Puts the keyboard back in raw mode
  • E: term : Sends a SIGTERM to all processes except the system’s init process
  • I: kill : Sends a SIGKILL to all processes except init
  • S: sync : Syncs all mounted filesystems, ensuring data is saved
  • U: umount: Remounts all filesystems in read-only mode, to protect data
  • B: reboot : Reboots the system immediately, without unmounting filesystems

Sources:

remarks:

There exists a less radical way than rebooting the whole system. If SysReq key works, you can kill processes one-by-one using Alt+SysReq+F. Kernel will kill the mostly ÂŤexpensiveÂť process each time. If you want to kill all processes for one console, you can issue Alt+SysReq+K.

NOTE: You should explicitly enable these key combinations. Ubuntu ships with sysrq default setting 176 (128+32+16), which allows to run only SUB part of REISUB combination. You can change it to 1 (all commands enabled) or 244 which is potentially less harmful.

(from source #3, bold by me)

Hope it helps a bit.

Many greetings from Rosika :slightly_smiling_face:

4 Likes

Here is some of the log. Some of the lines were highlighted in red. I turned the time stamp to bold for the lines that were in red.

Feb 22 18:06:43 HP kernel: sd 5:0:0:0: [sdd] Attached SCSI disk
Feb 22 18:06:43 HP kernel: scsi 5:0:0:1: Wrong diagnostic page; asked for 1 got 8
Feb 22 18:06:43 HP kernel: scsi 5:0:0:1: Failed to get diagnostic page 0x1
Feb 22 18:06:43 HP kernel: scsi 5:0:0:1: Failed to bind enclosure -19
Feb 22 18:06:43 HP kernel: ses 5:0:0:1: Attached Enclosure device
Feb 22 18:06:43 HP systemd-udevd[393]: /etc/udev/rules.d/99-megasync-udev.rules:1 The line has no effect, ignoring.
Feb 22 18:06:44 HP udisksd[907]: Error probing device: Error sending ATA command IDENTIFY DEVICE to ‘/dev/sdd’: Unexpected sense data returned:
0000: f0 00 01 00 50 00 01 0a 80 00 00 00 00 1d 00 00 …P…
0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
(g-io-error-quark, 0)
Feb 22 18:06:44 HP kernel: [UFW BLOCK] IN=eno1 OUT= MAC= SRC=192.168.1.230 DST=239.255.255.250 LEN=635 TOS=0x00 PREC=0x00 TTL=1 ID=11275 DF PROTO=UDP >
Feb 22 18:06:44 HP kernel: [UFW BLOCK] IN=eno1 OUT= MAC= SRC=fe80:0000:0000:0000:4808:148a:a9b2:af96 DST=ff02:0000:0000:0000:0000:0000:0000:000c LEN=6>
Feb 22 18:06:49 HP kernel: FAT-fs (dm-1): error, fat_get_cluster: invalid cluster chain (i_pos 0)
Feb 22 18:06:49 HP kernel: FAT-fs (dm-1): Filesystem has been set read-only
Feb 22 18:07:27 HP kernel: [UFW BLOCK] IN=eno1 OUT= MAC=f4:39:09:08:f1:1b:18:78:d4:31:5c:04:08:00 SRC=74.120.8.24 DST=192.168.1.230 LEN=52 TOS=0x00 PR>
Feb 22 18:07:27 HP kernel: [UFW BLOCK] IN=eno1 OUT= MAC=f4:39:09:08:f1:1b:18:78:d4:31:5c:04:08:00 SRC=74.120.9.92 DST=192
Feb 22 18:08:16 HP kernel: FAT-fs (dm-1): error, fat_get_cluster: invalid cluster chain (i_pos 0)
Feb 22 18:08:28 HP kernel: [UFW BLOCK] IN=eno1 OUT= MAC=f4:39:09:08:f1:1b:18:78:d4:31:5c:04:08:00 SRC=74.120.8.24 DST=192.168.1.230 LEN=52 TOS=0x00 PR>
Feb 22 18:08:29 HP kernel: [UFW BLOCK] IN=eno1 OUT= MAC=f4:39:09:08:f1:1b:18:78:d4:31:5c:04:08:00 SRC=74.120.9.92 DST=192.168.1.230 LEN=52 TOS=0x00 PR>
Feb 22 18:08:32 HP kernel: FAT-fs (dm-1): error, fat_get_cluster: invalid cluster chain (i_pos 0)

The output of the command.
ps ax | grep syslogd
854 ? Ssl 0:00 /usr/sbin/rsyslogd -n -iNONE
4780 pts/1 S+ 0:00 grep --color=auto syslogd

I have never use SSH. Maybe I need to do some reading and testing ro learn how to use to.

Had not heard of sysrq, but here is what mine looks like.

  • For example, to enable both control of console logging level and
  • debugging dumps of processes: kernel.sysrq = 10

kernel.sysrq = 176

To all, thank you for your suggestions. It is all new to me and will require a lot of reading and testing to try and understand some of it. I posted an update.

3 Likes

It seems to be having trouble with a FAT filesystem?

Do you have a USB drive plugged in maybe?
Only other likely fat filesystem woukd be the EFI partition, and it would not be looking at that after boot is completed.

also

Is this some usb disk enclosure?
You might check the connections.

1 Like

Update to Linux Loop

Yes, it is an external disk plugged in an USB port.

The loop happen a second time! And now I know what caused the loop, but not why.

I had performed a permanent delete (Shift+Delete) of a directory on the attached disk. The disk was formatted as fats and was encrypted with Truecrypt. I used Truecrypt so that that I could use the disk on both Linux & Windows.

When I mounted the disk on Windows, Win said the disk needed to be scan for errors. After the scan and repair on Win, I remounted the disk on Linux. I deleted another directory and this time Linux did not go into a loop.

2 Likes

I see here a problem.
Can you check SMART of /dev/sdd?
And of course the connections to that drive: if not the drive itself, I suspect unreliable connection or power connection to that drive.
The SMART will show.

Edit:

I see…

How is it powered? Only from USB?
Do you plug it in USB3 or USB2?

3 Likes

I’m pretty sure it is in a USB 2 port. Hard to see as the port is in the back of the desktop. I will see if I can check. Yes, all the ports in back of the PC are USB2.

The disk stand has it’s own power supply.

Linux ‘Disks’ does not show any smart data available.

2 Likes

Try running smartctl on that disk .
Disks may not see demountables.

3 Likes

Hi Howard, :wave:

Right. That´s the default setting.
The explanation from the 3rd source I mentioned above is:

You should explicitly enable these key combinations. Ubuntu ships with sysrq default setting 176 (128+32+16), which allows to run only SUB part of REISUB combination. You can change it to 1 (all commands enabled) or 244 which is potentially less harmful.

This procedure is always part of my to-do tasks after setting up a new system. :wink:

Hope it helps.

Many greetings from Rosika :slightly_smiling_face:

3 Likes

@nevj :

Hi Neville and all, :wave:

Good idea.
However, if it´s an external disk then …

… External USB enclosures usually have a converter chip (e.g. a USB SATA bridge), which acts as a converter between the different interfaces/protocols. It may happen that an S.M.A.R.T. query is not implemented correctly…

… and thus hard disks in external enclosures appear as “not supported”, although they are quite S.M.A.R.T. capable of next door. The use of a special option provides a remedy:

sudo smartctl -A -d sat /dev/sdX

(from ubuntuusers # original source in German)

Might be worth keeping in mind.

Cheers from Rosika :slightly_smiling_face:

3 Likes