Hi,
I have LM at the current level of release 22.1.
I have had a rare few times gone into a very tight CPU (?) loop with Linux where the mouse nor keyboard will respond. Not just with LM at 22.1, but with different versions of Linux Mint. The only way out of this type of loop is to turn off the PC. Once the PC comes back on all is fine.
Today was one of those rare times when the PC hanged. So it is a minor problem, but I would like to know what I log files or other places to look to get an idea of what caused the loop.
Back in my old mainframe days, the software had a trace table. So if you had a memory (core) dump, you could go to the trace table and see what was the last entry was.
Thanks,
Howard
So, do you have any suggestion that I could use to look into this or any other problem I might have with Linux?
Hi Howard,
Try /var/log/messages if you have syslog running.
if you only have systemd journal I cant help
Linux rarely core dumps, so you cant use those old techniques.
Maybe when it hangs, you could go to another console ( F2, F3, etc) and login and see what process is using all the time. If it really is a tight loop there will be a process consuming time.
With recent kernels, the kernel will display a QR-code when it panics. If thereâs no kernel panic, then thereâs no QR-code and itâs something else which is causing the problem.
I would look at /var/log/messages, also /var/log/Xorg.log would be of interest.
What he said⌠Alternatively - if you have more than one computer - SSH to it from another⌠I sometimes have to do this when Brave Browser eats my main Pop!_OS desktop machine - or - sometimes ffmpeg - I can SSH from another Linux or MacOS computer, run btop or top to have a look, âps -efâ to find PIDs, kill the CPU hog or dodgy process and get my system backâŚ
e.g.
Sometimes ffmpeg will cause my PC to hang if I forget to limit the number of threads it can useâŚ
Sometimes a seemingly hung system - where even âCtrl+F1,2,3â wonât get the TTY - it will still respond over SSH. Thatâs assuming youâre running sshd (I think the default in most Ubuntu based distros is to not install openssh-server / sshd) - I personally donât see the point of not having sshd running - I guess its good security practise, but I think of Linux and UNIX as Server - AND - Desktop machine simultaneously â edit (correction) â
I should also grep -v for grep itself above :
So
I know it´s not a solution of your problem, it´s rather a side-note.
Instead of turning off the PC (I guess you meant with the help of the on/off-switch)
it´s advisable to perform this action with the help of the REISUB sequence.
In order to get it working do the following:
edit the file /etc/sysctl.d/10-magic-sysrq.conf. At the end there should be the entry: kernel.sysrq = 244 (see remarks below)
Press âAltâ and âPrintâ at the same time and keep them pressed
Then type: R - E - I - S - U - B, preferably allowing a second to elapse before hitting the next key
Explanation:
R: unraw : Puts the keyboard back in raw mode
E: term : Sends a SIGTERM to all processes except the systemâs init process
I: kill : Sends a SIGKILL to all processes except init
S: sync : Syncs all mounted filesystems, ensuring data is saved
U: umount: Remounts all filesystems in read-only mode, to protect data
B: reboot : Reboots the system immediately, without unmounting filesystems
There exists a less radical way than rebooting the whole system. If SysReq key works, you can kill processes one-by-one using Alt+SysReq+F. Kernel will kill the mostly ÂŤexpensiveÂť process each time. If you want to kill all processes for one console, you can issue Alt+SysReq+K.
NOTE: You should explicitly enable these key combinations. Ubuntu ships with sysrq default setting 176 (128+32+16), which allows to run only SUB part of REISUB combination. You can change it to 1 (all commands enabled) or 244 which is potentially less harmful.
Here is some of the log. Some of the lines were highlighted in red. I turned the time stamp to bold for the lines that were in red.
Feb 22 18:06:43 HP kernel: sd 5:0:0:0: [sdd] Attached SCSI disk
Feb 22 18:06:43 HP kernel: scsi 5:0:0:1: Wrong diagnostic page; asked for 1 got 8
Feb 22 18:06:43 HP kernel: scsi 5:0:0:1: Failed to get diagnostic page 0x1
Feb 22 18:06:43 HP kernel: scsi 5:0:0:1: Failed to bind enclosure -19
Feb 22 18:06:43 HP kernel: ses 5:0:0:1: Attached Enclosure device
Feb 22 18:06:43 HP systemd-udevd[393]: /etc/udev/rules.d/99-megasync-udev.rules:1 The line has no effect, ignoring.
Feb 22 18:06:44 HP udisksd[907]: Error probing device: Error sending ATA command IDENTIFY DEVICE to â/dev/sddâ: Unexpected sense data returned:
0000: f0 00 01 00 50 00 01 0a 80 00 00 00 00 1d 00 00 âŚPâŚ
0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 âŚ
(g-io-error-quark, 0)
Feb 22 18:06:44 HP kernel: [UFW BLOCK] IN=eno1 OUT= MAC= SRC=192.168.1.230 DST=239.255.255.250 LEN=635 TOS=0x00 PREC=0x00 TTL=1 ID=11275 DF PROTO=UDP >
Feb 22 18:06:44 HP kernel: [UFW BLOCK] IN=eno1 OUT= MAC= SRC=fe80:0000:0000:0000:4808:148a:a9b2:af96 DST=ff02:0000:0000:0000:0000:0000:0000:000c LEN=6>
Feb 22 18:06:49 HP kernel: FAT-fs (dm-1): error, fat_get_cluster: invalid cluster chain (i_pos 0)
Feb 22 18:06:49 HP kernel: FAT-fs (dm-1): Filesystem has been set read-only
Feb 22 18:07:27 HP kernel: [UFW BLOCK] IN=eno1 OUT= MAC=f4:39:09:08:f1:1b:18:78:d4:31:5c:04:08:00 SRC=74.120.8.24 DST=192.168.1.230 LEN=52 TOS=0x00 PR>
Feb 22 18:07:27 HP kernel: [UFW BLOCK] IN=eno1 OUT= MAC=f4:39:09:08:f1:1b:18:78:d4:31:5c:04:08:00 SRC=74.120.9.92 DST=192
Feb 22 18:08:16 HP kernel: FAT-fs (dm-1): error, fat_get_cluster: invalid cluster chain (i_pos 0)
Feb 22 18:08:28 HP kernel: [UFW BLOCK] IN=eno1 OUT= MAC=f4:39:09:08:f1:1b:18:78:d4:31:5c:04:08:00 SRC=74.120.8.24 DST=192.168.1.230 LEN=52 TOS=0x00 PR>
Feb 22 18:08:29 HP kernel: [UFW BLOCK] IN=eno1 OUT= MAC=f4:39:09:08:f1:1b:18:78:d4:31:5c:04:08:00 SRC=74.120.9.92 DST=192.168.1.230 LEN=52 TOS=0x00 PR>
Feb 22 18:08:32 HP kernel: FAT-fs (dm-1): error, fat_get_cluster: invalid cluster chain (i_pos 0)
The output of the command.
ps ax | grep syslogd
854 ? Ssl 0:00 /usr/sbin/rsyslogd -n -iNONE
4780 pts/1 S+ 0:00 grep --color=auto syslogd
I have never use SSH. Maybe I need to do some reading and testing ro learn how to use to.
Had not heard of sysrq, but here is what mine looks like.
For example, to enable both control of console logging level and
debugging dumps of processes: kernel.sysrq = 10
kernel.sysrq = 176
To all, thank you for your suggestions. It is all new to me and will require a lot of reading and testing to try and understand some of it. I posted an update.
It seems to be having trouble with a FAT filesystem?
Do you have a USB drive plugged in maybe?
Only other likely fat filesystem woukd be the EFI partition, and it would not be looking at that after boot is completed.
also
Is this some usb disk enclosure?
You might check the connections.
Yes, it is an external disk plugged in an USB port.
The loop happen a second time! And now I know what caused the loop, but not why.
I had performed a permanent delete (Shift+Delete) of a directory on the attached disk. The disk was formatted as fats and was encrypted with Truecrypt. I used Truecrypt so that that I could use the disk on both Linux & Windows.
When I mounted the disk on Windows, Win said the disk needed to be scan for errors. After the scan and repair on Win, I remounted the disk on Linux. I deleted another directory and this time Linux did not go into a loop.
I see here a problem.
Can you check SMART of /dev/sdd?
And of course the connections to that drive: if not the drive itself, I suspect unreliable connection or power connection to that drive.
The SMART will show.
Edit:
I seeâŚ
How is it powered? Only from USB?
Do you plug it in USB3 or USB2?
Iâm pretty sure it is in a USB 2 port. Hard to see as the port is in the back of the desktop. I will see if I can check. Yes, all the ports in back of the PC are USB2.
The disk stand has itâs own power supply.
Linux âDisksâ does not show any smart data available.
Right. That´s the default setting.
The explanation from the 3rd source I mentioned above is:
You should explicitly enable these key combinations. Ubuntu ships with sysrq default setting 176 (128+32+16), which allows to run only SUB part of REISUB combination. You can change it to 1 (all commands enabled) or 244 which is potentially less harmful.
This procedure is always part of my to-do tasks after setting up a new system.
Good idea.
However, if it´s an external disk then âŚ
⌠External USB enclosures usually have a converter chip (e.g. a USB SATA bridge), which acts as a converter between the different interfaces/protocols. It may happen that an S.M.A.R.T. query is not implemented correctlyâŚ
⌠and thus hard disks in external enclosures appear as ânot supportedâ, although they are quite S.M.A.R.T. capable of next door. The use of a special option provides a remedy: