Why would Clonezilla suddenly go slow?

nevj · June 20, 2024, 12:56pm

I use clonezilla to backup my 3 disks (2 x hdd, 1 x ssd).
Normal time to do all the settings up till when it starts writing data is half an hour.
Tonight it took 3 hours, just to get to the point where it starts to backup the first partition… It spent ages detecting disks and partitions.
then
when it started writing data, it went ahead it the normal speed.
There were no error messages at any stage

I am trying to decide whether there is a failing disk or if my clonezilla usb drive is somehow corrupted?
Tomorrow when the backup is completed, I will check the disks with SMART tools,
and I may download a new version of clonezilla.

Before I ran the backup, the computer was operating normally.
I used clonezilla in failsafe mode, because of AMD card issues. I have done that before without issues.

Does anyone have suggestions?

Rosika · June 20, 2024, 1:23pm

Hi Neville,

I´m not quite sure if I understand you correctly.

Could you give an example of what you do as preaparatory work (which normally takes half an hour)?

Cheers from Rosika

kovacslt · June 20, 2024, 1:31pm

Exactly that:

How does this clonezilla start, is it separate boot?
While it runs do you have a chance to have a look around to see what happens, what is loaded, what is not?
I mean things like top and iostat -x -t 1 and such…

nevj · June 20, 2024, 1:37pm

I start clonezilla, from a usb drive, then
local backup
disk-image… ie savedisk
tell it which disk is for /home/partimage
tell it which disk to backup
there are few options like, fsck the disk, check the image, dont encrypt
then it says … lets do it (y/n)

That normally takes about half an hour ( I do have 3 disks with 26 partitions in total, so it has to detect all these) .
Tonight it took 3 hours. It spent forever detecting disks and partitions, and checking for raid disks ( I dont have any) .
then,
when it started writing data, it was normal speed.

Mystery. I used the same clonezilla usb drive I have been using for over a year.
Not a new version.

nevj · June 20, 2024, 1:40pm

It is a separate boot from a usb drive. I do it that way so all disks are unmounted.
So I cant interrupt it and look at what it is doing… it is a console terminal… no windows … dont think I could ssh into it… it is Linux.

Rosika · June 20, 2024, 1:49pm

Thanks for the info, Neville

O.K., I get it.
That´s exactly the same method I employ.

So I may tell you this procedure takes up roughly 10 minutes of my time.
But then my setup is somewhat different.

In my case clonezilla (booted from a USB stick) has to detect (apart from the saving HDD):

my PC´s internal HDD
the external HDD with Linux Lite on it (3 partitions)
2 data sticks

So detecting your 26 partitons taking half an hour surely makes sense.
Seems a good value in view of the setup, I guess.

Hmm, off the top of my head I don´t seem to come up with a good explanation.
Sorry, Neville.

… which seems to be a good sign after all.

I´ll try to look for some info.

Cheers from Rosika

Rosika · June 20, 2024, 2:08pm

@nevj :

just threw the question at ChatGPT.
Nothing altogether new I suppose, but here´s its output:

It’s interesting that the Clonezilla process suddenly took much longer than usual for your friend. Several factors could contribute to this unexpected delay. Here are some potential reasons and troubleshooting steps:

Potential Reasons

USB Drive Issues:

The USB drive might be experiencing read/write issues or slowdowns due to wear and tear.

Disk Health:

The source or target disks might be experiencing issues. Disk health can degrade over time, leading to slower read/write speeds.

File System Issues:

File system inconsistencies or errors might slow down the process. Even if the disks are healthy, file system errors can cause delays.

Hardware Changes:

Changes in the system hardware or connections (e.g., loose cables) could cause slower detection times.

Background Processes:

Other processes running in the background could be using disk I/O, leading to slower performance.

RAID Checks:

Although your friend doesn’t have RAID disks, Clonezilla’s RAID detection step might be taking longer due to misconfigurations or false positives.

Troubleshooting Steps

Check Disk Health:

Use tools like smartctl to check the SMART status of the disks.

sudo smartctl -a /dev/sdX

Replace /dev/sdX with the appropriate disk identifier.

Check File System:

Run file system checks on the partitions.

sudo fsck /dev/sdXn

Replace /dev/sdXn with the appropriate partition identifier.

Test the USB Drive:

Try using a different USB drive or re-create the Clonezilla bootable USB.

Monitor System Resources:

Use tools like top or htop to monitor system resource usage during the Clonezilla process.

Examine Logs:

Check Clonezilla logs for any warnings or errors that might indicate the cause of the delay.

Update Clonezilla:

Even though the version hasn’t changed, updating to the latest version can sometimes resolve issues.

Check for RAID Configuration:

Ensure that there are no leftover RAID configurations that Clonezilla might be detecting.

Next Steps

Have your friend follow these steps to diagnose and possibly resolve the issue. If the problem persists, gathering more specific information about the delays (e.g., exact time spent on each step) might help pinpoint the cause.

Cheers from Rosika

P.S.

ChatGPT sends you its greetings.

nevj · June 20, 2024, 2:15pm

Hi Rosika,
That is comprehensive , thank you
Lots to look at … the backup is still running., but no errors.
I am betting on a clonezilla version issue.
Regards
Neville

Rosika · June 20, 2024, 2:22pm

Hi Neville,

you´re welcome.

That might be it.

I myself invariably use clonezilla version ‘20160210-wily-amd64’.
Quite old, as you can see. But it never gave me any trouble. So I never replaced it by something newer.

I think it can still be downloaded here:
Download clonezilla-live-20160210-wily-amd64.iso (Clonezilla) .

Good luck with your backup.

Many greetings from Rosika

pdecker · June 20, 2024, 4:04pm

Clonezilla didn’t change but suddenly has an issue?

Are you thinking something else was updated and caused Clonezilla to have an issue?

Certainly not the filesystem (ext4, xfs, etc) or the partition types. Right?

callpaul.eu · June 20, 2024, 4:49pm

Why boot to a usb version ?

nevj · June 20, 2024, 5:23pm

I want all the disks unmounted.
I know you can run clonezilla from your own distro, but then it cant backup the root
filesystem, because you cant unmount /
The traditional way to do a backup is to take the machine down and boot from some external medium.

nevj · June 20, 2024, 5:36pm

Right. Last change there was more than 6 months ago when I added an SSD and shifted my main 2 Linuxes onto it.

Yes, but I cant guess what changed. Not my software, it is not running as clonezilla boots from its usb drive . Has to be filesystem content or grub or some glich when I booted clonezilla. My first thought was a corrupted partition table… but I think it keeps two copies of that on each disk?
Clonezilla normally reports any disk errors and I always tell it to fsck both the source and destination disks.

callpaul.eu · June 20, 2024, 6:32pm

Thanks, never used clonzilla did not think about it not copying some drives

pdecker · June 20, 2024, 6:54pm

My first thought was the fsck took longer. Maybe that’s it? Is the fsck output visible?

nevj · June 21, 2024, 2:31am

@pdecker , @kovacslt , @callpaul.eu , @Rosika
There seeems to be an issue with disk sdb. That is NOT the disk I was backing up ( that was sda and it passes te smartctl test)

root@trinity:/home/nevj# smartctl -a /dev/sdb
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.5.0-1mx-ahs-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST2000DM001-9YN164
Serial Number:    S2408NP5
LU WWN Device Id: 5 000c50 04aacb35c
Firmware Version: CC4C
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database 7.3/5319
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jun 21 12:20:13 2024 AEST

==> WARNING: A firmware update for this drive is available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  600) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 263) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x3085)	SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   114   099   006    Pre-fail  Always       -       79437888
  3 Spin_Up_Time            0x0003   093   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   096   096   020    Old_age   Always       -       4747
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   064   060   030    Pre-fail  Always       -       120354969335
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       5472
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   096   096   020    Old_age   Always       -       4707
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   082   082   000    Old_age   Always       -       18
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       17 17 17
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   076   057   045    Old_age   Always       -       24 (Min/Max 20/24)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       155
193 Load_Cycle_Count        0x0032   076   076   000    Old_age   Always       -       49596
194 Temperature_Celsius     0x0022   024   043   000    Old_age   Always       -       24 (0 15 0 0 0)
197 Current_Pending_Sector  0x0012   099   099   000    Old_age   Always       -       208
198 Offline_Uncorrectable   0x0010   099   099   000    Old_age   Offline      -       208
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       4690h+50m+03.638s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       89941329049766
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       218632277254403

SMART Error Log Version: 1
ATA Error Count: 18 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 18 occurred at disk power-on lifetime: 4616 hours (192 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 01 ff ff ff 4f 00      01:25:33.275  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      01:25:33.275  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      01:25:33.274  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      01:25:33.274  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      01:25:33.274  SET FEATURES [Set transfer mode]

Error 17 occurred at disk power-on lifetime: 4616 hours (192 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 01 ff ff ff 4f 00      01:25:30.391  READ FPDMA QUEUED
  60 00 01 ff ff ff 4f 00      01:25:30.391  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      01:25:30.391  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      01:25:30.391  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00      01:25:30.390  IDENTIFY DEVICE

Error 16 occurred at disk power-on lifetime: 4616 hours (192 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 01 ff ff ff 4f 00      01:25:27.524  READ FPDMA QUEUED
  60 00 01 ff ff ff 4f 00      01:25:27.524  READ FPDMA QUEUED
  60 00 01 ff ff ff 4f 00      01:25:27.524  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      01:25:27.523  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00      01:25:27.523  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 15 occurred at disk power-on lifetime: 4616 hours (192 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 01 ff ff ff 4f 00      01:25:24.649  READ FPDMA QUEUED
  60 00 01 ff ff ff 4f 00      01:25:24.649  READ FPDMA QUEUED
  60 00 01 ff ff ff 4f 00      01:25:24.649  READ FPDMA QUEUED
  60 00 01 ff ff ff 4f 00      01:25:24.648  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00      01:25:24.648  SET FEATURES [Enable SATA feature]

Error 14 occurred at disk power-on lifetime: 4616 hours (192 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 01 ff ff ff 4f 00      01:25:21.757  READ FPDMA QUEUED
  60 00 01 ff ff ff 4f 00      01:25:21.757  READ FPDMA QUEUED
  60 00 01 ff ff ff 4f 00      01:25:21.757  READ FPDMA QUEUED
  60 00 01 ff ff ff 4f 00      01:25:21.756  READ FPDMA QUEUED
  60 00 01 ff ff ff 4f 00      01:25:21.756  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

That is at least a possible cause of the problem

I tested all disks ( sda,sdb,sdc) and only sdb returned errors.
Also tested the backup USB drive, it returned no errors.

What do you think? Has that triggered the clonezilla issue?

pdecker · June 21, 2024, 2:42am

Seems like sdb to me. Even if that isn’t the cause of the Clonezilla slowness the SMART status seems to show problems. Is the drive that old? SMART seems to think so.

nevj · June 21, 2024, 3:34am

Yes, both sda and sdb are 2Tb hdds , that have been in the machine since it was built in about 2011.
That is why I put in a new 4Tb ssd recently
It is strange… it says passed the test, then reports those errors… I think the platter might be OK, but there may be a problem in the controller circuitry, so it takes forever to respond to probes.

I have a new spare sitting in the drawer. I will copy what is on it to the ssd , then replace it. Need to be careful, it has a swap partition, removing that may stop things booting.

kovacslt · June 21, 2024, 4:28am

That disk has pending sectors, those are about to be moved to the spare area
While this is officially allowed, the disk passes the test, the phenomenon indicates a problem on the surface.
You could do a badblock -swv against the disk and see if the weak sectors get moved to spare (attribute reallocated sectrors tells that), or the pending increases.
I just replace such a disk ASAP, no matter it officially passes, lost my trust, so it must go…

kovacslt · June 21, 2024, 4:31am

A brand new drive can fail too
At any time…