I use clonezilla to backup my 3 disks (2 x hdd, 1 x ssd).
Normal time to do all the settings up till when it starts writing data is half an hour.
Tonight it took 3 hours, just to get to the point where it starts to backup the first partition… It spent ages detecting disks and partitions.
then
when it started writing data, it went ahead it the normal speed.
There were no error messages at any stage
I am trying to decide whether there is a failing disk or if my clonezilla usb drive is somehow corrupted?
Tomorrow when the backup is completed, I will check the disks with SMART tools,
and I may download a new version of clonezilla.
Before I ran the backup, the computer was operating normally.
I used clonezilla in failsafe mode, because of AMD card issues. I have done that before without issues.
How does this clonezilla start, is it separate boot?
While it runs do you have a chance to have a look around to see what happens, what is loaded, what is not?
I mean things like top and iostat -x -t 1 and such…
I start clonezilla, from a usb drive, then
local backup
disk-image… ie savedisk
tell it which disk is for /home/partimage
tell it which disk to backup
there are few options like, fsck the disk, check the image, dont encrypt
then it says … lets do it (y/n)
That normally takes about half an hour ( I do have 3 disks with 26 partitions in total, so it has to detect all these) .
Tonight it took 3 hours. It spent forever detecting disks and partitions, and checking for raid disks ( I dont have any) .
then,
when it started writing data, it was normal speed.
Mystery. I used the same clonezilla usb drive I have been using for over a year.
Not a new version.
It is a separate boot from a usb drive. I do it that way so all disks are unmounted.
So I cant interrupt it and look at what it is doing… it is a console terminal… no windows … dont think I could ssh into it… it is Linux.
just threw the question at ChatGPT.
Nothing altogether new I suppose, but here´s its output:
It’s interesting that the Clonezilla process suddenly took much longer than usual for your friend. Several factors could contribute to this unexpected delay. Here are some potential reasons and troubleshooting steps:
Potential Reasons
USB Drive Issues:
The USB drive might be experiencing read/write issues or slowdowns due to wear and tear.
Disk Health:
The source or target disks might be experiencing issues. Disk health can degrade over time, leading to slower read/write speeds.
File System Issues:
File system inconsistencies or errors might slow down the process. Even if the disks are healthy, file system errors can cause delays.
Hardware Changes:
Changes in the system hardware or connections (e.g., loose cables) could cause slower detection times.
Background Processes:
Other processes running in the background could be using disk I/O, leading to slower performance.
RAID Checks:
Although your friend doesn’t have RAID disks, Clonezilla’s RAID detection step might be taking longer due to misconfigurations or false positives.
Troubleshooting Steps
Check Disk Health:
Use tools like smartctl to check the SMART status of the disks.
sudo smartctl -a /dev/sdX
Replace /dev/sdX with the appropriate disk identifier.
Check File System:
Run file system checks on the partitions.
sudo fsck /dev/sdXn
Replace /dev/sdXn with the appropriate partition identifier.
Test the USB Drive:
Try using a different USB drive or re-create the Clonezilla bootable USB.
Monitor System Resources:
Use tools like top or htop to monitor system resource usage during the Clonezilla process.
Examine Logs:
Check Clonezilla logs for any warnings or errors that might indicate the cause of the delay.
Update Clonezilla:
Even though the version hasn’t changed, updating to the latest version can sometimes resolve issues.
Check for RAID Configuration:
Ensure that there are no leftover RAID configurations that Clonezilla might be detecting.
Next Steps
Have your friend follow these steps to diagnose and possibly resolve the issue. If the problem persists, gathering more specific information about the delays (e.g., exact time spent on each step) might help pinpoint the cause.
Hi Rosika,
That is comprehensive , thank you
Lots to look at … the backup is still running., but no errors.
I am betting on a clonezilla version issue.
Regards
Neville
I myself invariably use clonezilla version ‘20160210-wily-amd64’.
Quite old, as you can see. But it never gave me any trouble. So I never replaced it by something newer.
I want all the disks unmounted.
I know you can run clonezilla from your own distro, but then it cant backup the root
filesystem, because you cant unmount /
The traditional way to do a backup is to take the machine down and boot from some external medium.
Right. Last change there was more than 6 months ago when I added an SSD and shifted my main 2 Linuxes onto it.
Yes, but I cant guess what changed. Not my software, it is not running as clonezilla boots from its usb drive . Has to be filesystem content or grub or some glich when I booted clonezilla. My first thought was a corrupted partition table… but I think it keeps two copies of that on each disk?
Clonezilla normally reports any disk errors and I always tell it to fsck both the source and destination disks.
@pdecker , @kovacslt , @callpaul.eu , @Rosika
There seeems to be an issue with disk sdb. That is NOT the disk I was backing up ( that was sda and it passes te smartctl test)
root@trinity:/home/nevj# smartctl -a /dev/sdb
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.5.0-1mx-ahs-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.14 (AF)
Device Model: ST2000DM001-9YN164
Serial Number: S2408NP5
LU WWN Device Id: 5 000c50 04aacb35c
Firmware Version: CC4C
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: In smartctl database 7.3/5319
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jun 21 12:20:13 2024 AEST
==> WARNING: A firmware update for this drive is available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 600) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 263) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x3085) SCT Status supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 114 099 006 Pre-fail Always - 79437888
3 Spin_Up_Time 0x0003 093 092 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 096 096 020 Old_age Always - 4747
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 064 060 030 Pre-fail Always - 120354969335
9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 5472
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 096 096 020 Old_age Always - 4707
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 082 082 000 Old_age Always - 18
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 17 17 17
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 076 057 045 Old_age Always - 24 (Min/Max 20/24)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 155
193 Load_Cycle_Count 0x0032 076 076 000 Old_age Always - 49596
194 Temperature_Celsius 0x0022 024 043 000 Old_age Always - 24 (0 15 0 0 0)
197 Current_Pending_Sector 0x0012 099 099 000 Old_age Always - 208
198 Offline_Uncorrectable 0x0010 099 099 000 Old_age Offline - 208
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 4690h+50m+03.638s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 89941329049766
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 218632277254403
SMART Error Log Version: 1
ATA Error Count: 18 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 18 occurred at disk power-on lifetime: 4616 hours (192 days + 8 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 01 ff ff ff 4f 00 01:25:33.275 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 01:25:33.275 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 01:25:33.274 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
ec 00 00 00 00 00 a0 00 01:25:33.274 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00 01:25:33.274 SET FEATURES [Set transfer mode]
Error 17 occurred at disk power-on lifetime: 4616 hours (192 days + 8 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 01 ff ff ff 4f 00 01:25:30.391 READ FPDMA QUEUED
60 00 01 ff ff ff 4f 00 01:25:30.391 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 01:25:30.391 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 01:25:30.391 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
ec 00 00 00 00 00 a0 00 01:25:30.390 IDENTIFY DEVICE
Error 16 occurred at disk power-on lifetime: 4616 hours (192 days + 8 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 01 ff ff ff 4f 00 01:25:27.524 READ FPDMA QUEUED
60 00 01 ff ff ff 4f 00 01:25:27.524 READ FPDMA QUEUED
60 00 01 ff ff ff 4f 00 01:25:27.524 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 01:25:27.523 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 00 01:25:27.523 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
Error 15 occurred at disk power-on lifetime: 4616 hours (192 days + 8 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 01 ff ff ff 4f 00 01:25:24.649 READ FPDMA QUEUED
60 00 01 ff ff ff 4f 00 01:25:24.649 READ FPDMA QUEUED
60 00 01 ff ff ff 4f 00 01:25:24.649 READ FPDMA QUEUED
60 00 01 ff ff ff 4f 00 01:25:24.648 READ FPDMA QUEUED
ef 10 02 00 00 00 a0 00 01:25:24.648 SET FEATURES [Enable SATA feature]
Error 14 occurred at disk power-on lifetime: 4616 hours (192 days + 8 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 01 ff ff ff 4f 00 01:25:21.757 READ FPDMA QUEUED
60 00 01 ff ff ff 4f 00 01:25:21.757 READ FPDMA QUEUED
60 00 01 ff ff ff 4f 00 01:25:21.757 READ FPDMA QUEUED
60 00 01 ff ff ff 4f 00 01:25:21.756 READ FPDMA QUEUED
60 00 01 ff ff ff 4f 00 01:25:21.756 READ FPDMA QUEUED
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
That is at least a possible cause of the problem
I tested all disks ( sda,sdb,sdc) and only sdb returned errors.
Also tested the backup USB drive, it returned no errors.
What do you think? Has that triggered the clonezilla issue?
Seems like sdb to me. Even if that isn’t the cause of the Clonezilla slowness the SMART status seems to show problems. Is the drive that old? SMART seems to think so.
Yes, both sda and sdb are 2Tb hdds , that have been in the machine since it was built in about 2011.
That is why I put in a new 4Tb ssd recently
It is strange… it says passed the test, then reports those errors… I think the platter might be OK, but there may be a problem in the controller circuitry, so it takes forever to respond to probes.
I have a new spare sitting in the drawer. I will copy what is on it to the ssd , then replace it. Need to be careful, it has a swap partition, removing that may stop things booting.
That disk has pending sectors, those are about to be moved to the spare area
While this is officially allowed, the disk passes the test, the phenomenon indicates a problem on the surface.
You could do a badblock -swv against the disk and see if the weak sectors get moved to spare (attribute reallocated sectrors tells that), or the pending increases.
I just replace such a disk ASAP, no matter it officially passes, lost my trust, so it must go…