Hard drive failure?
- Cressy Snr
- Amstrad Tower of Power
- Posts: 10582
- Joined: Wed May 30, 2007 12:25 am
- Location: South Yorks.
#1 Hard drive failure?
Hi Guys
I know a bit about computers but not to the depth of going into file system structures.
I have two bootable Lacie 500GB firewire hard drives, one holds all my music. The SqueezeBox server reads from this. The drive backs up every two days to the other one.
I started getting music scan failures on my Squeezebox Classic today
with the "scan terminated unexpectedly" error message.
Looking at the server logs there were lots of "error reading header" messages. These started the alarm bells ringing.
Setting the OSX disk utility to repair disk, sure enough during the scan prior to repair "incorrect number of file hard links" appeared.
To cut a long story short, after the utility finished its scan and started the repair, thousands of "orphaned file inode (id =*******)" messages have started to scroll by in the disk repair window.
It's still scrolling by as I write this.
So whilst the first drive is being (hopefully) repaired I changed over to the backup and started the music scan again. This has also terminated unexpectedly with the log showing "error reading headers" messages again.
So it looks like both my external drives are exhibiting problems.
Is this a sign of impending doom or am I being paranoid?
Steve
I know a bit about computers but not to the depth of going into file system structures.
I have two bootable Lacie 500GB firewire hard drives, one holds all my music. The SqueezeBox server reads from this. The drive backs up every two days to the other one.
I started getting music scan failures on my Squeezebox Classic today
with the "scan terminated unexpectedly" error message.
Looking at the server logs there were lots of "error reading header" messages. These started the alarm bells ringing.
Setting the OSX disk utility to repair disk, sure enough during the scan prior to repair "incorrect number of file hard links" appeared.
To cut a long story short, after the utility finished its scan and started the repair, thousands of "orphaned file inode (id =*******)" messages have started to scroll by in the disk repair window.
It's still scrolling by as I write this.
So whilst the first drive is being (hopefully) repaired I changed over to the backup and started the music scan again. This has also terminated unexpectedly with the log showing "error reading headers" messages again.
So it looks like both my external drives are exhibiting problems.
Is this a sign of impending doom or am I being paranoid?
Steve
- pre65
- Amstrad Tower of Power
- Posts: 21400
- Joined: Wed Aug 22, 2007 11:13 pm
- Location: North Essex/Suffolk border.
#2
Sounds a bit terminal !
Do solid state hard drives have these issues ?
Do solid state hard drives have these issues ?
The only thing necessary for the triumph of evil is for good men to do nothing.
Edmund Burke
G-Popz THE easy listening connoisseur. (Philip)
Edmund Burke
G-Popz THE easy listening connoisseur. (Philip)
#3
I'd be a bit suspicious of them both going down simultaneously - modern hard drives have become pretty good. I have had problems with external disks and have usually tracked it down to the blummin' caddies.
When you do a backup are you copying the data between them, file by file, which is most likely, or are you cloning the disks, copying the whole drive sector by sector, instead of file by file.
If you are doing sector by sector you might have been copying corruptions across.
Perhaps your firewire controller is going south?
I'd be tempted not to touch the second one, just in case its not the drive and its the controller, can you connect via USB?
Andrew
When you do a backup are you copying the data between them, file by file, which is most likely, or are you cloning the disks, copying the whole drive sector by sector, instead of file by file.
If you are doing sector by sector you might have been copying corruptions across.
Perhaps your firewire controller is going south?
I'd be tempted not to touch the second one, just in case its not the drive and its the controller, can you connect via USB?
Andrew
#4
It sounds like the filesystem has been corrupted. The big question is why. If it was a creeping hard driver failure on the first driver, its possible the corruption was copied over to the second drive, but it seems odd. Is the copy done by reading the filesystem using something like rsync, or is it below the filesystem by some kernel/filesystem level process. If it was rsync, I dont see how it could spread the corruption. Anything in the console log (I dont know of OSX has dmesg). It should show any I/O errors.
Whenever an honest man discovers that he's mistaken, he will either cease to be mistaken or he will cease to be honest.
- Cressy Snr
- Amstrad Tower of Power
- Posts: 10582
- Joined: Wed May 30, 2007 12:25 am
- Location: South Yorks.
#5
Hi Nick/Andrew
I use a backup utility called SuperDuper, which does file by file backups, only copying changes over after the initial cloning operation, which was done about a year ago.
The repair was successful.
After listing around 4000 orphaned files
the disk utility produced this:
2010-08-23 16:40:51 +0100: Look for missing items in lost+found directory.
2010-08-23 16:41:12 +0100: Rechecking volume.
2010-08-23 16:41:12 +0100: Checking Journaled HFS Plus volume.
2010-08-23 16:41:12 +0100: Checking Extents Overflow file.
2010-08-23 16:41:12 +0100: Checking Catalog file.
2010-08-23 16:41:50 +0100: Checking multi-linked files.
2010-08-23 16:42:08 +0100: Checking Catalog hierarchy.
2010-08-23 16:43:25 +0100: Checking Extended Attributes file.
2010-08-23 16:43:25 +0100: Checking volume bitmap.
2010-08-23 16:43:26 +0100: Checking volume information.
2010-08-23 16:43:26 +0100: 2010-08-23 16:43:26 +0100: The volume Backup_2 was repaired successfully.
Repair tool completed: 2010-08-23 16:43:27 +0100
2010-08-23 16:43:28 +0100:
2010-08-23 16:43:28 +0100:
Looks like it grubbed around in a lost + found directory and found the missing files.
The other backup, that it backs up to checked out OK using the disk utility so it looks like no corruption has been copied across.
Question is though, why would the system throw a load of files into a lost/found directory. I can see this as being very useful, vital even but why did it lose them in the first place.?
OSX does have a console application which displays all manner of console logs, but I wouldn't know where to start looking for I/O error information
Though the repair worked, I'm still a bit dubious about committing 8000 plus FLAC files to this drive, so I'm hanging fire at the moment and have turned off backups until I'm sure there isn't some fault in the hardware.
Steve
I use a backup utility called SuperDuper, which does file by file backups, only copying changes over after the initial cloning operation, which was done about a year ago.
The repair was successful.
After listing around 4000 orphaned files
the disk utility produced this:
2010-08-23 16:40:51 +0100: Look for missing items in lost+found directory.
2010-08-23 16:41:12 +0100: Rechecking volume.
2010-08-23 16:41:12 +0100: Checking Journaled HFS Plus volume.
2010-08-23 16:41:12 +0100: Checking Extents Overflow file.
2010-08-23 16:41:12 +0100: Checking Catalog file.
2010-08-23 16:41:50 +0100: Checking multi-linked files.
2010-08-23 16:42:08 +0100: Checking Catalog hierarchy.
2010-08-23 16:43:25 +0100: Checking Extended Attributes file.
2010-08-23 16:43:25 +0100: Checking volume bitmap.
2010-08-23 16:43:26 +0100: Checking volume information.
2010-08-23 16:43:26 +0100: 2010-08-23 16:43:26 +0100: The volume Backup_2 was repaired successfully.
Repair tool completed: 2010-08-23 16:43:27 +0100
2010-08-23 16:43:28 +0100:
2010-08-23 16:43:28 +0100:
Looks like it grubbed around in a lost + found directory and found the missing files.
The other backup, that it backs up to checked out OK using the disk utility so it looks like no corruption has been copied across.
Question is though, why would the system throw a load of files into a lost/found directory. I can see this as being very useful, vital even but why did it lose them in the first place.?
OSX does have a console application which displays all manner of console logs, but I wouldn't know where to start looking for I/O error information
Though the repair worked, I'm still a bit dubious about committing 8000 plus FLAC files to this drive, so I'm hanging fire at the moment and have turned off backups until I'm sure there isn't some fault in the hardware.
Steve
#6
Normally (at least on *nix) lost+found is where fsck (file system check, Unix folk don't like typing more than is needed) will put file fragments that it finds but can't find a directory entry that they belong to. This sort of file-system damage is normally the result of the system being powered off without being properly shut-down. But I can't say to much about OSX as I don't know that much about what they have messed with.
It may be a sign of a ailing disk, but I am surprised that a file by file copy program can duplicate the corruption, it should be higher up the food chain.
It may be a sign of a ailing disk, but I am surprised that a file by file copy program can duplicate the corruption, it should be higher up the food chain.
Whenever an honest man discovers that he's mistaken, he will either cease to be mistaken or he will cease to be honest.
- Cressy Snr
- Amstrad Tower of Power
- Posts: 10582
- Joined: Wed May 30, 2007 12:25 am
- Location: South Yorks.
#8
Hi NickNick wrote:N
It may be a sign of a ailing disk, but I am surprised that a file by file copy program can duplicate the corruption, it should be higher up the food chain.
The file by file copy didn't duplicate any of the corruption thankfully.
The "bad" drive backs up every couple of days to the other firewire drive, but that one checks clean; no orphaned files, no issues whatever, so that backup is good. I've managed to get the Squeezebox server to scan the duplicate music on that drive without any bother. The SB is up and running again.
What I would say however, is that anyone running a Squeezebox or other digital streaming technology ought to have a backup of a backup like I have and no doubt yourself, ed Chris(stratmangler) Andrew do too.
In fact my iTunes (horrors) downloads are backed up in three places.
Losing all ones music to a bad HD is a good way to spoil your day.
Andrew,
The dirty drive checks out OK now
The other drive is also fine.
Come to think of it, we did have a couple of power cuts; three in the same night around six weeks ago. The main Mac is set to restart automatically after a power failure, so that was three hard shutdowns in about six hours
Could that have buggered up the data on the "bad" drive?
Maybe I'd better boot up from one of the external drives and run a repair on the main Mac.
Steve
#9
The power cut could have done it, but the drive should have been checked when the machine started back up; the shutdown at power loss would not have been clean.
I would keep a close eye on drive #1, like Nick says, it might be suspect.
I use two external drives, one has the music, the other has the backup of the music.
Andrew
I would keep a close eye on drive #1, like Nick says, it might be suspect.
I use two external drives, one has the music, the other has the backup of the music.
Andrew
#10
It does depend how the OS is set, using ext3 (for example) as long as the driver has not reached its time for a new check ubuntu (for example) doesn't run fsck at startup, it assumes the filesystem journaling will keep track of it all.Andrew wrote:The power cut could have done it, but the drive should have been checked when the machine started back up; the shutdown at power loss would not have been clean.
I would keep a close eye on drive #1, like Nick says, it might be suspect.
I use two external drives, one has the music, the other has the backup of the music.
Andrew
Running fsck on startup has become a costly operation in the world of 500Gb disks.
Having the server set to restart on its own can be a killer during a power cut, its common for the power to come back for a couple of minutes then fail again, thats enough to boot and then be pulled down again, adding to the file system woes every time.
Whenever an honest man discovers that he's mistaken, he will either cease to be mistaken or he will cease to be honest.
- Cressy Snr
- Amstrad Tower of Power
- Posts: 10582
- Joined: Wed May 30, 2007 12:25 am
- Location: South Yorks.
#11
I'll disable that option NickNick wrote: Having the server set to restart on its own can be a killer during a power cut, its common for the power to come back for a couple of minutes then fail again, thats enough to boot and then be pulled down again, adding to the file system woes every time.
If it goes down in a power failure in future, it can stay down.
Cheers chaps
#12
If its not vital it comes back its best that way. The other option is a UPS to bring it down gracefully.SteveTheShadow wrote:I'll disable that option NickNick wrote: Having the server set to restart on its own can be a killer during a power cut, its common for the power to come back for a couple of minutes then fail again, thats enough to boot and then be pulled down again, adding to the file system woes every time.
If it goes down in a power failure in future, it can stay down.
Cheers chaps
Whenever an honest man discovers that he's mistaken, he will either cease to be mistaken or he will cease to be honest.
#13
Just returned from holiday so coming at this late; I don't think I can add to what has already been said by Nick and Andrew. I would keep an eye on disk #1 though - just in case.
It might be worth re-emphasising for others building music libraries using computer technology (or for storing personal documents and the like for that matter) that having disk mirroring or RAID doesn't, at least in my book, constitute a backup. A backup should be a seperate copy, ideally kept somewhere away from the computer you've backed up (a backup isn't much use if the burglar takes it with the computer or its in the same house that burns down). Often neglected is the need to check your backups occasionally to ensure you can actually read them to be able to restore if it proves necessary.
Ray
It might be worth re-emphasising for others building music libraries using computer technology (or for storing personal documents and the like for that matter) that having disk mirroring or RAID doesn't, at least in my book, constitute a backup. A backup should be a seperate copy, ideally kept somewhere away from the computer you've backed up (a backup isn't much use if the burglar takes it with the computer or its in the same house that burns down). Often neglected is the need to check your backups occasionally to ensure you can actually read them to be able to restore if it proves necessary.
Ray
#14
A UPS in itself won't shut it down gracefully; you need to detect that the the UPS has taken over and use that to trigger a clean shutdown.Nick wrote:If its not vital it comes back its best that way. The other option is a UPS to bring it down gracefully.
Ray
#15
Having no moving parts solid state 'disks' are more reliable, quicker and quieter (silent), however, they are not immune to failure. During my IT career I've encountered several computer issues that have been odd and ultimately came down to issues with memory or mother board chips.pre65 wrote:Do solid state hard drives have these issues ?
Ray