Hard drive failure?

I think we all know by now what this section is for.
User avatar
Cressy Snr
Amstrad Tower of Power
Posts: 10582
Joined: Wed May 30, 2007 12:25 am
Location: South Yorks.

#1 Hard drive failure?

Post by Cressy Snr »

Hi Guys

I know a bit about computers but not to the depth of going into file system structures.

I have two bootable Lacie 500GB firewire hard drives, one holds all my music. The SqueezeBox server reads from this. The drive backs up every two days to the other one.

I started getting music scan failures on my Squeezebox Classic today
with the "scan terminated unexpectedly" error message.
Looking at the server logs there were lots of "error reading header" messages. These started the alarm bells ringing.

Setting the OSX disk utility to repair disk, sure enough during the scan prior to repair "incorrect number of file hard links" appeared.

To cut a long story short, after the utility finished its scan and started the repair, thousands of "orphaned file inode (id =*******)" messages have started to scroll by in the disk repair window.

It's still scrolling by as I write this.

So whilst the first drive is being (hopefully) repaired I changed over to the backup and started the music scan again. This has also terminated unexpectedly with the log showing "error reading headers" messages again.

So it looks like both my external drives are exhibiting problems.

Is this a sign of impending doom or am I being paranoid?

Steve
User avatar
pre65
Amstrad Tower of Power
Posts: 21400
Joined: Wed Aug 22, 2007 11:13 pm
Location: North Essex/Suffolk border.

#2

Post by pre65 »

Sounds a bit terminal ! :shock:

Do solid state hard drives have these issues ?
The only thing necessary for the triumph of evil is for good men to do nothing.

Edmund Burke

G-Popz THE easy listening connoisseur. (Philip)
Andrew
Eternally single
Posts: 4206
Joined: Thu May 24, 2007 2:18 pm

#3

Post by Andrew »

I'd be a bit suspicious of them both going down simultaneously - modern hard drives have become pretty good. I have had problems with external disks and have usually tracked it down to the blummin' caddies.

When you do a backup are you copying the data between them, file by file, which is most likely, or are you cloning the disks, copying the whole drive sector by sector, instead of file by file.

If you are doing sector by sector you might have been copying corruptions across.

Perhaps your firewire controller is going south?

I'd be tempted not to touch the second one, just in case its not the drive and its the controller, can you connect via USB?

Andrew
User avatar
Nick
Site Admin
Posts: 15759
Joined: Sun May 06, 2007 10:20 am
Location: West Yorkshire

#4

Post by Nick »

It sounds like the filesystem has been corrupted. The big question is why. If it was a creeping hard driver failure on the first driver, its possible the corruption was copied over to the second drive, but it seems odd. Is the copy done by reading the filesystem using something like rsync, or is it below the filesystem by some kernel/filesystem level process. If it was rsync, I dont see how it could spread the corruption. Anything in the console log (I dont know of OSX has dmesg). It should show any I/O errors.
Whenever an honest man discovers that he's mistaken, he will either cease to be mistaken or he will cease to be honest.
User avatar
Cressy Snr
Amstrad Tower of Power
Posts: 10582
Joined: Wed May 30, 2007 12:25 am
Location: South Yorks.

#5

Post by Cressy Snr »

Hi Nick/Andrew

I use a backup utility called SuperDuper, which does file by file backups, only copying changes over after the initial cloning operation, which was done about a year ago.


The repair was successful.

After listing around 4000 orphaned files

the disk utility produced this:


2010-08-23 16:40:51 +0100: Look for missing items in lost+found directory.
2010-08-23 16:41:12 +0100: Rechecking volume.
2010-08-23 16:41:12 +0100: Checking Journaled HFS Plus volume.
2010-08-23 16:41:12 +0100: Checking Extents Overflow file.
2010-08-23 16:41:12 +0100: Checking Catalog file.
2010-08-23 16:41:50 +0100: Checking multi-linked files.
2010-08-23 16:42:08 +0100: Checking Catalog hierarchy.
2010-08-23 16:43:25 +0100: Checking Extended Attributes file.
2010-08-23 16:43:25 +0100: Checking volume bitmap.
2010-08-23 16:43:26 +0100: Checking volume information.

2010-08-23 16:43:26 +0100: 2010-08-23 16:43:26 +0100: The volume Backup_2 was repaired successfully.
Repair tool completed: 2010-08-23 16:43:27 +0100
2010-08-23 16:43:28 +0100:
2010-08-23 16:43:28 +0100:

Looks like it grubbed around in a lost + found directory and found the missing files.

The other backup, that it backs up to checked out OK using the disk utility so it looks like no corruption has been copied across.

Question is though, why would the system throw a load of files into a lost/found directory. I can see this as being very useful, vital even but why did it lose them in the first place.?

OSX does have a console application which displays all manner of console logs, but I wouldn't know where to start looking for I/O error information

Though the repair worked, I'm still a bit dubious about committing 8000 plus FLAC files to this drive, so I'm hanging fire at the moment and have turned off backups until I'm sure there isn't some fault in the hardware.
Steve
User avatar
Nick
Site Admin
Posts: 15759
Joined: Sun May 06, 2007 10:20 am
Location: West Yorkshire

#6

Post by Nick »

Normally (at least on *nix) lost+found is where fsck (file system check, Unix folk don't like typing more than is needed) will put file fragments that it finds but can't find a directory entry that they belong to. This sort of file-system damage is normally the result of the system being powered off without being properly shut-down. But I can't say to much about OSX as I don't know that much about what they have messed with.

It may be a sign of a ailing disk, but I am surprised that a file by file copy program can duplicate the corruption, it should be higher up the food chain.
Whenever an honest man discovers that he's mistaken, he will either cease to be mistaken or he will cease to be honest.
Andrew
Eternally single
Posts: 4206
Joined: Thu May 24, 2007 2:18 pm

#7

Post by Andrew »

OK so let me check the second drive checked clean?

The first drive was dirty (errors) but after fsck it reported clean?

Andrew
User avatar
Cressy Snr
Amstrad Tower of Power
Posts: 10582
Joined: Wed May 30, 2007 12:25 am
Location: South Yorks.

#8

Post by Cressy Snr »

Nick wrote:N
It may be a sign of a ailing disk, but I am surprised that a file by file copy program can duplicate the corruption, it should be higher up the food chain.
Hi Nick

The file by file copy didn't duplicate any of the corruption thankfully.

The "bad" drive backs up every couple of days to the other firewire drive, but that one checks clean; no orphaned files, no issues whatever, so that backup is good. I've managed to get the Squeezebox server to scan the duplicate music on that drive without any bother. The SB is up and running again.

What I would say however, is that anyone running a Squeezebox or other digital streaming technology ought to have a backup of a backup like I have and no doubt yourself, ed Chris(stratmangler) Andrew do too.

In fact my iTunes (horrors) downloads are backed up in three places.

Losing all ones music to a bad HD is a good way to spoil your day.

Andrew,

The dirty drive checks out OK now
The other drive is also fine.

Come to think of it, we did have a couple of power cuts; three in the same night around six weeks ago. The main Mac is set to restart automatically after a power failure, so that was three hard shutdowns in about six hours

Could that have buggered up the data on the "bad" drive?

Maybe I'd better boot up from one of the external drives and run a repair on the main Mac.

Steve
Andrew
Eternally single
Posts: 4206
Joined: Thu May 24, 2007 2:18 pm

#9

Post by Andrew »

The power cut could have done it, but the drive should have been checked when the machine started back up; the shutdown at power loss would not have been clean.

I would keep a close eye on drive #1, like Nick says, it might be suspect.

I use two external drives, one has the music, the other has the backup of the music.

Andrew
User avatar
Nick
Site Admin
Posts: 15759
Joined: Sun May 06, 2007 10:20 am
Location: West Yorkshire

#10

Post by Nick »

Andrew wrote:The power cut could have done it, but the drive should have been checked when the machine started back up; the shutdown at power loss would not have been clean.

I would keep a close eye on drive #1, like Nick says, it might be suspect.

I use two external drives, one has the music, the other has the backup of the music.

Andrew
It does depend how the OS is set, using ext3 (for example) as long as the driver has not reached its time for a new check ubuntu (for example) doesn't run fsck at startup, it assumes the filesystem journaling will keep track of it all.

Running fsck on startup has become a costly operation in the world of 500Gb disks.

Having the server set to restart on its own can be a killer during a power cut, its common for the power to come back for a couple of minutes then fail again, thats enough to boot and then be pulled down again, adding to the file system woes every time.
Whenever an honest man discovers that he's mistaken, he will either cease to be mistaken or he will cease to be honest.
User avatar
Cressy Snr
Amstrad Tower of Power
Posts: 10582
Joined: Wed May 30, 2007 12:25 am
Location: South Yorks.

#11

Post by Cressy Snr »

Nick wrote: Having the server set to restart on its own can be a killer during a power cut, its common for the power to come back for a couple of minutes then fail again, thats enough to boot and then be pulled down again, adding to the file system woes every time.
I'll disable that option Nick :oops:

If it goes down in a power failure in future, it can stay down.

Cheers chaps
User avatar
Nick
Site Admin
Posts: 15759
Joined: Sun May 06, 2007 10:20 am
Location: West Yorkshire

#12

Post by Nick »

SteveTheShadow wrote:
Nick wrote: Having the server set to restart on its own can be a killer during a power cut, its common for the power to come back for a couple of minutes then fail again, thats enough to boot and then be pulled down again, adding to the file system woes every time.
I'll disable that option Nick :oops:

If it goes down in a power failure in future, it can stay down.

Cheers chaps
If its not vital it comes back its best that way. The other option is a UPS to bring it down gracefully.
Whenever an honest man discovers that he's mistaken, he will either cease to be mistaken or he will cease to be honest.
User avatar
Ray P
No idea why I do this anymore
Posts: 6323
Joined: Thu Nov 22, 2007 5:18 pm
Location: Somerset

#13

Post by Ray P »

Just returned from holiday so coming at this late; I don't think I can add to what has already been said by Nick and Andrew. I would keep an eye on disk #1 though - just in case.

It might be worth re-emphasising for others building music libraries using computer technology (or for storing personal documents and the like for that matter) that having disk mirroring or RAID doesn't, at least in my book, constitute a backup. A backup should be a seperate copy, ideally kept somewhere away from the computer you've backed up (a backup isn't much use if the burglar takes it with the computer or its in the same house that burns down). Often neglected is the need to check your backups occasionally to ensure you can actually read them to be able to restore if it proves necessary.

Ray
User avatar
Ray P
No idea why I do this anymore
Posts: 6323
Joined: Thu Nov 22, 2007 5:18 pm
Location: Somerset

#14

Post by Ray P »

Nick wrote:If its not vital it comes back its best that way. The other option is a UPS to bring it down gracefully.
A UPS in itself won't shut it down gracefully; you need to detect that the the UPS has taken over and use that to trigger a clean shutdown.

Ray
User avatar
Ray P
No idea why I do this anymore
Posts: 6323
Joined: Thu Nov 22, 2007 5:18 pm
Location: Somerset

#15

Post by Ray P »

pre65 wrote:Do solid state hard drives have these issues ?
Having no moving parts solid state 'disks' are more reliable, quicker and quieter (silent), however, they are not immune to failure. During my IT career I've encountered several computer issues that have been odd and ultimately came down to issues with memory or mother board chips.

Ray
Post Reply