3 WD Purples failing @ the same time.

c hris527

Known around here
Joined
Oct 12, 2015
Messages
1,805
Reaction score
2,115
Location
NY
Hi All,

Just want to throw this out there, Last week I got a call about about a Dahua 16 Channel POE NVR that was acting weird. I went to the site and at first I believed one of the three drives was failing. I looked more in depth at the logs and quickly came to the conclusion that all three were tripping errors in the Storage logs.
At this point I was not believing it and thought the NVR was taking a dump. I decided to pull it from the rack and get it on the bench. Here are a few log files pulled from the NVR.

Hot.JPG deverror.JPG Error.JPG

Turns out all three drives were bad and had failed at the same moment. We did have some bad weather that weekend but the first error was not until around 8:am on the Monday Morning well after bad weather went through. I tested the NVR for 4 days with different drives and if the NVR itself was eating them it should have showed up. I did have a conversation with western Digital about what might have happened and they were pointing in the direction of bad sectors due to "Something" but was more than happy to RMA all three drives. Since the system has really good surge protection and nothing else in that building had issues it is really bothering me what caused this. I did notice in the room where the rack is stored that their is stacks of containers that they are storing and I had to move all that crap out of the way to swing open the rack.
I now wonder if somebody got careless and slammed the stack of crates into the NVR causing head crashes.
All three drives have the same failure error. What is your take on the HD Error?
 

looney2ns

IPCT Contributor
Joined
Sep 25, 2016
Messages
15,720
Reaction score
23,070
Location
Evansville, In. USA
Hi All,

Just want to throw this out there, Last week I got a call about about a Dahua 16 Channel POE NVR that was acting weird. I went to the site and at first I believed one of the three drives was failing. I looked more in depth at the logs and quickly came to the conclusion that all three were tripping errors in the Storage logs.
At this point I was not believing it and thought the NVR was taking a dump. I decided to pull it from the rack and get it on the bench. Here are a few log files pulled from the NVR.

View attachment 34581 View attachment 34582 View attachment 34583

Turns out all three drives were bad and had failed at the same moment. We did have some bad weather that weekend but the first error was not until around 8:am on the Monday Morning well after bad weather went through. I tested the NVR for 4 days with different drives and if the NVR itself was eating them it should have showed up. I did have a conversation with western Digital about what might have happened and they were pointing in the direction of bad sectors due to "Something" but was more than happy to RMA all three drives. Since the system has really good surge protection and nothing else in that building had issues it is really bothering me what caused this. I did notice in the room where the rack is stored that their is stacks of containers that they are storing and I had to move all that crap out of the way to swing open the rack.
I now wonder if somebody got careless and slammed the stack of crates into the NVR causing head crashes.
All three drives have the same failure error. What is your take on the HD Error?
Sounds like you may have come up with the cause, slamming crates. What are the odds of 3 drives taking a dump at once otherwise.
With my Tinfoil hat on.......Unlikely but possible, someone did it on purpose to cover up a recording they didn't want seen.
 

archedraft

Getting the hang of it
Joined
Sep 11, 2018
Messages
138
Reaction score
91
Location
USA
It’s certainly possible that all the drives could fail “around” same time. As a rule of thumb, for server applications, it’s best to not buy all the hard drives (same size and brand) all at one time as we have seen correlations that hard drives from the same “lot” seem to fail around the same time. In a server raid configuration you usually have some redundancy but it’s certainly never a good thing when multiple drives fail around the same time or if one drive dies and a second one dies during the rebuild of the first.

Now all that said, I highly doubt that all three failed at the same time and it was due to the manufacturing. Likely someone did something they should not have which caused it.

Also, when you first buy the hard drives, I would always recommend a good stress test to weed out any DOA drives. There are many ways you could do this but an unsophisticated yet effective method would be to take the manufacturers testing application, run a short smart test, long smart test, zero the drive complete, long smart test and short smart test. At that point you should have a pretty good idea if the drive is going to give you issues or not.
 
Last edited:

c hris527

Known around here
Joined
Oct 12, 2015
Messages
1,805
Reaction score
2,115
Location
NY
It’s certainly possible that all the drives could fail “around” same time. As a rule of thumb, for server applications, it’s best to not buy all the hard drives (same size and brand) all at one time as we have seen correlations that hard drives from the same “lot” seem to fail around the same time. In a server raid configuration you usually have some redundancy but it’s certainly never a good thing when multiple drives fail around the same time or if one drive dies and a second one dies during the rebuild of the first.

Now all that said, I highly doubt that all three failed at the same time and it was due to the manufacturing. Likely someone did something they should not have which caused it.

Also, when you first buy the hard drives, I would always recommend a good stress test to weed out any DOA drives. There are many ways you could do this but an unsophisticated yet effective method would be to take the manufacturers testing application, run a short smart test, long smart test, zero the drive complete, long smart test and short smart test. At that point you should have a pretty good idea if the drive is going to give you issues or not.
I can tell you for sure these all came at the same time, heck the S/N are pretty close. Something did in fact happen to cause this, If it was electrical in nature, it did not effect the NVR or any other computers nearby or the 40" TV that is also on the same circuit.
 

alastairstevenson

Staff member
Joined
Oct 28, 2014
Messages
15,984
Reaction score
6,805
Location
Scotland
I now wonder if somebody got careless and slammed the stack of crates into the NVR causing head crashes.
It's a pity that the SMART attributes for those drives don't include a g-sensor count.
With 3 together at the same time there is clearly a common cause, and you have indicated that electrical problems are unlikely due to surge protection and lack of incidents on other equipment.
My money would be on a shock event.

Any suspicious marks on the rack?
 

c hris527

Known around here
Joined
Oct 12, 2015
Messages
1,805
Reaction score
2,115
Location
NY
You know, I did not really look, I have been so busy with all this stuff that I never even thought at the time to examine the rack. But I will be going back this week to do a check up on the system, You know I thought I might have seen dry smeared blood on the floor (just kidding) anyhow I will look.
Thanks all
 
Top