Monday, November 7, 2016

How to check your system components for errors in a dell r510

How to check you system components for errors in a dell r510

Whether you're buying hardware direct from the manufacturer, from an auction on ebay, or from government surplus website you want to confirm that your purchase is working as intended before you put in some serious work..  Many ebay auctions come with a 14-30 day DOA warranty so you'll need to run as many tests quickly to be able to start the return process if something is wrong.

This blog tries to make zero assumptions as to previous knowledge.  I know when I've done something completely out of my comfort zone I start to wonder, 'Is this really supposed to be taking this long'.  This post is going to read half instructional and half like a twitter feed just so you can see what's possible with my Dell r510 hardware and see how much time this process takes. With that preface lets get this show on the road.

*note Ebay has a new policy that makes opening up a dispute with the retailer, even when it is resolved to the customer's satisfaction as a negative mark against the retailer.  When possible deal directly with the seller before opening up a dispute. Also realize the dispute is now trying to be avoided and be ready to use it in case things did not get resolved to your satisfaction.  Its not a threat to make against a seller, merely another tool to resolve issues.  I'm not sure the maximum time between receiving the item and filing a dispute, double check eBay rules.

 I hit F10 at startup / boot screen and went into a Unified Server Configurator boot screen,  this enabled me to enter a window with some choices listed below and one of them was hardware diagnostics.  This is built into my dell r510, if you don't have a diagnostic tool built in for some reason, or are looking to run a diagnostic tool that might be updated when you haven't updated your server's firmware in a while I'll cover that in a different post. TODO

Step 1: After hitting F10, scroll down to Hardware Diagnostics, tab over to the right window and Select Run Hardware Diagnostics with the arrow keys. Hit <Enter>

If you get this warning, it took 4 seconds, not 4 minutes  to get to the next screen.  All is well.


Click Run Diags. We'll look at MpMemory later on.

Now I chose Custom Test.  I didn't know what's involved with the others, So I want to see a list of possibilities.

You have the option to only select specific hardware to test.

Thankfully, I chose Custom and went into Hard Drive, my original goal for this test.  I had the root\hard drive\ top tree selected when I hit enter so it kicked off scanning all drives with all tests.  There's also a few tests under each drive and these can be individually selected as a test or you can click on the drive to complete all the available tests on that drive only. Double clicking also selects and tests everything under the tree, Double clicking doesn't expand the tree.


Note: The keyboard currently isn't allowing any functions, I assume it will read the ESC key to abort, but all tab, shift tab, ctrl tab operations have stopped.  There is also a warning saying that the mouse is inactive during this test at the very bottom of the screen, unfortunately cut off.  The progress bar at the top shows something is progressing.

Here you see the Hard Drive test running.  It loaded the first drive and went to the first scan option, the Confidence Test;

The scan options for hard drive are:
  • Confidence Test
  • Device Self Test,
  • Drive Self Test(Long), - grayed out
  • Drive Self Test(Short),
  • Read Test, grayed out
  • SMART Test,
  • Verify Test, grayed out
  • Buffer Test.
It has now been running for 15minutes and I'm at 58% complete Test Progress | 2% complete Overall Progress.

OK at <8% Overall it quickly jumped to Device Self Test, and then to Drive Self Test (short)
It went and did smart test and buffer test extremely quickly.

It then auto jumped to Drive 2 and started the Confidence Test with a scan, and then stopped... We'll call this the confidence test pre-check.

It then asked me if I want to continue with the confidence test (pictured below) because it will take a couple minutes.  That'll be annoying if you wanted to scan all the hard drives at once while away and come back to a full report.

Currently at 41% test progress, 21%  overall progress at 40 minutes in.  Just a reminder, my dell r510 has 160 gig x 7200 rpm x 5 drives with 2 measly e5503s running at 2.0 gHz.  Since all 5 hard drives are the same size, the tool is assuming each hard drive full test is 20% of the overall progress 100%/5 hard drives = 20%/hard drive.  I will share this bit of good news though; I don't hear the hard drives at all clunking around with all the scanning they're doing.  Pretty impressed with the WD Re drives. Need to see how the Seagate, Fujitsu, and Maxtor's compare if I can acquire some. Moving along...

WD-WCAT26387671 passed scanning (HD0)
WD-WCAT26387879 scanning right now.  I didn't realize there were serial numbers (HD1)

I started these tests around 4:00pm
4:59 second harddrive went into DriveSelf Test Short
5:02 Short, smart and Buffer was all finished
5:02 Pretest on drive 3, with Prompt returns.
WD-WCAT26387879 passed scanning (HD1)
WD-WCAT26386594 scanning right now. (HD2)
~5:38hit the prompt again for hard drive 4(might have been delayed)
WD-WCAT26386594 passed scanning (HD2)
WD-WCAT26387842 scanning right now. (HD3)

Needless to say it's continuing....

...and finally Finished, each 160gig  drive took around 30minutes to scan.

Here you can see how as each test passes, we get our green check.



When it finally ended I checked to see how I could use a keyboard/mouse.  Tab and Shift+Tab worked to traverse sections. Arrow keys worked to traverse between.  Double clicking Hard Drive did not expand the category but started running the entire subtree, with all drives+all drive tests avail. Mouse works in general like normal, but double clicking to expand a tree equates to run every test under it.

ALL HD's passed. go me.

Now lets hunt around and see what else we can test.

Went to check something easier/quicker for test purposes
Tried System Management\IPMI\ and double clicked.
Whole test was less than 30seconds, and finishes by asking you to check the front LCD screen and type in an output.

Running the scan on the System Management\SMBIOS\ System Information\ did show the version of my Dell r510 to be A02.  Which means I can run xeon 5600series chips. Revision II for the win.
*I WAS WRONG, so if one is good, more is better. I bought another r510 with a 5630 installed and had a revision of A04. However, the iDRAC system does show on the system page my revision is II
*iDRAC 6 version 1.80 build 17 on dell r510 showing System Revision


*iDRAC 6 version 1.40 build 13 on dell r510 showing System Revision



I tested every other option except the DVD-ROM Confidence test.  All the tests outside of the hard drive test were less than 1minute each. CPU was the longest excluding the hard drives test, and was about 4 minutes.
Warning:
The dvd-rom test said it was going to eject and pull back in the tray so I needed to remove my front bezel.  The test ejected but there was no withdraw back in.  I do not believe my dvd-rom tray is built for that operation, it's more a laptop cd-drive than a desktop one that cycles the cd back inside when you hit the eject button a second time.  This is also a good thing incase you ever forgot your bezel, my drive bay door DOES NOT push against my bezel.  CHECK YOUR OWN DVD/CD ROM BEFORE ASSUMING THE SAME.

One option I see is the ability to export the log.  I need to find out how to use checkbox Log output file pathname: <path> . I'm not sure what kind of path I can put here with the esxi/linux file systems and there is no browse function at all.

After exiting the individual Hardware selector, I followed the next option of MpMemory test. I did a quick MpMemory RAM memory test by choosing custom and it did break down the long duration vs short duration RAM checks.  Selecting ALL short duration tests was less than 10 minutes on 12 gig of RAM.





Reboot as the final choice  of 3 exitted the testing utility and did a full reboot to the system.

If anyone has their own tests, write a comment down below and we'll see if I can add it to this list.
I think there's a Dell OpenManage tool to do this within a running OS. TODO

No comments:

Post a Comment