Monday, July 6, 2009

A Long and Winding Story

Thursday, I received the parts for my new desktop/workstation build I was planning on.  Straight from Newegg came two AMD Opteron 2427 6-core CPUs, two 2GB ECC/reg DDR2-800 DIMMs, and a Supermicro H8DAE-2 dual Socket F motherboard with HT3 support.

The adventure started when I first tried to boot on Thursday.  I actually had video!  I was expecting nothing since only the latest BIOS added support for Istanbul CPUs.  "What a lucky break!" I thought when I also discovered that in the manual there was an "Emergency BIOS Recovery" procedure for updating the BIOS even without a CPU.  All that was supposedly required to use it was the motherboard, a power supply, and a keyboard.  Hold down Home and Ctrl during boot and the chipset and BIOS boot block itself have sufficient logic to update from a floppy with a file named SUPER.ROM (the updated BIOS image).

I decided I'd try this.  Heck, it almost worked already.  So, I powered up with the appropriate keys held in on the keyboard.  All it did was beep repeatedly after accessing the floppy.  After some searching, I found that the manual was simply wrong--the file had to be named AMIBOOT.ROM in order for this procedure to work.

Booting again into the recovery mode with the file renamed to AMIBOOT.ROM yielded an on-screen status of what was happening.  This again was in opposition to what the manual said, since it indicated status would be audio-only (four beeps upon finishing) and nothing would be on screen.  After about 30 seconds, the process finished and the screen printed "Success" briefly before rebooting itself.

...Nothing happened.

After the flash, nothing would power up except for the chipset fan, and that's probably just hard wired into the power plane.  Not even the CPU fan would spin up.  I decided to install the second CPU just to be safe.  Who knows?  Maybe the update flipped which socket is seen as primary (required).  No joy.  I was dead in the water.

Then, the old technique I'd heard about back in the nForce2 chipset days made its way to the front of my mind after seeing it mentioned online for a different person's similar problem.  Hot flashing.  No, not post-PMS hot flashing.  In this context, the meaning is to boot a board with a compatible BIOS chip/controller and remove its BIOS chip while running.  Then you insert the ROM you want to re-program (the dead one from my Supermicro in this case due to the apparently bad boot block after the update) and do a BIOS update.  The update gets written to the inserted ROM and once complete, you simply power down, pop it out, place it in the original board, and re-insert that board's original ROM as well.  In theory, you end up with everything being back to how it should be.

On the only board I had handy that I could do the hot flash with (most of my other modern boards do not have socketed flash chips but are instead soldered), the update failed.  Heck, upon closer examination I observed that the board couldn't even update its own BIOS yet alone help me in flashing another.  Since I happened to have two of these boards handy, I tried on the other one.  Same issue.  I looked online for ages trying to find a solution, but there were just others posting that they too have a problem updating the hardware.  No solutions.

By this point, I've spent all of Thursday night and all of Friday doing the actions previously described.

Being somewhat tenacious when it comes to solving mysteries like this one, I ventured further.  I had replaced my dad's motherboard with a fancier one during his last upgrade a few months back.  I went over to his place and checked.  Sure enough, his board had a compatible BIOS type.  Armed with this old hardware and his current CPU (borrowed with permission of course heh) to run it, I went back home to try again.  Using a new flashing utility I found called Uniflash, I tried flashing my Supermicro's 8Mbit ROM chip.  No luck!  Three cells near the end failed to verify after flashing.  Since most of the data did verify, however, I decided to try it.  All I need to at least try the BIOS recovery procedure again is a working boot block, and that only comprises the first 8KB of a BIOS.  Apparently, the boot block was stored in those final cells as I was still unable to boot or do anything useful with the Supermicro board with this reflashed ROM.

It was at this point that I decided to go to lunch.  I needed a break.

Once back, I remembered that I saw a BIOS chip in my old DFI ICFX3200-T2R/G motherboard I used to run.  It died during a bad BIOS flash over a year prior, and it just so happened to have an identical SST 49LF080A 8Mbit flash ROM--just like the Supermicro.  I hot flashed this ROM like I'd previously done to the other chip.  While it failed too, different cells failed to verify.  It was at this point that I became somewhat discouraged.  It was possible that both ROMs were physically damaged, and the fact that both chips failed but in different areas (yet both failed in their own respective cells each time they were re-flashed again and again) suggested such physical failure.

It was approaching night time, and I decided a nice shower was in store.  While I often get breakthroughs in the shower when it comes to intellectual conundrums, I had no such luck that time.  Laying on my bed afterward, however, I received what I was after.  Half a year ago, I'd put together a Tyan dual-CPU system that was in much the same situation as my present one.  The CPUs were too new to be supported by the BIOS, and I hadn't even tried until a new BIOS chip with the latest firmware flashed to it arrived via mail (for $27).  Being a server board, it was more likely to be an 8Mbit ROM like I needed.  All of the desktop boards I'd surveyed around my house that had socketed ROMs were 4Mbit other than that DFI.

I went to my dad's place again where I'd built that machine, and sure enough...in the pile of parts on the floor from that build was an electrostatic bag with the Tyan's original BIOS chip in it.  It too was an SST 49LF080A!  It couldn't hardly be damaged.  It was handled with care, stored carefully, never updated/flashed, etc.  With it in hand I raced home.

The hot flash on this new chip could save the day!  Could.  It didn't.  It too failed like both of the other chips.  Since it was almost impossible for the chip to be damaged, that pointed to the possibility that there was simply something wrong with this board when flashing these chips.

Desperate, I asked around for other motherboards that might do for flashing with.  My dad's old old old dual-CPU board from years and years ago was available.  Being an old AMD 760MPX based board, it had been around for a while.  I took that board home with me and tried again.  Success on the first attempt!  I had a properly flashed chip...probably.

After dropping the ROM into my Supermicro, I tried to turn it on.  Fire!  No, not the literal kind.  I mean the machine turned on and was working.  I was now operational.  Finally.

Kinda.

While the machine worked, it didn't seem to work when two CPUs were installed.  Some times it would fail to boot at all.  Other times it would boot but the second CPU wouldn't be detected.  Ack!  I swapped CPUs and it still did the same thing, but that did at least tell me each CPU was individually operational.

That leaves me where I am today.  I don't have enough memory modules to run two DIMMs on each CPU.  The manual says two DIMMs are required on CPU1 when two CPUs are installed.  It also says no DIMMs are required on CPU2, but it is preferred.  I don't know how up to date or accurate the manual is.  It may be that CPU1 does require 2 DIMMs but that CPU2 must have its own memory as well before it'll work.  Some old, really slow ECC/reg DDR2 DIMMs are coming from a brother in the area on Tuesday.  At that time I'll be able to confirm or reject this current theory by testing it.

Until then, I'm just a one-CPU pony.  Or is that a one-trick pony?  I don't know.  All I know is that I'm exhausted.  That concludes my time between Thursday night and Sunday evening.  Man I'm a geek.

No comments: