Home > Hardware > Maintenance Continues

Maintenance Continues

June 23rd, 2008

Our “Extended  Preventative Maintenance has continued to be full of fun and eventful.  The box arrived Wednesday AM from Intel, but the part was a Switch not a Storage Controller.  The modules look very similar and the sku is just on digit difference between the two part codes so one could understand that mistake.  We needed Part # d91231-002 and we received Part # d91241-002.  Needless to say this was very frustrating since I immediately realized this would mean another 24 hours until we received a replacement part.

Jeremie called Intel to alert them of the problem and was told he would receive a call from the RMA-Specialist within the hour to resolve our replacement hardware problem.  For the Next 6 hours every hour we contacted Intel and were told to wait for a call back.  At 4 pm I joined Jeremie on the phone and we were told that the inventory from the parts depot appeared to have the Storage Module on back order. Needless to say steam, fumes, and a few other things started coming out of my ears.

We were talking to a support technician named Carol, who informed me a transfer to her supervisor, a shift manager, a floor manager, or anyone else at Intel was against policy and I would have to wait for the callback. After about 5-10 minutes of this discussion looping Carol informed me that she was ending the call.  ARE YOU KIDDING ME?  A customer calls and informs you that 75 % of the production network is critical down and you say wait for a call back and hang up?  So we immediately called back and talked with another support technician and he again escalated the case.  Within 10 minutes we received a call from Oscar, he was a RMA Specialist and had talked with the staff at the parts depot and confirmed that the Storage Module was on back order… and our option was to purchase a new module from a distributor and Intel would reimburse us.  How can a mission critical part for a system that is 6 months old be on back order???

We called about 10-15 distributors and all were out of stock.  Finally we found our new best friend at SHOPBLT.COM,  Harold returned our voicemail that we left after their stated business hours, and apologized that he didn’t answer the call, but that they were under a tornado warning in Connecticut, but he wanted to know how he could assist us.  He did have one module in a warehouse, and he checked and it was on the shelf and the warehouse manager in Illinois said he would have it on a truck that night.  We asked where in Illinois the warehouse was and could we pick up the part, and Harold informed us he couldn’t disclose the location since they were a defense contractor…

Meanwhile we had contacted our Intel Channel Partner Mark and he had arranged for a test in the Intel Engineering to confirm that the replacement of the module was as simple as putting in the new module.  Intel’s tests confirmed the replacement was as documented.

Thursday AM the part arrived and we reinstalled the original Mid-Plane back into the chassis and said a prayer.  When we powered on the Chassis all was well and we could see and access all our data Yahoo! What a God thing.  Unlike our SAN this hardware was designed correctly to house the information about the RAID arrays on the drives so the volumes were completely intact after we inserted the new storage module.

We copied the data off to our recently made larger SAN and booted up all the virtual servers on various hosts to make sure the data was intact… and it was.  Jeremie then installed the new mid-plane now that the Chassis was not running any virtual servers and upgraded the firmware on the module all with success.  Once MasterFlex was back online we copied the virtual servers back and booted them up… And all production servers back online.

Follow-Up:

– Why don’t we have a second Storage controller in the blade system… $1800 is the cost of the second module.  Would it have prevented this issue.. possibly unless the secondary would have fried during the process to update the firmware too… So now we need to budget for the secondary module.

Questions for Intel:

- Why was there no documentation sent with or online about replacing the Mid-Plane or how to check the firmware version that it would push to all the modules.

- Why is the Storage Module on Back order?  What would Plan C had been if we couldn’t have purchased a module?

- Why is there no escalation process to the Customer Support/Technical Support call queue?

Hardware

  1. No comments yet.
  1. No trackbacks yet.