Who, Me? Remember when September seemed so far away? Those of you still working from your bedroom since March should probably have changed your pyjamas by now. We’ll wait. When you’re ready, enjoy a tale from the Who, Me? vault courtesy of a reader who knows all about unplanned undergarment changes.
“O” spent the early part of this century in the capable hands of Windows Server 2003, and was always keen to find new ways of keeping his systems ticking over: “Being the only experienced professional running the systems with a few ‘independent consultants’ who looked down at any really hard server work,” he said, modestly, “there was always an area to improve.”
The consultants managed only one of the company’s five or so sites and while complexity was swerved where possible, they were “very vocal about any issue that made them look good”.
The ad-hoc spaghetti of the gang’s various solutions had been beaten into submission through “Quality Control” and “Change Management” although disk space woes remained a constant. The issue, recalled O, was “due to a surprisingly high resistance to RAID5 (and the crazy cost of those SCSI disks!)”
Upgrading the primary server (lurking in the same location as the consultants) was to be a simple job. O would fly in, plan, start and finish the task. Everyone was happy.
Well, not quite.
This PDP-11/70 was due to predict an election outcome – but no one could predict it falling over
“A new cluster with plenty of disk space seemed to draw out the complaints on disk space across the other regions,” O explained. So he put together a script that could read a list of user-approved sacrificial directories when disk space was getting low, and wipe the things.
A cautious fellow, he also had the script check which server it was running on (“to prevent misfires!”) and added a “reasonable amount of intelligence and automation”.
As with so many things in the IT world, O’s handy script sat in the background and was soon forgotten about as new nodes, BDCs and the like were added with the expansion of the company.
It wasn’t until a planned weekend outage that the wheels first showed signs of coming off. O noted that some servers were blocking updates due to disk issues.
Not a problem. He remoted into the consoles of the servers, double-checked where he was and ran his handy clean-up script.
“I watched it start,” he said, “then went off to do the other patching reboots.”
The calls began eight hours later. A newly added BDC in the south of the country had died, showing
winnt errors on reboot. Odd, but after fettling a boot disk (and thanks given to how the disk system, NTFS, managed Access Control Lists) things were soon back up and running.
“Initial root cause was bad disk,” said O, “because of the lack of investment into RAID5.”
Time passed, and even the BDC incident began to fade into memory. And then another call came in, this time from the site with the cluster and that team of consultants.
“A failover had happened, and things looked bad,” explained O. “A PDC going down was bad news, but adding to the complexity was the main site and … the consultants”
They looked at the problem and helpfully determined that, no, this wasn’t RAID-related. Instead the passive node’s C: drive looked like somebody had run
del *.* /s on it.
How on earth could such a destructive command (which would have a crack at deleting all files in the current directory and all subdirectories) possibly have been run?
“Looking back at the ‘bulletproof’ script in hindsight as I was cleaning my desk,” sighed O, “the script was not designed to run where it should not run.
“It was designed to check for the server name and execute based on where it was meant to run.”
However, if it didn’t find the name, it did not simply exit. Oh no.
“The script added ‘
del specific drivepath‘, but when running on a new node for which it hadn’t been told about it became only ‘
del *.* …”
As well as the
/s parameter, O added
/f (to force the deletion of read-only files) and
/q (to stop Windows asking if the user was sure about that wildcard).
/q – the three options of the apocalypse.
It was, he said ruefully, “the cherry on top”.
Ever created what you thought was the neatest utility ever, only to realise that you have unleashed a data-destroying monster? Or conducted an impromptu test of your company’s backup and restore strategy? You have? Then an email to Who, Me? will clear your conscience. ®