While writing a review of The Phoenix Project I realised that it might seem irrelevant to current-day developers. Many people have never lived with separate “Development” and “IT Operations” teams and deploy their code automatically / with little fear of failure (e.g. just pushing to a repo where an automated build is triggered). That’s great, except sometimes it’s good to reflect on bad history in order to understand the important things about our current situation.
To aid reflection, I offer this memory of disfunction from a real life situation in the mid 2000s.
(I have observed in a few places that for any group of people who are ignorant of an experience, there is another group who know only that experience. If you find it weird that there could ever be software without people whose only job is to deploy and run it, The Phoenix Project is highly recommended)
My first job was as a “Technology Consultant”, aka contract software developer for a contracting house that wrote whole new systems for big corporates that didn’t have capacity for those projects. Typically we would write a system and then leave it in the hands of an internal IT Operations team to maintain. Their job was to keep new software and old software running smoothly on the same servers, and handle issues with the operating environment.
One project required some data to be migrated before we could go live, and before that we had to run the migration on their staging environment to prove it would work. Unfortunately these imports took several hours at a time and in order to avoid clogging the network that their staging environment was on - it was already barely usable during working hours - we logged on after midnight night after night for some weeks.
Why didn’t we schedule a cron job to do this? This was pre-cloud, so all software ran on shared hardware with no abstractions (containers, VMs etc) protecting them from each other. This was a multi-national with global scale red tape - mandatory multi-continent conference calls weeks before any deploy - so we were forbidden to install anything ourselves. Operations people had to do everything.
Why didn’t the operations people do it? Because supposed Unix admins were hopelessly underskilled and unsupported by their parent company (also an outsourced third party). We literally had conversations over the phone like this:
“Now tell me what is in this directory. What? Type ls. No, type the character l, then the character s. ENTER. Ok now read me what’s on your screen.”
Of course we were blind to the output ourselves and the Internet connection to their office couldn’t support shared video.
Instead of having to talk these people through installing a script, we added a button to the software that called code to import files from an FTP server. Because the script took so long on the network, we had to split up the work over 20-ish imports. We took turns setting our alarms to wake us up after midnight and press it, for weeks, if we did in fact get it right. One night I slept through the alarm and suddenly we were a day behind schedule.
This was pretty laughable and everyone who heard about it asked the “why didn’t you set up a cron job?” question.
With hindsight, we really should’ve pushed through the bad operations experience and gotten that script installed, but I didn’t want to deal with the red tape and ornery / semi-competent reception from our overstressed colleagues. We’d have to get it right over the phone in conversations similar to above, and if it went wrong we’d have to deal with blame game and excuses and still no access with which to diagnose.
Taking a moment to reflect on the bottlenecks that lead me to a bad decision:
- globally centralised release planning with severe red tape
- lack of technology to support independent releases
- technology that prevented collaboration
- long feedback loops / separation from the operating environment of our code
- operations people who couldn’t use the systems they were supposed to operate
- operations people fearful of the consequences of getting anything wrong
… this seems like the opposite of a few good ideas in the software world, for instance agile, c.f. Joshua Kierevsky’s 4 Modern Agile principles (Make people awesome; deliver value continuously; make safety a prerequisite; experiment and learn rapidly) and devops culture (flow, feedback, continual learning).
Nowadays, my job has few of the above problems. I’m thankful that technology makes some of these very easy to deal with (e.g. automated builds), but it’s a mistake to fix the technology problems alone. You also need to deal with people and culture, and THAT’s why I still find The Phoenix Project valuable.
subscribe via RSS