Uncohesive data, four years later

I’ve been doing a handover of some systems that I’ve been involved with in one way or another for around four years. Communicating about all the issues that we’ve faced has given me some fresh perspective.

Context

More and more of the business had been added to a system that initially served only an online application process and some customer management, the classic organically grown monolith. When the company hit the growth phase, the CTO recognised that there would be enough developers working on it to start stepping on each others toes. He created self-organising teams, and mandated the teams to write services outside of the monolith. When I joined, I immediately started working on one of that first batch of systems.

That first system’s job was really to improve the lives of a group of people working on a giant spreadsheet. It had lots of columns added by seemingly every different part of the business, and because it was so central there were strict conditions for collaborating on it. It was only safe for one person at a time to edit it, and enough different people needed to edit it that some people found themselves waiting until midnight for their turn at it.

It’s easy to imagine a spreadsheet being converted into CRUD forms. Plain sailing, albeit with a lengthy migration of data out of the monolith (that part which existed in the spreadsheet and the monolith).

Fighting for cohesion and losing

The early decision makers (even before coding started) had their work cut out pushing to maintain cohesiveness. Parts of that spreadsheet were really tangential to what was trying to be done: “Rather keep column X in the monolith”, “column Y is a calculated field, people shouldn’t have to type it in”, “column Z belongs in another new system we just started, can you go speak to team Z?”, and so on.

I’m calling data that serves the main purpose “cohesive”, and data that doesn’t, “uncohesive”.

They won some fights, and lost others. Usually they lost on the basis of there not being a good place to put the data yet. No problem, we added that data “here”, and would migrate it later on.

At a guess, 30-40% of the data that the system eventually mastered was uncohesive. The users took spreadsheets (mercifully smaller spreadsheet) from other departments, and typed the results into the system, after doing some fuzzy translation of the ideas in that other department. Sometimes it was slack messages going back and forth that were needed to create the data.

Negative consequences of uncohesive data

In a relatively short space of time, the system became important to a large portion of the business. The data is used in almost every part of the customer journey, which is consistent with the data (even the good, cohesive part) being central to the larger product.

Unfortunately, the uncohesive parts have had some negative effects.

First, there were requests for changes from departments that are organisationally very far away, on the other end of a broken telephone effect. It was difficult to have confidence that the changes were for the better. Any given request could have been redundant, or there might be a better way of addressing the problem. While trying to get those answers, the people who wanted the change were getting frustrated at the lack of delivery.

Second, seemingly innocuous changes could break weird calculations that I’d never heard of. “Something’s wrong” is easy to understand, but understanding the problem well enough to make sure that it didn’t happen again was difficult.

As a result decision making slowed to a crawl. For example, I frequently want to know if a given complex feature justifies its high maintenance cost. Let’s call it X. I’d have conversations like:

Me: "Hi A, can you tell me if you use X?"
A: "I don't use X, but speak to person B before you change anything".

Me: "Hi B, can you tell me if you use X?"
B: "I don't use X, but speak to person C before you change anything",

Me: "Hi C, can you tell me if you use X?"
C: "I don't know anything about X, but I know that A and B do, go speak to them".

Me: "Hi A&B, it seems no-one uses this thing, I'm going to delete it"
A: "I'm not really sure about this, I don't know that much about X and I need something else from your team"

It was days between getting time with each person, and there have been enough of them that I can’t remember if we ended up being “allowed” to make that change. Good ideas got smothered. If the data had been more cohesive, we wouldn’t have had to speak to so many different people before making any given change.

Positive and negative forces

Why aren’t the teams that send the spreadsheets just typing into their own systems instead of creating spreadsheets? In one case, because that area of the business has been a low priority for automation. We still don’t have good places for some of that data.

In other cases, it’s because of momentum. Originally these were “quick wins”. Over time we forget that we were to be temporary owners: “someone needs X2, we have X1, we should add X2 here”. I’m in a unique position of having seen all four years, and it becomes harder with turnover to remind people that X1 was supposed to be temporary, and we should rather put X1 and X2 somewhere else than double down on the original. We can hardly call it a “quick win” anymore.

On the plus side, I’m happy to say that everyone (not just technical people) is now on board that this is a problem. The users have to mindlessly type things and suffer the consequences when - understandably - boredom leads to mistakes, so they’re not happy.

Also positive is that as more of the business is automated, including those parts that were underserved, more technical people are able to weigh in and ask the question: “I’m responsible for X over here, why is X1 being looked after by Fritz over there?”

It’s been a slow pace getting to this point: over four years, maybe a tenth of the poorly placed data has been migrated out of this system. That’s a sobering realisation, compared to how we thought our system would own the uncohesive data for say, six months, before a more suitable place could have been found.

Conway’s Law

All of this seem’s predictable with hindsight. Conway’s Law says that system architecture will start to resemble the organisation that created the systems, because of difficulty communicating across organisational boundaries (note: I’m not a Conway’s Law expert, more of the “I checked in Wikipedia and this seems right”). We tried to flout this - by making our system responsible for data from far and wide - and progress became really difficult because communication was difficult. It’s taken me four years to make this connection.