The Bus Factor
The Bus Factor (also known as the Truck Factor) is a somewhat-morbid term that gives us an idea of how concentrated knowledge is within the separate members of a team, or equivalently, how poorly knowledge is shared across the team. Since there can be slight variations on the exact meaning of this term, let’s define the term for the purposes of this article as:
Bus Factor: The minimum number of people on a team who have to leave before some essential or important responsibility of that team can no longer be accomplished.
As such, the Bus Factor is one measure of the risk to a team - the risk that the unavailability of a certain set of team members could impact that team’s ability to deliver.
Characteristics of the Bus Factor
The maximum value of the Bus Factor for a team of size N is N. In this case, anyone on such a team knows how to do all essential parts of that team’s work, though perhaps not to the same ability of anyone else. In other words, to completely remove all of the team’s knowledge, you’d have to remove the entire team. This is probably an overly idealistic scenario rather than a pragmatic one or even a goal to aim for. In real life, people aren’t replaceable cogs nor should you strive for them to be.
The minimum value for the Bus Factor is 0. In this case, the team has already lost one or more people who were the only ones who knew how to perform some essential task! This team will probably struggle for a while until they re-learn how to accomplish those tasks. A real-life example of a Bus Factor of 0 was seen recently during the COVID-19 pandemic, when the state of New Jersey put out a call for COBOL programmers (erroneously referred to as “Cobalt”) to help improve the state’s unemployment systems, which could not keep up with the huge influx of applications.
A Bus Factor of 1 means there is an essential task on the team that only a single person knows how to accomplish. If this task occurs frequently and is important, then by extension, that person becomes very important for the team. Perhaps they like having this responsibility - but in my experience, it tends to lead to an unfair burden on that single individual, which may eventually lead to burnout and in extreme cases, that individual leaving the team.
Following this observation, I am going to propose a corollary:
Unless acted on by an outside force, Bus Factors of 1 tend to become 0.
Avoid Low Bus Factors
Now that we know that a low Bus Factor is a risk for a team, how should we avoid it? What is the ideal Bus Factor?
Let’s first address the latter question. As mentioned before, the maximum value for the Bus Factor for any given team is the same as the size of the team. However, I believe this is an unrealistic goal, and is probably counter-productive in that it works against deep specialization by forcing everyone to spread themselves too thin. A reasonable goal might be to aim for a Bus Factor of at least 3, though I’ll admit that’s just a number pulled out of the sky and based only on personal observation. The actual number will vary depending on the team size and the particular situation of your team, but I will say that a Bus Factor of 1 is something that should almost always be avoided.
The basic ways to avoid having a low Bus Factor are to incentivize or encourage knowledge sharing and discourage knowledge silos. Here are some examples of both:
Encouraging Knowledge Sharing
- Experts should be incentivized to share their knowledge through presentations such as deep-dives, tutorials, and hands-on sessions.
- Anyone should be able to make a change to any part of the code base that the team owns, subject to of course code reviews.
- For significant parts of the code base that were written by one individual, encourage “hand-offs” of that code to others so that a single individual is not the only one who understands it.
- Oncall responsibilities should be shared with everyone on the team on a rotating basis. New team members should “shadow” the oncall for 1-2 rotations to learn the ropes.
- If there is a service/application owned by your team, everyone should be able to deploy it, and know how to debug issues for that service. This will make everyone aware of the deployment/production environment, and also make them aware of the importance of code quality in preventing production issues. Deployment responsibilities can be shared through a rotating “Release Captain” schedule, similar to, or even combined with oncall duties, depending on the specifics of your team.
- Reward not only the creation of documentation, but also the act of keeping documentation up to date.
Discouraging Knowledge Silos
- No part of the team’s code base should be owned by a single individual. Code is owned by a team, not an individual.
- Avoid custom frameworks/libraries that were coded by a single individual, usually on a whim. Such custom frameworks/libraries, in my experience, tend to be poorly documented, and tend to suffer from a lack of updates. Instead, rely on frameworks/libraries that are widely known and used beyond a single team.
- Don’t always assign specific tasks to an expert. Accept a temporary productivity loss as other team members learn tasks that were previously done by only one individual. While it may be faster to get a certain task done by just letting the expert do it, it’s too easy to get stuck in this local optimum that does not optimize for long-term team productivity because it prevents other team members from gaining the knowledge necessary to do that task.
If you want to introduce your team to these behaviours, I would suggest starting slowly by picking one or two that are most applicable to your team, and gauging the feedback to the changes. Often, these sorts of culture changes take time to be accepted.
Furthermore, don’t expect that everyone will eventually be able to accomplish specific tasks at the same level as an expert; that is an unrealistic expectation. The goal here is to ensure that others can at least accomplish those tasks with a basic level of proficiency, ensuring there isn’t a functional gap in a team’s abilities should the expert become unavailable.
Conclusion:
Avoiding a Bus Factor of 1 should be a high priority for any team. It ensures the team can continue to function effectively should a team member become unavailable. It also ensures that knowledge is properly shared across your team and not siloed within individual team members. Furthermore, upping your team’s Bus Factor avoids putting an unfair burden on any single team member for some important or essential responsibility of the team. By promoting a culture of knowledge sharing within a team, you not only improve that team’s culture, but also their ability to onboard new team members - thus helping to keep the team cohesive through individual departures and additions.