Original image by Luc Viatour / www.Lucnix.be

Repo style wars: mono vs multi

This essay was originally written when consulting for Eero who has graciously allowed me to share it.

The Fundamental Law of Repo Topology is that you must not have cyclical dependencies between repos. If you do you are in for a world of hurt when you have to perform a series of non-atomic changes to update libraries.1 Going with a monorepo has the advantage that you never have this problem because there’s only one repo. On the other hand, working in a monorepo implies certain things about the rest of your development process and even philosophy of development.

Two philosophies

The fundamental difference between the monorepo and multirepo philosophies boils down to a difference about what will allow teams working together on a system to go fastest. The multirepo view, in extreme form, is that if you let every sub-team live in its own repo, they have the flexibility to work in their area however they want, using whatever libraries, tools, development workflow, etc. will maximize their productivity. The cost, obviously, is that anything not developed within a given repo has to be consumed as if it was a third-party library or service, even if it was written by the person sitting one desk over. If you find a bug in, say, a library you use, you have to fix it in the appropriate repo, get a new artifact published, and then go back to your own repo to make the change to your code. In the other repo you have to deal with not only a different code base but potentially with a different libraries and tools or even a different workflow. Or maybe you just have to ask someone who owns that system to make the change for you and wait for them to get around to it.

The monorepo view, on the other hand, is that that friction, especially when dealing with more complicated dependency graphs, is much more costly than multirepo advocates recognize and that the productivity gains to be had by letting different teams go their own way aren’t really all that significant: While it may be the case that some teams will find a locally optimal way of working, it is also likely that their gains will be offset by other teams choosing a sub-optimal way of working, probably by cargo culting some other team’s approach and then letting it bitrot. By putting all your eggs in the one basket of the monorepo you can then afford to invest in watching that basket carefully. Clearly the friction of having to make changes in other repos or, worse, having to wait for other teams to make changes for you, is largely avoided in a monorepo because anyone can (and is in fact encouraged) to change anything. If you find a bug in a library, you can fix it and get on with your life with no more friction than if you had found a bug in your own code.

But this does point at what is perhaps the sharpest practical difference between the monorepo and multirepo philosophies, the difference in who is responsible for making the changes necessary to deal with library changes. That is, if the owners of Library X make a change to their API, who’s responsible for fixing the code that uses X? In a monorepo it has to be the Library X author because they can’t check in their change until the whole build is clean; they have to find and fix any code that their change would break. In other words, library authors are forced to balance the benefits of making breaking changes against the cost of fixing all the code they break because they’re going to do the fixing themselves.

In a multirepo world, on the other hand, the Library X folks would simply check in their changes to their repo and then publish a new versioned artifact and it falls to the teams working in other repos to—at some point—change their dependency on X to the newer version and make whatever changes they need to to deal with it.

Again, this choice really rests on what you think will let developers go fastest. Obviously for library authors, the multirepo approach requires less work in the short term—they just make the changes they want and publish a new binary. And consumers of the library are not slowed down by having to deal with the Library X changes and their code base does not get destabilized by having someone come in and muck with it. So life seems good—they can wait until an opportune time to adopt the new version and make the necessary changes to their own code.

But there’s no such thing as a free lunch. In the long term, both library maintainers and library users pay for their short term gains. Maintainers have to support multiple versions of their library since not everyone will upgrade right away. And consumers not only have to eventually upgrade to new version of things, they may be blocked from upgrading when they want to because other code they depend on hasn’t upgraded yet.2

Implications of a monorepo

The main challenge of running a monorepo is that it will naturally get larger over time and since you can’t scale horizontally (by splitting into multiple repos) you have to scale vertically and not all your tools will necessarily be up to the challenge. In particular:

Some other considerations:

Implications of a multirepo

Perhaps the main advantage of a multirepo is that it avoids the tooling challenges of a monorepo since existing tools are already designed to deal with the scale of project you’re likely to put in a single repo. However there are other issues, even if you believe that the multirepo approach will get you productivity gains:

Best of both worlds?

Can you get the best of both worlds somehow? Probably not. You could certainly tool up around a multirepo to make multiple repos act more like a single monorepo (this is what Android does with a tool called repo) and you could establish a consistent toolchain, etc. across all repos to reduce the costs of making changes in other repos. But at that point its not clear what benefit you are getting from multiple repos unless there are other reasons to have some code in separate repos.

Or, in a monorepo, you could use branches to mimic the local stability you get in a multirepo—every project would live in a longlived branch and only merges changes from other projects when needed. But then you get all the potential for irresolvable diamond dependencies preventing you from upgrading when you actually want to.

Further reading

1 To take a simple case: suppose you have code A and B that lives in one repo and code C that lives in another repo. If C depends on A and B depends on C even though there are no cycles in the code depdency graph, the cycle between the two repos means that updating A requires the following dance:

  • Change A and publish a new version of A’s artifact.

  • Go to C’s repo and change C to consume the new version of A.

  • Publish a new artifact for C.

  • Go back to the A/B repo and change B to consume the new C artifact.

This dance will require at least three separate CI runs as well as probably some branching and merging since the changes to A are incompatible with B until it is updated to the new (not yet existant) version of C.

2 Suppose both Libraries B and C depend on X and Application A depends on both B and C. This is the classic diamond dependency that has the consequence that A can only depend on versions of B and C which agree on what version of X they want. So if A wants to upgrade C to the lastest version but that version has moved ahead to a newer version of X than the latest B is on, A is stuck until B moves forward. Even a half diamond, where A depends direcly on X and B which also depends on X, is sufficient to cause problems if A wants to be on a newer version of X than B is. Now imagine a more complex dependency graph and you can see that the coordination costs of upgrading of a low-level library are going to be non-trivial.