Multi-repos KISS
- 866 words
- 5 min
I have never worked in a monorepo professionally.
One project, multiple repositories, multiple build pipelines, multiple artifacts, multiple deployment jobs, for one system, always.
Frontend vs backend splits, microservices, code vs infrastructure, "this is just a spike"... there always seems to be a "reason" to somebody why this is a good idea. "We want to be able to release components separately" is inevitably followed by "But our release testing doesn't work and so we have a CAB meeting with 7 days notice. In reality we only ever release everything at once, maybe each month.".
Multiple development pipelines are set up, one for each repository, some with subtle dependencies on another. An easy solution is found - just using the latest merged build from the other repository. Two hours later a developer struggles to test and merge a change which requires changes in both branches. This is fine because automated pull request build validation is far beyond the horizon - the development lead eyes the diffs and merges anyway.
Wikipedia's page on Monorepos provides a non-exhaustive list of Limitations and disadvantages, but not once have I heard one of those items given as a reason when abandoning the monorepo, nor heard a reason I felt should be added to it. I would love to see others contribute valid reasons to it. Looking at what's currently there more closely:
- Loss of version information (Semantic versioning breaks when using the same build number across the project)
- Semantic versioning is another of those things I rarely see employed in a professional setting in the first place...
- Lack of per-project access control
- I would argue that this is no longer a loss, but is solved in modern collaboration tools with branch policies preventing merges without specific approvals and/or build validation. Build validation can easily enforce different policies per-path. Some collaboration tools such as Azure DevOps then provide first party path filters on branch policies.
"Atomic Commits" is probably the item on the Advantages list that I feel is most often ignored. I see organisations inevitably reinvent processes, procedure & bureaucracy on top of the development process to regain the atomicity still desired. Inevitably, this devolves into a manual process, an engineer down the line having to eye up versions when deploying. Mistakes happen.
I argue multirepos are almost always a class of premature optimisation which isn't called out or documented well enough. I'm no authority on Dev: structure your code as you want, microservice if you think it's the only way to manage your complexity. 6 engineers and 20 microservices? Sure, you're the expert, good luck!
But to Ops Engineers, I say play tough and demand to make your team's lives easier: argue the case for a monorepo.
- Make it work
- KISS: Keep It Simple, Stupid! Stick it into a single repo, single build pipeline.
- Figure out how your project is going to end up before you optimise for it.
- Dependencies you won't have anticipated will crop up here and it'll be easier to manage.
- Make it right
- This is where you really build up your CI/CD, testing, everything you can afford.
- Enforce some form of automated Pull Request build validation. (Even if it's just a green build, no testing.)
- Since everything is in one place, it's easy to work out what assurances you want to build into the system.
- Don't think about templating bits until you're doing it for the 3rd time and you already need a template.
- Make it fast
- Your build pipeline already has multiple parallel steps, there's probably one which is far slower than the rest.
- I bet 50p you can improve the speed of this step in an acceptable, obvious way.
- Build VM's are also cheap.
- This is potentially when you give in to demands to split the pipeline out.
- Reuse pipeline templates in different ways - perhaps pull requests need everything built and tested which can take longer, but pushes run a subset?
- Agree a better versioning strategy than "append Jenkins' job number"
- Can't avoid a proper artifact management strategy now...
- Work out dependencies & triggers between pipelines.
- A million other options you can think of, all easy to make iteratively since everything is still in a single repository so you can implement it in one clean commit.
- I bet you can still deliver a single artifact by this point, even if you think you're ready to move on to the next step.
- Do what you want. (Pad that CV!)
- Multi-repo? If you really want to, go for it... it'll be simpler to chop up the repo and the build pipeline than if you had to build everything at the start. You'll be warned earlier if things break since you'll see existing build systems fail, even though you may not have proper tests. You've still got to work out further dependencies, versioning strategies and all those other things you would have needed to figure out at the start but probably skipped to get it working.
Multi-repos aren't necessarily bad, you just don't need them yet.