From a Single Repo, to Multi-Repos, to Monorepo, to Multi-Monorepo

No Comments

I’ve been engaged on the identical challenge for a number of years. Its preliminary model was an enormous monolithic app containing 1000’s of recordsdata. It was poorly architected and non-reusable, however was hosted in a single repo making it simple to work with. Later, I “fastened” the mess within the challenge by splitting the codebase into autonomous packages, internet hosting every of them by itself repo, and managing them with Composer. The codebase grew to become correctly architected and reusable, however being cut up throughout a number of repos made it much more troublesome to work with.

Because the code was reformatted again and again, its internet hosting within the repo additionally needed to adapt, going from the preliminary single repo, to a number of repos, to a monorepo, to what could also be referred to as a “multi-monorepo.”

Let me take you on the journey of how this befell, explaining why and after I felt I needed to swap to a brand new method. The journey consists of 4 levels (up to now!) so let’s break it down like that.

Stage 1: Single repo

The challenge is leoloso/PoP and it’s been by a number of internet hosting schemes, following how its code was re-architected at totally different occasions.

It was born as this WordPress web site, comprising a theme and a number of other plugins. The entire code was hosted collectively in the identical repo.

A while later, I wanted one other web site with comparable options so I went the short and straightforward approach: I duplicated the theme and added its personal customized plugins, all in the identical repo. I obtained the brand new web site working very quickly.

I did the identical for an additional web site, after which one other one, and one other one. Finally the repo was internet hosting some 10 websites, comprising 1000’s of recordsdata.

A single repository internet hosting all our code.

Points with the only repo

Whereas this setup made it simple to spin up new websites, it didn’t scale properly in any respect. The massive factor is {that a} single change concerned looking out for a similar string throughout all 10 websites. That was fully unmanageable. Let’s simply say that replicate/paste/search/substitute grew to become a routine factor for me.

So it was time to start out coding PHP the proper approach.

Stage 2: Multirepo

Quick ahead a few years. I fully cut up the applying into PHP packages, managed through Composer and dependency injection.

Composer makes use of Packagist as its major PHP bundle repository. With a purpose to publish a bundle, Packagist requires a composer.json file positioned on the root of the bundle’s repo. Which means we’re unable to have a number of PHP packages, every of them with its personal composer.json hosted on the identical repo.

As a consequence, I needed to swap from internet hosting the entire code within the single leoloso/PoP repo, to utilizing a number of repos, with one repo per PHP bundle. To assist handle them, I created the group “PoP” in GitHub and hosted all repos there, together with getpop/root, getpop/component-model, getpop/engine, and lots of others.

Within the multirepo, every bundle is hosted by itself repo.

Points with the multirepo

Dealing with a multirepo could be simple when you may have a handful of PHP packages. However in my case, the codebase comprised over 200 PHP packages. Managing them was no enjoyable.

The rationale that the challenge was cut up into so many packages is as a result of I additionally decoupled the code from WordPress (in order that these may be used with different CMSs), for which each bundle should be very granular, coping with a single aim.

Now, 200 packages will not be odd. However even when a challenge includes solely 10 packages, it may be troublesome to handle throughout 10 repositories. That’s as a result of each bundle should be versioned, and each model of a bundle depends upon some model of one other bundle. When creating pull requests, we have to configure the composer.json file on each bundle to make use of the corresponding growth department of its dependencies. It’s cumbersome and bureaucratic.

I ended up not utilizing function branches in any respect, a minimum of in my case, and easily pointed each bundle to the dev-master model of its dependencies (i.e. I used to be not versioning packages). I wouldn’t be stunned to be taught that this can be a widespread observe as a rule.

There are instruments to assist handle a number of repos, like meta. It creates a challenge composed of a number of repos and doing git commit -m “some message” on the challenge executes a git commit -m “some message” command on each repo, permitting them to be in sync with one another.

Nevertheless, meta won’t assist handle the versioning of every dependency on their composer.json file. Regardless that it helps alleviate the ache, it’s not a definitive resolution.

So, it was time to carry all packages to the identical repo.

Stage 3: Monorepo

The monorepo is a single repo that hosts the code for a number of initiatives. Because it hosts totally different packages collectively, we are able to model management them collectively too. This manner, all packages could be printed with the identical model, and linked throughout dependencies. This makes pull requests quite simple.

The monorepo hosts a number of packages.

As I discussed earlier, we aren’t capable of publish PHP packages to Packagist if they’re hosted on the identical repo. However we are able to overcome this constraint by decoupling growth and distribution of the code: we use the monorepo to host and edit the supply code, and a number of repos (at one repo per bundle) to publish them to Packagist for distribution and consumption.

The monorepo hosts the supply code, a number of repos distribute it.

Switching to the Monorepo

Switching to the monorepo method concerned the next steps:

First, I created the folder construction in leoloso/PoP to host the a number of initiatives. I made a decision to make use of a two-level hierarchy, first underneath layers/ to point the broader challenge, after which underneath packages/, plugins/, purchasers/ and whatnot to point the class.

The monorepo layers point out the broader challenge.

Then, I copied all supply code from all repos (getpop/engine, getpop/component-model, and so on.) to the corresponding location for that bundle within the monorepo (i.e. layers/Engine/packages/engine, layers/Engine/packages/component-model, and so on).

I didn’t must hold the Git historical past of the packages, so I simply copied the recordsdata with Finder. In any other case, we are able to use hraban/tomono or shopsys/monorepo-tools to port repos into the monorepo, whereas preserving their Git historical past and commit hashes.

Subsequent, I up to date the outline of all downstream repos, to start out with [READ ONLY], corresponding to this one.

The downstream repo’s “READ ONLY” is positioned within the repo description.

I executed this job in bulk through GitHub’s GraphQL API. I first obtained the entire descriptions from the entire repos, with this question:

{
repositoryOwner(login: “getpop”) {
repositories(first: 100) {
nodes {
id
title
description
}
}
}
}

…which returned a listing like this:

{
“information”: {
“repositoryOwner”: {
“repositories”: {
“nodes”: [
{
“id”: “MDEwOlJlcG9zaXRvcnkxODQ2OTYyODc=”,
“name”: “hooks”,
“description”: “Contracts to implement hooks (filters and actions) for PoP”
},
{
“id”: “MDEwOlJlcG9zaXRvcnkxODU1NTQ4MDE=”,
“name”: “root”,
“description”: “Declaration of dependencies shared by all PoP components”
},
{
“id”: “MDEwOlJlcG9zaXRvcnkxODYyMjczNTk=”,
“name”: “engine”,
“description”: “Engine for PoP”
}
]
}
}
}
}

From there, I copied all descriptions, added [READ ONLY] to them, and for each repo generated a brand new question executing the updateRepository GraphQL mutation:

mutation {
updateRepository(
enter: {
repositoryId: “MDEwOlJlcG9zaXRvcnkxODYyMjczNTk=”
description: “[READ ONLY] Engine for PoP”
}
) {
repository {
description
}
}
}

Lastly, I launched tooling to assist “cut up the monorepo.” Utilizing a monorepo depends on synchronizing the code between the upstream monorepo and the downstream repos, triggered every time a pull request is merged. This motion known as “splitting the monorepo.” Splitting the monorepo could be achieved with a git subtree cut up command however, as a result of I’m lazy, I’d quite use a software.

I selected Monorepo builder, which is written in PHP. I like this software as a result of I can customise it with my very own performance. Different common instruments are the Git Subtree Splitter (written in Go) and Git Subsplit (bash script).

What I like concerning the Monorepo

I really feel at house with the monorepo. The velocity of growth has improved as a result of coping with 200 packages feels just about like coping with only one. The increase is most evident when refactoring the codebase, i.e. when executing updates throughout many packages.

The monorepo additionally permits me to launch a number of WordPress plugins without delay. All I must do is present a configuration to GitHub Actions through PHP code (when utilizing the Monorepo builder) as a substitute of hard-coding it in YAML.

To generate a WordPress plugin for distribution, I had created a generate_plugins.yml workflow that triggers when making a launch. With the monorepo, I’ve tailored it to generate not only one, however a number of plugins, configured through PHP by a customized command in plugin-config-entries-json, and invoked like this in GitHub Actions:

– id: output_data
run: |
echo “quot;::set-output title=plugin_config_entries::$(vendor/bin/monorepo-builder plugin-config-entries-json)”

This manner, I can generate my GraphQL API plugin and different plugins hosted within the monorepo suddenly. The configuration outlined through PHP is this one.

class PluginDataSource
{
public operate getPluginConfigEntries(): array
{
return [
// GraphQL API for WordPress
[
‘path’ => ‘layers/GraphQLAPIForWP/plugins/graphql-api-for-wp’,
‘zip_file’ => ‘graphql-api.zip’,
‘main_file’ => ‘graphql-api.php’,
‘dist_repo_organization’ => ‘GraphQLAPI’,
‘dist_repo_name’ => ‘graphql-api-for-wp-dist’,
],
// GraphQL API – Extension Demo
[
‘path’ => ‘layers/GraphQLAPIForWP/plugins/extension-demo’,
‘zip_file’ => ‘graphql-api-extension-demo.zip’,
‘main_file’ =>; ‘graphql-api-extension-demo.php’,
‘dist_repo_organization’ => ‘GraphQLAPI’,
‘dist_repo_name’ => ‘extension-demo-dist’,
],
];
}
}

When making a launch, the plugins are generated through GitHub Actions.

This determine reveals plugins generated when a launch is created.

If, sooner or later, I add the code for yet one more plugin to the repo, it’ll even be generated with none hassle. Investing a while and vitality producing this setup now will certainly save loads of time and vitality sooner or later.

Points with the Monorepo

I imagine the monorepo is especially helpful when all packages are coded in the identical programming language, tightly coupled, and counting on the identical tooling. If as a substitute now we have a number of initiatives primarily based on totally different programming languages (corresponding to JavaScript and PHP), composed of unrelated elements (corresponding to the principle web site code and a subdomain that handles e-newsletter subscriptions), or tooling (corresponding to PHPUnit and Jest), then I don’t imagine the monorepo offers a lot of a bonus.

That stated, there are downsides to the monorepo:

We should use the identical license for the entire code hosted within the monorepo; in any other case, we’re unable so as to add a LICENSE.md file on the root of the monorepo and have GitHub decide it up mechanically. Certainly, leoloso/PoP initially supplied a number of libraries utilizing MIT and the plugin utilizing GPLv2. So, I made a decision to simplify it utilizing the bottom widespread denominator between them, which is GPLv2.There’s loads of code, loads of documentation, and loads of points, all from totally different initiatives. As such, potential contributors that have been interested in a particular challenge can simply get confused.When tagging the code, all packages are versioned independently with that tag whether or not their explicit code was up to date or not. This is a matter with the Monorepo builder and never essentially with the monorepo method (Symfony has solved this drawback for its monorepo).The problems board wants correct administration. Specifically, it requires labels to assign points to the corresponding challenge, or danger it turning into chaotic.

The problems board can change into chaotic with out labels which are related to initiatives.

All these points usually are not roadblocks although. I can deal with them. Nevertheless, there is a matter that the monorepo can not assist me with: internet hosting each private and non-private code collectively.

I’m planning to create a “PRO” model of my plugin which I plan to host in a non-public repo. Nevertheless, the code within the repo is both public or non-public, so I’m unable to host my non-public code within the public leoloso/PoP repo. On the identical time, I need to hold utilizing my setup for the non-public repo too, significantly the generate_plugins.yml workflow (which already scopes the plugin and downgrades its code from PHP 8.0 to 7.1) and its risk to configure it through PHP. And I need to hold it DRY, avoiding copy/pastes.

It was time to modify to the multi-monorepo.

Stage 4: Multi-monorepo

The multi-monorepo method consists of various monorepos sharing their recordsdata with one another, linked through Git submodules. At its most simple, a multi-monorepo includes two monorepos: an autonomous upstream monorepo, and a downstream monorepo that embeds the upstream repo as a Git submodule that’s capable of entry its recordsdata:

The upstream monorepo is contained throughout the downstream monorepo.

This method satisfies my necessities by:

having the general public repo leoloso/PoP be the upstream monorepo, andcreating a non-public repo leoloso/GraphQLAPI-PRO that serves because the downstream monorepo.

A personal monorepo can entry the recordsdata from a public monorepo.

leoloso/GraphQLAPI-PRO embeds leoloso/PoP underneath subfolder submodules/PoP (discover how GitHub hyperlinks to the precise commit of the embedded repo):

This determine present how the general public monorepo is embedded throughout the non-public monorepo within the GitHub challenge.

Now, leoloso/GraphQLAPI-PRO can entry all of the recordsdata from leoloso/PoP. For example, script ci/downgrade/downgrade_code.sh from leoloso/PoP (which downgrades the code from PHP 8.0 to 7.1) could be accessed underneath submodules/PoP/ci/downgrade/downgrade_code.sh.

As well as, the downstream repo can load the PHP code from the upstream repo and even lengthen it. This manner, the configuration to generate the general public WordPress plugins could be overridden to supply the PRO plugin variations as a substitute:

class PluginDataSource extends UpstreamPluginDataSource
{
public operate getPluginConfigEntries(): array
{
return [
// GraphQL API PRO
[
‘path’ => ‘layers/GraphQLAPIForWP/plugins/graphql-api-pro’,
‘zip_file’ => ‘graphql-api-pro.zip’,
‘main_file’ => ‘graphql-api-pro.php’,
‘dist_repo_organization’ => ‘GraphQLAPI-PRO’,
‘dist_repo_name’ => ‘graphql-api-pro-dist’,
],
// GraphQL API Extensions
// Google Translate
[
‘path’ => ‘layers/GraphQLAPIForWP/plugins/google-translate’,
‘zip_file’ => ‘graphql-api-google-translate.zip’,
‘main_file’ => ‘graphql-api-google-translate.php’,
‘dist_repo_organization’ => ‘GraphQLAPI-PRO’,
‘dist_repo_name’ => ‘graphql-api-google-translate-dist’,
],
// Occasions Supervisor
[
‘path’ => ‘layers/GraphQLAPIForWP/plugins/events-manager’,
‘zip_file’ => ‘graphql-api-events-manager.zip’,
‘main_file’ => ‘graphql-api-events-manager.php’,
‘dist_repo_organization’ => ‘GraphQLAPI-PRO’,
‘dist_repo_name’ => ‘graphql-api-events-manager-dist’,
],
];
}
}

GitHub Actions will solely load workflows from underneath .github/workflows, and the upstream workflows are underneath submodules/PoP/.github/workflows; therefore we have to copy them. This isn’t excellent, although we are able to keep away from modifying the copied workflows and deal with the upstream recordsdata as the only supply of fact.

To repeat the workflows over, a easy Composer script can do:

{
“scripts”: {
“copy-workflows”: [
“php -r “copy(‘submodules/PoP/.github/workflows/generate_plugins.yml’, ‘.github/workflows/generate_plugins.yml’);””,
“php -r “copy(‘submodules/PoP/.github/workflows/split_monorepo.yaml’, ‘.github/workflows/split_monorepo.yaml’);””
]
}
}

Then, every time I edit the workflows within the upstream monorepo, I additionally copy them to the downstream monorepo by executing the next command:

composer copy-workflows

As soon as this setup is in place, the non-public repo generates its personal plugins by reusing the workflow from the general public repo:

This determine reveals the PRO plugins generated in GitHub Actions.

I’m extraordinarily happy with this method. I really feel it has eliminated the entire burden from my shoulders regarding the way in which initiatives are managed. I examine a WordPress plugin writer complaining that managing the releases of his 10+ plugins was taking a substantial period of time. That doesn’t occur right here—after I merge my pull request, each private and non-private plugins are generated mechanically, like magic.

Points with the multi-monorepo

First off, it leaks. Ideally, leoloso/PoP ought to be fully autonomous and unaware that it’s used as an upstream monorepo in a grander scheme—however that’s not the case.

When doing git checkout, the downstream monorepo should go the –recurse-submodules choice as to additionally checkout the submodules. Within the GitHub Actions workflows for the non-public repo, the checkout should be executed like this:

– makes use of: actions/checkout@v2
with:
submodules: recursive

Consequently, now we have to enter submodules: recursive to the downstream workflow, however to not the upstream one though they each use the identical supply file.

To unravel this whereas sustaining the general public monorepo as the only supply of fact, the workflows in leoloso/PoP are injected the worth for submodules through an setting variable CHECKOUT_SUBMODULES, like this:

env:
CHECKOUT_SUBMODULES: “”;

jobs:
provide_data:
steps:
– makes use of: actions/checkout@v2
with:
submodules: ${{ env.CHECKOUT_SUBMODULES }}

The setting worth is empty for the upstream monorepo, so doing submodules: “” works properly. After which, when copying over the workflows from upstream to downstream, I substitute the worth of the setting variable to “recursive” in order that it turns into:

env:
CHECKOUT_SUBMODULES: “recursive”

(I’ve a PHP command to do the substitute, however we may additionally pipe sed within the copy-workflows composer script.)

This leakage reveals one other difficulty with this setup: I have to assessment all contributions to the general public repo earlier than they’re merged, or they might break one thing downstream. The contributors would additionally fully unaware of these leakages (and so they couldn’t be blamed for it). This case is particular to the general public/private-monorepo setup, the place I’m the one one that is conscious of the total setup. Whereas I share entry to the general public repo, I’m the one one accessing the non-public one.

For example of how issues may go mistaken, a contributor to leoloso/PoP would possibly take away CHECKOUT_SUBMODULES: “” since it’s superfluous. What the contributor doesn’t know is that, whereas that line will not be wanted, eradicating it’ll break the non-public repo.

I suppose I want so as to add a warning!

env:
### ☠️ Don’t delete this line! Or dangerous issues will occur! ☠️
CHECKOUT_SUBMODULES: “”

Wrapping up

My repo has gone by fairly a journey, being tailored to the brand new necessities of my code and software at totally different levels:

It began as a single repo, internet hosting a monolithic app.It grew to become a multirepo when splitting the app into packages.It was switched to a monorepo to raised handle all of the packages.It was upgraded to a multi-monorepo to share recordsdata with a non-public monorepo.

Context means the whole lot, so there isn’t any “finest” method right here—solely options which are roughly appropriate to totally different situations.

Has my repo reached the top of its journey? Who is aware of? The multi-monorepo satisfies my present necessities, nevertheless it hosts all non-public plugins collectively. If I ever must grant contractors entry to a particular non-public plugin, whereas stopping them to entry different code, then the monorepo might not be the best resolution for me, and I’ll must iterate once more.

I hope you may have loved the journey. And, if in case you have any concepts or examples from your individual experiences, I’d love to listen to about them within the feedback.

The submit From a Single Repo, to Multi-Repos, to Monorepo, to Multi-Monorepo appeared first on CSS-Methods. You may assist CSS-Methods by being an MVP Supporter.

    About Marketing Solution Australia

    We are a digital marketing company with a focus on helping our customers achieve great results across several key areas.

    Request a free quote

    We offer professional SEO services that help websites increase their organic search score drastically in order to compete for the highest rankings even when it comes to highly competitive keywords.

    Subscribe to our newsletter!

    More from our blog

    See all posts

    Leave a Comment