Define subtrees and submodules
Now you are going to learn an important skill. There are all sorts of projects you will be working on in GitHub, and some may be very extensive. They may require libraries from several repositories.
How can we integrate code libraries from other repositories into our own repo?
To use these without disrupting your work, you will have to split your codebase into separate modules. This is called modularization in Git. This is achieved in two ways: subtrees and submodules. Depending on the requirements of the project, one may be more useful than the other.
What are subtrees?
If you think of your Git repository as a tree, imagine a subtree as a smaller version inside your main tree.
It's kind of like when you have a file directory with subdirectories inside. What's a Git subtree? Let's say that you have a beautiful video game, and it uses a backdrop with happy trees and animals. Only they are defined in a code library in another repository. You'd use a subtree to copy and bring in that other repository with the trees and the animals. Once that copy is integrated into your repository, you can work efficiently with your code.
✅ The benefits to using a subtree:
It's straightforward, efficient, and easy to use.
Since it is stored as commits, it's easy to use with your repository.
It provides all of the same functions a normal repository would.
If you need to change things up:
the repository provides easy access to where the subtree is integrated
if you want cleaner code, you can squash all the commits
it is easily reverted with Git
❌ When won't a subtree work for you?
You have many complex dependencies.
Your dependencies need constant updating.
Constant updates and working with other dependencies can get messy!
Getting started with subtrees
You remember how to clone a project, right? It's very similar in this case: you run the
git clone command from Git Bash, using the desired repository's HTTPS URL on GitHub. This will provide you a clone of that external repository inside your repository. Let's walk through the steps!
First, it's important to create a remote connection to your external repo from Git Bash. Doing this first, allows you to refer to the repository's URL without having to reference it with the address every time. This command will also create the subdirectory that will hold your subtree. Let's call this subdirectory
It looks like this:
git remote add forest https://github.com/forestlibraries/animalsandtrees.git
After the remote connection is established, you can add your subtree with a prefix for your subtree's subdirectory.
git subtree add --prefix=forest/ forest master --squash
Notice that for the main repository and the subtree, we are using the master branch. Also notice that I used the squash command. This can help you integrate another repository without the history of so many commits. This can make your subtree more manageable. You can work more easily with any new updates that come in.
What about updates?
Your subtree is not directly connected to its original repository. How can you update it from its remote repository? Can you guess the command?
Yes! Shocking right? 😉 You're basically replacing the
add command with
git subtree pull --prefix=forest/ forest master
But how do I push my local repo to remote with the subtree?
Make sure that your files are all cloned into your subdirectories. Then checkout to master if you are in another branch using
git checkout master. Use the command:
git push origin master
Now you're all set!
Submodules work a little bit differently, and depending on the project, they may be useful to some coders. So far, you've seen that with subtrees, you pull an entire codebase from another repository into your own. It's a seamless way to manage it if you understand basic Git.
Now submodules are another ballgame. Let's take a look at how they work.
Let's go back to the animal and tree code libraries example called
forest. You need to connect back to the forest repository to pick up and integrate some specific trees and animals into the background images on your game. What is another way to handle this?
With submodules, you don't clone or integrate any of the physical code in your repository. Rather, you include pointers (or references) that connect back to the
forest repository on GitHub. These pointers are hyperlinks to a commit in an external repository.
How do we get started?
git submodule command, create a submodule that will save the path and hyperlink references in a folder called
Here is the workflow:
Here's the workflow:
Open Git Bash to your project directory.
Go to your external repository on GitHub, and copy the HTTPS clone URL.
On Git Bash, type in:
git submodule add https ://github.com/yoursubmoduleclonelink.git
After your submodule is cloned, it will create a folder.
Initialize the submodule in the local repository by typing in:
git submodule init git submodule update
Now you are ready to use those submodules in the code. Although just like with subtrees, the entire clone is in a subfolder. Submodules are different because they contain a
.gitmodules folder that holds references to that repository.
What makes submodules different?
When you cloned the repository into a subfolder and created the references in the
.gitmodule folder, this all happened on the latest commit that was in that external repo. At that point, your new commit does not contain the actual source code of that external repository anymore! It only contains the reference links. You are now frozen in that commit unless you update that submodule.
That's right! Changes in the external repository will not automatically update in your submodule. You will have to run an update manually if you want your submodule to point to a newer commit in that external repository.
In fact, you will run the same code to initialize the submodules as you had when introducing the submodules to the project!
git submodule init git submodule update
Understand the best use-cases for subtrees and submodules
Here's a quick refresher:
Subtrees are clones of external repositories put into subfolders in a main repository. Subtrees contain source code and the entire commit history of that external repository.
Submodules are references to the latest commit of an external repository. These submodules do not provide source code, and only link to the latest commit at the time you cloned and initialized that submodule.
So, do I choose subtrees or submodules?
Subtrees are more efficient, especially when working on team projects. They contain source code and commit history, so it is easy to work with as a subdirectory. It is possible to keep the entire commit history in the subtree if you don't squash it when you add your submodule.
It is also easy to see the code that your repository is linking to, and update it to later commits if required. Lastly, when cloning a project with a subtree, the source code will be a part of that clone, so there are no additional steps necessary. By general consensus, subtrees are the easiest option because they work seamlessly with the main repository. 👍
On the other hand, submodules use references and stick to the last commit. Submodules do not save the source code of the external repository, but contain references to the external repository. Moreover, the source code is not carried with it when it is cloned, so any updates will require additional steps. Since the submodules link to a repository such as GitHub, your ability to use that submodule disappears with the repository. 😧 They can be used if space is an issue because it doesn't hold the entire external repository.
Subtrees will clone an external repository, so it can be used in your main repository.
Submodules will clone an external repository, and use a link when referencing it in the main repository.
Subtrees allow the user to edit the external repository.
The copy of the submodules in your main repository will need to be updated when the external repository has changed.