How to manage Git Submodules with JGit

Home  >>  Common  >>  How to manage Git Submodules with JGit

How to manage Git Submodules with JGit

On April 16, 2014, Posted by , In Common,Eclipse, With No Comments

For a larger project with Git you may find yourself wanting to share code among multiple repositories. Whether it is a shared library between projects or perhaps templates and such used among multiple different products. The Git built-in answer to this problem are submodules. They allow putting a clone of another repository as a subdirectory within a parent repository (sometimes also referred to as the superproject). A submodule is a repository in its own. You can commit, branch, rebase, etc. from inside it, just as with any other repository.

JGit offers an API that implements most of the Git submodule commands. And this API it is I would like to introduce you to.

The Setup

The code snippets used throughout this article are written as learning tests1. Simple tests can help to understand how third-party code works and adopting new APIs. They can be viewed as controlled experiments that allow you to discover exactly how the third-party code behaves.

A helpful side effect is that, if you keep the tests, they can help you to verify new releases of the third-party code. If your tests cover how you use the library, then incompatible changes in the third-party code will show themselves early on.

Back to the topic at hand: all tests share the same setup. See the full source code for details. There is an empty repository called parent. Next to it there is a library repository. The tests will add this as a submodule to the parent. The library repository has an initial commit with a file named readme.txt in it. A setUp method creates both repositories like so:

Git git = Git.init().setDirectory( "/tmp/path/to/repo" ).call();

The repositories are represented through the fields parent and library of type Git. This class wraps a repository and gives access to all Commands available in JGit. As I explained here earlier, each Command class corresponds to a native Git pocelain command. To invoke a command the builder pattern is used. For example, the result from the Git.commit() method is actually a CommitCommand. After providing any necessary arguments you can invoke its call() method.

Add a Submodule

The first and obvious step is to add a submodule to an existing repository. Using the setup outlined above, the library repository should be added as a submodule in the modules/library directory of the parent repository.

@Test
public void testAddSubmodule() throws Exception {
  String uri 
    = library.getRepository().getDirectory().getCanonicalPath();
  SubmoduleAddCommand addCommand = parent.submoduleAdd();
  addCommand.setURI( uri );
  addCommand.setPath( "modules/library" );
  Repository repository = addCommand.call();
  repository.close();
  
  F‌ile workDir = parent.getRepository().getWorkTree();
  F‌ile readme = new F‌ile( workDir, "modules/library/readme.txt" );
  F‌ile gitmodules = new F‌ile( workDir, ".gitmodules" );
  assertTrue( readme.isF‌ile() );
  assertTrue( gitmodules.isF‌ile() );
}

The two things the SubmoduleAddCommand needs to know are from where the submodule should be cloned and a where it should be stored. The URI (shouldn’t it be called URL?) attribute denotes the location of the repository to clone from as it would be given to the clone command. And the path attribute specifies in which directory – relative to the parent repositories’ work directory root – the submodule should be placed. After the commands was run, the work directory of the parent repository looks like this:

The library repository is placed in the modules/library directory and its work tree is checked out. call() returns a Repository object that you can use like a regular repository. This also means that you have to explicitly close the returned repository to avoid leaking file handles.

The image reveals that the SubmoduleAddCommand did one more thing. It created a .gitmodules file in the root of the parent repository work directory and added it to the index.

[submodule "modules/library"]
path = modules/library
url = git@example.com:path/to/lib.git

If you ever looked into a Git config file you will recognize the syntax. The file lists all the submodules that are referenced from this repository. For each submodule it stores the mapping between the repository’s URL and the local directory it was pulled into. Once this file is committed and pushed, everyone who clones the repository knows where to get the submodules from (later more on that).

Inventory

Once we have added a submodule we may want to know that it is actually known by the parent repository. The first test did a naive check in that it verified that certain files and directories existed. But there is also an API to list the submodules of a repository. This is what the code below does:

@Test
public void testListSubmodules() throws Exception {
  addLibrarySubmodule();
  
  Map<String,SubmoduleStatus> submodules 
    = parent.submoduleStatus().call();
  
  assertEquals( 1, submodules.size() );
  SubmoduleStatus status = submodules.get( "modules/library" );
  assertEquals( INITIALIZED, status.getType() );
}

The SubmoduleStatus command returns a map of all the submodules in the repository where the key is the path to the submodule and the value is a SubmoduleStatus. With the above code we can verify that the just added submodule is actually there and INITIALIZED. The command also allows to add one or more paths to limit the status reporting to.

Speaking of status, JGit’s StatusCommand isn’t at the the same level as native Git. Submodules are always treated as if the command was run with ‐‐ignore-submodules=dirty: changes to the work directory of submodules are ignored.

Updating a Submodule

Submodules always point to a specific commit of the repository that they represent. Someone who clones the parent repository somewhen in the future will get the exact same submodule state although the submodule may have new commits upstream.

In order to change the revision, you must explicitly update a submodule like outlined here:

@Test
public void testUpdateSubmodule() throws Exception {
  addLibrarySubmodule();
  ObjectId newHead = library.commit().setMessage( "msg" ).call();
  
  File workDir = parent.getRepository().getWorkTree();
  Git libModule = Git.open( new F‌ile( workDir, "modules/library" ) );
  libModule.pull().call();
  libModule.close();
  parent.add().addF‌ilepattern( "modules/library" ).call();
  parent.commit().setMessage( "Update submodule" ).call();
    
  assertEquals( newHead, getSubmoduleHead( "modules/library" ) );
}

This rather lengthy snippet first commits something to the library repository (line 4) and then updates the library submodule to the latest commit (line 7 to 9).

To make the update permanent, the submodule must be committed (line 10 and 11). The commit stores the updated commit-id of the submodule under its name (modules/library in this example). Finally you usually want to push the changes to make them available to others.

Updating Changes to Submodules in the Parent Repository

Fetching commits from upstream into the parent repository may also change the submodule configuration. The submodules themselvs, however are not updated automatically.

This is what the SubmoduleUpdateCommand solves. Using the command without further parametrization will update all registered submodules. The command will clone missing submodules and checkout the commit specified in the configuration. Like with other submodule commands, there is an addPath() method to only update submodules within the given paths.

Cloning a Repository with Submodules

You probably got the pattern meanwhile, everything to do with submodules is manual labor. Cloning a repository that has a submodule configuration does not clone the submodules by default. But the CloneCommand has a cloneSubmodules attribute and setting this to true, well, also clones the configured submodules. Internally the SubmoduleInitCommand and SubmoduleUpdateCommand are executed recursively after the (parent) repository was cloned and its work directory was checked out.

Removing a Submodule

To remove a submodule you would expect to write something like
git.submoduleRm().setPath( ... ).call();
Unfortunately, neither native Git nor JGit has a built-in command to remove submodules. Hopefully this will be resolved in the future. Until then we must manually remove submodules. If you scroll down to the removeSubmodule() method you will see that it is no rocket science.

First, the respective submodule section is removed from the .gitsubmodules and .git/config files. Then the submodule entry in the index is also removed. Finally the changes – .gitsubmodules and the removed submodule in the index – are committed and the submodule content is deleted from the work directory.

For-Each Submodule

Native Git offers the git submodule foreach command to execute a shell command for each submodule. While JGit doesn’t exactly support such a command, it offers the SubmoduleWalk. This class can be used to iterate over the submodules in a repository. The following example fetches upstream commits for all submodules.

@Test
public void testSubmoduleWalk() throws Exception {
  addLibrarySubmodule();

  int submoduleCount = 0;
  Repository parentRepository = parent.getRepository();
  SubmoduleWalk walk = SubmoduleWalk.forIndex( parentRepository );
  while( walk.next() ) {
    Repository submoduleRepository = walk.getRepository();
    Git.wrap( submoduleRepository ).fetch().call();
    submoduleRepository.close();
    submoduleCount++;
  }
  walk.release();
  
  assertEquals( 1, submoduleCount );
}

With next() the walk can be advanced to the next submodule. The method returns false if there are no more submodules. When done with a SubmoduleWalk, its allocated resources should be freed by calling release(). Again, if you obtain a Repository instance for a submodule do not forget to close it.

The SubmoduleWalk can also be used to gather detailed information about submodules.
Most of its getters relate to properties of the current submodule like path, head, remote URL, etc.

Sync Remote URLs

We have seen before that submodule configurations are stored in the .gitsubmodules file at the root of the repository work directory. Well, at least the remote URL can be overridden in .git/config. And then there is the config file of the submodule itself. This in turn can have yet another remote URL. The SubmoduleSyncCommand can be used to reset all remote URLs to the settings in .gitmodules

As you can see, the support for submodules in JGit is almost at level with native Git. Most of its commands are implemented or can be emulated with little effort. And if you find that something is not working or missing you can always ask the friendly and helpful JGit community for assistance.


  1. The term is taken from the section on ‘Exploring and Learning Boundaries’ in Clean Code by Robert C. Martin

Rüdiger Herrmann

Rüdiger started writing software more than 20 years ago. From the very beginning he focused on a pragmatic approach to writing and delivering software.

Currently he is co-leading the gonsole project. His interests include promoting test driven development and agile methods to deliver clean code that works.

(EMail: rherrmann@codeaffine.com)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>