How to Clone Git Repositories with JGit

Home  >>  Eclipse  >>  How to Clone Git Repositories with JGit

How to Clone Git Repositories with JGit

On November 30, 2015, Posted by , In Eclipse, By ,,, , With 45 Comments

Whatever you plan to do with an existing repository, first a clone has to be created. Whether you plan to contribute or just want to peek at its history, a local copy of the repository is needed.

While cloning a repository with JGit isn’t particularly difficult, there are a few details that might be worth noting. And because there are few online resources on the subject, this article summarizes how to use the JGit API to clone from an existing Git repository.

Cloning Basics

To make a local copy of a remote repository, the CloneCommand needs at least to be told where the remote is to be found:

Git git = Git.cloneRepository()
  .setURI( "https://github.com/eclipse/jgit.git" )
  .call();

The Git factory class has a static cloneRepository() method that returns a new instance of a CloneCommand. setURI() advises it where to clone from and like with all JGit commands, the call() method actually executes the command.

Though remote repositories – like the name suggests – are usually stored on a remote host, setURI() can also specify a path to a local resource.

If no more information is given, JGit will choose the directory in which the cloned repository will be stored for you. Based on the current directory and the repository name that is derived from its URL, a directory name is built. In the example above it would be ‘/path/to/current/jgit’.

But usually, you would want to have more control over the destination directory and explicitly state where to store the local clone.

The setDirectory() method specifies where the working directory should be and with setGitDir() the location of the metadata directory (.git) can be set. If setGitDir() is omitted, the .git directory is created directly underneath the working directory

The example below

Git git = Git.cloneRepository()
  .setURI( "https://github.com/eclipse/jgit.git" )
  .setDirectory( "/path/to/repo" )
  .call();

will create a local repository whose work directory is located at ‘/path/to/repo’ and whose metadata directory is located at ‘/path/to/repo/.git’.

However the destination location is chosen, explicitly through your code or by JGit, the designated directory must either be empty or must not exist. Otherwise, an exception will be thrown.

The settings for setDirectory(), setGitDir() and setBare() (see below) are forwarded to the InitCommand that is used internally by the CloneCommand. Hence more details thereover are explained in Initializing Git Repositories with JGit.


The Git instance that is returned by CloneCommand.call() provides access to the repository itself (git.getRepository()) and can be used to execute further commands targeting this repository. When finished using the repository it must be closed (git.close()) or otherwise the application may leak file handles.

To later regain a Repository (or Git) instance, the path to the work directory or .git directory is sufficient. The article How to Access a Git Repository with JGit has detailed information on the subject.

Upstream Configuration

As a last step the clone command updates the configuration file of the local repository to register the source repository as a socalled remote.

When looking at the configuration file (.git/config) the remote section looks like this:

[remote "origin"]
  url = https://github.com/eclipse/jgit.git
  fetch = +refs/heads/*:refs/remotes/origin/*

If no remote name is given, the defaults ‘origin’ is used. To have the CloneCommand use a particular name under which the remote repository is registered, use setRemote().

See also  An Automated OSGi Test Runner

The refspec given by ‘fetch’ determines which branches should be exchanged when fetching from or pushing to the remote repository by default.

Cloning Branches

By default, the clone command creates a single local branch. It looks at the HEAD ref of the remote repository and creates a local branch with the same name as the remote branch referenced by it.

But the clone command can also be told to clone and checkout certain branch(es). Assuming that the remote repository has a branch named ‘extra’, the following lines will clone this branch.

Git git = Git.cloneRepository()
  .setURI( "https://github.com/eclipse/jgit.git" )
  .setDirectory( "/path/to/repo" )
  .setBranchesToClone( singleton( "refs/heads/extra" ) );
  .setBranch( "refs/heads/extra" )
  .call();

With setBranchesToClone(), the command clones only the specified branches. Note that the setBranch() directive is necessary to also checkout the desired branch. Otherwise, JGit would attempt to checkout the ‘master’ branch. While this is isn’t a problem from a technical point of view, it is usually not what you want.

If all branches of the remote repository should be cloned, you can advise the command like so:

Git git = Git.cloneRepository()
  .setURI( "https://github.com/eclipse/jgit.git" )
  .setDirectory( "/path/to/repo" )
  .setCloneAllBranches( true )
  .call();

To prevent the current branch from being checked out at all, the setNoCheckout() method can be used.


Listing Remote Branches

If you want to know which branches a remote repository has to offer, the LsRemoteCommand comes to the rescue. To list all branches of a JGit repository, use Git’s lsRemoteRepository() like shown below.

Collection<Ref> remoteRefs = Git.lsRemoteRepository()
  .setHeads( true )
  .setRemote( "https://github.com/eclipse/jgit.git" )
  .call();

In case you would also want to list tags, advise the command with setTags( true ) to include tags.

For reasons I rather don’t want to know, JGit requires a local repository for certain protocols in order to be able to list remote refs. In this case Git.lsRemoteRepository() will throw a NotSupportedException. The workaround is to create a temporary local repository and use git.lsRemote() instead of Git.lsRemoteRepository() where git wraps the temporary repository.

Cloning Bare Repositories

If the local repository does not need a work directory, the clone command can be instructed to create a bare repository.

By default non-bare repositories are created, but with setBare( true ) a bare repository is created like shown below:

Git git = Git.cloneRepository()
  .setBare( true )
  .setURI( "https://github.com/eclipse/jgit.git" )
  .setGitDir( "/path/to/repo" )
  .call();

Here the destination directory is specified via setGitDir() instead of using setDirectory().
The resulting repository’s isBare() will return true, getGitDir() will return /path/to/repo and since there is no work directory getWorkTree() will throw a NoWorkTreeException.

Note that ‘bare’ here only applies to the destination repository. Whether the source repository is bare or not doesn’t make a difference when cloning.

See also  Configure Your OSGi Services with Apache Felix File Install

Cloning Submodules

If the remote repository is known to have submodules or if you wish to include submodules in case there are any, the clone command can be instructed to do so:

Git git = Git.cloneRepository()
  .setCloneSubmodules( true )
  .setURI( "https://github.com/eclipse/jgit.git" )
  .setDirectory( "/path/to/repo" )
  .call();

The above example advises the clone command to also clone any submodule that is found.

If setCloneSubmodules( true ) wasn’t specified while cloning the repository, you can catch up on the missing submodules later. For more details see the article How to manage Git Submodules with JGit.

Cloning with Authentication

Of course, JGit also allows accessing repositories that require authentication. Common protocols like SSH and HTTP(S) and their authentication methods are supported. A detailed explanation on how to use authentication support can be found in the JGit Authentication Explained article.

What’s Next

If you are wondering what to do next with a repository, you may want to read Getting Started with JGit. The tutorial explains the most commonly used Git commands and their respective JGit counterparts. It walks through the steps to create a repository, fetch contents from a remote, add and remove files to/from the history, inspect the history, and finally push back the changes to the originating repository.

Concluding How to Clone Git Repositories with JGit

For almost all features of the native Git clone command, there is a counterpart in JGit. Even a progress monitor which may be useful when JGit is embedded in interactive applications exists. And for the missing mirror option apparently a workaround exists. Only the often asked for shallow clones (e.g. git clone --depth 2) aren’t yet supported by JGit.

The snippets shown throughout this article are excerpts from a learning test that illustrates the common use cases of the CloneCommand. The full version can be found here:
https://gist.github.com/rherrmann/84089f0e38d9eb875601

In order to help with setting up the development environment, you may want to also read An Introduction to the JGit Sources. If you still have difficulties or further questions, please leave a comment or ask the friendly and helpful JGit community for assistance.

Rüdiger Herrmann
Follow me
Latest posts by Rüdiger Herrmann (see all)

45 Comments so far:

  1. renga says:

    In my local git repository,i have more than one commits for a file.how can i get status/action has been performed on that commit.I mean ,how can i know on a particular commit ,is that file added newly or is it been modified or is it been labeled..i want the correct eclipse API to get the above details.

    • Rüdiger Herrmann says:

      I am afraid I don’t quite understand your question.In order to examine what a commit has changed compared to its parent commit, use the DiffCommand. Its call() method returns a list of DiffEntries that described each changed file.

      • renga says:

        Sorry i did’t get your point correctly…my requirement is ,For a particular commit .what action has been performed to a particular file. So far i got the log message through RevCommit rev=git.log().call().from that rev ,i am able to get authorname and committed date and time…now i want to get the action for that particular file…consider i have added the file for first time..so i should get the action as Added…can you help in this

        • Rüdiger Herrmann says:

          Did you look at the DiffCommand? It will tell you what files haven added, deleted and changed in a commit compared to its ancestor comit.

  2. Nara says:

    How do you do sparseCheckout programitically using JGIT API .I am able to do using cmd below
    git config core.sparseCheckout true
    echo “SpecificSubFolder” >> .git/info/sparse-checkout

    appreciate any examples

  3. Daniela says:

    I have two questions so far:

    1) Does git need to be installed in the machine where JGit runs?
    2) How soon will JGit include the implementation for shallow clones?

    btw, thank you for this awesome library and for providing tutorials like this one.

    • Rüdiger Herrmann says:

      Hi Daniela,

      though my stake in providing JGit is minimal, I am nonetheless glad you like the tutorials, JGit is a pure Java library implementing the Git version control system (see https://eclipse.org/jgit/). Therefore native Git need not be installed. However, be aware that since JGit is an independent implementation of Git there are (small) gaps and differences between the two.
      I can’t tell how soon a certain feature will be available in JGit. It is best to track the respective bug report and state your interest there or on the mailing list.

      Best
      Rüdiger

      • Daniela says:

        Thank you Rüdiger for always replying in stackoverflow. Nice work here!

  4. Benoit says:

    I’d like to clone only if the remote repository is bare (since, if I understood correctly, only bare repo can be pushed to). Is it possible to know if the remote repo is bare ?

    • Rüdiger Herrmann says:

      AFAIK it is not possible to check if a repository is bare – as long as you cannot access the repositoy locally.

      However, according to this post (http://stackoverflow.com/questions/1764380/push-to-a-non-bare-git-repository), pushing to a non-bare remote is possible. And at least in theory, the bare state of a repository may change.

      I wouldn’t pro-actively prevent cloning a non-bare repository but rather transform the result when pushing to a non-bare repository fails into a meaningful error message.

  5. Eric says:

    Thanks for the article!

    I was wondering, how can it be done if the source URL is local and you want to make sure there are no -hardlinks?

    In git, we’d normally use : git clone –no-hardlinks /path/to/source /path/to/dest

    Any idea how it can be done?

    PS: We’ve been testing the file:// protocol but for bigger repos (~10Gigs), it’s much slower than the local –no-hardlinks above

    • Eric says:

      ..forgot to add:

      For comparison on the git clone operations done on our big repo via shell:
      – git clone file:///path/to/source/repo /path/to/dest/repo # takes about 25-30 minutes
      – git clone –no-hardlinks /path/to/source/repo /path/to/dest/repo # takes less than 2 minutes

      I suspect that the reason why the file:// protocol is much slower is because it goes through the compression/send/decompression phase while the other one doesn’t

  6. Rüdiger Herrmann says:

    Eric,

    thank you for your feedback. From what I see, the -no-hardlinks option is not supported by JGit. If you find this should be supported, please file an enhancement request: https://eclipse.org/jgit/support/

    Regarding the performance observations, you way want to forward these to the JGit mailing list https://dev.eclipse.org/mailman/listinfo/jgit-dev

    – Rüdiger

    • Eric says:

      Hi Rüdiger,

      Thanks for the reply. Too bad “–local –no-hardlinks” isn’t supported.

      I will look into the suggestions you’ve provided.

      Cheers!

  7. ramya A says:

    is it possible to read the git repo files without cloning?

    • Rüdiger Herrmann says:

      In short, no! You need to first clone a repository before you can access its contents.

  8. Aleksandr says:

    Hi,

    Thank you for your blog!

    Maybe you can give me advice with my issue. If I clone repository like:

    Git git = Git.cloneRepository()
    .setURI(repositoryURI)
    .setCredentialsProvider(new UsernamePasswordCredentialsProvider(credentials.getUserName(), credentials.getPassword()))
    .setDirectory(localRepositoryDir)
    .call();

    And after this call push command like

    PushCommand pushCommand = repositoryGit
    .push()
    .setCredentialsProvider(new UsernamePasswordCredentialsProvider(credentials.getUserName(), credentials.getPassword()))
    .setForce(true)
    .setPushAll();

    All works fine. But after this I triying just checkout and push and it not send data to server. No errors. But not results.

    Thank you for help!

    • Rüdiger Herrmann says:

      If you call the PushCommand right after the CloneCommand, it will do nothing, beacuse your local repository is in sync with the remote repository.

      A checkout does not change the situation either, unless, there are new commits on the checked out branch that have not been published to the remote repository yet.

      To find out what the PushCommand did, you can evaluate the PushResult returned by call(). It provides very detailed information about every ref that participated in the push operation.

      For a general understanding of what push does, you may also want to read this SO post.

      • Aleksandr says:

        Hi,

        Thank you for help!

        I found solution. Problem was in incorrect checkout branch name.

        Regards,
        Aleksandr.

  9. Naga says:

    Hi, I have a requirement to clone a git repository and look for pom.xml and read it, is there a way I can do it without cloning whole repository in the directory?

  10. Eliz says:

    is it possible to do
    “git remote rename origin old-origin”
    using jgit

    • Rüdiger Herrmann says:

      JGit does not provide a ready-to-use command to rename a remote repository. However with the help of other APIs, you should be able to do that.

      I would start by renaming all remote branches. With the ListBranchCommand you can obtain a list of all branches, filter the relevant remote branches and use RenameBranchCommand to change the remote part of their name (e.g. refs/remotes/old-origin/foo -> refs/remotes/new-origin/foo).

      Thereafter, only the configuration section should remain with the old name. repository.getConfig() gives you access to the repository’s configuration and provides API to copy the remote config section and delete the old one. Finally, save the changes with StoredConfig::save.

      Have I forgotten anything that git remote rename does while renaming a remote?

  11. David Warner says:

    Can you provide me the whole source code of cloning repository from github because it will very much useful for me.

    • Rüdiger Herrmann says:

      David,

      cloning a repository hosted on GitHub should be no different than cloning any other repository. The article lists various examples that actually clone a GitHub repository, namely https://github.com/eclipse/jgit.git.

      If you are stuck with a specific problem, have you looked on StackOverflow? Over the years, it has become a popular place for JGit questions and answers.

  12. Neeraj says:

    Hi,
    Getting java.io.EOFException: Packfile is truncated, while cloning repo, from java code.
    Can you please help.

  13. Divya says:

    Is it possible to clone only a specific directory from a Git repository??

  14. Vivek says:

    Hi Rüdiger,

    I used maven scm plugin to clone repositories. I plan to use JGit now. I see that JGit clones lot more data than the plugin as the .git directory is pretty big now. Any suggestion ? I need to clone one branch only.

    • Rüdiger Herrmann says:

      You can restrict the number of branches that are cloned with setBranchesToClone. For example:


      Git git = Git.cloneRepository()
      .setBranchesToClone(Arrays.asList("refs/heads/master"))
      .setDirectory(...)
      .setURI(...)
      .call();

      Presumably, the main reason that JGit clones a lot more data is that it clones the entire history. Until JGit has support for shallow clones (see https://bugs.eclipse.org/bugs/show_bug.cgi?id=475615), I recommend to stay with C Git.

  15. Vivek says:

    Oh i see ! that’s the reason then.
    Yeah i used the same code but still.

    Thanks for the insight.

  16. Abdur Billah says:

    Hello Experts,

    I wonder whether it is possible to have paging in JGit. For example, I may have hundreds of files checked out / modified. How can I limit them so that I can show, say, 10 at a time, when I run the StatusCommand in JGit.

    Regrads,
    -Abdur

    • Rüdiger Herrmann says:

      Hello Abdur,

      JGit is a Java implementation of Git. It has no built-in support for paging. You will have to handle this aspect in your application. For example the HTTP endpoint that serves the list of changes would run the StatusCommand and return only the requested slice of entries. If you see that holding the results of the command in memory is an issue, you can use the lower-level APIs (those that the StatusCommand uses itself) to stream the result. Looking into the source code of the respective command should get your started.

      HTH
      Rüdiger

      • Abdur Billah says:

        Hello Rüdiger,

        Thank you very much for your detailed reply and the tips for possible implementation. I got to be involved in another more urgent task and could not get back to this task yet.
        I’ll look into the implementation of StatusCommand and see how I can implement the paging with status command.

        My apologies for the delayed reply.
        I appreciate your valuable feedback very highly.

        Regards,
        -abdur

  17. Rahul Talekar says:

    Hi Rüdiger Herrmann,

    I wanted to know how I can clone ALL repositories along with all branches present on git using JGit.

    For example, I have a git repositories as follows

    Repo A: (Having two branches)
    Repo B: (Having three branches)
    Repo C:(Having five branches)

    Basically, I wanted to traverse all repos with branches and clone them in my local.

    Thanks in advance

    • Rüdiger Herrmann says:

      As described in the article, you can use setCloneAllBranches to tell the CloneCommand to clone all branches.

      In order to clone multiple repositories, you will need to traverse over the list of repository URLs and call the CloneCommand for each of the entries.

  18. Saqeeb Shaikh says:

    Thanks for the article. I would like to know if there is some java lib that enables us to clone git repo in memory rather than on the file system. I didn’t get any relevant documentations or articles specific to in-memory git operations. I also read some of your answers on SO.

  19. Rishi says:

    Hi , thanks a lot for the article. I had a question regarding jgit, would be really grateful if you could help. If i have cloned some repo using user1 credentials then i want to push the changes with user2 credentials . both the users have access to the remote repo. Is it possible ? It’s like multi-user scenario .

  20. Rüdiger Herrmann says:

    You should be able to use different CredentialsProviders with the PushCommand to pass each user’s credentials.

  21. Hardik says:

    How do i create a feature branch from master branch using java code any suggestion ?

  22. Rüdiger Herrmann says:

    This question has already been answered here: https://stackoverflow.com/q/55089678/2986905