• Uncategorized

About git : Download-a-single-folder-or-directory-from-a-GitHub-repo

Question Detail

How can I download only a specific folder or directory from a remote Git repo hosted on GitHub?

Say the example GitHub repo lives here:

[email protected]:foobar/Test.git

Its directory structure:

Test/
├── foo/ 
│   ├── a.py
│   └── b.py   
└── bar/
    ├── c.py
    └── d.py

I want to download only the foo folder and not clone the whole Test project.

Question Answer

Update Apr. 2021: there are a few tools created by the community that can do this for you:

  • Download Directory (Credits to fregante)
    • It has also been integrated into the excellent Refined Github chrome extension as a button in the Github web UI.
  • GitZip (Credits to Kino – see his answer here)
  • DownGit (Credits to Minhas Kamal – see his answer here)

Note: if you’re trying to download a large number of files, you may need to provide a token to these tools to avoid rate limiting.


Original (manual) approach: Checking out an individual directory is not supported by git natively, but Github can do this via SVN. If you checkout your code with subversion, Github will essentially convert the repo from git to subversion on the backend, then serve up the requested directory.

Here’s how you can use this feature to download a specific folder. I’ll use the popular javascript library lodash as an example.

  1. Navigate to the folder you want to download. Let’s download /test from master branch.

  2. Modify the URL for subversion. Replace tree/master with trunk.

    https://github.com/lodash/lodash/tree/master/test

    https://github.com/lodash/lodash/trunk/test

  3. Download the folder. Go to the command line and grab the folder with SVN.

svn checkout https://github.com/lodash/lodash/trunk/test

You might not see any activity immediately because Github takes up to 30 seconds to convert larger repositories, so be patient.

Full URL format explanation:

  • If you’re interested in master branch, use trunk instead. So the full path is trunk/foldername
  • If you’re interested in foo branch, use branches/foo instead. The
    full path looks like branches/foo/foldername
  • Protip: You can use svn ls to see available tags and branches before downloading if you wish

That’s all! Github supports more subversion features as well, including support for committing and pushing changes.

Go to DownGit > Enter Your URL > Download!

You can DIRECTLY DOWNLOAD or create DOWNLOAD LINK for any GitHub public directory or file from DownGit-



You may also configure properties of the downloaded file- detailed usage.


Disclaimer: I fell into the same problem as the question-asker and could not find any simple solution. So, I developed this tool for my own use first, then opened it for everyone 🙂

Two options for this feature:

Option 1: GitZip Browser Extension

Chrome Extension, Edge Extension, Firefox Addon

Usage:

  1. Browse any Github repository page.
  2. Two ways to download:
    1. Choose the items:
      1. In default, you can double click on items or check the checkbox on the front of items.
      2. Click download button at the bottom-right of the page.
    2. In context menu:
      1. Click “GitZip Download” > “Whole Repository” or “Current Folder”.
      2. Move the mouse cursor on the item and click “GitZip Download” > “Selected Folder/File”.
      3. Click “GitZip Download” > “Checked Items” after doing 2-1-1.
  3. See the progress dashboard and wait for browser trigger download.
  4. Get the ZIP file.

Get Token:

  1. Click GitZip Extension icon on your browser.
  2. Click “Normal” or “Private” link besides “Get Token”.
  3. Authorize GitZip permission on Github auth page.
  4. Back to repo page of the beginning.
  5. Continue to use.

Option 2: Github gh-page

http://kinolien.github.io/gitzip by using GitHub API, and JSZip, FileSaver.js libraries.

Step1: Input github url to the field at the top-right.
Step2: Press enter or click download for download zip directly or click search for view the list of sub-folders and files.
Step3: Click “Download Zip File” or “Get File” button to get files.

In most cases, it works fine, except that the folder contains more than 1,000 files, because of the Github Trees API limitation. (refers to Github API#Contents)

And it also can support private/public repos and upgrade the rate limit, if you have GitHub account and use “get token” link in this site.

If you have svn, you can use svn export to do this:

svn export https://github.com/foobar/Test.git/trunk/foo

Notice the URL format:

  • The base URL is https://github.com/
  • /trunk appended at the end

Before you run svn export, it’s good to first verify the content of the directory with:

svn ls https://github.com/foobar/Test.git/trunk/foo

For a Generic git Repo:

If you want to download files, not clone the repository with history, you can do this with git-archive.

git-archive makes a compressed zip or tar archive of a git repository. Some things that make it special:

  1. You can choose which files or directories in the git repository to archive.
  2. It doesn’t archive the .git/ folder, or any untracked files in the repository it’s run on.
  3. You can archive a specific branch, tag, or commit. Projects managed with git often use this to generate archives of versions of the project (beta, release, 2.0, etc.) for users to download.

An example of creating an archive of the docs/usage directory from a remote repo you’re connected to with ssh:

# in terminal
$ git archive --format tar --remote ssh://server.org/path/to/git HEAD docs/usage > /tmp/usage_docs.tar

More information in this blog post and the git documentation.

Note on GitHub Repos:

GitHub doesn’t allow git-archive access. ☹️

After trying all the answers, the best solution for me was:

GitHub’s vscode based editor.

Pros:

  1. doesn’t require any extra tool like svn or API tokens.
  2. No limit on size of content
  3. Saves as a directory or file, and not archive.

Instructions

  1. Go to any repo. (ex. https://github.com/RespiraWorks/Ventilator/tree/master/software)
  2. Press . or replace .com with .dev in URL to open the repo in GitHub’s internal editor
  3. In Explorer pane (left side or press Ctrl+Shift+E), Right click on the required file/folder and select download.
  4. In the Select Folder dialog box, choose the directory on your disk under which you want the selected file/folder to exist.

Note

I tried other solutions like in accepted answer but,

  1. Don’t want to install and learn svn only for this.
  2. Other tools like Download Directory, Refined GitHub, GitZip, DownGit either require API tokens or cannot download large directories.

Other options

  • VSCode with Remote Repositories extension to open the repo and download the file/folder.

Nothing wrong with other answers but I just thought I’d share step-by-step instructions for those wandering through this process for the first time.

How to download a single folder from a github repository (Mac OS X):

~ To open Terminal just click spotlight and type terminal then hit enter

  1. On a Mac you likely already have SVN (to test just open terminal and
    type “svn” or “which svn” ~ without the quote marks)
  2. On Github: Locate the Github path to your git folder (not the repo) by clicking the specific folder name within a repo
  3. Copy the path from the address bar of the browser
  4. Open Terminal and type: svn export
  5. Next paste in the address (eg.):
    https://github.com/mingsai/Sample-Code/tree/master/HeadsUpUI
  6. Replace the words: tree/master
  7. with the word: trunk
  8. Type in the destination folder for the files (in this example, I
    store the target folder inside of the Downloads folder for the
    current user)
  9. Here space is just the spacebar not the word (space) ~/Downloads/HeadsUpUI
  10. The final terminal command shows the full command to download the
    folder (compare the address to step 5) svn export
    https://github.com/mingsai/Sample-Code/trunk/HeadsUpUI
    ~/Downloads/HeadsUpUI

BTW – If you are on Windows or some other platform you can find a binary download of subversion (svn) at http://subversion.apache.org

~ If you want to checkout the folder rather than simply download it try using the svn help (tldr: replace export with checkout)

Update

Regarding the comment on resuming an interrupted download/checkout. I would try running svn cleanup followed by svn update. Please search SO for additional options.

Whoever is working on specific folder he needs to clone that particular folder itself, to do so please follow below steps by using sparse checkout.

  1. Create a directory.

  2. Initialize a Git repository. (git init)

  3. Enable Sparse Checkouts. (git config core.sparsecheckout true)

  4. Tell Git which directories you want (echo 2015/brand/May( refer to folder you want to work on) >> .git/info/sparse-checkout)

  5. Add the remote (git remote add -f origin https://jafartke.com/mkt-imdev/DVM.git)

  6. Fetch the files (git pull origin master )

You cannot; unlike Subversion, where each subdirectory can be checked out individually, Git operates on a whole-repository basis.

For projects where finer-grained access is necessary, you can use submodules — each submodule is a separate Git project, and thus can be cloned individually.

It is conceivable that a Git front-end (e.g. GitHub’s web interface, or gitweb) could choose to provide an interface for you to extract a given folder, but to my knowledge none of them do that (though they do let you download individual files, so if the folder does not contain too many files, that is an option)

Edit – GitHub actually offers access via SVN, which would allow you to do just this (as per comment). See https://github.com/blog/1438-improved-svn-here-to-stay-old-svn-going-away for latest instructions on how to do this

2019 Summary

There are a variety of ways to handle this, depending on whether or not you want to do this manually or programmatically.

There are four options summarized below. And for those that prefer a more hands-on explanation, I’ve put together a YouTube video: Download Individual Files and Folders from GitHub.

Also, I’ve posted a similar answer on StackOverflow for those that need to download single files from GitHub (as opposed to folders).


1. GitHub User Interface

  • There’s a download button on the repository’s homepage. Of course, this downloads the entire repo, after which you would need to unzip the download and then manually drag out the specific folder you need.

2. Third Party Tools

  • There are a variety of browser extensions and web apps that can handle this, with DownGit being one of them. Simply paste in the GitHub URL to the folder (e.g. https://github.com/babel/babel-eslint/tree/master/lib) and press the “Download” button.

3. Subversion

  • GitHub does not support git-archive (the git feature that would allow us to download specific folders). GitHub does however, support a variety of Subversion features, one of which we can use for this purpose. Subversion is a version control system (an alternative to git). You’ll need Subversion installed. Grab the GitHub URL for the folder you want to download. You’ll need to modify this URL, though. You want the link to the repository, followed by the word “trunk”, and ending with the path to the nested folder. In other words, using the same folder link example that I mentioned above, we would replace “tree/master” with “trunk”. Finally, open up a terminal, navigate to the directory that you want the content to get downloaded to, type in the following command (replacing the URL with the URL you constructed): svn export https://github.com/babel/babel-eslint/trunk/lib, and press enter.

4. GitHub API

  • This is the solution you’ll need if you want to accomplish this task programmatically. And this is actually what DownGit is using under the hood. Using GitHub’s REST API, write a script that does a GET request to the content endpoint. The endpoint can be constructed as follows: https://api.github.com/repos/:owner/:repo/contents/:path. After replacing the placeholders, an example endpoint is: https://api.github.com/repos/babel/babel-eslint/contents/lib. This gives you JSON data for all of the content that exists in that folder. The data has everything you need, including whether or not the content is a folder or file, a download URL if it’s a file, and an API endpoint if it’s a folder (so that you can get the data for that folder). Using this data, the script can recursively go through all content in the target folder, create folders for nested folders, and download all of the files for each folder. Check out DownGit’s code for inspiration.

If you truly just want to just “download” the folder and not “clone” it (for development), the easiest way to simply get a copy of the most recent version of the repository (and therefore a folder/file within it), without needing to clone the whole repo or even install git in the first place, is to download a zip archive (for any repo, fork, branch, commit, etc.) by going to the desired repository/fork/branch/commit on GitHub (e.g. http(s)://github.com/<user>/<repo>/commit/<Sha1> for a copy of the files as they were after a specific commit) and selecting the Downloads button near the upper-right.

This archive format contains none of the git-repo magic, just the tracked files themselves (and perhaps a few .gitignore files if they were tracked, but you can ignore those :p) – that means that if the code changes and you want to stay on top, you’ll have to manually re-download it, and it also means you won’t be able to use it as a git repository…

Not sure if that’s what you’re looking for in this case (again, “download”/view vs “clone”/develop), but it can be useful nonetheless…

There’s a Python3 pip package called githubdl that can do this*:

export GIT_TOKEN=1234567890123456789012345678901234567890123
pip install githubdl
githubdl -u http://github.com/foobar/test -d foo

The project page is here

* Disclaimer: I wrote this package.

git clone --filter from git 2.19 now works on GitHub

Tested 2020-09-18, git 2.25.1.

This option was added together with an update to the remote protocol, and it truly prevents objects from being downloaded from the server.

E.g., to clone only objects required for d1 of this repository: https://github.com/cirosantilli/test-git-partial-clone I can do:

git clone \
  --depth 1 \
  --filter=blob:none \
  --no-checkout \
  https://github.com/cirosantilli/test-git-partial-clone \
;
cd test-git-partial-clone
git checkout master -- d1

I have covered this in more detail at: Git: How do I clone a subdirectory only of a Git repository?

If you are comfortable with unix commands, you don’t need special dependencies or web apps for this. You can download the repo as a tarball and untar only what you need.

Example (woff2 files from a subdirectory in fontawesome):

curl -L https://api.github.com/repos/FortAwesome/Font-Awesome/tarball | tar xz --wildcards "*/web-fonts-with-css/webfonts/*.woff2" --strip-components=3
  • More about the link format: https://developer.github.com/v3/repos/contents/#get-archive-link (including how to get a zip file or specific branches/refs)
  • Keep the initial part of the path (*/) to match any directory. Github creates a wrapper directory with the commit ref in the name, so it can’t be known.
  • You probably want --strip-components to be the same as the amount of slashes (/) in the path (previous argument).

This will download the whole tarball. Use the SVN method mentioned in the other answers if this has to be avoided or if you want to be nice to the GitHub servers.

Another specific example:

Like I want to download ‘iOS Pro Geo’ folder from the url

https://github.com/alokc83/APRESS-Books-Source-Code-/tree/master/%20Pro%20iOS%20Geo

and I can do so via

svn checkout https://github.com/alokc83/APRESS-Books-Source-Code-/trunk/%20Pro%20iOS%20Geo

Note trunk in the path

Edited: (as per Tommie C’s comment)

Yes, using export instead of checkout would give a clean copy without extra git repository files.

svn export https://github.com/alokc83/APRESS-Books-Source-Code-/trunk/%20Pro%20iOS%20Geo

Edited: If tree/master is not there in url then Fork it and it will be there in Forked url.

This is how I do it with git v2.25.0, also tested with v2.26.2. This trick doesn’t work with v2.30.1

TLDR

git clone --no-checkout --filter=tree:0 https://github.com/opencv/opencv
cd opencv

# requires git 2.25.x to 2.26.2
git sparse-checkout set data/haarcascades

You can use Docker to avoid installing a specific version of git

git clone --no-checkout --filter=tree:0 https://github.com/opencv/opencv
cd opencv

# requires git 2.25.x to 2.26.2
docker run --rm -it -v $PWD/:/code/ --workdir=/code/ alpine/git:v2.26.2 sparse-checkout set data/haarcascades

Full solution

# bare minimum clone of opencv
$ git clone --no-checkout --filter=tree:0 https://github.com/opencv/opencv
...
Resolving deltas: 100% (529/529), done.

# Downloaded only ~7.3MB , takes ~3 seconds
# du = disk usage, -s = summary, -h = human-readable
$ du -sh opencv
7.3M    opencv/

# Set target dir
$ cd opencv
$ git sparse-checkout set data/haarcascades
...
Updating files: 100% (17/17), done.
# Takes ~10 seconds, depending on your specs

# View downloaded files
$ du -sh data/haarcascades/
9.4M    data/haarcascades/
$ ls data/haarcascades/
haarcascade_eye.xml                      haarcascade_frontalface_alt2.xml      haarcascade_licence_plate_rus_16stages.xml  haarcascade_smile.xml
haarcascade_eye_tree_eyeglasses.xml      haarcascade_frontalface_alt_tree.xml  haarcascade_lowerbody.xml                   haarcascade_upperbody.xml
haarcascade_frontalcatface.xml           haarcascade_frontalface_default.xml   haarcascade_profileface.xml
haarcascade_frontalcatface_extended.xml  haarcascade_fullbody.xml              haarcascade_righteye_2splits.xml
haarcascade_frontalface_alt.xml          haarcascade_lefteye_2splits.xml       haarcascade_russian_plate_number.xml

References

  • git-sparse-checkout-blog
  • git-sparse-checkout-docs
  • git-filter-props-docs

you can use git-svn in the following way.

first, replace tree/master with trunk
then, install git-svn by sudo apt install git-svn

git svn clone https://github.com/lodash/lodash/trunk/test

This way you don’t have to go through the pain of setting svn, specifically for Windows users.

You can do a simple download of the directory tree:

git archive --remote [email protected]:foobar/Test.git HEAD:foo | tar xf -

But if you mean to check it out, and be able to do commits and push them back, no you can’t do that.

None of the answers helped in my situation. If you are developing for Windows, you likely don’t have svn. In many situations one can’t count on users to have Git installed either, or don’t want to download entire repositories for other reasons. Some of the people that answered this question, such as Willem van Ketwich and aztack, made tools to accomplish this task. However, if the tool isn’t written for the language you are using, or you don’t want to install a third party library, these don’t work.

However, there is a much easier way. GitHub has an API that allows you to download a single file or an entire directory’s contents using GET requests. You can access a directory using https://api.github.com/repos/:owner/:repo_name/contents/:path that returns a JSON object enumerating all the files in the directory. Included in the enumeration is a link to the raw content of the file, the download_url parameter. The file can then be downloaded using that URL.

It’s a two step process that requires the ability to make GET requests, but this can be implemented in pretty much any language, on any platform. It can be used to get files or directories.

git sparse-checkout

Git 2.25.0 includes a new experimental git sparse-checkout command that makes the existing feature easier to use, along with some important performance benefits for large repositories. (The GitHub Blog)

Example with current version:

git clone --filter=blob:none --sparse https://github.com/git/git.git
cd git
git sparse-checkout init --cone
git sparse-checkout add t

Most notably

  • --sparse checks out only top-level directory files of git repository into working copy
  • git sparse-checkout add t incrementally adds/checks out t subfolder of git

Other elements

  • git sparse-checkout init does some preparations to enable partial checkouts
  • --filter=blob:none optimizes data fetching by downloading only necessary git objects (take a look at partial clone feature for further infos)
  • --cone also speeds up performance by applying more restricted file inclusion patterns

GitHub status

GitHub is still evaluating this feature internally while it’s enabled on a select few repositories […]. As the feature stabilizes and matures, we’ll keep you updated with its progress. (docs)

Just 5 steps to go

  • Download SVN from here.
  • Open CMD and go to SVN bin directory like:
    cd %ProgramFiles%\SlikSvn\bin
  • Let’s suppose I wan to download this directory URL
    https://github.com/ZeBobo5/Vlc.DotNet/tree/develop/src/Samples
  • Replace
    tree/develop
    or tree/master with trunk
  • Now fire this last command to download folder in same directory.
svn export https://github.com/ZeBobo5/Vlc.DotNet/trunk/src/Samples

You can use ghget with any URL copied from the address bar:

ghget https://github.com/fivethirtyeight/data/tree/master/airline-safety

It’s a self-contained portable shell script that doesn’t use SVN (which didn’t work for me on a big repo). It also doesn’t use the API so it doesn’t require a token and isn’t rate-limited.

Disclaimer: I made it.

Just to amplify the answers above, a real example from a real GitHub repository to a local directory would be:

svn ls https://github.com/rdcarp/playing-cards/trunk/PumpkinSoup.PlayingCards.Interfaces

svn export https://github.com/rdcarp/playing-cards/trunk/PumpkinSoup.PlayingCards.Interfaces  /temp/SvnExport/Washburn

Sometimes a concrete example helps clarify the substitutions proposed.

I use linux so , put this in ~/.bashrc , called even 😀 $HOME/.bashrc

git-dowloadfolder(){
a="$1"
svn checkout ${a/tree\/master/trunk}

}

then refresh the shell with

source ~/.bashrc 

then use it with git-downloadfolder blablabla 😀

It’s one of the few places where SVN is better than Git.

In the end we’ve gravitated towards three options:

  1. Use wget to grab the data from GitHub (using the raw file view).
  2. Have upstream projects publish the required data subset as build artifacts.
  3. Give up and use the full checkout. It’s big hit on the first build, but unless you get lot of traffic, it’s not too much hassle in the following builds.

For whatever reason, the svn solution does not work for me, and since I have no need of svn for anything else, it did not make sense to spend time trying to make it, so I looked for a simple solution using tools I already had. This script uses only curl and awk to download all files in a GitHub directory described as "/:user:repo/contents/:path".

The returned body of a call to the GitHub REST API
"GET /repos/:user:repo/contents/:path" command returns an object that includes a "download_url" link for each file in a directory.

This command-line script calls that REST API using curl and sends the result through AWK, which filters out all but the “download_url” lines, erases quote marks and commas from the links, and then downloads the links using another call to curl.

curl -s https://api.github.com/repos/:user/:repo/contents/:path | awk \
     '/download_url/ { gsub("\"|,", "", $2); system("curl -O "$2"); }'

Our team wrote a bash script to do this because we didn’t want to have to install SVN on our bare bones server.

https://github.com/ojbc/docker/blob/master/java8-karaf3/files/git-download.sh

It uses the github API and can be run from the command line like this:

git-download.sh https://api.github.com/repos/ojbc/main/contents/shared/ojb-certs

I work with CentOS 7 servers on which I don’t have root access, nor git, svn, etc (nor want to!) so made a python script to download any github folder: https://github.com/andrrrl/github-folder-downloader

Usage is simple, just copy the relevant part from a github project, let’s say the project is https://github.com/MaxCDN/php-maxcdn/, and you want a folder where some source files are only, then you need to do something like:

$ python gdownload.py "/MaxCDN/php-maxcdn/tree/master/src" /my/target/dir/
(will create target folder if doesn’t exist)

It requires lxml library, can be installed with easy_install lxml
If you don’t have root access (like me) you can create a .pydistutils.py file into your $HOME dir with these contents:

[install]
user=1

And easy_install lxml will just work (ref: https://stackoverflow.com/a/33464597/591257).

Open repo to codesandbox by replacing github to githubbox in url and on codesandbox go to file menu and Export it as a zip.

For following repo:
https://github.com/geist-org/react/tree/master/examples/custom-themes

Enter following url:
https://githubbox.com/geist-org/react/tree/master/examples/custom-themes

In codesandbox go to file menu and Export it as a Zip.

try it.

https://github.com/twfb/git-directory-download

usage: gitd [-h] [-u URL] [-r] [-p] [--proxy PROXY]

optional arguments:
  -h, --help         show this help message and exit
  -u URL, --url URL  github url, split by ",", example: "https://x, http://y"
  -r, --raw          download from raw url
  -p, --parse        download by parsing html
  --proxy PROXY      proxy config, example "socks5://127.0.0.1:7891"

Example:
  1. download by raw url: gitd -u "https://github.com/twfb/git-directory-download"
  2. download by raw url: gitd -r -u "https://github.com/twfb/git-directory-download"
  3. dowmload by parsing: gitd -p -u "https://github.com/twfb/git-directory-download"
  4. download by raw url with proxy: gitd -r -u "https://github.com/twfb/git-directory-download" --proxy "socks5://127.0.0.1:7891"

You may also like...

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.