Git and Large Repositories
by Peter Jones / February 19, 2008
Abstract
Import a large repository into Git and perform some testing to see how
fast it really is.
Importing from a Large CVS Repository
The FreeBSD CVS repository was chosen because it should provide an
appropriately sized code base to work from in terms of the number of
files, and the overall historical data available.
As of this writing, the FreeBSD src CVS repository had 67,311 files,
weighing in at 1.7 GB worth of data. The first commit was made in
1993, providing us with 15 years worth of historical information.
Getting a Local Copy of the FreeBSD CVS Repository
For performance reasons, it's probably a good idea to get a local copy
of the CVS repository you wish to import into Git. Using the list of
FreeBSD rsync sites, locate a mirror that contains the entire FreeBSD
FTP site.
You can then use rsync to download the entire repository:
$ mkdir freebsd
$ cd freebsd
$ rsync -vaz --delete rsync://a-freebsd-mirror/FreeBSD/development/FreeBSD-CVS/src .
Preparing the CVS Repository
The FreeBSD mirror that you pulled the src directory from probably
doesn't have the CVSROOT directory. We'll create that now.
Just create an empty directory, and run cvs init.
$ cd ..
$ mkdir empty
$ cvs -d $PWD/empty init
$ mv empty/CVSROOT freebsd/
$ rmdir empty
Preparing the CVS Conversion Tools
I wanted to use the git-cvsimport tool, but alas, it didn't work.
Therefore, I used the cvs2svn tool which support Git as of version
2.1.
Tools Needed:
- Python Version >= 2.2
- Python GDBM Bindings (should be part of Python)
- cvs2svn Version >= 2.1
Prepare cvs2svn for the conversion:
$ fetch http://cvs2svn.tigris.org/files/documents/1462/41596/cvs2svn-2.1.0.tar.gz
$ tar xzf cvs2svn-2.1.0.tar.gz
$ mkdir freebsd-to-git
$ cd freebsd-to-git
$ cp ../cvs2svn-2.1.0/cvs2svn-example.options .
$ cp ../cvs2svn-2.1.0/test-data/main-cvsrepos/cvs2svn-git.options .
You then need to edit the cvs2svn-git.options file. Change the path
to the CVS repository (look near run_options.add_project).
Make sure you remove the comment character before the
fallback_encoding lines. Otherwise the log message conversion will
fail because it can't convert all commit messages to ASCII. I also
removed the comment character before the utf8 string, just above
fallback_encoding because some of the FreeBSD commit messages were in
UTF8.
Importing the CVS Repository
Everything should be ready for the actual conversion. The following
steps took me approximately 48 hours. Your millage may vary.
$ env PYTHONPATH="$PWD/../cvs2svn-2.1.0/contrib" python ../cvs2svn-2.1.0/cvs2svn --options=cvs2svn-git.options
$ git-init
$ cat cvs2svn-tmp/git-blob.dat cvs2svn-tmp/git-dump.dat | git-fast-import
I noticed (and ignored) several warnings along the lines of:
branch '1.1.1' already has name 'ISC',
cannot also have name 'VIXIE', ignoring the latter
Examining Git Performance and Scalability
The following tests were performed using a local repository so that
network access times would not be a factor. The most significant
issue with remote repositories should be the initial cloning of a
large Git repository.
Size of Repository
$ du -hs cvs
$ du -hs svn
$ du -hs git
| CVS |
1.7 GB |
| SVN |
3.9 GB |
| Git |
511 MB |
Size of Working Directory
$ du -hs .
$ du -hs .
$ du -hs .
| CVS |
534 MB |
| SVN |
1.1 GB |
| Git |
995 MB |
Time Required to Perform a Checkout
$ cvs -Q -d `pwd`/cvs co src
$ svn co -q file://`pwd`/svn/trunk
$ git clone git freebsd.git
| CVS |
11.56s user 79.36s system 24% cpu 6:09.16 total |
| SVN |
38.64s user 203.96s system 17% cpu 22:57.05 total |
| Git |
3.53s user 42.25s system 19% cpu 3:53.25 total |
Time Required to Export HEAD
Time Required to Create and Checkout a Branch
$
$ svn copy -m "make branch" TRUNK_URL BRANCH_URL; svn switch BRANCH_URL
$ git checkout -b pjjexp
| CVS |
|
| SVN |
1.35s user 12.22s system 3% cpu 5:49.05 total |
| Git |
0.37s user 4.09s system 7% cpu 56.524 total |
Time Required to Retrieve Change Status
$ cvs status > /dev/null 2>&1
$ svn status
$ git status
| CVS |
1.42s user 29.60s system 18% cpu 2:45.72 total |
| SVN |
0.74s user 6.34s system 5% cpu 2:16.22 total |
| Git |
0.42s user 3.64s system 27% cpu 14.810 total |
Time Required to Tag a Branch
Time Required to Retrieve History on a Single File
$ cvs log Makefile > /dev/null
$ svn log src/Makefile > /dev/null
$ git log src/Makefile > /dev/null
| CVS |
0.00s user 0.03s system 89% cpu 0.029 total |
| SVN |
0.06s user 2.05s system 19% cpu 10.751 total |
| Git |
2.46s user 4.13s system 37% cpu 17.627 total |
Time Required to Retrieve Entire Project History
$ cvs log > /dev/null 2>&1
$ svn log > /dev/null 2>&1
$ git log > /dev/null 2>&1
| CVS |
6.99s user 71.31s system 31% cpu 4:09.50 total |
| SVN |
11.38s user 130.37s system 15% cpu 14:53.16 total |
| Git |
0.66s user 5.20s system 36% cpu 15.853 total |
Tags:
git