The Better "SVN -> Git" Guide


Converting a Subversion repository to Git can be a pain since there's no simple tool that automates the whole process. Worse, most guides and scripts out there completely ignore a couple basic differences between SVN and Git:

  • Unlike Git, SVN supports empty directories. Consequently, many SVN projects rely on these. These need to be preserved with placeholder directories.
  • SVN repos don't usually get cloned, so they're more likely to contain large binary files. For example, I used to be in the habit of keeping my precompiled binaries in the repository in addition to the source code. Since Git repos are always cloned, it's good to prune out such files when converting.

I've recently gone through the pains of converting a few of my SVN projects to Git. So here are my notes on how to do it (just as much for my own reference as for anyone else).

One important note that Windows users (like me) aren't going to like to hear: Since Git is heavily Linux-oriented and all the needed scripts are Linux shell scripts...you'll have to do this on a Linux system (The "Git Bash" that comes with Git on Windows might be sufficient, but I haven't tried). If you don't have a Linux machine, and Git Bash gives you problems, I recommend installing Linux into a VM using Sun'sOracle's free VirtualBox.

Also, you will need at least Git v1.7.7 (and also git-svn). Anything older than v1.7.7 lacks the --preserve-empty-dirs switch we'll be using. You can check your version of Git with:

$git --version

If you need to upgrade it, and you're on a system that uses apt-get, remember: With apt-get, you upgrade a program with the install command, not the upgrade command. Ie:

$sudo apt-get install git git-svn

1. Copy SVN Repo to Local System

We'll be creating a lot of files and directories, so we should work in a clean directory:

$mkdir my-proj-convert-vcs $cd my-proj-convert-vcs

The SVN repository needs to be copied to your local system if it isn't already there. If your only way of accessing the repo is through SVN itself, you can do it like this:

$svnadmin create my-local-svn-repo $cd my-local-svn-repo $echo '#!/bin/sh' > hooks/pre-revprop-change $chmod +x hooks/pre-revprop-change $svnsync init file:///`pwd` https://url_to_svn_repo $svnsync sync file:///`pwd` $cd ..

That synsync sync... command may take awhile as it downloads each revision in order.

2. Prune the Repo

If you don't have any big binary files (or anything else) that you want pruned out of the repo, you can skip this step.

First, dump the SVN repo:

$svnadmin dump my-local-svn-repo > my-local-svn-repo.dump

Subversion has an official svndumpfilter tool for removing content from a dumped repo, but it's known to be crap. It didn't even work at all for me. Instead, you should use the vastly superior svndumpsanitizer.

Download, extract and compile svndumpsanitizer:

$wget http://miria.linuxmaniac.net/svndumpsanitizer/svndumpsanitizer-0.8.4.tar.bz2 $tar xvjf svndumpsanitizer-0.8.4.tar.bz2 $gcc svndumpsanitizer-0.8.4/svndumpsanitizer.c -o svndumpsanitizer

Now, create a little script to run svndumpsanitizer. Depending if you're on a KDE-based, GNOME-based system or text-based system:

$kate prune-repo.sh & or $gedit prune-repo.sh & or $pico prune-repo.sh

Enter something like this (note that svndumpsanitizer doesn't support wildcards):

#!/bin/sh ./svndumpsanitizer --infile my-local-svn-repo.dump --outfile my-pruned-svn-repo.dump \ --exclude trunk/bin/myApp1 \ --exclude trunk/bin/myApp1.exe \ --exclude trunk/bin/myApp2 \ --exclude trunk/bin/myApp2.exe \ --exclude branches/fooBranch/myApp1 \ --exclude branches/fooBranch/myApp1.exe \ --exclude branches/fooBranch/myApp2 \ --exclude branches/fooBranch/myApp2.exe

Save that, and then back at the command prompt, run it:

$chmod +x prune-repo.sh $./prune-repo.sh

Double-check that it actually pruned the files by comparing the file sizes and making sure the pruned version is indeed smaller:

$ls -l

Now, we create our newly-pruned SVN repo:

$svnadmin create my-pruned-svn-repo $svnadmin load --ignore-uuid my-pruned-svn-repo < my-pruned-svn-repo.dump

That last command may take awhile. It creates a new SVN repository one commit at a time.

3. Convert the Authors

Make an empty checkout of any of your project's SVN repos. The original SVN repo works just as well as any:

$svn co --depth empty https://url_to_svn_repo my-working-copy

Create this file and name it svn-authors.sh :

#!/usr/bin/env bash svn log -q | \ awk -F '|' \ '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | \ sort -u

Make it executable, and run it on your checked out working copy:

$chmod +x svn-authors.sh $cd my-working-copy $../svn-authors.sh > ../my-repo-authors.txt $cd ..

The file my-repo-authors.txt now contains a list of all the authors who have committed to the repo. It looks like this:

User1 = User1 User2 = User2 User3 = User3

Edit that file, changing the right-side to the user's name/email for Git. Don't change the left the left-hand side - those are the SVN user names.

4. Fix git-svn

This part is a bit of an annoyance. From v1.7.7 onward, Git has a --preserve-empty-dirs. Problem is, the damn thing's broken. If you try to use it as-is, the whole operation will likely just fail partway through. It has to be fixed.

First, find your git-svn file:

$find / 2> /dev/null | grep git-svn

For me, it was at /usr/libexec/git-core/git-svn. Open it in your favorite editor:

$sudo [your favorite editor] /path/to/git-svn

Now, in this git-svn file, search for die "Failed to strip path. It should be somewhere near line 4583. Change the die to print and save. Your git-svn is now fixed.

5. Convert to Git

As you may have already guessed, we're going to use git-svn. For very large repos (ex: ten or so thousand commits, hundreds of branches/tags, and thousands of files) git-svn has been known to take forever and then crap out. Allegedly, such repos can be converted quickly with svn-fe and git-fast-import, but good luck actually figuring out how to do it without screwing up your branches, tags, and empty dirs. Personally, I just gave up. This git-svn method may not be suitable for such huge repos, but at least it's actually feasible for mere mortals.

The exact flags to use depend on the structure of your SVN repo. If your repo uses the standard SVN trunk/branches/tags layout, then the proper command is:

$git svn clone file://`pwd`/my-pruned-svn-repo --preserve-empty-dirs \ --placeholder-filename=.stupidgit --authors-file=my-repo-authors.txt \ --stdlayout my-temp-git-repo

The traditional name for the empty-directory-preserving placeholder file is .gitignore (and that's the default), but I think .stupidgit is much more appropriate (and satisfying).

Note that the above command is equivalent to:

$git svn clone file://`pwd`/my-pruned-svn-repo --preserve-empty-dirs \ --placeholder-filename=.stupidgit --authors-file=my-repo-authors.txt \ --trunk=trunk --branches=branches --tags=tags \ my-temp-git-repo

So if your SVN repo uses a non-standard layout for trunk/branches/tags, you handle it like this:

$git svn clone file://`pwd`/my-pruned-svn-repo --preserve-empty-dirs \ --placeholder-filename=.stupidgit --authors-file=my-repo-authors.txt \ --trunk=whatever/trunk/path --branches=whatever/branches/path \ --tags=whatever/tags/path my-temp-git-repo

Even though we now have a Git repo, we're still not done yet.

6. Clean Up the Mess Git Left Behind

First, we'll convert the ignore list, since Git didn't bother to do that automatically:

$cd my-temp-git-repo $git svn show-ignore > .gitignore $git add .gitignore $git commit -m 'Convert svn:ignore properties to .gitignore.'

Even though Git was able to insert dummy files to preserve your empty directories, it was still too dumb to know when to actually get rid of them. So now you likely have a bunch of useless old directories that had already been deleted in SVN which Git wasn't intelligent enough to mimic the removal of. These directories are being held in existence by the .stupidgit placeholder files. You may also have unneeded .stupidgit files in directories that already have other files. So while some of your .stupidgit files are holding legitimate empty directories in existence, we need to remove the rest of them from version control. For each of these useless .stupidgit files, run:

$git rm path/to/useless/placeholder/.stupidgit

Once you've gotten them all (but none of the ones you legitimately want to keep!), commit the changes:

$git commit -m 'Remove superfluous .stupidgit files.'

Now we'll create a new bare Git repository (ie, a repository without a working copy):

$cd .. $git init --bare my-bare-git-repo.git $cd my-bare-git-repo .git $git symbolic-ref HEAD refs/heads/trunk $cd ../my-temp-git-repo $git remote add bare ../my-bare-git-repo.git $git config remote.bare.push 'refs/remotes/*:refs/heads/*' $git push bare $cd ../my-bare-git-repo $git branch -m trunk master $cd ..

At this point, you can delete the temporary Git repo if you want:

$rm my-temp-git-repo -rf

Create a script to convert the tags from Git branches into actual Git tags:

$[your favorite editor] clean-tags.sh

Enter the following:

#!/bin/sh git for-each-ref --format='%(refname)' refs/heads/tags | cut -d / -f 4 | while read ref do git tag "$ref" "refs/heads/tags/$ref"; git branch -D "tags/$ref"; done

Save, then exit back to the command line and run it on the bare Git repo:

$chmod +x clean-tags.sh $cd my-bare-git-repo $../clean-tags.sh

Finally, you're done! You can copy your my-bare-git-repo.git to whatever computer you want, clone from it, push it to BitBucket, etc.

References:

8 comments for "The Better "SVN -> Git" Guide"

  1. (Guest) greg
    2012-04-13 13:05

    Hey there,

    I'm facing some of the same issues you've had, in particular with the empty directories. It seems somehow they're created in all git branches, regardless of which svn branch/tag they were supposed to have been created in. Have you noticed that too ?

    Also, wouldn't you need to apply some of the clean up steps (adding .gitignore and removal of .stupidgit files) in branches rather than just the current (master?) one ?

    Cheers, thanks for the good write up anyway. I guess I'll also publish mine once I'm actually done …

  2. 2012-04-16 15:17

    To be honest, I've never really used SVN's branching, just the tagging. So no, I never noticed that. :( I have no idea offhand how to deal with that.

    "I guess I'll also publish mine once I'm actually done"

    Please do! And post a link to it here, I'd love to take a look.

  3. (Guest) Pelle
    2012-08-23 05:11

    Some really good stuff on this page, thanks!

    I have a repository which makes use of several svn externals. I would like to see some additional stuff on externals, git-svn doesn't do externals out of the box.

  4. 2013-06-27 08:58

    I used the steps you have outlined with great success. Thanks very much for taking the time to succinctly put it in a blog post.

    One little thing pertaining to script to fix the tags, it is missing the loop closing statement "done". The correct one is below:
    #!/bin/sh
    git for-each-ref --format='%(refname)' refs/heads/tags |
    cut -d / -f 4 |
    while read ref
    do
    git tag "$ref" "refs/heads/tags/$ref";
    git branch -D "tags/$ref";
    done

    If you could correct the blog post, it will immensely help people copy & pasting and following your instructions.

    Cheers
    Babu

  5. 2013-07-29 19:43

    @Babu Annamalai: Fixed, thanks!

  6. 2013-11-27 20:20

    Thanks for the amazing tutorial!

    I'm having a slight problem, running Ubuntu.

    After doing the first command in step 5, I get an error saying "Unable to open an ra_local session to URL". Any ideas why?

    Cheers,
    Opender

  7. 2013-12-19 02:31

    No idea, sorry. I'm far from being a Linux expert :( I'd recommend asking over at http://www.linuxquestions.org if you haven't already. They tend to have some very knowledgeable people there.

  8. 2014-01-05 23:21

    Thanks for the reply! I managed to fix it by using 'svnserve -d -r <repo location>', and then using svn://localhost as the git-svn URL.

    Note, when finished, you should also kill the service.

    ps -ef | grep svnserve will list the svn services. Find the sercice number of the appropriate service, and run 'sudo kill <service number>'.

Leave a comment

Captcha