Customized Suzuki Intruder in French village
Photo: Copyright © 2014 Eelke Blok

Converting from Subversion to Git

Note: This tutorial is about completely replacing a server-side Subversion repository by a Git repository, for a workflow with a central Git repository. If you just want to use Git as a frontend to a Subversion repository, you are probably better off with the standard Git SVN documentation.

For a personal project I wanted to convert from subversion to Git. There are a lot of tutorials out there. Why add another one? Well, I haven't found one that is generic and tells me everything I want to do. The most notable omission from most tutorials is, what happens to your Subversion branches and tags. I want them to be properly converted, and I want to replace my server side SVN repo with a Git repo, so that I have a central place somewhere that stores my work.

So, I'm recording the steps I need to take, as much for my own future reference as for the "public benefit". I'm by no means a Git expert (I hope to become one once I have converted a few of my projects to Git), so please bare with me and forgive any mistakes I might make (or rather, please point them out in the comments).

Importing the subversion history

We'll assume you have some version of Git on your system. I'm using msysgit on Windows, but this should work equally well on any install. Older versions of Git may require you to not use the single git entrypoint command, but have you combine it with the subcommand, so "git svn" becomes "git-svn". I do believe this is an indication your Git install is pretty old, though (correct me if I'm wrong).

I've started out with these commands to create a home for the imported svn repo (mscn for Mini Seven Club Nederland - that's the project I'm converting, remember - and temp because we're going to throw this out later - we're going to move this to the server and clone that for our working repo later on).

$ mkdir project_git
$ cd project_git

Now it's time to initialize this repo as a subversion enabled repository (the location is the location of the project on my local network):

$ git svn init svn://example.com/project --stdlayout --no-metadata
Initialized empty Git repository in c:/data/Project/project_git/.git/

The --stdlayout is there to tell Git that our Subversion repository has the standard layout, i.e. trunk, branches and tags directories at the root level of the repository. This will make sure that these are imported correctly. If omitted, you'll get a Git repository with a single branch, containing your entire SVN structure. You don't want that.

The --no-metadata tells Git not to record the original SVN URL locations, which we won't need and will only be legacy clutter, since this is a one way conversion.

Before we continue with the actual import, there's one more step. We need to prepare a file that describes to the import process how to translate usernames in the SVN repository (a simple username like "jdoe") to the format Git uses (a more elaborate name of the form "John Doe <j.doe@example.com>"). Now, this repo has been through a lot already; it actually started out as a CVS repository and has been hosted at different locations, so eventhough I am the only person who's ever worked on it, there are several usernames used to check in stuff.

Update 30/11/2012: When Googling for this again to see whether there are any more good procedures, I found a good description By John Albin Wilkins, who displays some strong command line foo to create some scaffolding for this file. Execute the following from your terminal (Mac/Linux/...) to get a file listing all usernames associated with revisions in the SVN repo:

$ svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > users.txt

Note: take care when copy-pasting this, you may end up with gt and lt combinations. Everything from the & up to the ; will need to be replaced with < for lt and > for gt.

It looks like this command will only look at the history for the current location in your working copy (current branch, if you are at the root), so if you may want to check if there are any authors exclusive to some branch you did not run this command on.

Edit the file to add actual names and email addresses, like so:

eelke = Eelke Blok <eelke@example.com>
eblok = Eelke Blok <eelke@example.com>
Eelke = Eelke Blok <eelke@example.com>
Administrator = Eelke Blok <eelke@example.com>
cvsowner = Eelke Blok <eelke@example.com>
eelkecvs = Eelke Blok <eelke@example.com>
(no author) = Eelke Blok <eelke@example.com>

The last line I've learned the hard way; apparently, the CVS to SVN conversion process (way back when) has resulted in commits without an author, resulting in Git telling me (after issuing a fetch, see further):

Author: (no author) not defined in users.txt file

After a bit of fiddling, I found out that adding this line to the users.txt file let the process run through succesfully (it was my second guess, after leaving the space before the = emtpy :)).

I've stored the file as users.txt in the directory where we're creating the git repo.

Next, let's tell Git about this file:

$ git config svn.authorsfile users.txt

All set, let's see how this works out.

$ git svn fetch

Git will start importing your SVN revisions one by one. Should you also run into the problem that your SVN repo contains an author you didn't specify in the authors file, don't worry. Simply add the appropriate line and rerun the fetch; it will continue where it left off.

You might notice that Git is importing your SVN branches into a namespace called refs/remotes. We don't like that, because we plan on using this repository as a remote repository, where these branches should just be the branch. We'll fix this later.

Anyway, get yourself a coffee/tea/soda/beer/cocktail, depending on the time of day and your preference. Heck, the time of day may become appropriate for whatever you like, this might take a while.

Cleaning the SVN stuff out

Ready? Didn't have too many cocktails so your thoughts are still reasonably coherent? OK, so now we have a plain old SVN-enabled Git repository. You can have a look at the branches that were created as such:

$ git branch -a
* master
 remotes/Pre-version
 remotes/banner
 remotes/bannermove
 remotes/dev
 remotes/devel
 remotes/eelkeblok
 remotes/ledenvoordeel
 remotes/mscn4
 remotes/register
 remotes/registrations
 remotes/remove_shop
 remotes/tags/demo
 remotes/tags/merge_dev_to_stable
 remotes/tags/online-20051224-2259
 remotes/tags/online-20051224-2311
 remotes/tags/online-20051224-2329
 remotes/tags/online-20051225-0001
[...]
 remotes/tags/online-20091116-2114
 remotes/tags/online-20091130-2127
 remotes/tags/start
 remotes/templateupdate-2.0.18
 remotes/trunk
 remotes/upgrade_forum_3_0
 remotes/wiki

You'll notice that the Subversion "tags" (which in Subversion aren't really tags at all, they're just branches without any subsequent revisions) were converted into Git branches within their own namespace. This is what you might use to work with a project that has its main repository still on SVN; you can use Git on your workstation and have most of its benefits, while the project itself remains on SVN. Neat. But not what we came here to do.

We basically want to do two things;

  • Move the generated branches out of the remotes space
  • Turn the SVN "tags" into proper Git tags

(The following steps were taken directly from reference 5).

First, let's convert the tags. Note: It has been reported that these steps may remove extra commits made in a tag on SVN (in SVN, tags really are no different from branches, except that we agreed to not commit to them, by convention). Proceed with caution.

Create a script with the following contents (Windows users should be able to execute this through Git Bash, which was installed along with msysgit):

for t in `git branch -r | grep 'tags/' | sed s_tags/__` ; do
     git tag $t tags/$t^
     git branch -d -r tags/$t
done

Save the script e.g. as converttags.sh and execute it.

$ converttags.sh
Deleted remote branch tags/demo (was 5cfe8f1).
Deleted remote branch tags/merge_dev_to_stable (was 2d47421).
Deleted remote branch tags/online-20051224-2259 (was 5a50f9d).
Deleted remote branch tags/online-20051224-2311 (was c6aeda6).
Deleted remote branch tags/online-20051224-2329 (was 92c1ad6).
Deleted remote branch tags/online-20051225-0001 (was 3432051).
Deleted remote branch tags/online-20051225-0008 (was f574ea8).
[...]

That's our tags converted. Let's also remove SVN compatibility references:

$ git branch -d -r trunk
Deleted remote branch trunk (was 6ade13f).

$ git config --remove-section svn-remote.svn

$ rm -rf .git/svn .git/{logs/,}refs/remotes/svn/

And let's convert the remaining remote branches to local ones:

$ git config remote.origin.url .

$ git config --add remote.origin.fetch +refs/remotes/*:refs/heads/*

$ git fetch

All set. We now have a local Git repository with all the contents from our old SVN repository.

Getting a bare repository onto the server

We're only a few steps away from victory. We need to turn our repository into a bare repository, which means just the Git data, not the accompanying working copy. We'll do this by cloning our current repository:

$ cd ..
$ git clone --bare mscn_temp mscn.git

This will go superfast, presumably because Git uses hard links because we're still on the same system.

Upload your bare repository to your server wherever you would like. Do this with the tool you are used to to upload stuff to your server. For the next steps, it is important that you have SSH access to your server; if you don't, there are other ways to contact your Git repository, although chances are slim that you'll actually be able to set up any of them if you don't even get SSH access with your provider. Make sure that the user you are planning to use to connect to the remote repository has read-write access to the repository.

Cloning the remote repository to create a working repository

Now we're ready to test the remote repository by cloning it to create our final working repository on our local system:

$ git clone eelke@myserver.net:/data/git/mscn.git
Initialized empty Git repository in c:/Data/Project/mscn/.git/
remote: Counting objects: 24568, done.
remote: Compressing objects: 100% (13012/13012), done.
remote: Total 24568 (delta 11682), reused 23571 (delta 10964)
Receiving objects: 100% (24568/24568), 52.27 MiB | 1022 KiB/s, done.
Resolving deltas: 100% (11682/11682), done.
Checking out files: 100% (1675/1675), done.

So, there you are. You have all existing history of your project on your server in a central Git repository and a local working repository to start Gittin' with it.

References

  1. Cleanly Migrate Your Subversion Repository To a GIT Repository, John Madox
  2. Converting Subversion repositories to Git, Redline's Weblog
  3. git-svn(1) Manual Page
  4. How to convert from Subversion to Git, Paul Dowman
  5. Convert a SVN Alioth repository to Git
  6. Converting a Subversion repository to Git, John Albin Wilkins

Comments

Hello,

Great tutorial! I am following it but I have encountered a problem with svn asking me for password each time it "changes to a tag" (while fetching).

e.g.
r7 = be190c07e15782279a357d6e5e7254ebc95e69d2 (refs/remotes/trunk)
Found possible branch point: svn+ssh://USER@svne1.XXXXX.XXXXX.com/svnroot/PROJECT/trunk =&gt; svn+ssh://USER@svne1.XXXXX.XXXXX.com/svnroot/PROJECT/tags/common-gen, 7
Found branch parent: (refs/remotes/tags/common-gen) be190c07e15782279a357d6e5e7254ebc95e69d2
Following parent with do_switch
USER@svne1.XXXXX.XXXXX.com's password:
Could not chdir to home directory /XXXXX/home/USER: No such file or directory
Successfully followed parent
USER@svne1.XXXXX.XXXXX.com's password:
Could not chdir to home directory /isource/home/USER: No such file or directory
r8 = c72e933e0341fcc1078a814e6c8a58fc9b98b329 (refs/remotes/tags/common-gen)

what can I do to prevent this? I have tried everything contained in here http://stackoverflow.com/questions/2899209/how-to-save-password-when-us… and http://stackoverflow.com/questions/2599281/cant-make-svn-store-password…

Hi, Thanks for the information.

How can I convert the SVN files which are linked through svn:externals to the GIT repository?
and

How can the svn:externals links can be conserved in git?

I'm not that familiar with svn externals, I certainly don't have much hands-on experience with them, so the below is mostly based on the quick glance in the svn book I just took.

If your externals are also moving to Git (or there are Git equivalents to the svn repositories), you might want to look into Git submodules or the subtree command. Submodules are most like svn externals (they are basically just a versioned reference to a commit in another Git repository), although there are also voices saying submodules are evil and you should use something like subtree (which results in the external stuff to be actually part of the main repository, which has both advantages and disadvantages).

If the externals need to remain on svn, your answer probably isn't going to be in Git. You could look into some sort of build tool that does the svn checkout as part of a build-proces, but you will need to make sure to add the checkout to your .gitignore or Git will version the svn checkout, which is most probably not what you want.

If your externals are in fact internal to the current repository (something, I found in the svn book, is also an option), there's not really an equivalent in Git (Git is fundamentally different to svn in the way it handles directory trees, which makes "referring to another part of the repository" not really a thing, in Git). If you're in a UNIXy environment, and so is everyone else who needs to work with the project, you could look into symbolic links. I'm not sure of there is an equivalent to symbolic links in Windows that would let itself be versioned (and allows relative references, which is what you would need). Or, maybe things are set up in such a way that it actually makes sense to split the single SVN repository into several Git repositories, which lands you into the scenario where your externals have their own Git repo again.

That's a really clear and useful tutorial that, in normal situations, worked
well for me.

I have a problem with one of my svn projects that I started with a
non-standard layout and then moved to the standard "trunk branches tags".

Applying your procedure recovers only the history "before" the standardization
(that's obvious).

I wonder how to recover the whole log, considering the one before
standardization as part of the trunk.

Maybe I need to apply the procedure up to the revision before reorganization
without the --stdlayout flag and then "merge" it with the other? I don't see
any option to pass to init to tell "svn git" to fetch only to a give
revision ...

Any suggestion?

Alberto

I don't have a pre-baked solution, but I'm thinking along the lines of doing two conversion, and then afterwards trying to combine them. It might be as easy as doing a rebase of the second conversion on the first (you would need to get both histories in the same repository, e.g. by defining one as a remote on the other). Or, you might find you need to attach a subtree of one repository into the other, Git has a subtree command for that you might want to check out.

But I <em>think</em> it could look something like this:

- Convert with svn root as Git master (conversion 1)
- Convert with svn trunk as Git master (conversion 2)
- Get rid of everything after the move to standard svn layout in conversion 2.
- Define conversion 2 as a remote in conversion 1.
- Rebase master of conversion 2 onto master of conversion 1.

This will end you up with dates on your commits of the moment you do the rebase, but the original time should be preserved in the commit date.

I do not get any errors, but my GIT repository appears to be empty after following these instructions. I wonder what I am doing wrong?

One thing I can think off of the top of my head is, is your SVN repository structured in the standard way with trunk, branches and tags folders at the base level? If not, you will have to remove the --stdlayout option and instead tell git-svn about the structure of your repo with the separate --trunk, --tags and --branches arguments (see e.g. https://www.kernel.org/pub/software/scm/git/docs/git-svn.html). If you don't have any such structure (you effectively have just a single line of development in the root of your svn repository) you can completely leave out any of these options.

I am getting this error when i try to perform "git svn fetch" . I am breaking my head to fix it could you please help me on this.
" Author: SYSTEM not defiend in Authors.txt file"

Sorry, I missed your legit reply in between all the spam. Have you tried simply making an entry in your authors.txt file with "SYSTEM =" and then a suitable author email (e.g. yourself)?

i got all the names of authors but idk i can see sss in authors list
but for that sss can i give just like

owner = Eelke Blok
eelkecvs = Eelke Blok
sss = Eelke Blok

sss = can i give my name and my email id ????? and also i am using svn2git.
following this (https://github.com/nirvdrum/svn2git).

i am right ? if i am wrong please help me out thanks happy weekend guys
awaiting for reply/help.

thanks.

Yes, that doesn't sound unreasonable. Especially if you can't remember where "sss" came from.

Thank you for this great article with detailed instructions. I have successfully migrated 5 repositories from SVN to Git. Nice!

Add new comment

Category