Engine of a Mini Seven racing Mini
Photo: Copyright © 2009 Eelke Blok

Converting from Subversion to Git

Note: This tutorial is about completely replacing a server-side Subversion repository by a Git repository, for a workflow with a central Git repository. If you just want to use Git as a frontend to a Subversion repository, you are probably better off with the standard Git SVN documentation.

For a personal project I wanted to convert from subversion to Git. There are a lot of tutorials out there. Why add another one? Well, I haven't found one that is generic and tells me everything I want to do. The most notable omission from most tutorials is, what happens to your Subversion branches and tags. I want them to be properly converted, and I want to replace my server side SVN repo with a Git repo, so that I have a central place somewhere that stores my work.

So, I'm recording the steps I need to take, as much for my own future reference as for the "public benefit". I'm by no means a Git expert (I hope to become one once I have converted a few of my projects to Git), so please bare with me and forgive any mistakes I might make (or rather, please point them out in the comments).

Importing the subversion history

We'll assume you have some version of Git on your system. I'm using msysgit on Windows, but this should work equally well on any install. Older versions of Git may require you to not use the single git entrypoint command, but have you combine it with the subcommand, so "git svn" becomes "git-svn". I do believe this is an indication your Git install is pretty old, though (correct me if I'm wrong).

I've started out with these commands to create a home for the imported svn repo (mscn for Mini Seven Club Nederland - that's the project I'm converting, remember - and temp because we're going to throw this out later - we're going to move this to the server and clone that for our working repo later on).

$ mkdir project_git
$ cd project_git

Now it's time to initialize this repo as a subversion enabled repository (the location is the location of the project on my local network):

$ git svn init svn://example.com/project --stdlayout --no-metadata
Initialized empty Git repository in c:/data/Project/project_git/.git/

The --stdlayout is there to tell Git that our Subversion repository has the standard layout, i.e. trunk, branches and tags directories at the root level of the repository. This will make sure that these are imported correctly. If omitted, you'll get a Git repository with a single branch, containing your entire SVN structure. You don't want that.

The --no-metadata tells Git not to record the original SVN URL locations, which we won't need and will only be legacy clutter, since this is a one way conversion.

Before we continue with the actual import, there's one more step. We need to prepare a file that describes to the import process how to translate usernames in the SVN repository (a simple username like "jdoe") to the format Git uses (a more elaborate name of the form "John Doe <j.doe@example.com>"). Now, this repo has been through a lot already; it actually started out as a CVS repository and has been hosted at different locations, so eventhough I am the only person who's ever worked on it, there are several usernames used to check in stuff.

Update 30/11/2012: When Googling for this again to see whether there are any more good procedures, I found a good description By John Albin Wilkins, who displays some strong command line foo to create some scaffolding for this file. Execute the following from your terminal (Mac/Linux/...) to get a file listing all usernames associated with revisions in the SVN repo:

$ svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > users.txt

Note: take care when copy-pasting this, you may end up with gt and lt combinations. Everything from the & up to the ; will need to be replaced with < for lt and > for gt.

It looks like this command will only look at the history for the current location in your working copy (current branch, if you are at the root), so if you may want to check if there are any authors exclusive to some branch you did not run this command on.

Edit the file to add actual names and email addresses, like so:

eelke = Eelke Blok <eelke@example.com>
eblok = Eelke Blok <eelke@example.com>
Eelke = Eelke Blok <eelke@example.com>
Administrator = Eelke Blok <eelke@example.com>
cvsowner = Eelke Blok <eelke@example.com>
eelkecvs = Eelke Blok <eelke@example.com>
(no author) = Eelke Blok <eelke@example.com>

The last line I've learned the hard way; apparently, the CVS to SVN conversion process (way back when) has resulted in commits without an author, resulting in Git telling me (after issuing a fetch, see further):

Author: (no author) not defined in users.txt file

After a bit of fiddling, I found out that adding this line to the users.txt file let the process run through succesfully (it was my second guess, after leaving the space before the = emtpy :)).

I've stored the file as users.txt in the directory where we're creating the git repo.

Next, let's tell Git about this file:

$ git config svn.authorsfile users.txt

All set, let's see how this works out.

$ git svn fetch

Git will start importing your SVN revisions one by one. Should you also run into the problem that your SVN repo contains an author you didn't specify in the authors file, don't worry. Simply add the appropriate line and rerun the fetch; it will continue where it left off.

You might notice that Git is importing your SVN branches into a namespace called refs/remotes. We don't like that, because we plan on using this repository as a remote repository, where these branches should just be the branch. We'll fix this later.

Anyway, get yourself a coffee/tea/soda/beer/cocktail, depending on the time of day and your preference. Heck, the time of day may become appropriate for whatever you like, this might take a while.

Cleaning the SVN stuff out

Ready? Didn't have too many cocktails so your thoughts are still reasonably coherent? OK, so now we have a plain old SVN-enabled Git repository. You can have a look at the branches that were created as such:

$ git branch -a
* master
 remotes/Pre-version
 remotes/banner
 remotes/bannermove
 remotes/dev
 remotes/devel
 remotes/eelkeblok
 remotes/ledenvoordeel
 remotes/mscn4
 remotes/register
 remotes/registrations
 remotes/remove_shop
 remotes/tags/demo
 remotes/tags/merge_dev_to_stable
 remotes/tags/online-20051224-2259
 remotes/tags/online-20051224-2311
 remotes/tags/online-20051224-2329
 remotes/tags/online-20051225-0001
[...]
 remotes/tags/online-20091116-2114
 remotes/tags/online-20091130-2127
 remotes/tags/start
 remotes/templateupdate-2.0.18
 remotes/trunk
 remotes/upgrade_forum_3_0
 remotes/wiki

You'll notice that the Subversion "tags" (which in Subversion aren't really tags at all, they're just branches without any subsequent revisions) were converted into Git branches within their own namespace. This is what you might use to work with a project that has its main repository still on SVN; you can use Git on your workstation and have most of its benefits, while the project itself remains on SVN. Neat. But not what we came here to do.

We basically want to do two things;

  • Move the generated branches out of the remotes space
  • Turn the SVN "tags" into proper Git tags

(The following steps were taken directly from reference 5).

First, let's convert the tags. Note: It has been reported that these steps may remove extra commits made in a tag on SVN (in SVN, tags really are no different from branches, except that we agreed to not commit to them, by convention). Proceed with caution.

Create a script with the following contents (Windows users should be able to execute this through Git Bash, which was installed along with msysgit):

for t in `git branch -r | grep 'tags/' | sed s_tags/__` ; do
     git tag $t tags/$t^
     git branch -d -r tags/$t
done

Save the script e.g. as converttags.sh and execute it.

$ converttags.sh
Deleted remote branch tags/demo (was 5cfe8f1).
Deleted remote branch tags/merge_dev_to_stable (was 2d47421).
Deleted remote branch tags/online-20051224-2259 (was 5a50f9d).
Deleted remote branch tags/online-20051224-2311 (was c6aeda6).
Deleted remote branch tags/online-20051224-2329 (was 92c1ad6).
Deleted remote branch tags/online-20051225-0001 (was 3432051).
Deleted remote branch tags/online-20051225-0008 (was f574ea8).
[...]

That's our tags converted. Let's also remove SVN compatibility references:

$ git branch -d -r trunk
Deleted remote branch trunk (was 6ade13f).

$ git config --remove-section svn-remote.svn

$ rm -rf .git/svn .git/{logs/,}refs/remotes/svn/

And let's convert the remaining remote branches to local ones:

$ git config remote.origin.url .

$ git config --add remote.origin.fetch +refs/remotes/*:refs/heads/*

$ git fetch

All set. We now have a local Git repository with all the contents from our old SVN repository.

Getting a bare repository onto the server

We're only a few steps away from victory. We need to turn our repository into a bare repository, which means just the Git data, not the accompanying working copy. We'll do this by cloning our current repository:

$ cd ..
$ git clone --bare mscn_temp mscn.git

This will go superfast, presumably because Git uses hard links because we're still on the same system.

Upload your bare repository to your server wherever you would like. Do this with the tool you are used to to upload stuff to your server. For the next steps, it is important that you have SSH access to your server; if you don't, there are other ways to contact your Git repository, although chances are slim that you'll actually be able to set up any of them if you don't even get SSH access with your provider. Make sure that the user you are planning to use to connect to the remote repository has read-write access to the repository.

Cloning the remote repository to create a working repository

Now we're ready to test the remote repository by cloning it to create our final working repository on our local system:

$ git clone eelke@myserver.net:/data/git/mscn.git
Initialized empty Git repository in c:/Data/Project/mscn/.git/
remote: Counting objects: 24568, done.
remote: Compressing objects: 100% (13012/13012), done.
remote: Total 24568 (delta 11682), reused 23571 (delta 10964)
Receiving objects: 100% (24568/24568), 52.27 MiB | 1022 KiB/s, done.
Resolving deltas: 100% (11682/11682), done.
Checking out files: 100% (1675/1675), done.

So, there you are. You have all existing history of your project on your server in a central Git repository and a local working repository to start Gittin' with it.

References

  1. Cleanly Migrate Your Subversion Repository To a GIT Repository, John Madox
  2. Converting Subversion repositories to Git, Redline's Weblog
  3. git-svn(1) Manual Page
  4. How to convert from Subversion to Git, Paul Dowman
  5. Convert a SVN Alioth repository to Git
  6. Converting a Subversion repository to Git, John Albin Wilkins

Comments

[...] follows a step-to-step migration history, loosely based on this article, that I found as the most useful resource on the Web among those suggested by Google. I publish the [...]

Better depends on your criteria. Also, this (closed source) product is targeted at mirroring (and thus keeping the SVN repository around). The above tutorial is about leaving SVN behind.

@Eelke Well, git-svn is open source. But did you see the code? I did. As for me, closed-source product with good support is better than open-source tool with no support at all (I doubt someone can provide any reasonable support for git-svn).

You don't need to keep SVN repository around after subgit converted it. Just drop it and you're good to go with git.

Regards, Simon.

Thanks. It really helped me out.
I made a script out of your instructions and executed it on a per project basis

<strong>Migrating JAIML from SVN to Git...</strong>

Some time ago, I decided to revive my Java AIML Interpreter &#8211; I&#8217;ve been programming in Smalltalk for the past 4 years and I&#8217;m becoming a bit rusty. My trusty old SVN server has been dead for a while, so I decided to migrate to github ...

If you have ever committed to a tag, the "tags/$t^" advice in this post WILL throw that last commit away!

Hmm... Have you tried? IIRC, the Git tag is placed on the tip of the tag branch and I would expect that to also hold true when there are more than one commit in the tag "branch" (you really shouldn't be committing to tags, though ;)). Anyway, I'll update the post with a warning.

Quick question: why the ^ at the end of:
git tag $t tags/$t^

^ at the end of the ref usually means "the commit before"... so wouldn't this tag the commit before the one we want to tag?

I'm probably missing something obvious here, but all the similar scripts, including other websites you reference don't use the carrot.

I only happened to notice it due to a particular tag I had not having a previous ref and thus getting:
fatal: ambiguous argument 'tags/foo^': unknown revision or path not in the working tree.
yet referencing 'tags/foo' works just fine.

Thanks.

I got the script from reference 5, so really my guess is as good as yours, in a way :) However, IIRC, at least when I last went through this process, svn tags would effectively become branches with a single commit branching off whatever branch the tag was placed on. This makes sense, because tagging in subversion effectively is creating a copy of part of the tree in a different location of the tree, creating a new tree revision. However, if we want to get a "proper" Git tag, we'd have to place the tag on the commit the tag-branch branches off from, i.e. one commit up in the tag-branch.

Hi,

Totally, this blog is too nice,

I have started to migrate from my SVN repo into GIT

http://192.168.0.58:8888/svn/IT
tags/Archive
tags/Baseline
trunk/HSC_Apps
trunk/Compliance
After migration I could be able to see my GIT repo of IT/HSC_Apps and IT/Compliance.

I couldn't able to see my tags changes into my GIT repo.

Can you please help me out what are the steps to do suppose if I have to see my tags as well.

[gituser@ggns1git01 IT]$ git branch -r
tags/Archive
tags/Baseline
tags/Baselined
tags/LATEST
trunk
Could you please can anyone help me on this.

Appreciate your support.

Regards,
Justin

Re:

"Quick question: why the ^ at the end of:
git tag $t tags/$t^"

I had to remove the trailing caret as half my tags failed when executing converttags.sh. My question is: What's the worst-case scenario with this removed?

git tag -l

shows all tags correctly, so I'm curious if I'm missing anything important. Otherwise, an excellent guide that allowed me to migrate our WikkaWiki codebase to github. Thanks!

@Brian Koontz: Also refer to comment 9. I believe the caret is intended to put the Git tag on the second to last commit in the Git <em>branch</em> that was created for the <em>SVN</em> tag (still with me?). The assumption is that the branch is only one commit long and the contents of that commit does not differ from its parent (because if you properly tagged in SVN, that would have been the situation there as well; you can, <em>but should not</em> commit to tags in SVN).

I'm not sure why this fails for you, but leaving it off should not have too many problems, except that all your converted tags will be at the tip of a separate branch that has no other purpose than the tag.

@Justin Raja Kumar: Sorry for the late response. Unfortunately, I am not sure I understand your problem. Do take into account the things that have already been noted about migrating SVN tags to Git tags. When following this guide to the letter, it is extremely important that you never committed to tags in SVN; you can, but you shouldn't and the assumption of this guide is that you haven't.

If you have, that is not a huge problem, you can modify the convertags.sh script so that it puts the Git tag on the tip of the tag-branch; just change:

<code>git tag $t tags/$t^</code>

to:

<code>git tag $t tags/$t</code>

(Untested, but from the other commenters I get that some people have done this with success - I am actually not sure if it is possible to delete the branch when there is a tag on it). This would result in a less clean repository history, but it will preserve any history that may have been in your tags-that-were-really-branches ;)

Hi Eelke,
thanks for this great post! I have tried several ways now to migrate from svn to git including yours, but I always seem to loose all svn commit messages. Only when I don't use the --no-metadata option in the beginning I get commit messages at all. But this is not what I want, I need the 'real' commit messages from the old svn repository. Does anybody else have this problem, too? I usually use 'qgit --all' to check the new git repo and to read the commit messages.

@Daniel: Sorry, no, this doesn't sound at all familiar. The --no-metadata switch will, as far as I know, only prevent Git from recording information that is required for it to be able to write back to the SVN repository, i.e. in case you use Git as an SVN client. Other data, like author and commit message, should be preserved (conceivably, you could call that metadata as well, but I could not imagine any scenario where you would want to leave *that* out).

Thanks for the instructions! After following these steps, it seemed like i couldn't delete my temp repository, since the bare repo was hard linked to it.. so I created a bundle and cloned it into a bare repo.. So far so good.

So instead of "Getting a bare repository onto the server", i did this:

git bundle create mybundle --all &amp;&amp;\
cd .. &amp;&amp;\
git clone --bare mscn_temp/mybundle -b master mscn.git

Hello,

Great tutorial! I am following it but I have encountered a problem with svn asking me for password each time it "changes to a tag" (while fetching).

e.g.
r7 = be190c07e15782279a357d6e5e7254ebc95e69d2 (refs/remotes/trunk)
Found possible branch point: svn+ssh://USER@svne1.XXXXX.XXXXX.com/svnroot/PROJECT/trunk =&gt; svn+ssh://USER@svne1.XXXXX.XXXXX.com/svnroot/PROJECT/tags/common-gen, 7
Found branch parent: (refs/remotes/tags/common-gen) be190c07e15782279a357d6e5e7254ebc95e69d2
Following parent with do_switch
USER@svne1.XXXXX.XXXXX.com's password:
Could not chdir to home directory /XXXXX/home/USER: No such file or directory
Successfully followed parent
USER@svne1.XXXXX.XXXXX.com's password:
Could not chdir to home directory /isource/home/USER: No such file or directory
r8 = c72e933e0341fcc1078a814e6c8a58fc9b98b329 (refs/remotes/tags/common-gen)

what can I do to prevent this? I have tried everything contained in here http://stackoverflow.com/questions/2899209/how-to-save-password-when-us… and http://stackoverflow.com/questions/2599281/cant-make-svn-store-password…

Hi, Thanks for the information.

How can I convert the SVN files which are linked through svn:externals to the GIT repository?
and

How can the svn:externals links can be conserved in git?

I'm not that familiar with svn externals, I certainly don't have much hands-on experience with them, so the below is mostly based on the quick glance in the svn book I just took.

If your externals are also moving to Git (or there are Git equivalents to the svn repositories), you might want to look into Git submodules or the subtree command. Submodules are most like svn externals (they are basically just a versioned reference to a commit in another Git repository), although there are also voices saying submodules are evil and you should use something like subtree (which results in the external stuff to be actually part of the main repository, which has both advantages and disadvantages).

If the externals need to remain on svn, your answer probably isn't going to be in Git. You could look into some sort of build tool that does the svn checkout as part of a build-proces, but you will need to make sure to add the checkout to your .gitignore or Git will version the svn checkout, which is most probably not what you want.

If your externals are in fact internal to the current repository (something, I found in the svn book, is also an option), there's not really an equivalent in Git (Git is fundamentally different to svn in the way it handles directory trees, which makes "referring to another part of the repository" not really a thing, in Git). If you're in a UNIXy environment, and so is everyone else who needs to work with the project, you could look into symbolic links. I'm not sure of there is an equivalent to symbolic links in Windows that would let itself be versioned (and allows relative references, which is what you would need). Or, maybe things are set up in such a way that it actually makes sense to split the single SVN repository into several Git repositories, which lands you into the scenario where your externals have their own Git repo again.

That's a really clear and useful tutorial that, in normal situations, worked
well for me.

I have a problem with one of my svn projects that I started with a
non-standard layout and then moved to the standard "trunk branches tags".

Applying your procedure recovers only the history "before" the standardization
(that's obvious).

I wonder how to recover the whole log, considering the one before
standardization as part of the trunk.

Maybe I need to apply the procedure up to the revision before reorganization
without the --stdlayout flag and then "merge" it with the other? I don't see
any option to pass to init to tell "svn git" to fetch only to a give
revision ...

Any suggestion?

Alberto

I don't have a pre-baked solution, but I'm thinking along the lines of doing two conversion, and then afterwards trying to combine them. It might be as easy as doing a rebase of the second conversion on the first (you would need to get both histories in the same repository, e.g. by defining one as a remote on the other). Or, you might find you need to attach a subtree of one repository into the other, Git has a subtree command for that you might want to check out.

But I <em>think</em> it could look something like this:

- Convert with svn root as Git master (conversion 1)
- Convert with svn trunk as Git master (conversion 2)
- Get rid of everything after the move to standard svn layout in conversion 2.
- Define conversion 2 as a remote in conversion 1.
- Rebase master of conversion 2 onto master of conversion 1.

This will end you up with dates on your commits of the moment you do the rebase, but the original time should be preserved in the commit date.

I do not get any errors, but my GIT repository appears to be empty after following these instructions. I wonder what I am doing wrong?

One thing I can think off of the top of my head is, is your SVN repository structured in the standard way with trunk, branches and tags folders at the base level? If not, you will have to remove the --stdlayout option and instead tell git-svn about the structure of your repo with the separate --trunk, --tags and --branches arguments (see e.g. https://www.kernel.org/pub/software/scm/git/docs/git-svn.html). If you don't have any such structure (you effectively have just a single line of development in the root of your svn repository) you can completely leave out any of these options.

I am getting this error when i try to perform "git svn fetch" . I am breaking my head to fix it could you please help me on this.
" Author: SYSTEM not defiend in Authors.txt file"

Sorry, I missed your legit reply in between all the spam. Have you tried simply making an entry in your authors.txt file with "SYSTEM =" and then a suitable author email (e.g. yourself)?

i got all the names of authors but idk i can see sss in authors list
but for that sss can i give just like

owner = Eelke Blok
eelkecvs = Eelke Blok
sss = Eelke Blok

sss = can i give my name and my email id ????? and also i am using svn2git.
following this (https://github.com/nirvdrum/svn2git).

i am right ? if i am wrong please help me out thanks happy weekend guys
awaiting for reply/help.

thanks.

Yes, that doesn't sound unreasonable. Especially if you can't remember where "sss" came from.

Add new comment

Category