It is important to note that this is very different from most SCM systems that you may be familiar with. Subversion, CVS,Perforce, Mercurial and the like all use Delta Storage systems - they store the differences between one commit and the next. Git does not do this - it stores a snapshot of what all the files in your project look like in this tree structure each time you commit. This is a very important concept to understand when using Git.
actually i didn't believe that ... i thought that would make git the most space inefficient scm ever created ... although data is compressed before storing but still each new change to a file however small will make git store ALL that file Again to the repository !!!!
so to make sure of that i wrote the following script
#!/bin/bash git init i=0 while [ $i -lt 20 ] do echo commit $i du -hs git add a > /dev/null git commit -m "commit $i" > /dev/null i=$(($i+1)) echo $i >> a echo '--------------' done
the script appends one character to the file then commits the changes , at each commit it prints the size of the repository.
during the 20 commits the size of the repository grew from 1020K to 7.1 MB !!!
off course that is not the full story , git uses something called Packing ... it stores large number of objects in the same file using delta-compression , so data is not written more than one time , still newly created objects are stored again so periodic repacking is needed to reduce the repo size.
lets try to execute git gc in our repo
it prints the following
Counting objects: 60, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (40/40), done.
Writing objects: 100% (60/60), done.
Total 60 (delta 20), reused 0 (delta 0)
and the size of the repo became 1.3 MB !!! that is just great ...
1 comment:
Thhis is a great post
Post a Comment