Wednesday, June 23, 2010

missing unix2dos and dos2unix in ubuntu 10.04

unix2dos and dos2unix were handy tools used to convert files' endings from/to windows and Linux.
after i installed ubuntu 10.04 then installed the tofrodos package as usual i was surprised that the two programs doesn't exist , after some searching i discovered that the programs were just given different names
so:
the unix2dos become todos
and the dos2unix become fromdos
you can adopt to this new names or give them aliases if you wish...
hope that was useful.

Saturday, June 19, 2010

git packing

while i was reading the git community book , i have noticed this paragraph in the first chapter

It is important to note that this is very different from most SCM systems that you may be familiar with. Subversion, CVS,Perforce, Mercurial and the like all use Delta Storage systems - they store the differences between one commit and the next. Git does not do this - it stores a snapshot of what all the files in your project look like in this tree structure each time you commit. This is a very important concept to understand when using Git.
actually i didn't believe that ... i thought that would make git the most space inefficient scm ever created ... although data is compressed before storing but still each new change to a file however small will make git store ALL that file Again to the repository !!!!

so to make sure of that i wrote the following script

#!/bin/bash
git init
i=0
while [ $i -lt 20 ]
do
 echo commit $i
 du -hs
 git add a > /dev/null
 git commit -m "commit $i" > /dev/null
 i=$(($i+1))
 echo $i >> a
 echo '--------------'
done

the script appends one character to the file then commits the changes , at each commit it prints the size of the repository.
during the 20 commits the size of the repository grew from 1020K to 7.1 MB !!!

off course that is not the full story , git uses something called Packing ... it stores large number of objects in the same file using delta-compression , so data is not written more than one time , still newly created objects are stored again so periodic repacking is needed to reduce the repo size.

lets try to execute git gc in our repo
it prints the following
Counting objects: 60, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (40/40), done.
Writing objects: 100% (60/60), done.
Total 60 (delta 20), reused 0 (delta 0)

and the size of the repo became 1.3 MB !!! that is just great ...
so git is really space efficient compared to others.

5 years of kotbcorp

Two days ago this blog completed its 5th year  :). So congratulations to my self and Thanks for the few followers that have been here for ...