Silence is Foo Mental notes on Ruby, Git, Rails and whatever geeky thing

7Apr/11Off

Understanding the staging area, index and cache in Git

When you are first learning Git it's a little bit confusing to undersand some of its terminology: sometimes it uses different words for the same thing, sometimes it uses the same word for different things, sometimes it isn't just the terminology in the documentation but the name of some options.

Let's start with the --cached option, look at this examples:

$ git rm --cached some_file

Taken from the git-rm(1) Manual Page:

    When --cached is given, the staged content has to match either the tip of the branch or the file on disk, allowing the file to be removed from just the index
    --cached
        Use this option to unstage and remove paths only from the index. Working tree files, whether modified or not, will be left alone.

Please, look at these three words: cache, stage and index, we'll talk about them later.

In a few words we use the --cached option to untrack a file without actually deleting it, look at the following example:

$ git rm --cached README

# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#	deleted:    README
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#	README

Look how we are deleting the file (untracking it) but at the same time the file is still there as untracked file.

Now, let's try without the --cached option:

$ git rm README
rm 'README'

$ git status

# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#	deleted:    README
#

In this case there isn't any untracked files because we deleted the file.

Let's see another case, I modified the README file but I haven't added to the staging area and I modified the something.rb file and I added the changes to the staging area.

$ git status

# On branch master
#
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#	modified:   something.rb
#
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   README
#

The following command shows the diff of the README file:

$ git diff

And the following commands shows the diff of the something.rb file because it's already in the staging area/index:

$ git diff --cached
$ git diff --staged

So, What's the cache? It is the old name for the index.

In older documentation you may see the index called the "current directory cache" or just the "cache".

THE GIT INDEX

The index is thus a sort of temporary staging area, which is filled with a tree which you are in the process of working on. Let's try with an easy definition: It's the place where changes for the next commit gets registered.

What else? It is a binary file (generally kept in .git/index) containing a sorted list of path names, each with permissions and the SHA1 of a blob object.

Is there a way to see the content of the .git/index file? yes, git ls-files can show you the contents of the index:

    --stage
         Show staged contents' object name, mode bits and stage number in the output.
$ git ls-files --stage

100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0  something.rb
    --cached
        Show cached files in the output (default)
$ git ls-files --cached

something.rb

Do you remember when we used the git-diff command and that the --cached and --staged options were exactly the same thing? In the case of git-ls-files is not that way, both options do different things, both of them are useful to read the index file but the output is different. confused? don't care, it's going to be worst.

During a merge the index can store multiple versions of a single file (called stages). The third column in the git ls-files --stage output above is the stage number, and will take on values other than 0 for files with merge conflicts.

So, What's the difference between the index, the cache, and thestaging area?

THEY ARE ALL THE SAME THING!!!!!

But this is Git, right? it wouldn't be that fun if it was that easy. Look at this, git apply has --index and --cached, with different semantics:

Taken from the git-apply Manual Page:

    With the --index option the patch is also applied to the index, and with the --cached option the patch is only applied to the index.

Whaaaaaat? Oh, yeah, perfectly clear! well, not that much though.

We all know Git, it really enjoys to gets us confused, here's another example of it:

Let's say we have modified a file and we want to know the status of it:

$ git status

# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   README
#

"Changed but not updated"? not sure what "Not updated" ever supposed to mean in this sentence?

"Not updated" means the file is not in the staging area/index or that it's not yet cached.

Maybe a better sentence would be something like this:

    Changed but not yet cached
    Changed but not staged
    Changed but not in index

Why using another term for the same thing? so far, we have 4 terms meaning the same thing, well, kinda. I don't know why.

Git borrows, I think, the terminology "UPDATE" from a caching system, so a cache is:

    "dirty" when it's not in sync with what it caches,
    "clean" when it is,
    "updated" when it's synced

OPPOSITE OPERATIONS

There's another area where Git gets really confusing, I'm talking about opposite operations. I mean, If you use "add" to add something, you expect to use "remove" to do exactly the oposite, well, not in Git. Let's see some examples.

Adding changes to the index/staging area

$ git add file_name

Opposite:

$ git reset HEAD file_name

Tracking files (adds to the index/staging area too)

$ git add file_name

Opposite:

If you want to delete the file:

$ git rm file_name

If you want to keep the file:

$ git rm --cached file_name
$ git update-index --force-remove file_name

Hope this helps you to understand how Git works. See you later!

Filed under: git Leave a comment
21Aug/10Off

git submodules statistics in the terminal prompt

When I'm working in projects with submodules and I make a change in one of the submodules I have to go to every submodule directory to review the changes, some times I didn't make any change in a submodule but my memory is not that smart to know that and it's a waste of time.

The solution I found was to add some aliases to my ~/.gitconfig that help me to know the status of the submodules and to add the stats (amount of modified files) to my terminal prompt.


We'd need to add the aliases to the ~/.gitconfig file, if you don't have it, create it

$ mkdir ~/.gitconfig

now we'll need a [alias] key to add our aliases, we'll add them all just below it.

[alias]
   alias_1=...
   alias_2=...
   alias_n=...

Let's say we have 3 submodules in our project: Silence, Foo and Bar.


1. Alias to show the modified files in each submodule
    substatus = "!git submodule foreach git ls-files -m"

the output would be somewhat like this:

$ git substatus
Entering 'Silence'
     src/com/vo/LoremVO.as
Entering 'Foo'
Entering 'Bar'
     bar.rb
     baz.rb

NOTE: substatus is not a git status, it won't show either untracked nor staged files, it will only show modified files, please see the ls-files command help (git ls-files -h)

2. Alias to show the amount of modified files in all the submodules
    substat = "!git submodule --quiet foreach 'echo $path: `git ls-files -m | wc -l`'"

the output would be somewhat like this:

$ git substat
Silence: 1
Foo: 0
Bar: 2

if you see the output of git substatus you'll see that the amount of files matches.

3. Show the amount of modified files in all the submodules in the prompt

Add the following function to your ~/.bash_profile or ~/.bashrc (if you want to know what's the difference click here)

function __git_submodule_stat
{
    if [ -d .git ]; then
        stat=`git submodule --quiet foreach git ls-files -m | wc -l | tr -d ' '`
        if [ $stat -eq 0 ]; then return; fi
        echo "["$stat"]"
    fi
}

then set your PS1

PS1="[\u:\h \W ] $(__git_ps1)$(__git_submodule_stat) $ "

and your prompt would be shown this way:

[user:host folder ] (master)[3] $ _

which would mean you are in the master branch of your repository and that there are 3 modified files in the submodules.

In this case I don't care about what submodule have been modified, I only want to know if there are modified files in any of the submodules, if so, then I use git substatus to know exactly which ones are the modified files.

The foreach argument to git submodule command is pretty interesting, it executes every command you pass it as parameter, such parameter might be a bash or ruby script with any amount of commands as well.

$ git submodule foreach ~/my_script.sh
$ git submodule foreach ~/my_script.rb

If you know a better solution I'd thank you if you'd let me know.

Please make a comment, thanks for reading!

Filed under: git Leave a comment