Silence is Foo Mental notes on Ruby, Git, Rails and whatever geeky thing

7Apr/11Off

Understanding the staging area, index and cache in Git

When you are first learning Git it's a little bit confusing to undersand some of its terminology: sometimes it uses different words for the same thing, sometimes it uses the same word for different things, sometimes it isn't just the terminology in the documentation but the name of some options.

Let's start with the --cached option, look at this examples:

$ git rm --cached some_file

Taken from the git-rm(1) Manual Page:

    When --cached is given, the staged content has to match either the tip of the branch or the file on disk, allowing the file to be removed from just the index
    --cached
        Use this option to unstage and remove paths only from the index. Working tree files, whether modified or not, will be left alone.

Please, look at these three words: cache, stage and index, we'll talk about them later.

In a few words we use the --cached option to untrack a file without actually deleting it, look at the following example:

$ git rm --cached README

# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#	deleted:    README
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#	README

Look how we are deleting the file (untracking it) but at the same time the file is still there as untracked file.

Now, let's try without the --cached option:

$ git rm README
rm 'README'

$ git status

# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#	deleted:    README
#

In this case there isn't any untracked files because we deleted the file.

Let's see another case, I modified the README file but I haven't added to the staging area and I modified the something.rb file and I added the changes to the staging area.

$ git status

# On branch master
#
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#	modified:   something.rb
#
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   README
#

The following command shows the diff of the README file:

$ git diff

And the following commands shows the diff of the something.rb file because it's already in the staging area/index:

$ git diff --cached
$ git diff --staged

So, What's the cache? It is the old name for the index.

In older documentation you may see the index called the "current directory cache" or just the "cache".

THE GIT INDEX

The index is thus a sort of temporary staging area, which is filled with a tree which you are in the process of working on. Let's try with an easy definition: It's the place where changes for the next commit gets registered.

What else? It is a binary file (generally kept in .git/index) containing a sorted list of path names, each with permissions and the SHA1 of a blob object.

Is there a way to see the content of the .git/index file? yes, git ls-files can show you the contents of the index:

    --stage
         Show staged contents' object name, mode bits and stage number in the output.
$ git ls-files --stage

100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0  something.rb
    --cached
        Show cached files in the output (default)
$ git ls-files --cached

something.rb

Do you remember when we used the git-diff command and that the --cached and --staged options were exactly the same thing? In the case of git-ls-files is not that way, both options do different things, both of them are useful to read the index file but the output is different. confused? don't care, it's going to be worst.

During a merge the index can store multiple versions of a single file (called stages). The third column in the git ls-files --stage output above is the stage number, and will take on values other than 0 for files with merge conflicts.

So, What's the difference between the index, the cache, and thestaging area?

THEY ARE ALL THE SAME THING!!!!!

But this is Git, right? it wouldn't be that fun if it was that easy. Look at this, git apply has --index and --cached, with different semantics:

Taken from the git-apply Manual Page:

    With the --index option the patch is also applied to the index, and with the --cached option the patch is only applied to the index.

Whaaaaaat? Oh, yeah, perfectly clear! well, not that much though.

We all know Git, it really enjoys to gets us confused, here's another example of it:

Let's say we have modified a file and we want to know the status of it:

$ git status

# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   README
#

"Changed but not updated"? not sure what "Not updated" ever supposed to mean in this sentence?

"Not updated" means the file is not in the staging area/index or that it's not yet cached.

Maybe a better sentence would be something like this:

    Changed but not yet cached
    Changed but not staged
    Changed but not in index

Why using another term for the same thing? so far, we have 4 terms meaning the same thing, well, kinda. I don't know why.

Git borrows, I think, the terminology "UPDATE" from a caching system, so a cache is:

    "dirty" when it's not in sync with what it caches,
    "clean" when it is,
    "updated" when it's synced

OPPOSITE OPERATIONS

There's another area where Git gets really confusing, I'm talking about opposite operations. I mean, If you use "add" to add something, you expect to use "remove" to do exactly the oposite, well, not in Git. Let's see some examples.

Adding changes to the index/staging area

$ git add file_name

Opposite:

$ git reset HEAD file_name

Tracking files (adds to the index/staging area too)

$ git add file_name

Opposite:

If you want to delete the file:

$ git rm file_name

If you want to keep the file:

$ git rm --cached file_name
$ git update-index --force-remove file_name

Hope this helps you to understand how Git works. See you later!

About raf

Ruby Developer
Filed under: git Comments Off
Comments (0) Trackbacks (0)

Sorry, the comment form is closed at this time.

Trackbacks are disabled.