Understanding the staging area, index and cache in Git
When you are first learning Git it's a little bit confusing to undersand some of its terminology: sometimes it uses different words for the same thing, sometimes it uses the same word for different things, sometimes it isn't just the terminology in the documentation but the name of some options.
Let's start with the --cached option, look at this examples:
$ git rm --cached some_file
Taken from the git-rm(1) Manual Page:
-
When
--cached is given, the staged content has to match either the tip of the branch or the file on disk, allowing the file to be removed from just the index
-
--cached
Use this option to unstage and remove paths only from the index. Working tree files, whether modified or not, will be left alone.
Please, look at these three words: cache, stage and index, we'll talk about them later.
In a few words we use the --cached option to untrack a file without actually deleting it, look at the following example:
$ git rm --cached README # On branch master # Changes to be committed: # (use "git reset HEAD <file>..." to unstage) # # deleted: README # # Untracked files: # (use "git add <file>..." to include in what will be committed) # # README
Look how we are deleting the file (untracking it) but at the same time the file is still there as untracked file.
Now, let's try without the --cached option:
$ git rm README rm 'README' $ git status # On branch master # Changes to be committed: # (use "git reset HEAD <file>..." to unstage) # # deleted: README #
In this case there isn't any untracked files because we deleted the file.
Let's see another case, I modified the README file but I haven't added to the staging area and I modified the something.rb file and I added the changes to the staging area.
$ git status # On branch master # # Changes to be committed: # (use "git reset HEAD <file>..." to unstage) # # modified: something.rb # # Changed but not updated: # (use "git add <file>..." to update what will be committed) # (use "git checkout -- <file>..." to discard changes in working directory) # # modified: README #
The following command shows the diff of the README file:
$ git diff
And the following commands shows the diff of the something.rb file because it's already in the staging area/index:
$ git diff --cached $ git diff --staged
So, What's the cache? It is the old name for the index.
In older documentation you may see the index called the "current directory cache" or just the "cache".
THE GIT INDEX
The index is thus a sort of temporary staging area, which is filled with a tree which you are in the process of working on. Let's try with an easy definition: It's the place where changes for the next commit gets registered.
What else? It is a binary file (generally kept in .git/index) containing a sorted list of path names, each with permissions and the SHA1 of a blob object.
Is there a way to see the content of the .git/index file? yes, git ls-files can show you the contents of the index:
-
--stage
Show staged contents' object name, mode bits and stage number in the output.
$ git ls-files --stage 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 something.rb
-
--cached
Show cached files in the output (default)
$ git ls-files --cached something.rb
Do you remember when we used the git-diff command and that the --cached and --staged options were exactly the same thing? In the case of git-ls-files is not that way, both options do different things, both of them are useful to read the index file but the output is different. confused? don't care, it's going to be worst.
During a merge the index can store multiple versions of a single file (called stages). The third column in the git ls-files --stage output above is the stage number, and will take on values other than 0 for files with merge conflicts.
So, What's the difference between the index, the cache, and thestaging area?
THEY ARE ALL THE SAME THING!!!!!
But this is Git, right? it wouldn't be that fun if it was that easy. Look at this, git apply has --index and --cached, with different semantics:
Taken from the git-apply Manual Page:
-
With the
--index option the patch is also applied to the index, and with the --cached option the patch is only applied to the index.
Whaaaaaat? Oh, yeah, perfectly clear! well, not that much though.
We all know Git, it really enjoys to gets us confused, here's another example of it:
Let's say we have modified a file and we want to know the status of it:
$ git status # On branch master # Changed but not updated: # (use "git add <file>..." to update what will be committed) # (use "git checkout -- <file>..." to discard changes in working directory) # # modified: README #
"Changed but not updated"? not sure what "Not updated" ever supposed to mean in this sentence?
"Not updated" means the file is not in the staging area/index or that it's not yet cached.
Maybe a better sentence would be something like this:
-
Changed but not yet cached
Changed but not staged
Changed but not in index
Why using another term for the same thing? so far, we have 4 terms meaning the same thing, well, kinda. I don't know why.
Git borrows, I think, the terminology "UPDATE" from a caching system, so a cache is:
-
"dirty" when it's not in sync with what it caches,
"clean" when it is,
"updated" when it's synced
OPPOSITE OPERATIONS
There's another area where Git gets really confusing, I'm talking about opposite operations. I mean, If you use "add" to add something, you expect to use "remove" to do exactly the oposite, well, not in Git. Let's see some examples.
Adding changes to the index/staging area
$ git add file_name
Opposite:
$ git reset HEAD file_name
Tracking files (adds to the index/staging area too)
$ git add file_name
Opposite:
If you want to delete the file:
$ git rm file_name
If you want to keep the file:
$ git rm --cached file_name $ git update-index --force-remove file_name
Hope this helps you to understand how Git works. See you later!