Missing Semester 06 - Version Control (Git)
Version control systems (VCSs) are tools used to track changes to source code (or other collections of files and folders) π€.
πΉ YouTube Link
βοΈ Official Notes
I also learned a lot from Alex' git workshop
Git Data Model permalink
Git models the history of a collection of files and folders within some top-level directory (root directory, the directory in which you do git init
) as a series of snapshots (commits).
π€ git init creates the .git
directory, which is where Git stores all of its information. If you delete everything except .git
, you can still rebuild your entire repository.
In git folders are trees, and files are blobs.
In pseudocode:
- A file is a bunch (array) of bytes:
type blob = array<byte>
- A directory contains named files and directories (mappings from the file/directory name to the actual contents - contents being another tree or a blob):
type tree = map<string, tree | blob>
- A commit has parents, metadata, and the top-level tree:
type commit = struct {
parent: array<commit>
author: string
message: string
snapshot: tree
}
Objects and content-addressing permalink
π€ What do we mean by content-addressable? It means that at the core of Git is a simple key-value data store. This means you can insert any kind of content into a Git repository, for which Git will hand you back a unique key you can use later to retrieve that content. More on git objects from Git Book
π€ Also, the content of an object determines its ID (two files with the same content will have the same id!). The technical name for this is a content-addressable filesystem.
- An object is a blob, tree, or commit (see above):
type object = blob | tree | commit
In Git data store, all objects are content-addressed by their SHA-1 hash:
objects = map<string, object>
def store(object):
// hash-object command
id = sha1(object)
objects[id] = object
def load(id):
return objects[id]
π€ A typical repo has thousands of objects, so Git breaks up objects into subdirectories to avoid any one directory becoming too large.
π€ What is a hash function?
A hash function is a function that takes in a big piece of data and turns it into a short string.
References permalink
Branches and refs are human-readable labels to specific commits. "HEAD" is a reference to where we currently are.
references = map<string, string>
// references are mutable
def update_reference(name, id):
references[name] = id
def read_reference(name):
return references[name]
// cat-file
def load_reference(name_or_id):
if name_or_id in references:
return load(references[name_or_id])
else:
return load(name_or_id)
π‘ At a high level, all git command line commands are just manipulations of references data or objects data:
git add
is a combination ofhash-object
andupdate-index
.git commit
comes fromwrite-tree
,commit-tree
, and if youβre on a branch,update-ref
git checkout
usesupdate-ref
orupdate-symbolic-ref
If you ls .git
in a new git repository:
HEAD description info refs config hooks objects
The HEAD file tells Git which branch weβre working on
The description file is only used by the GitWeb program β it can be ignored.
The info directory has a single file, exclude, which contains a list of per-repo ignores. Like a gitignore file, but it doesnβt need to be checked in.
The refs directory is empty
The config file contains repo-specific configuration
The hooks directory is used to store scripts that fire on certain events β for example, running a linter before you commit
The objects directory should be empty (aside from two more empty directories)
You'll get the
index
file once you've added your changes to the staging area (a.k.a. index)
π‘ Staging area is you telling git what changes should be included in the next snapshot.
git cat-file -p [commit sha]
Git's internal command to print out the contents of the commit (used to inspect objects stored in Git). With this command, we can restore our file even if we delete it β because the object is kept safe in the .git
directory:
rm animals.txt`
git cat-file -p b13311e04762c322493e8562e6ce145a899ce570 > animals.txt
Misc Git Commands permalink
git commit -a
To commit all the changes for files that are already tracked - so you can potentially skip git add .
π‘ Think of merging as being the opposite of branching.
π€ git pull
=== git fetch & git merge
git add -p
For interactively deciding what you want to add.
git blame [filename]
To see who made changes to the file.
git show [commit sha]
To see the changes made in the commit.
git bisect
Use binary search to find the commit that introduced a bug.