Skip to content
Flynn Duniho edited this page May 29, 2024 · 1 revision

XKCD describes how a lot of people (and internet tutorials) apply Git to their workflow.

Here are our evolving notes on Git and how we currently understand it. It's a work in progress. If you learn through understanding models instead of recipes, this article is for you.

Essential Git Theory

Git is our source control software. Git DOES NOT use the model of a "central repository" that synchronizes "the latest files" between a server and multiple client (e.g. SVN). Instead, Git is used to manage "sets of changes to files" that can be shared between repos of equal authority.

Git is essentially just a collection of utilities that manages sets of changes called "commit objects" or more colloquially commits.

There are three important ideas around Git commits that may make it easier to understand how Git does its magic.

  • A commit contains the instructions needed to change a file. Git can "replay" these instructions to change files on-the-fly.
  • Every commit contains a reference to the commit that immediately preceded it, and every commit is uniquely identified. Therefore, Git can reconstruct the entirety of a project's history by following the "chain of commits" given a starting commit.
  • Because each commit contains a list of changes and is linked to a parent, it is possible to "replay" chains of changes in-order. Chains of commits are called branches, and are designated by the unique identifier (called a hash) of a commit object.

We use the Git utilities to:

  • Create a "working set of files" that you can edit
  • Track your changes and pack them into a new commit
  • Add the new commit to your repo
  • Create and switch between different "branches" of code
  • Share and receive commits from other repos on the network

Overview

For NetCreate, we use GitHub to host the public repository. The formal steps are:

  • Clone our GitHub repo to your own computer
  • Checkout the dev branch
  • Create your own private "feature branch" named dev- followed by your initials, a /, then the feature you are working on. Dave's is dev-ds/feature, where "feature" is something like "add-standalone-mode".
  • Make a commit to your private feature branch. Each commit consists of one or more file changes, and should Each commit should be explanable with a short sentence.
  • If you want to share your feature, use Git to push your private branch to GitHub.
  • If you want your feature to be added to the GitHub dev branch, merge the current dev branch into your private dev branch, test, and repush it back to GitHub. Then use the pull request function on the GitHub site to generate the Pull Request. See past pull requests for examples of how we write them.

Here's an example video showing how to do these operations in SourceTree.

Workflow and Terminology

Git repositories use a finer-grained approach to tracking changes; they are collections of linked commit objects that specify transformations to the contents of files. This is more nuanced than tracking simple sets of files, and it means that there are more magical spaces that are not immediately apparent.

MAGICAL SPACES IN THE LOCAL REPO

Locally, you have a WORKING DIRECTORY, a STAGING AREA, and a REPOSITORY. As you make changes to your WORKING DIRECTORY, you can choose to STAGE them to prepare for a COMMIT. When you want to save your changes to the repo, Git uses the files in the invisible STAGING area to prepare a COMMIT that is added to the REPOSITORY. Note that a REPOSITORY is not the same as the collection of files; it's the data structure that's used to track changes and branches.

The REPOSITORY itself consists of two important storage areas: the collection of all commit objects and the list of named branches. Each branch "points" to a specific commit object that is the "HEAD" of that branch. Since every commit object refers to a parent commit object, it's possible to recreate the chain of changes that represent a specific code branch.

MAGICAL SPACES IN THE REMOTE REPO

A Git repo can be aware of remote repositories. These 'remotes' are given names so you can easily refer to them when exchanging commit objects with them over the network.

In the case of GitHub, a remote named origin is created when you clone NetCreate. This name is just a convention that GitHub happens to use.

The remote repository has the same collection of commit objects and named branches, but this collection is NOT exactly the same as yours simply because many people have cloned their own repos from the GitHub-hosted NetCreate. Since Git repos do not have the notion of a "centralized master server", to synchronize code changes between different repos means sharing commit objects between repos.

SHARING COMMITS BETWEEN REPOS

Generally, we want to get the most recent changes from GitHub, which in practical terms means updating our files in the WORKING DIRECTORY. To do this, the following magical spaces are involved:

  1. FETCH all new commit objects and named branch pointers from the remote named 'origin'. This updates your local REPOSITORY collection of commit objects, but does not actually update any files in your WORKING DIRECTORY.
  2. MERGE

Man, this is getting long...

Clone this wiki locally