Fundamentals of Software Version Control

NOTES FROM

Fundamentals of Software Version Control

Michael Lehman

1. Overview of Software Version Control

Overview of software version control

Version Control is the process of keeping track of your creative output as it evolves over the course of a project or product. It tracks what is changed, it tracks who changes it, and it tracks why it was changed.

Understanding version control concepts

As you work, each new version is kept in a simple database-like system. A Version Control system is an application or a set of applications that manage creating a repository, which is the name of the specialized database where your files and history are stored. In some systems, this is in an actual database, and in some systems this is done by creating a set of small change files which are identified by some identifier and managed by the software. The next is working set. Your files and folders can be organized into projects within the repository in a manner similar to how you manage files by keeping them in folders on your hard drive.

The files on your hard drive, from the point of view of the Version Control software, are considered your working set. You change the files in the working set and then update the repository to reflect those updates, automatically creating a backup and denoting the purpose of the changes in a process– that’s called add when you are adding new files or committing or checking in for existing files. Once you have added or updated files in the repository, you can easily revert your working set with any version of the data in the repository.

If you’re working by yourself, you can grab an older version. If you are working in a team, you can grab files that have been updated by your team members. That process is either called updating or checking out, and finally as we talked about before, you can mark the state of an entire project tree or an entire repository with an identifier called either a tag, or a label depending on the vendor, when significant events occur such as shipping a release or delivering it to a client.

2. Terminology 

Repository – the database where your changes are tracked;

Working set – your local files with potential changes not yet in the repository;

Add and check-in – ways of getting data into the repository;

Checkout and update – ways of getting data from the repository into your working set;

Revert or rollback – ways of updating your working set to match specific versions of the repository.

When sending changes from one repository to another, you can do it via two methods, one is to use a method which uses HTTP or SSH to communicate the information to a remote server, the other is a way of extracting data from a repository to be able to send by email. Push sends the changes from your repository to the server’s repository, and pull gets the changes from the server and adds them to your local repository. As I said, this is almost often done via a server listing on HTTP or via SSH.

capture

Typically, you initialize the repository because the repository is on your local box.Similar to the Centralized Version Control, you still have a working set, and you add things and you update them. In addition, you can also have this optional remote repository, which is a way of backing things up. Now typically, people that use Git or Mercurial use a system like this, and they will use a hosted system like GitHub in order to be able to back up their local repository into the cloud, and that’s done as we mentioned just before in terminology by using push and pull.

4. Version Control Concepts

Getting files in and out of a repository

The first thing you need to know to use Version Control is how to create either the entire repository or how to create a portion of the repository for the project your’re working on. With Distributed Version Control, you’ll create the repository yourself at least on your local machine. And then you may synchronize it with another repository created on the server by an administrator. So you’ll create the repository by using a command that’s sometimes called initialize and sometimes called create. capture

And what this does is create an empty space for you to put your files and directories in, as we did in our sample before where we simply created myprog.c and checked it in. If your files already exist, meaning you’re adding an existing project to Version Control, you can use the Add feature of your Version Control Software to bulk add all the files and directories into your working set and then check them in. In Distributed Version Control, there’s usually a concept between your file system and the repository that’s called the staging area. I did git commit -a -m and the -a meant add any files that are in the working set that aren’t already in the repository automatically.

Saving changes and tracking history
capture

Once you’ve got your files in, and you can get your files out, the typical work flow when using any Version Control System looks like this in your repository. At Time 1, you can see that there’s a file in your repository that contains A, B, and C. Let’s say you check it out and then you delete B and you add D. Now when you save your files back into the repository, use a command that’s called commit, check in, or update. When you command the Version Control System to save your updated file, it will ask you for a short description of the changes you’ve made. This is oftentimes called a commit message, and this is a crucial part of the history of tracking information.

Reverting to a prior version

capture

This is one of the places where Version Control systems can really help you out.So you’ve got your file in the repository at time T1, you would check it out or update it as necessary, and you begin to make some changes.

You delete feature B and add feature D and then you realize that’s not the way you want it. So with Version Control software, you can go back and say please overwrite the file in my working set with the most recently saved one, or even overwrite the file or the entire working set with the version from last Thursday. This is sometimes called reverting, and in some systems it’s called rollback. Now, the key thing is that you have to identify the specific version you want to roll back to. Each time you commit or check in, the Version Control Software creates a change set identifier.

Creating tags and labels

capture

Sometimes you want to be able to identify the entire state of the repository at a particular point in time, such as a customer release or a major milestone of the project. Rather than writing down a cryptic change set identifier, Version Control systems allow you to supply a human readable name describing the current state of the whole repository or project, in our case here, V1.0. This is called tagging in some systems and labeling in others.

Branching and merging

capture

You start out with your main branch and your existing files, and you create a new branch, and now you have another set of identical files that you can make changes to. Because all branching means is that you’re taking the current state of the repository and creating a new copy.

In the new branch, as opposed to the main branch–which is also sometimes called the trunk–you can make any changes you want and they don’t affect the product that is already working.This is a perfect sandbox in which you can experiment and you can refactor code. You can delete old unnecessary features and add brand-new ones. You really have full freedom to do anything you want to.

In our example, a repository contains a file containing features A, B, and C. Once you branch you can see the main trunk still contains A, B, and C, and so does the branch.

Now, at T2 we may delete C and add feature D. And this doesn’t have anything to do with what we can do in the main branch, where we might, for example, add feature F. And back in the branch we might delete feature B now and add feature E. And this is all without changing our main production code.

Hand in hand with branching is the reverse process, called merging. This allows you to take changes that you’ve made in a branch and add them back into the main trunk or into another branch. So let’s take a look. We have our main branch, there is our main file, we’ll create a new branch, as we did before, feature 1, and we’ll take the copy and we’ll add feature D. And now back in the main branch, we’ll add feature F. And here in parallel, we’re going to add feature E in the feature 1 branch.

So as you do these changes, you might decide you like both sets of changes and want to combine them. In this case, we don’t have any conflicting changes because we didn’t delete anything, we just added. So now we ask our Version Control system to merge the two branches together, and now we end up with a new file 1, checked into the main branch with all the features from both the main branch and the feature branch.

When there are conflicts, such as two changes in each branch that effectively did the same thing, for example, if we deleted feature B in both the main branch and the feature 1 branch, the Version Control Software sometimes can’t tell exactly what you meant to do and so therefore it declares a merge conflict and requests that you resolve this manually.

 

 

 

 

 

Advertisements

One thought on “Fundamentals of Software Version Control

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s