Git that code!

I learned about Git several months ago. A scientist was helping me out with some data processing and had his code hosted on Github. At the very beginning it was all confusion. With time, I figured out that something called version control was extensively used by programmers to develop applications in a collaborative environment and that Github was only one of the many repositories available.

Version control is a system that keeps track of all changes in our electronic documents. It could be plain text with our newest novel or programming code. Version control creates extra (hidden) files in our working directory that track any difference in the file (or files) we are working on. The mechanic of version control is rather complex and beyond the scope of this post. However, I want to guide you through a simple workflow that I use every day and show the power of one of the version control systems available: Git.

Git has several options with a broad spectrum of functionality, from simple tasks to complex endeavors. Something important to keep in mind:

Working files in your project are stored in your local computer but most of the time you will be working with local and remote copies. Remote copies of your files will be stored in a Git repo such as Github or Gitlab.

What Git does is keeping track of all changes in all your files and folder placed in the same working directory. For example, if your working directory is called watchmaker , each subfolder and file within watchmaker will be tracked . Also, Git can merge files and resolve conflicts when two changes in the same file seem inconsistent, but this is a more advance feature that I will no cover here.

The following examples are based on my daily experience. I have been developing code by myself so this is a very simple example. When you develop code with others, extra complications may come up and you will probably need to manage other commands as well.

I will assume you are familiar with Unix-based commands. I would recommend using Github as the remote repo since it is well documented and there is good amount of help in different forums.

1. A new idea? Create a repo

As I mentioned earlier, repos are folders containing your project. I like to organize my repos by having them in a folder named Github located in my home directory. Let’s create a new repo called watchmaker :

Therefore, we create our local repos with the init  command. The terminal should show:

2. Add and commit a file

Let’s create and save a simple python file called hello_watchmaker.py  containing the following line:

We check the status (i.e. if there is any change in our working directory) with the status  command:

The above command returns:

As the message says, we have a new untracked file and if we want to track the changes we need to use the add  command:

Now the file has been added to the repo. If we check the status:

Git is now aware of the new file called hello_watchmaker.py. Any change from this point onward will be tracked by Git. Let’s modify the single line of our file (note I added '\n'  characters):

Checking the status:

To explain what is going on here, I will make use of this simple scheme:

git_tree

So far we added our new file to the Index; however, Git will not be happy until we add and commit our new changes in the repo so that all the changes in the working directory go to the HEAD. You can imagine the HEAD as the last updated version of your working directory. In other words, we first need to add the file to the Index and then commit, so that the HEAD is updated.

The commit command moves the changes to the HEAD. The commit message (string after -m and between quotation marks) is usually short and should help to have an idea of what has been changed. The commit command returns:

I will not go into the details of the above message. However, you should notice that Git is telling you that 1 file changed, which has 1 insertion and no deletions. Whenever you use the commit command you will get a similar message depending of the changes in your working directory.

Checking the status:

This means that Git is happy and that all our changes in the working directory are updated in HEAD.

3. What’s the difference?

After you make a commit you may wonder: what was the change I just committed?? To my knowledge, there is no simple way to know. You have to use other ( potentially confusing) commands so I will not talk about this here. Nevertheless, it is a good practice using the diff command before you add the file to the Index. Let’s change and save hello_watchmaker.py again:

Checking the status:

Now, we can see the differences between the working directory and the HEAD with the diff  command:

which returns:

In this way, next to the -  sign are things you removed, whereas the +  sign points out things you have added. In the example above, I removed the second '\n'  and added an entire line. Again, this only work if you haven’t added to the Index yet. Thus, if you use git add  and then  git diff  you will get nothing.

Checking the status:

In case you forgot to use git diff  before git add , you can still take the file out of the Index (known as unstage) and check the difference using the command reset HEAD . For example:

Then, using git diff  will give you the difference relative to HEAD.

4. Push those changes!

Make a remote repo with the same name as the local repo. Here are the instructions for Github. I usually create the README.md file in my local repo so that I keep the “initialize this repository with a README” option unchecked.

In the working directory use:

This sets the remote repository on your local working directory. Be careful of using https . Origin will be the name of the remote (like an alias). Then, push  (aka upload) the files to the remote repo:

This means that you are copying the local repo to the master branch in the remote repo (or origin). Branches are a whole topic by itself so I will talk about this in a future post. For now, assume that you work with a single branch called master. If you get the following message:

error: The requested URL returned error: 403 Forbidden while accessing https://github.com/rvalenzuelar/watchmaker.git/info/refs

you need to edit the config file within the .git folder located in your working directory. As the name says, this file contains basic configuration of your local repo, like the url where you are pushing the changes. To fix this error replace the url line:

for:

You can simplify the push command when you are always pushing to the same branch (in this case our single branch master). To do so, use the following command:

In this way, you can call:

to copy your changes to the remote repo. Keep in mind that when you start using branches, setting the upstream in this way might not be a good idea since you will be pushing always to the same branch master

5. Pull the remote

Let’s say you pushed changes to watchmaker while you were working at your office. Then, if you want to work on the same repo at home (assuming you are using different computers) you have to add an SSH key in your home computer and then clone the repo. To do so, while in your  working directory (at home) use:

After the remote is copied to your working directory you will see the same files you left in your working directory at your office.

Now, let’s say you go back to your office the next day, make some changes to hello_watchmaker.py , push those changes, and go back home again hoping you can keep working on hello_watchmaker.py . This time, instead of cloning the repo you pull the changes:

In this way, the changes you pushed at your office will be updated in your working directory at home.

6. Workflow I use the most

If you are already developing a project you might have your own workflow. Otherwise, you can use something similar to what I do in a daily basis.

I check the status of my working directory:

If I am working in two different computers (i.e. at home and office), I update my local working directory:

I modify files and check the changes:

I add all the changes to the Index:

I commit the files, sometimes all of them with the same message:

…sometimes I specify a message for a single file:

…or use the same message for a group of files:

Finally, I push the changes:

7. Final thoughts

In my opinion, using a version control system like Git is key when you want to develop ideas (especially code) in a collaborative environment. In addition, Git helps to scientists to expose their codes to peers, so that everyone can check how they processed certain data set or what was the algorithm employed to solve certain problem, encouraging the advance of scientific endeavors. Of course, corresponding authority to the original ideas has to be acknowledge.

Keep in mind that Github is free for a public repository, which means than anyone can access your files. If you want to have a private repo in Github there is a fee you have to pay. On the other hand, repos like Gitlab offer private space for free.

Leave a Comment