A couple of weeks ago I encouraged a college student to make his project open source and post it on Github (or Gitlab or Bitbucket or whatever). What had not occurred to me is that this person had never had to collaborate with other coders professionally or privately. He did not understand the implications or the process of pushing code to Github. The following is adapted from an email I wrote to this gent to explain the ins and outs, whys and wherefores of collaborating on Github.
First, why go through this at all?
- It looks great on your resume! As a hiring manager, I look for people with extra-curricular activities: a history on Github, blogging, conferences talks, etc.
- You’ll have to learn how to collaborate eventually. Might as well start now. School will teach you how to code but it won’t teach you how to work with other coders.
What to remember:
- You are the boss of this project, for as long as you want to be. You control what goes in, you are the “benevolent dictator”1)There are other methods of governance, but as a single-creator just starting out, you will almost certainly start off as the sole operator of the project. until you don’t want to do it anymore. I want to make it clear, this is your show. No one is trying to steal your thunder (well, I hope not anyway).
git is a source control program. Source control is used primarily for its ability to collaborate with other people on the same code base and for its ability to maintain a version history of the code. Once you are in a real environment with users, you need to be able to “roll back” code if a release has a bad bug, and you need to be able to do this quickly.
Here are your most important git commands. If you don’t have git installed, it is easy enough to get. If you are using macOS, I strongly recommend homebrew. If you have brew, from the Terminal you just type:
brew install git. These commands are all typed from the Terminal command line, in your code folder.
git init – create a new repo in the current directory.
git clone <url> – Get a copy of a repository from
<url>. This is how a new contributor gets a copy of an existing repository.
git add <file> – “stage” a file for the next commit.
git add . stages all untracked, changed files.
git commit – commit your staged files. This creates a version control checkpoint, so you can roll back to this version of the code at any time. Commit often, at least once a day. Some people commit much more frequently than that.
git push – Upload your code to a remote server. For our purposes, the remote server is Github.
git pull – Download code changes from a remote server. Again, Github.
git checkout – Checkout code. Used to switch to a different branch, or checkout a previous version of the code. For example, if you made changes to a file and you want to reset it back to what it was, you can just
git checkout <file>.
git branch – list branches.
git checkout -b <branch_name> – Create a new branch, checkout that branch.
git merge <branch> – Merge code from
<branch> to the current checked-out code.
So, that is a big knowledge dump. Now I am going to explain why you need all that.
git is magical because of its ability to branch code. These days, it is accepted as a given, but in the days of CVS / SVN, branching large codebases could literally take hours. In git, it takes seconds. This revolutionized how we manage code.
A branch is just a copy, or snapshot, of the repository at a certain point in time. You can then make changes to that code. When you are done making changes, you can merge that code back into the production branch. Example:
- Let’s say all production code is delivered from branch “master” (master is the name of the default branch in git).
- I want to work on “Cool Feature X”.
- I create a new branch:
git checkout -b cool_feature_x
- I work on the code for a while. I test it, it looks good.
- I merge the code back into master:
git checkout master && git merge cool_feature_x
- You can now push the changes to production as a new finished version of the product.
Using this pattern, I can be working on many features simultaneously and in isolation. In fact, many people can be working on many different features simultaneously without impacting one another.
Play around with creating and merging branches a bit, this is going to be important for what comes next.
So, for the most part, Github gives us two things: a destination for a “source-of-truth” repository that multiple people can work on; and a nice user interface. To make this work, everyone needs to push and pull code from Github. “Push” updates the code on Github to make sure everyone is in sync, and pulling brings the code back onto your local machine to make sure you are in sync. You can push and pull any branch to Github, not just master. This is a nice way to let other people play around with new features without having to merge them into the production code (master).
This is where your job as code maintainer / repository owner comes into play.
Anyone can make a copy of the repository and work on the code, but not just anyone can merge code back into your repository. Depending on how you setup your Github project, it is likely that you are the only person who can merge changes. This gives you, the project maintainer, a chance to review and approve all changes. As the project grows, you may add additional maintainers who also have the authority to review and accept changes. Github has special functionality to manage this process, that is not built into git natively, called a pull request.
A pull request is basically me asking you to approve my code for submission to your repo. This is your opportunity to review the code, download the branch, try it out, and make suggestions for improvement. In fact, the person making the pull request has made a personal copy of the your entire repository, called a fork.
Github makes it easy to review and comment on the code in the pull request. It makes a nice “diff” that shows you what changed. And so on. At the end of the day, a pull request is just a fancier version of git merge. Before the merge, you get to approve or deny the merge. That’s basically it.
A note on setup
So you added me to the repo as a “contributor.” I think I have the ability to approve my own Pull Requests, but I am not going to do that. As long as you are the guy with your name on the manifest, and you are submitting code to the extension stores, you should be controlling what goes in.
A couple of people (myself and another guy) have submitted changes that affect the look and feel of the extension’s output. If you don’t like what you see, drop a note on the ticket or pull request and tell us what to change. Or if you don’t like that direction at all, just say so.
Final note: Github Issues
Github has a feature called Issues. This is a great place to track both bugs and new features, and have conversations around specific features. This might seem kind of overkill as a lone wolf developer, but I’d suggest checking it out and see how it works. Issues are the focal point for discussion about the code and product. This is where you can help other people to help you. Be clear about what you want and don’t want for the direction of the product. The right people will show up, and the wrong people will disappear.
Usually you want every pull request to be tied to a specific issue. That way, you can track the history of a feature or bug from inception as an issue to its final merge into the codebase.
There is much left unsaid, and yet it feels too long as it is. Hopefully this is enough info for a prospective newcomer to get started managing their open source project. Please leave feedback in the comments!
|↑ 1.||There are other methods of governance, but as a single-creator just starting out, you will almost certainly start off as the sole operator of the project.|