Basic options for bioinformatics data analysis

This post is written in the spirit of “if you write it in an email to more than one person, turn it into a blog”.

It’s for beginners who are just starting out in bioinformatics and who need information about which path to go down for which programming language.

—————————

Firstly, you will need to understand how the command line works, because most bioinformatics tools don’t use Graphic User Interfaces, but rather use the command line (aka the terminal). The command line is a powerful way to use a computer, but it has a learning curve, and also there is less of a safety net so you can make mistakes/errors really quickly (like deleting your entire file system if you’re not careful). Below is a good option for getting familiarised with the command line, but really it’s just the kind of thing which comes with practice. This is a good option for learning some terminal/command line/shell commands (all words which usually mean the same thing).

When it comes to more involved analyses, you have two basic options for learning the skills you need to do bioinformatics.

The options are:

  1. Learn bash & R programming.
  2. Learn python.

bash and R option

Basically, in this option, you will use bash scripting to automate running tasks on the command line, and then you will use R for all the data analysis. Here are some materials if you decide on this option:

  1. Learn some bash programming (bash is a type of “shell”, sorry, lots of new jargon!) https://www.learnshell.org/
  2. Swirl – an introduction to the basics of R – https://swirlstats.com/students.html
  3. Learn R for data science – https://r4ds.had.co.nz/index.html

Python option

Python is a better all purpose language than R, therefore we can use it for both automating running tasks on the command line and data analysis. You probably don’t need to do all of the material in everything below, I’ve just included a few different options for you to explore yourself.

  1. The basics of python – https://learnpythonthehardway.org/python3/
  2. Python tutorials on youtube – https://www.youtube.com/playlist?list=PL-osiE80TeTt2d9bfVyTiXJA-UTHn6WwU
  3. Python basics – http://swcarpentry.github.io/python-novice-gapminder/

Which one should I choose?

Personally I would go with the one you have more experience with, or the one that is used by the lab/group/people around you so that you can easily get advice if you get stuck. There is a lot of (mostly joking) animosity between R and python communities, but truth be told, they’re both amazing options with fantastic eco-systems which let you do incredible things.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s