Bioinformatics 101

Claire Chung
6 min readJul 9, 2021

Welcome to the world of bioinformatics!

Image Credit: CI Photos/Shutterstock.com

Having guided quite a number of student mentees and sharing this introduction I wrote via email to dozens of students throughout my postgraduate study, I guess it’s good to share this piece somewhere for a broader interest. As this is originally meant as a catalog of resources to supplement verbal introduction, it’s mainly in point form and I will keep it this way (lazily haha).

So, here we go! No worries about not having learnt anything about programming and bioinformatics. We all need a starting point for anything.

Basic skills and knowledge

To work on bioinformatics, here are the basic knowledge required

1. Sequencing technologies:

  • What is it? Why and when to do sequencing?
  • What current technologies are there?

2. Sequencing analysis workflow

  • What are the common file formats?
  • Standard procedures: e.g., quality control, read trimming, alignment
  • What are the common tools?

It is sufficient to first understand the big picture and be familiar with the most common tools. Fine details will be specific for each project by experience.

3. Basic Linux commands and programming skills

Not all bioinformaticians will develop new software, but it is necessary to become familiar with writing up simple scripts to automate tasks and handle data, e.g., table manipulation and string extraction.

4. Statistics

To get better at bioinformatics or any data analysis, it’s good to know about at least the principles of common statistical tests, clustering and dimension reduction methods.

Advanced knowledge

5. More advanced computer skills

System administration, data structures, algorithms, software engineering, etc.

6. Bioinformatics algorithms and data structures

To better understand and evaluate existing tools and research — important when designing own programs and especially when developing software.

7. Current advancement

As in any field — read current literature, but of course we will only understand after grasping the basic concepts

First IT knowledge to acquire

There are much to talk about for each of the above areas. We will first start with one we can start anywhere anytime.

  • Linux– a lineage of open-source operating systems (OS), just as Windows & Mac are OS, but common for servers for the good performance .
  • BASH shell — a “typing” way to get around without mouse on servers without graphics- to handle repetitive things that will take you tonnes of time- Google and learn these basic commands:
    ls, cd, less, head, tail, cp, mv, cat, wc, wget, gzip, tar, vi, rm….
    Beware when rm (removing) and mv (moving, which may replace existing files xd)
    sed, awk are useful commands for data manipulation
  • Python — A more sophisticated language the most learnt for years to handle more complicated operations, while the language is intuitive for programming beginners. Many libraries (ready-to-use functions written by others) are available.
  • R — While more packages written in Python are published, many bioinformatics packages are written in R on the bioinformatics software market Bioconductor, a common example is the DESeq. Need to have basic knowledge in R to run them. Being familiar with the data analytics package family tidyverse will ease life.
  • Concepts — these are most important as they apply regardless of language syntax
    - Variables: Boolean (True or False), integer, float(ing point number), string, list/array
    - Logic: IF-THEN-ELSE, FOR-loop, WHILE-loop-, functions, etc.
  • Data manipulation & visualization
    -
    Table manipulation: extract columns/rows by condition, transformation, etc.
    - Graph plotting: basic chart types like line, scatter and box plots to volcano plots and heatmaps

Resources

Recommended Reviews

Sequencing and bioinformatics

Genomics

RNA-seq

Single-cell transcriptomics

Spatial transcriptomics

Integrated omics

Deep learning

Recommended Books

Quite many useful IT books and resources from O’Reilly and Packt accessible via O’Reilly online learning subscription. If you are a university student, very likely you have free education access through the library. Do check it out!

Recommended online courses

Recommended Sites

Hope this can serve as a starter for anyone who may be interested. It’s always a nice challenge to pick up some knowledge and skills at your own pace in your free time.

And of course, we learn the best by having use cases. While there is a myriad of data out there and problem, you may want some guidance and exposure to actual practice in a research lab. If you are much interested, just look for a lab and talk to the professor for a chance of internship, just like my mentees did. Some time way before summer is usually a time you get better chance of being admitted.

Enjoy the journey!

https://button.like.co/ccneko1

--

--