Challenges of “Big Data” in Biology

First of all what is this “Big Data”???

If you are reading this post you must already be having some knowledge about big data and big data analytics. So, what is this big data and why is there such a sudden hipe of this thing. Well, big data  is a basically a general term that is used to describe the voluminous amount of structured and unstructured data that are too large and complex to manipulate or interrogate with standard methods or tools. Specifically, Big Data relates to data creation, storage, retrieval and analysis that is remarkable in terms of volume, velocity, and variety.

Capture4

The new world of “Big Data”

Now the question comes as how to deal with this monstrous amount of data. There comes the importance of analytics. Big data analytics refers to the process of collecting, organizing and analyzing large sets of data to discover patterns and useful information. You will be amazed to know that every day, we create about 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. Also by 2020 the analysts predicts that the amount of data will be 50 times what it is today.

Problems and  challenges that Big Data Analytics faces today :

One of the many challenges that Big Data Analyticsfaces today is in creating platforms that can pull in unstructured data as easily as structured data. Another problem of data analytics is with segmentation. Putting it in the funny way Des MacHale(professor of mathematics and author) once told that “the average human has one breast and one testicle”. A bit ridiculous, but true. The moral of this story is to separate men and women when analyzing number of sexual organs.This is one of the major problem that this field faces today as with so much amount of data it is very critical to segment the data properly to take out useful information, otherwise it can give a totally absurd result.

How “Big Data” has entered into the field of biology ???

Biology-Word-Cloud

Biology Word Cloud

As we know that data is growing exponentially in all sciences today and science is becoming too much data driven. There has been a convergence of physical and life science through Big Data and with cheap DNA sequencing  and bioinformatics helping in  a massive explosion in data. It is also surprising to know that the new DNA sequences are pouring data into the international databases at  1.5 billion bases per month.

Significance of Genome Projects in generating Big Data:

Now this is very important since the major amount of data is being generated in biology from Genome projects.So, the question comes that what are these projects and how they are important for us???

Genome projects are  aimed at determining the complete genome sequence of an organism eg. bacteria, virus, fungus, plants, animals, etc. The genome sequence of an organism includes the collective DNA sequences of each chromosome in the organism.

tumblr_moelc9M3No1s6t806o1_1280

Human Genome Project

One of the major genome project was the Human Genome Project which was started in 1990. HGP was one of the great feats of exploration in history – an inward voyage of discovery rather than an outward exploration of the planet or the cosmos; an international research effort to sequence and map all of the genes of members of our species, Homo sapiens. Completed in April 2003, the HGP gave us the ability, for the first time, to read nature’s complete genetic blueprint for building a human being.It was a landmark project that is already having a major impact on research across the life sciences, with potential for spurring numerous medical and commercial developments.

Amount of data being generated in Biology and how to tackle them:

Capture6

Sequencing Data Trend

It is really fascinating to know that the annual genome sequence data generation forecasted to exceed 1 ExaByte by 2018. With so much data being generated in the field of biology there is a need to exploit this  knowledge from diverse and complex datasets using new e-tools and data resources . Now that’s where comes the importance of Bioinformatics and Computational biology.

 So what is this Bioinformatics???

I went through a lot of extensive searching through the net to find a proper definition of this field and also to know if there is a difference between Bioinformatics and Computational biology and i found this answer the best:

“Bioinformatics is a broad field that interfaces a variety of life science disciplines (biology, genetics, biochemistry, biophysics, etc) with a variety of quantitative sciences (mathematics, statistics, computer science, engineering, etc). Bioinformatics techniques typically involve developing and applying software and algorithms to computationally intensive biological questions, such as those common in structural biology, genomics, sequence analysis, and systems biology.”

Some scientists draw a distinction between the term bioinformatics and computational biology. While these areas indeed broad and diverse, these distinctions in terms are not consistent or well-defined.For the future purpose i’ll be using the term bioinformatics in any case. Now the  picture below will give you the brief idea about the major areas that bioinformatics touches:

Capture7

Areas of Bioinformatics

In my future posts i’ll try to dig deeper  into these areas and will share my experience while working on the projects related to bioinformatics.

Stay tuned guys for some more interesting stuff……