Statistical Analysis of Metagenomic Data *SLIDES NOW AVAILABLE*

Wednesday 5th October 2011

Metagenomics is the study of the total genomic content of microbial communities. In metagenomics studies, DNA material is sampled collectively from the microorganisms that populate the environment of interest (e.g. agricultural soil, ocean water, or the humangut). The extracted DNA sequences are subsequently used to profile the environment and its biodiversity, its dominant microbial classes or biological functions, and to investigate whether and how this profile differs from those of other environments.
Metagenomic analysis is distinct from other forms of genomic analysis principally in three ways: the scientific questions asked are often at the level of communities of organisms; these organisms and their evolutionary relationships are mostly unknown; and, when using second-generation sequencing technologies, their sampled sequence data are sparse, fragmented and pooled.
Insights derived from metagenomics studies have become increasingly relevant in areas as diverse as human health (e.g. evaluation of antibiotic effects and other drugs) and biodefense (e.g. monitoring of food, air and water quality). Other fields that benefit from metagenomics contributions include:
  • Agriculture: investigating soil-borne crop diseases, assessing the impact of agricultural treatments and practices, developing methods for the bioremediation of contaminated soils and the monitoring of animal health.
  • Medical and epidemiological sciences: developing disease diagnosis and treatment strategies based on the composition of bacterial and viral communities in gut, dental caries, tumours and skin.
  • Environmental sciences: profiling of microbial communities in marine sediments or ancient samples of soil to understand theimpacts of pollution and climate change.
  • Bioenergy: advancing technologies that process crops or waste with microbial systems to generate renewable forms of energy.
  • Palaeontology: differentiating between DNA that originates from bacterial, fungal, and human contaminants in fossil samples.This is crucial for studies that aim at elucidating the evolutionary processes of extinct species.
This meeting brings together researchers from different fields working on statistical methods for analysis of metagenomic data.
Date: 5th October 2011
Location: Fowden Conference Hall, Rothamsted Research.
Rothamsted Research is on the south side of Harpenden, just off the A1081 heading from Harpenden town towards St Albans. The site is about 15 minutes walk from Harpenden station, which has direct trains from London St Pancras (~30 minutes) and Luton Parkway (for Luton Airport). Further information can be found at  (click on ‘How to find us’ at the bottom of the page). Car parking is available in the main car park; a permit must be obtained from Reception on arrival.
Call for posters: To facilitate further discussion, participants may display their own posters on work in the area during lunch. As space is limited, please email (before 1st October) to reserve a space.
Costs: Pre-registration is required, and charges for the meeting include lunch. Biometric Society members £30; student members £15; non-members £50. Payment by cheque, made payable to the Biometric Society, should be made by 1st October. Send (with your name and email address) to: Sue Welham, Metagenomics Meeting, BAB, Rothamsted Research, Harpenden AL5 2JQ.
Non-members are encouraged to join the Society ( to take immediate advantage of the members' registration fee. Note: student membership is free!
Queries: email



 Document downloads for IBS members.
Join us now.
10:00 - 11:30Introductory tutorial: an overview of scientific questions, bioinformatic and statistical challenges

Ian Clark, Michael Defoin-Platel, Elisa Loza, Wally Gilks (Rothamsted Research)

This session aims to give a brief introduction into the area of metagenomic analysis. We will give an overview of how changes in sequencing technology have enabled new questions to be asked about microbial communities in many different scientific areas. Work in soil science at Rothamsted will be used to give a more detailed example of approaches and questions of interest. The efficient processing and management of sequence data is an essential step prior to statistical analysis. An overview of these tasks will be given, with some comments on the bioinformatic tools currently available. Finally, we will describe briefly how statistics has been used to answer some of the scientific questions, including issues of experimental design, profiling of communities and comparison of communities across different conditions. We will also consider areas for future statistical research.
11:30 - 12:00Coffee
12:00 - 12:45Comparative meta-genome analysis

Suparna Mitra (Tuebingen University)

Metagenomics is a rapidly developing science, promising expansion towards discoveries that can help in the comprehension, cure and prevention of many diseases, in monitoring the impact of pollutants on ecosystems and in mining the rich genetic resource of non-culturable microbes that may lead to the discovery of new genes, enzymes, and natural products.

The recent development of new, less expensive, ultra-high throughput sequencing technologies that can produce huge numbers of DNA reads at an affordable cost, has boosted the number and scope of metagenomic sequencing projects. It has resulted into a dramatic increase in the volume of sequence data that must be analyzed. The analysis of metagenomic datasets is an immense conceptual and computational challenge.

The analysis  often starts by asking the questions of "who is out there?', "what are they doing?" and "how do they compare?". This talk will briefly describe, how these computational questions can be addressed using MEGAN, the MEtaGenome ANalyzer program. First how to analyze the taxonomic and functional content of a single dataset and then more specially showing how such analyses can be performed in a comparative fashion. I will demonstrate how to compare different datasets using ecological indices and other distance measures. The discussion will be conducted using a number of published marine datasets comprising metagenomic, metatranscriptomic and 16S rRNA data.

12:45 - 14:00Lunch and posters
14:00 - 14:30Statistical and computational applications of short read DNA sequencing for viral sequence discovery

Vincent Plagnol (University College London)

It is becoming increasingly clear that human pathogens play a role in multiple disorders which do not have an obvious infectious basis. This is in particular the case for autoimmune disorders such as type 1 diabetes as the interplay between the host immune system and infectious agents shapes our immunity and has a long term effect on human biology. All active pathogens, and in particular viruses, leave a RNA signature in affected tissues. The advent of high throughput DNA sequencing techniques, in particular transcriptome sequencing or RNA-Seq, is an opportunity to interrogate human tissues for the presence of these infectious agents. However, the use of short sequencing reads limits the sensitivity and specificity of viral identification. To overcome these issues, it is necessary to efficiently combine read assembly with homology/blast based methods in order to increase read length and provide a more powerful tool for viral sequence detection. In this talk, I will present an ongoing methodological work to address this question. Using a combination of simulations and actual RNA-Seq data, I will highlight the reasons why metagenomic analysis of short read sequence data is challenging and show how these limitations can be overcome.

14:30 - 15:00Statistical analysis of microevolutionary variation in metagenomic data

Daniel Falush (Max Planck Institute Leipzig)

This talk will outline the challenges of studying the evolution of bacteria based on metagenomic data. Differences in composition between metagenomic samples provides information on evolutionary change as well as organismal composition but in order to access this information new statistical algorithms are be required to that take into account the uncertainty about which organism each sequence read comes from. I will outline first generation algorithms that address this problem. 


15:00 - 15:30Tea
15:30 - 16:00Quantifying diversity and abundance in soil microbial communities from high-throughput sequence data

E. Loza, M. Defoin-Platel, K. Dawson, S. Welham and W. Gilks (Rothamsted Research)

Soil microbial communities are essential in many different ways. Understanding the complex structure of soil microbial communities is crucial to better manage agricultural soils and minimise the negative impact of agricultural practices. 
We address the problem of identifying bacterial groups present in soil and estimating their relative abundance using high-throughput data sampled from agricultural fields. As a first step, we demonstrate the ability of our method to correctly discriminate between different bacterial groups and to quantify their relative abundance using simulated data. We then go on to analyse Roche/454 data sampled from one of Rothamsted’s long-term experimental fields. We present preliminary results on this analysis.
16:00 - 16:30Discussion


Existing members can login below to view all site content. Lost password?


Other visitors might be interested to learn more about the benefits of membership.

Other events

11 Nov 20Estimating Abundance and Beyond
28 Oct 20Advances in statistical genomics
22 Sep 20Advances in Survival Analysis
02 Oct 19New perspectives on studying the effects of treatment on a time to event outcome
10 Jul 19 - 12 Jul 197th Channel Network Conference

© 2009-2024 Biometric Society, British and Irish Region | Admin | Read our cookie and privacy policy