Statistical Analysis of Metagenomic Data *SLIDES NOW AVAILABLE*
Wednesday 5th October 2011
- Agriculture: investigating soil-borne crop diseases, assessing the impact of agricultural treatments and practices, developing methods for the bioremediation of contaminated soils and the monitoring of animal health.
- Medical and epidemiological sciences: developing disease diagnosis and treatment strategies based on the composition of bacterial and viral communities in gut, dental caries, tumours and skin.
- Environmental sciences: profiling of microbial communities in marine sediments or ancient samples of soil to understand theimpacts of pollution and climate change.
- Bioenergy: advancing technologies that process crops or waste with microbial systems to generate renewable forms of energy.
- Palaeontology: differentiating between DNA that originates from bacterial, fungal, and human contaminants in fossil samples.This is crucial for studies that aim at elucidating the evolutionary processes of extinct species.
Programme | Document downloads for IBS members. Join us now. | ||
10:00 - 11:30 | Introductory tutorial: an overview of scientific questions, bioinformatic and statistical challenges Ian Clark, Michael Defoin-Platel, Elisa Loza, Wally Gilks (Rothamsted Research) This session aims to give a brief introduction into the area of metagenomic analysis. We will give an overview of how changes in sequencing technology have enabled new questions to be asked about microbial communities in many different scientific areas. Work in soil science at Rothamsted will be used to give a more detailed example of approaches and questions of interest. The efficient processing and management of sequence data is an essential step prior to statistical analysis. An overview of these tasks will be given, with some comments on the bioinformatic tools currently available. Finally, we will describe briefly how statistics has been used to answer some of the scientific questions, including issues of experimental design, profiling of communities and comparison of communities across different conditions. We will also consider areas for future statistical research. | ||
11:30 - 12:00 | Coffee | ||
12:00 - 12:45 | Comparative meta-genome analysis Suparna Mitra (Tuebingen University) Metagenomics is a rapidly developing science, promising expansion towards discoveries that can help in the comprehension, cure and prevention of many diseases, in monitoring the impact of pollutants on ecosystems and in mining the rich genetic resource of non-culturable microbes that may lead to the discovery of new genes, enzymes, and natural products. The recent development of new, less expensive, ultra-high throughput sequencing technologies that can produce huge numbers of DNA reads at an affordable cost, has boosted the number and scope of metagenomic sequencing projects. It has resulted into a dramatic increase in the volume of sequence data that must be analyzed. The analysis of metagenomic datasets is an immense conceptual and computational challenge. The analysis often starts by asking the questions of "who is out there?', "what are they doing?" and "how do they compare?". This talk will briefly describe, how these computational questions can be addressed using MEGAN, the MEtaGenome ANalyzer program. First how to analyze the taxonomic and functional content of a single dataset and then more specially showing how such analyses can be performed in a comparative fashion. I will demonstrate how to compare different datasets using ecological indices and other distance measures. The discussion will be conducted using a number of published marine datasets comprising metagenomic, metatranscriptomic and 16S rRNA data. | ||
12:45 - 14:00 | Lunch and posters | ||
14:00 - 14:30 | Statistical and computational applications of short read DNA sequencing for viral sequence discovery Vincent Plagnol (University College London) It is becoming increasingly clear that human pathogens play a role in multiple disorders which do not have an obvious infectious basis. This is in particular the case for autoimmune disorders such as type 1 diabetes as the interplay between the host immune system and infectious agents shapes our immunity and has a long term effect on human biology. All active pathogens, and in particular viruses, leave a RNA signature in affected tissues. The advent of high throughput DNA sequencing techniques, in particular transcriptome sequencing or RNA-Seq, is an opportunity to interrogate human tissues for the presence of these infectious agents. However, the use of short sequencing reads limits the sensitivity and specificity of viral identification. To overcome these issues, it is necessary to efficiently combine read assembly with homology/blast based methods in order to increase read length and provide a more powerful tool for viral sequence detection. In this talk, I will present an ongoing methodological work to address this question. Using a combination of simulations and actual RNA-Seq data, I will highlight the reasons why metagenomic analysis of short read sequence data is challenging and show how these limitations can be overcome. | ||
14:30 - 15:00 | Statistical analysis of microevolutionary variation in metagenomic data Daniel Falush (Max Planck Institute Leipzig) This talk will outline the challenges of studying the evolution of bacteria based on metagenomic data. Differences in composition between metagenomic samples provides information on evolutionary change as well as organismal composition but in order to access this information new statistical algorithms are be required to that take into account the uncertainty about which organism each sequence read comes from. I will outline first generation algorithms that address this problem.
| ||
15:00 - 15:30 | Tea | ||
15:30 - 16:00 | Quantifying diversity and abundance in soil microbial communities from high-throughput sequence data E. Loza, M. Defoin-Platel, K. Dawson, S. Welham and W. Gilks (Rothamsted Research) Soil microbial communities are essential in many different ways. Understanding the complex structure of soil microbial communities is crucial to better manage agricultural soils and minimise the negative impact of agricultural practices.
We address the problem of identifying bacterial groups present in soil and estimating their relative abundance using high-throughput data sampled from agricultural fields. As a first step, we demonstrate the ability of our method to correctly discriminate between different bacterial groups and to quantify their relative abundance using simulated data. We then go on to analyse Roche/454 data sampled from one of Rothamsted’s long-term experimental fields. We present preliminary results on this analysis. | ||
16:00 - 16:30 | Discussion |
Membership
Existing members can login below to view all site content. Lost password?
Other visitors might be interested to learn more about the benefits of membership.
Other events
11 Nov 20 | Estimating Abundance and Beyond |
28 Oct 20 | Advances in statistical genomics |
22 Sep 20 | Advances in Survival Analysis |
02 Oct 19 | New perspectives on studying the effects of treatment on a time to event outcome |
10 Jul 19 - 12 Jul 19 | 7th Channel Network Conference |