# Model selection for genetic and epidemiological data

### Thursday 29th March 2012

International Biometric Society - British and Irish Region (IBS-BIR) Spring meeting and LSHTM Centre for Statistical Methodology Meeting

**Title**: Model selection for genetic and epidemiological data

**Date**: 29 March 2012, 1:30PM-5PM

**Location**: Manson Lecture theatre, LSHTM

**Cost and Registration:** £20 for International Biometric Society-British and Irish region members, £40 for non members and free for student members (paypal payment available or by cheque on site). Note that it free for students to join the Biometric society and it costs £40 to join the Biometric Society as a full member.

## Programme | Document downloads for IBS members. Join us now. | ||

13:30 - 14:15 | Stijn Vansteelandt, Ghent University and LSHTM: Challenges for model selection in etiologic studiesOver the past 3 decades, enormous progress has been made in terms of understanding and relaxing the conditions under which causal inferences can be drawn from observational studies. Most available procedures assume that a set of covariates is available, which is sufficient – in some sense – to adjust for confounding of the association between exposure on outcome. The possible high dimensionality of this set makes that some reduction is often necessary in samples of typical size. Interestingly, this important and widespread problem has been largely ignored in the causal inference literature. In this talk, I will reflect on the challenges for model/covariate selection in etiologic studies. I will argue that routinely applied variable selection procedures – while potentially relevant for the construction of outcome prediction models – are sub-optimal for selecting covariates in causal analyses, in view of which I will propose a procedure directly targeting the quality of the exposure effect estimator. I will discuss the roles of causal inference procedures based on outcome regression models versus propensity score models. It will be found that certain strategies for inferring causal effects have the desirable features (a) of producing (approximately) valid confidence intervals, even when the covariate-selection process is ignored, and (b) of being robust against certain forms of misspecification of the association of covariates with | ||

14:15 - 15:00 | Christian Robert, Universite Paris Dauphine: ABC model choice and relevant summary statistics Approximate Bayesian computation (ABC) have become a essential tool for the analysis of complex stochastic models. Having implemented ABC-based model choice in a wide range of phylogenetic models in the DIY-ABC software (Cornuet et al., 2008), we first present theoretical background as to why a generic use of ABC for model choice is ungrounded, since it depends on an unknown amount of information loss induced by the use of a summary statistic (Robert et al., 2011). We then present necessary and sufficient conditions on the summary statistics for ABC based model choice procedure to be consistent, a solution that avoids the call to additional empirical verifications of the performances of the ABC procedure as those available in DIYABC and advocated in Ratman et al. (2011). Note: these are joint works with J.M. Cornuet, J.M. Marin, N. Pillai and J. Rousseau.
| ||

15:00 - 15:30 | Tea/Coffee break | ||

15:30 - 16:15 | Doug Speed, University College London: Improved Heritability Estimation using Linear Mixed ModelsThere is continued discussion regarding the so called "missing heritability" problem. By applying a linear mixed model to whole-genome SNP data, a series of papers headed by Yang et. al. have presented strong evidence that many complex traits are highly polygenic, so that while common variants can explain most of the heritability, each on average has such a small contribution to make their detection by standard size GWAS almost impossible. We have investigated use of the linear mixed model for heritability estimation, finding that it is highly sensitive to the correlations induced by linkage disequilibrium (LD). In particular, it will struggle to pick up the variance explained by rarer variants, even if typed, as their signals will be on average more poorly represented by the SNP array. We have devised a solution to this problem, allowing unbiased estimates of heritability in spite of LD. Using our revised method, we have been able to show that almost all of the heritability for epilepsy can be explained by common SNPs, suggesting that effective prediction models should be possible.
References: Genome partitioning of genetic variation for complex traits using common SNPs; J. Yang, P. Visscher et. al. Nature Genetics 2011
| ||

16:15 - 17:00 | David Clayton, University of Cambridge: Link functions in multi-locus genetic models"Complex" diseases are, by definition, influenced by multiple causes, both genetic and environmental and statistical work on the joint action of multiple risk factors has, for more than 40 years, been dominated by the generalized linear model. In genetics, models for dichotomous traits have traditionally been approached via the model of an underlying, normally distributed, liability. This corresponds to the generalized linear model with binomial errors and a probit link function. Elsewhere in epidemiology, however, the logistic regression model, a GLM with logit link function, has been the tool of choice, largely because of its convenient properties in case-control studies. The choice of link function has usually been dictated by mathematical convenience, but it has some important implications in (a) the choice of association test statistic in the presence of existing strong risk factors, (b) the ability to predict disease from genotype given its heritability, and (c) the definition, and interpretation of epistasis (or epistacy). I will review these issues and propose a new association test. |

### Membership

Existing members can login below to view all site content. Lost password?

Other visitors might be interested to learn more about the benefits of membership.