Past and Upcoming Events
Spring 2013
Time: Wednesday, May 22 at 11 am
Speaker: John Lafferty, Louis Block Professor; Department of Statistics and Department of Computer Science; University of Chicago
Title: Variants on Nonparametric Additive Models for Regression and Graphical Modeling
Abstract: We present recent research on nonparametric additive models for different families of statistical estimation problems, including multivariate regression and graphical modeling. The focus is on scaling to high dimensions while controlling the loss in statistical and computational efficiency compared with linear models. Convex geometry and optimization plays a central role. Time permitting, we will also discuss some recent efforts to develop new courses and computational infrastructure for statistics and machine learning at UChicago.
Time: Wednesday, May 1 at 11 am
Speaker: Jim Berger, The Arts and Sciences Professor of Statistics; Department of Statistical Science; Duke University
Title: Reproducibility of Science: P-values and Multiplicity
Abstract: Published scientific findings seem to be increasingly failing efforts at replication. This is undoubtedly due to many sources, including specifics of individual scientific cultures and overall scientific biases such as publication bias. While these will be briefly discussed, the talk will focus on the all-too-common misuse of p-values and failure to properly account for multiplicities as two likely major contributors to the lack of reproducibility. The Bayesian approaches to both testing and multiplicity will be highlighted as possible general solutions to the problem.
Time: Wednesday, April 24 at 11 am
Speaker: Osnat Stramer, Associate Professor; Department of Statistics and Actuarial Science; University of Iowa
Title: Bayesian Inference for Diffusion Processes
Abstract: The problem of formal likelihood-based (either classical or Bayesian) inference for discretely observed multi-dimensional diffusions is particularly challenging. In principle this involves data-augmentation of the observation data to give representations of the entire diffusion trajectory. In this talk, we provide a generic and transparent framework for data augmentation for diffusions. We introduce a generic program which can be followed in order to identify appropriate auxiliary variables, to Markov chain Monte Carlo algorithms that are valid even in the limit where continuous paths are imputed, and to approximate these limiting algorithms. We also present the Pseudo-marginal (PM) approach for Bayesian inference in diffusion models. The PM approach can be viewed as a generalization of the popular data augmentation schemes that sample jointly from the missing paths and the parameters of the diffusion volatility. The efficacy of the proposed algorithms is demonstrated in a simulation study of the Heston models, and is applied to the bivariate S&P 500 and VIX implied volatility data.
Time: Wednesday, April 10 at 11 am
Speaker: Mary Lindstrom, Professor; Department of Biostatistics and Medical Informatics; University of Wisconsin − Madison
Title: Modeling the shape and variability of curves
Abstract: I will present an overview of self-modeling for functional data. Functional data occur when the ideal observation for each experimental unit (individual) is a curve and each individual's data consist of a number of noisy observations at points along their curve. The self-modeling approach to the analysis of functional data is based on the assumption that all individuals' unknown curves have a common, possibly complex, shape and that a particular individual's curve is a low-dimension, parametric transformation of the common shape curve. This simple idea works surprising well in practice. I will discuss the history of self-modeling and the natural extension of modeling the parameters in the individual transformations as random effects. I will also describe generalizations of the model to groups of curves that have similar but not identical shape, outlines (closed curves), curves with a nested structure and 2-dimensional, time-parameterized curves such as those arising from studies of motion.Winter 2013
Time: Wednesday, March 13 at 11 am
Speaker: Robert D. Gibbons, Professor of Biostatistics in the Departments of Medicine and Health Studies, Director of the Center for Health Statistics, University of Chicago
Title: Statistical Issues in Drug Safety: The curious case of Antidepressants, Anticonvulsants, ...., and Suicide
Abstract: In 2003, the U.S. FDA, MHRA in the U.K., and EU released public health advisories for a possible causal link between antidepressant treatment and suicide in children and adolescents ages 18 and under. This led the U.S. FDA to issue a black box warning for antidepressant treatment of childhood depression in 2004, which was later extended to include young adults (18-24) in 2006. Following these warnings, rather than observing the anticipated decrease in youth suicide rates, increases in youth suicide rates were observed in both the U.S. and Europe. In this presentation, I review this literature and discuss new statistical and experimental design approaches to post-marketing drug safety surveillance.
Time: Wednesday, Jan. 16, 11am
Speaker: Ping Li, Assistant Professor of Statistics, Cornell University
Title: BigData: Probabilistic Methods for Efficient Search and
Statistical Learning in Extremely High-Dimensional Data
Abstract: This talk will present a series of work on probabilistic
hashing methods which typically transform a challenging (or infeasible)
massive data computational problem into a probability and statistical
estimation problem. For example, fitting a logistic regression (or SVM)
model on a dataset with billion observations and billion (or billion
square) variables would be difficult. Searching for similar documents
(or images) in a repository of billion web pages (or images) is another
challenging example. In certain important applications in the search
industry, a web page is often represented as a binary (0/1) vector in
billion square (2 to power 64) dimensions. For those data, both data
reduction (i.e., reducing number of nonzero entries) and dimensionality
reduction are crucial for achieving efficient search and statistical
learning.
This talk will present two closely related probabilistic methods: (1)
b-bit minwise hashing and (2) one permutation hashing, which
simultaneously perform effective data reduction and dimensionality
reduction on massive, high-dimensional, binary data. For example,
training an SVM for classification on a text dataset of size 24GB took
only 3 seconds after reducing the dataset to merely 70MB using our
probabilistic methods. Experiments on close to 1TB data will also be
presented. Several challenging probability problems still remain open.
Key references: [1] P. Li, A. Owen, C-H Zhang, On Permutation Hashing,
NIPS 2012; [2] P. Li, C. Konig, Theory and Applications of b-Bit Minwise
Hashing, Research Highlights in Communications of the ACM 2011.
Wednesday Jan. 9th, 11am
Speaker: Ryan Martin, Assistant Professor of Statistics, UIC
Title: A Bayesian test of normality versus a Dirichlet process mixture alternative.
Abstract:: In this talk I will describe a new Bayesian test of normality of univariate or multivariate data against alternative nonparametric models characterized by Dirichlet process mixture distributions. The alternative models are based on the principles of embedding and predictive matching. They can be interpreted to offer random granulation of a normal distribution into a mixture of normals with mixture components occupying a smaller volume the farther they are from the distribution center. A scalar parametrization based on latent clustering is used to cover an entire spectrum of separation between the normal distributions and the alternative models. An efficient sequential importance sampler is developed to calculate Bayes factors.
Simulations indicate the proposed test can effectively detect non-normality without favoring the nonparametric alternative when normality holds. (Joint work with Surya Tokdar at Duke.)
Fall 2012
Wednesday Nov. 14th, 11am
Speaker: Yuan Ji, Director of Cancer Informatics, Center for Clinical and Research Informatics, NorthShore University HealthSystem
Title: Bayesian Models for Next Generation Sequencing Data on Epigenetics
Abstract: In this talk, I will describe how Bayesian models are successfully applied to the field of epigenetics, which is concerned about regulatory mechanism of gene expression. Epigenetics, one of the most heavily researched and challenging field in biology, increasingly draws attention from statisticians due to breakthroughs in bioengineer and biotechnology that allow large-scale and high-throughput experiments to be routinely conducted with affordable cost. A central topic of epigenetics is to understand the chromatin state -- modifications to histones and other proteins that package the DNA. A complex mechanism called "histone code" is believed to dictate the dynamics of DNA expression. As a step towards deciphering the histone code, we develop Bayesian models based on genome-wide mapping of histone modifications. Such models are only initial attempts to decipher the complex histone code but highlight the need of Bayesian inference in the research of gene regulations, receiving relatively small amount of attention from statisticians. I will summarize our recent work and results using a comprehensive ChIP-Seq data set.
Wednesday Nov. 7th, 11am
Speaker: Hongyuan Cao, Assistant Professor of Biostatistics, University of Chicago
Title: Analysis of sparse asynchronous longitudinal data
Abstract: We consider estimation of regression models for sparse asynchronous longitudinal observations, where time-dependent response and covariates are observed intermittently within subjects. Unlike with synchronous data, where response and covariates are observed at the same time point, with asynchronous data, the observation times are mismatched. Simple kernel weighted estimating equations are proposed for generalized linear models with either time-invariant or time-dependent coefficients. The time-dependent covariates are assumed to be smooth in time but sparsely observed, while the time-varying response may be continuous, categorical, or count data. For models with either time-invariant or time-dependent coefficients, the estimators are consistent and asymptotically normal. However, they converge at rates which are slower than the rates which may be achieved with synchronous longitudinal data with response and covariates measured at the same time points. Simulation studies evidence that the methods perform well with realistic sample sizes and may be superior to methods for synchronous data based on an ad hoc last value carry forward approach. The practical utility of the methods is illustrated on data from an HIV study.
Wednesday October 17, 11am
Place: Classroom, Department of Statistics, 2006 Sheridan Road
Speaker: Yuguo Chen, Associate Professor of Statistics, UIUC
Title: Sampling for Conditional Inference on Network Data
Abstract: Random graphs with given vertex degrees have been widely used as a model for many real-world complex networks. We describe a sequential sampling method for sampling networks with a given degree sequence. These samples can be used to approximate closely the null distributions of a number of test statistics involved in such networks, and provide an accurate estimate of the total number of networks with given vertex degrees. We apply our method to a range of examples to demonstrate its efficiency in real problems.
Wednesday October 10, 11am
Speaker:
Michael Zhu, Associate Professor of Statistics, Purdue University (http://www.stat.purdue.edu/~yuzhu/ )
Title: Statistical Model-Based Methods for Transcript Expression Level Quantification and Their Comparison
Abstract: Further advancement and application of RNA-Seq technology call for the development of effective normalization methods for RNA-Seq data. In this talk, we propose to use finite Poisson mixture models to characterize the generating mechanism of RNA-Seq read counts and develop a procedure called MP-Seq to quantify transcript expression level.
Furthermore, we propose to use a system of measurement error models based on qRT-PCR, Microarray and RNA-Seq gene expression data to compare and validate RNA-Seq normalization methods. As an application, we apply the system to show that MP-Seq outperforms other existing quantification methods in the literature.
Winter 2012
Wednesday March 7, 11am
Speaker: Bruce Lindsay, Department of Statistics, Pennsylvania State University
Wednesday February 15, 11am
Speaker: Zhengyuan Zhu, Department of Statistics, Iowa State University
Title: Spatial Sampling Design and Wireless Sensor Networks
Abstract: Spatial sampling design problems have been studied by statisticians for many different application areas such as agriculture, soil science, ecology, and environmental science. Though many of the methodologies in spatial sampling design can be used to help design the sampling plan of wireless sensor networks (WSN), WSN has some characteristics such as the energy and communication constraints which are not present in a traditional sampling network and poses new challenges to statisticians. In this talk we will give an overview on spatial sampling design and discuss its relationship to the sampling design for WSN. An example of maximum-information predictive designs for model-based geostatistics and some preliminary results on the optimal sampling design of a WSN for parameter estimation under energy and communication constraints will be presented.
Wednesday February 8, 11am
Speaker: Wenxuan Zhong, Department of Statistics, University of Illinois, Urbana-Champaign
Title: Variable selection using dimension reduction model
Abstract: In this talk, a stepwise procedure will be discussed for variable selection under the sufficient dimension reduction framework, in which the response variable is influenced by a subset of predictors through an unknown function of a few linear combinations of them. Unlike linear stepwise regression, our proposed method does not impose a special form of relationship (such as linear) between the response variable and the predictor variables. Our method selects variables that attain the maximum correlation between the transformed response and the linear combination of the variables. Various asymptotic properties of the COP procedure are established, and in particular, its variable selection performance under diverging number of predictors and sample size has been investigated. The empirical performance of the COP procedure will be demonstrated in functional genomic analysis.
Wednesday January 11, 11am
Speaker: Yinxiao Huang, Department of Statistics, University of Illinois, Urbana-Champaign
Title: Nonparametric Online Inference for Time Series
Abstract: Online learning is concerned with the task of making real-time updates as new observations become available. This is especially relevant in a time series context including speech recognition and image processing, where large data comes in a sequential manner. However classical nonparametric methods do not accommodate real-time update.
In the literature, online learning with kernels has been extensively studied in the fields of both statistics and computer science. To my knowledge, most of them are considered under either i.i.d. conditions which is not realistic for time series or mixing conditions which are hard to verify. In this talk, we consider online kernel estimation for time series data. The asymptotic behavior of our online kernel estimators, both for density and regression function, is explored for a general class of stationary time series under the dependence framework developed by Wu (2005). We establish the asymptotic normality, almost sure convergence for the online kernel estimators and in particular, a law of iterated logarithm (LIL) for the online kernel density estimator, while one generally does not have such a sharp convergence rate for traditional estimators. Our approach can be extended further to nonstationary processes.
Fall 2011
Wednesday November 30, 11am
Speaker: Blake McShane, Kellogg School of Management, Northwestern University
Title: Modeling Time Series Dependence for Scoring Sleep in Mice
Abstract: Current methods for scoring sleep behavior in mice are expensive, invasive, and labor intensive, thus leading to considerable interest in high-throughput automated systems which would allow many mice to be scored cheaply and quickly. Previous efforts have been able to differentiate sleep from wakefulness, but cannot differentiate the rare and important state of REM sleep from non-REM sleep. Key difficulties in detecting REM are that (i) REM is much rarer than non-REM and wakefulness, (ii) REM looks similar to non-REM in terms of the observed covariates, (iii) the data is noisy, and (iv) the data contains strong time dependence structures crucial for differentiating REM from non-REM. We develop a novel approach which combines statistical learning methods with generalized Markov models, thereby enhancing the former to account for the time dependence in our data. Our proposed methodology can accommodate very general and very long-term time dependence structures in an easily estimable and computationally tractable fashion. Furthermore, it shows improved differentiation of REM from non-REM sleep in our application to sleep scoring in mice.
Wednesday November 23, 11am
Speaker: Dacheng Xiu, Booth School of Business, University of Chicago
Title: A Tale of Two Option Markets: State-Price Densities Implied from S&P 500 and VIX Option Prices
Abstract: The S&P 500 options and VIX options reveal the dynamics of the index return and its volatility. To study their dynamics, we perform a nonparametric analysis of state-price densities implicit in both option prices. We find that the state-price density of the index strongly depends on the current VIX level not only in the short-run, but also for the long term when VIX options become unavailable. The short-run dependence is compatible with and can be explained by the state-price density of the VIX. Furthermore, we conduct nonparametric specification tests of the state-of-the-art parametric models, and offer some insights on modeling the dynamics.
Wednesday November 2, 11am
Speaker: Elie Tamer, Department of Economics, Northwestern University
Title: Sensitivity Analysis in some Econometric Models
Wednesday October 12, 11am
Speaker: Jules van Binsbergen, Kellogg School of Management, Northwestern University
Title: On the Timing and Pricing of Dividends
Abstract: We recover prices of dividend strips on the aggregate stock market using data from derivatives markets. The price of a k-year dividend strip is the present value of the dividend paid in k years. The value of the stock market is the sum of all dividend strip prices across maturities. We study the properties of strips and find that expected returns, Sharpe ratios, and volatilities on short-term strips are higher than on the aggregate stock market, while their CAPM betas are well below one. Short-term strip prices are more volatile than their realizations, leading to excess volatility and return predictability.
Spring 2011
Wednesday May 25, 11am
Speaker: Hui Xie, School of Public Health, University of Illinois at Chicago
Title: An Integrated Adaptive Approach to Data Fusion
Abstract: Data fusion combines data items from various sources based on a common set of variables. The fused database overcomes the limitations of a single-source dataset and offers the opportunity to answer important managerial questions that cannot be addressed with a single-source dataset. However, performing proper data fusion in a general, adaptive, flexible and robust fashion is challenging. In this article, we propose an integrated adaptive data fusion framework, that integrates both the limited-information approach and full-information approach to data fusion. Other useful features of the proposed methods include: they can handle a mixture of continuous, semi-continuous and discrete variables in a robust manner in that no parametric distributional assumptions are required for variables in the datasets; the joint predictive distribution of the variables of interest has probability mass concentrating on a finite number of points, thereby substantially simplifying a direct approach to data fusion in general situations. Under the integrated framework for data fusion, researchers have full access to an array of data fusion approaches for a wide range of types of marketing data and applications, and have the flexibility to choose a most suitable one for the data at hand. We conduct simulation studies to evaluate the performance of the proposed methods. We then apply the methods to an application that studies counterfeits exposure and purchasing behaviors combining survey data and consumer databases. These analyses demonstrate several advantages of the proposed method, as compared with alternative approaches. This is a joint work with Dr. Yi Qian at Northwestern University.
Wednesday May 18, 11am
Speaker: Feng Liang, Department of Statistics, University of Illinois at Urbana-Champaign
Title: A Bayesian Approach to Structured Sparsity with Application to Market Segmentation
Abstract: Benefit segmentation, that is, grouping consumers into different segments based on their product preference, is an essential problem of marketing theory and practice. Modern marketing environments impose some new challenges to traditional segmentation methods. For example, companies are adding more and more features into a single product, while the data we could collect from each consumer is of relatively small size. Although most methods in benefit segmentation assume consumers use every product feature in their decision making, recent research has shown that consumers only consider a subset of features. Further, the heterogeneity among consumers in selecting important product features should be used as an additional index for market segmentation and for new product development. In responding to these challenges, we propose a Bayesian approach for collaborative inference among consumers. The proposed method is a Bayesian approach for multi-task learning problems with structure sparsity, where the structures we consider are stochastic groups and graphs. Connections with existing work on structure sparsity are discussed. And we demonstrate the utility of our method on several simulated data sets and a real case study example on online shopping websites. The talk is based on joint work with Jianfeng Xu (UIUC) and Sunghoon Kim (PSU).
Wednesday May 11, 11am
Speaker: Yuan Liao, Department of Operations Research and Financial Engineering, Princeton University
Title: High Dimensional Covariance Matrix Estimation in Approximate Factor Model
Abstract: The variance covariance matrix plays a central role in the inferential theories of high dimensional factor models in finance and economics. Classical methods of estimating the covariance matrices are based on the strict factor models assuming independent idiosyncratic components. This assumption, however, is restrictive in practical applications. By assuming sparse error covariance matrix, we allow for the presence of the cross-sectional correlation even after taking out common factors. The sparse covariance is estimated by the adaptive thresholding technique as in Cai and Liu (2011). The covariance matrix of the outcome is then estimated based on the factor structure. We first consider the case of observable factors, then extend the results to the case of unobservable factors. It is shown that in both cases, the estimated covariance matrix is still nonsingular regardless of its dimensionality, and is consistent under various norms. Finally, extensions to seemingly unrelated regression is considered.
Wednesday May 4, 11am
Speaker: Michael Stein, Department of Statistics, University of Chicago
Title: When does the screening effect hold?
Abstract: When using optimal linear prediction to interpolate point observations of a mean square continuous stationary spatial process, one might generally expect that the interpolant mostly depends on those observations located nearest to the predictand. This phenomenon is in fact commonly observed in practice and is called the screening effect. However, there are situations in which a screening effect does not hold in a reasonable asymptotic sense and theoretical support for the screening effect is limited to some rather specialized settings for the observation locations. This talk explores conditions on the observation locations and the process model under which an asymptotic screening effect holds. A series of examples shows the difficulty in formulating a general result, especially for processes with different degrees of smoothness in different directions, which can naturally occur for spatial-temporal processes. These examples motivate a general conjecture and I describe two theorems covering special cases of it. The key condition on the process is that its spectral density should change slowly at high frequencies. I will argue that models not satisfying this condition of slow high-frequency change should generally not be used in practice.
Wednesday April 27, 11am
Speaker: Robert Gramacy, Booth School of Business, University of Chicago
Title: Simulation-based Regularized Logistic Regression
Abstract: We develop an omnibus framework for regularized logistic regression by simulation-based inference, exploiting two important results on scale mixtures of normals. By carefully choosing a hierarchical model for the likelihood by one type of mixture, and how regularization may be implemented by another, we obtain subtly different MCMC schemes with varying efficiency depending on the data type (binary v. binomial, say) and the desired estimator (maximum likelihood, maximum a posteriori, posterior mean, etc.). Advantages of this umbrella approach include flexibility, computational efficiency, application in p >> n settings, uncertainty estimates, variable selection, and an ability to assess the optimal degree of regularization in a fully Bayesian setup. We compare the statistical and algorithmic efficiency of each of our proposed methods against each other, and against modern alternatives on synthetic and real data.
Thursday April 21, 11am
Speaker: Jim Berger, Department of Statistical Science, Duke University
Location: ITW Auditorium, Ford Engineering Building, 2133 Sheridan Road
Title: Risk Assessment for Pyroclastic Flows
Abstract: The problem of risk assessment for rare natural hazards - such as volcanic pyroclastic flows - is addressed, and illustrated with the Soufriere Hills Volcano on the island of Montserrat. Assessment is approached through a combination of mathematical computer modeling, statistical modeling of geophysical data, and extreme-event probability computation. A mathematical computer model of the natural hazard is used to provide the needed extrapolation to unseen parts of the hazard space. Statistical modeling of the available geophysical data is needed to determine the initializing distribution for exercising the computer model. In dealing with rare events, direct simulations involving the computer model are prohibitively expensive, so computation of the risk probabilities requires a combination of adaptive design of computer model approximations (emulators) and rare event simulation.
Wednesday April 20, 11am
Speaker: Peter Qian, Department of Statistics, University of Wisconsin-Madison
Title: Sudoku Based Space-Filling designs
Abstract: Sudoku is played by millions of people across the globe. It has simple rules and is very addictive. The game-board is a nine-by-nine grid of numbers from one to nine. Several entries within the grid are provided and the remaining entries must be filled in subject to each row, column, and three-by-three subsquare containing no duplicated numbers. By exploiting these three types of uniformity, we propose an approach to constructing a new type of design, called Sudoku based space-filling design, intended for data pooling. Such a design can be divided into groups of subdesigns such that the complete design and each subdesign achieve maximum uniformity in both univariate and bivariate margins. Also will be discussed are several unexpected applications of experimental design techniques, including stochastic optimization, parallel computing, cross-validation and variable selection.
Wednesday April 6, 11am
Speaker: Shaoli Wang, School of Statistics and Management, Shanghai University of Finance and Economics
Title: Laplace error penalty based variable selection in ultrahigh dimensions
Abstract: Variable selection is of fundamental importance in high-dimensional modeling and data analysis. The method based on L0 penalty function, which gives rise to subset selection, is generally believed to be optimal among various penalized procedures. However, it is unstable and computationally infeasible as the dimension grows. In this talk we propose a novel penalty function, Laplace Error Penalty (LEP), for variable selection. LEP is bounded and infinitely differentiable everywhere except for the origin. With an extra tuning parameter, LEP approximates the L0 penalty much faster than competing methods, and therefore achieves consistent model selection and accurate parameter estimation simultaneously. We show that the penalized least squares via LEP has a unique global minimizer, and the resulted estimator satisfies oracle properties. The LEP procedure allows fast computation and works well for high-dimensional data. Its performance is demonstrated through simulations and real data analysis.
Winter 2011
Wednesday March 9, 11am
Speaker: Guang Cheng, Department of Statistics, Purdue University
Title: How Many Iterations are Sufficient for Semiparametric Estimation?
Abstract: Iterative estimation procedure is a common practice to obtain an efficient estimate for the Euclidean parameter in semiparametric models. A rigorous theoretical study of the semiparametric iterative estimation approach is the main purpose of this talk. We first show that the grid search algorithm can be used to produce the desirable initial estimate with the proper convergence rate. Our major contribution is to provide a formula in calculating the minimal number of iterations k needed to produce an efficient estimate. We discover that k depends on the convergence rates of the initial estimate and functional nuisance estimate, and k iterations are also sufficient for recovering the estimation sparsity in high dimensional data. These general conclusions hold, in particular, when the nuisance parameter is not estimable at root-n rate, and apply to semiparametric models estimated under various regularizations, e.g., kernel or penalized estimation. In practice, our results may be useful in reducing the bootstrap computational cost for the semiparametric models.
Wednesday March 2, 11am
Speaker: Fengqing Zhang, Department of Statistics, Northwestern University
Title: Imaging Mass Spectrometry Data Biomarker Selection and Classification
Abstract: Imaging Mass Spectrometry (IMS) has shown great potential and is very promising in proteomics. However, data processing remains challenging due to the difficulty of analyzing high dimensionality, the fact that the number of predictors is significantly larger than the number of observations, and the needs for considering both spectral and spatial information in order to represent the advantage of IMS technology. In this talk, I’ll present some recent progress on IMS data analysis using multivariate analysis methods. First, we incorporate a spatial penalty term into the elastic net (EN) model for IMS data processing. The EN-based model fully utilizes not only the spectrum information within individual pixels but also the spatial information for the whole IMS image cube. Both the simulation and real data analysis results show that the EN-based model works effectively and efficiently for IMS data processing. We then propose a weighted elastic net (WEN) model combining ion intensity spreading information directly with the elastic net model. Properties including variable selection accuracy of the WEN model are also discussed. Finally, we develop a software package, called IMSmining, including visualization and analysis tools for IMS data processing.
Wednesday February 23, 11am
Speaker: Tim McMurry, Department of Mathematical Sciences, DePaul University
Title: Robust Empirical Bayes With Application to Genome-Wide Association Studies
Abstract:
Large scale technologies such as gene expression microarrays and genome-wide association studies measure a large number of parallel parameters on a usually much smaller number of subjects. Bayesian and empirical Bayes analyses are natural for large scale data because of their ability to infer the collective structure of the many underlying parameters and to borrow information from the other observations. This talk proposes a rank-conditioned procedure in which the inference is based on the conditional distribution of the error given the rank of the (raw) estimate among all other estimates as opposed to conditioning on the raw estimate itself. Our method is particularly suited for correcting ranking bias in large scale estimation and for constructing valid confidence intervals for selected top-ranked parameters. The new method is almost as efficient as the corresponding Bayesian analysis when the prior is correctly specified. When the prior is incorrectly specified, the new method can be much more robust in the sense that it continues to provide accurate point and interval estimates. The efficacy of the proposed method is demonstrated through application to two genome-wide association studies from the Wellcome Trust Case Control Consortium (2007).
Fall 2010
Wednesday Nov 10, 11am
Speaker: Brent Logan, Associate Professor
Division of Biostatistics, Medical College of Wisconsin
Title: Marginal Models for Clustered Time-to-Event Data with Competing Risks Using Pseudovalues
Abstract:
Many time-to-event studies are complicated by the presence of competing risks and by nesting of individuals within a cluster, such as patients in the same center in a multicenter study. Several methods have been proposed for modeling the cumulative incidence function with independent observations. However, when subjects are clustered, one needs to account for the presence of a cluster effect either through frailty modeling of the hazard or subdistribution hazard, or by adjusting for the within-cluster correlation in a marginal model. We propose a method for modeling the marginal cumulative incidence function directly. We compute leave-one-out pseudo-observations from the cumulative incidence function at several time points. These are used in a generalized estimating equation to model the marginal cumulative incidence curve, and obtain consistent estimates of the model parameters. A sandwich variance estimator is derived to adjust for the within-cluster correlation. The method is easy to implement using standard software once the pseudovalues are obtained, and is a generalization of several existing models. Simulation studies show that the method works well to adjust the SE for the within-cluster correlation. We illustrate the method on a dataset looking at outcomes after bone marrow transplantation.
Wednesday Oct 27, 11am
Speaker: Dabao Zhang, Associate Professor
Department of Statistics, Purdue University
Title: Penalized Orthogonal-Components Regression for Selecting Sparse Variables in High-Dimensional Data
Abstract: We propose a penalized orthogonal-components regression (POCRE) to select variables from high-dimensional low-sample-size data. Orthogonal components are sequentially constructed to maximize, upon standardization, their correlation to the response residuals. A new penalization framework, implemented via empirical Bayes thresholding, is presented to effectively identify sparse predictors of each component. POCRE is computationally efficient owing to its sequential construction of leading sparse principal components. In addition, such construction offers other properties such as grouping highly correlated predictors and allowing for collinear or nearly collinear predictors. With multivariate responses, POCRE can construct
common components and thus build up latent-variable models for high dimensional data.
Wednesday Oct 13, 11am
Speaker: Liping Tong, Assistant Professor
Department of Mathematics and Statistics, Loyola University
Title:Co-evolution Model for Dynamic Social Network and Behavior
Abstract:An individual's behaviors may be influenced by the behaviors of friends, such as hours spent watching television, playing sports, and unhealthy eating habits. However, preferences for these behaviors may also influence the choice of friends; for example, two children who enjoy playing the same sport are more likely to become friends. To study the interdependence of social network and behavior, Snidjers et al. has developed the stochastic actor based modeling (SABM) methods for the co-evolution process, which turns out to be useful when dealing with longitudinal social network and behavior data when behavior variables are discrete and have limited number of possible values. Unfortunately, since the evolution function for behavior variable is in exponential format, the SABM can generate unrealistic results when the behavior variable is continuous or has a large range. To realistically model continuous behavior variable, we propose a co-evolution process so that the network evolution is based on an exponential random graph model and the behavior evolution is based on a linear model.
Wednesday Sep 29, 11am
Speaker: Mihails Levins, Associate Professor
Department of Statistics, Purdue University
Title: An EM algorithm for estimation of finite nonparametric mixtures of multivariate densities
Abstract: We propose an algorithm for nonparametric estimation of finite mixtures of multivariate random vectors that is a true EM algorithm. The vectors are assumed to have independent coordinates conditional upon knowing the mixture component they come from, but otherwise their density functions are completely unspecified.
To the best of our knowledge, this is the first algorithm for estimation of nonparametric mixtures that specifies an explicit likelihood function and has a verifiable ascent property. We show that, under reasonable regularity conditions, there exists a unique solution of the likelihood maximization problem that specifies an algorithm and that the proposed algorithm indeed converges to that solution. The algorithm can be applied to a mixture with any finite number of components and any dimensionality of the underlying random vectors. Performance of the algorithm is illustrated using both simulated and real data.
Spring 2010
Wednesday May 26, 11am
Speaker: Leland Wilkinson,Professor
SYSTAT
Department of Computer Science, University of Illinois at Chicago
Dept. of Statistics, Northwestern University
Title: Linf: An L-infinity Classifier
Abstract: We introduce a classifier based on the L-infinity norm. This classifier, called Linf, is a composition of four stages (transforming, projecting, binning, and covering) that are designed to deal with the curse of dimensionality, computational complexity, and nonlinear separability. Linf is not a hybrid or modification of existing classifiers; it employs a new covering algorithm. The accuracy of Linf on widely-used benchmark classification datasets exceeds, on average, the accuracy of competitive classifiers such as Support Vector Machines and Random Forests. Its computational complexity is sub-linear in number of instances and number of variables and subquadratic in number of classes. This is work with Anushka Anand and Dang Nhon Tuan, Dept. of Computer Science, University of Illinois at Chicago.
Wednesday May 5, 11am
Speaker: Jing Wang, Assistant Professor of Statistics
University of Illinois at Chicago
Title: Efficient and fast spline-backfitted kernel smoothing of additive
models
Abstract: A great deal of effort has been devoted to the inference of
additive model in the last decade. Among existing procedures, the kernel
type are too costly to implement for high dimensions or large sample
sizes, while the spline type provide no asymptotic distribution or uniform convergence. We propose a one step backfitting estimator of the component function in an additive regression model, using spline estimators in the first stage followed by kernel/local linear estimators. Under weak conditions, the proposed estimator's pointwise distribution is asymptotically equivalent to an
univariate kernel/local linear estimator, hence the dimension is effectively reduced to one at any point. This dimension reduction holds uniformly over an interval under assumptions of normal errors. Monte Carlo evidence supports the asymptotic results for dimensions ranging from low to very high, and sample sizes ranging from moderate to large. The proposed confidence band is applied to the Boston housing data for linearity diagnosis. This paper is a joint work with Professor Lijian Yang at Michigan State University.
Wednesday April 21, 11am
Speaker: Xuming He, Professor of Statistics
University of Illinois at Urbana-Champaign
Title: On Dimensionality of Mean Structure from a Single Data Matrix
Abstract:We consider inference from data matrices that have low dimensional mean structures. In educational testing and in probe-level microarray data, estimation and inference are often made from a single data matrix believed to have a uni-dimensional mean structure. In this talk, we focus on probe-level microarray data to examine the adequacy of a uni-dimensional summary for characterizing the data matrix of each probe-set. To do so, we propose a low-rank matrix model, and develop a useful framework for testing the adequacy of uni-dimensionality against targeted alternatives. We analyze the asymptotic properties of the proposed test statistics as the number of rows (or columns) of the data matrix tends to infinity, and use Monte Carlo simulations to assess their small sample performance. Applications of the proposed tests to GeneChip data show that evidence against a uni-dimensional model is often indicative of practically relevant features of a probe-set. (Part of the talk is based on ongoing work of Xingdong Feng, a doctoral student at the University of Illinois.)
Wednesday April 7, 11am
Speaker: Fangfang Wang, Assistant Professor,
Department of Information and Decision Sciences
University of Illinois at Chicago
Title: The HYBRID GARCH Class of Models
Abstract: We propose a general GARCH framework that allows the use of different frequency returns to model conditional heteroskedasticity. We call the class of models High FrequencY Data-Based PRojectIon-Driven (HYBRID) GARCH models as the GARCH dynamics are driven by what we call HYBRID processes. We study three broad classes of HYBRID processes: (1) parameter-free processes that are purely data-driven, (2) structural HYBRIDs where one assumes an underlying DGP for the high frequency data and finally (3) HYBRID filter processes. We develop the asymptotic theory of various estimators and study their properties in small samples via simulations. This is work with Xilong Chen and Eric Ghysels.
Winter 2010
Wednesday March 10, 2010, 11am
Speaker: Stephen Stigler, Professor of Statistics, University of Chicago
Title:Darwin, Galton, and the Statistical Enlightenment
Abstract: Francis Galton invented multivariate analysis in 1885. The main outline of that advance is fairly well known, but the link to Darwin has been inadequately studied. I will discuss that link and a hitherto unnoticed major step in that development, and I will tell how this advance led to a remarkable 50 year period that might justly be called the Statistical Enlightenment, a period that included the reinvention of Bayesian inference. Galton's algorithm for simulating posterior distributions will be highlighted. A previously unsung hero of the story, working in Lake Forest Illinois in 1889, will be saluted.
Wednesday Feb 24, 2010, 11am
Speaker: Dale Rosenthal, Assistant Professor of Finance, UIC
Title:A Network Model of Counterparty Risk
Abstract:
Two network structures of contracts on a risky asset are explored in a two-period model. One structure represents a bilaterally-cleared OTC market, the other represents a centrally-cleared market. An exogenous bankruptcy occurs before period one inducing counterparties to trade with price impact. The two different market structures are shown to yield different price impact and volatility. Further, market-induced bankruptcy of a large (financial) firm is shown to yield two undesirable phenomena in bilateral markets: checkmate and hunting. Checkmate occurs when a counterparty cannot expect to prevent impending bankruptcy. The other occurs when counterparties push markets further than a central counterparty would, inducing further bankruptcies. These counterparties may even expect to profit from such follow-on bankruptcies. The results suggest that bilateral OTC markets have externalities (larger distress volatility) which can be priced relative to centrally-cleared markets. This might offer guidance on when and how much incentive to offer for markets to transition from one structure to another. The results also suggest that in times of distress coordination by market authorities has value.
Wednesday Feb 10, 2010, 11am
Speaker: Zhengjun Zhang, Assistant Professor of Statistics, University of Wisconsin at Madison
Title: Examining Extremal Dependence in Continental USA Climate Data
Abstract:In recent years, extremal climatic conditions are more often observed, where part of climate variables are not only dependent, but also extremely dependent. Identification of extremal dependence among observations is challenging and remains an open problem. This talk introduces a class of tail quotient correlation coefficients (TQCC) which allows the underlying threshold values to be random. The limit distribution of the TQCC under the null hypothesis of extremal independence is derived. Test statistics for extremal independence are constructed and shown to be consistent under the alternative hypothesis of extremal dependence. Motivated by TQCC, the talk introduces a broader class of nonlinear quotient correlation coefficients (NQCC) for characterizing nonlinear dependence between random variables. We apply TQCC and NQCC to investigate extremal dependence and nonlinear dependence of daily precipitations in US during 1950--1999 recorded at 5873 stations from the National Climate Data Center rain gauge data. Our results indicate nonstationarity, spatial clusters, and extremal dependence and nonlinear dependence in the data. They provide useful information for next generation climate models.
Past Seminars
Fall 2009
Wednesday December 2, 2009, 11am
Speaker: Beth Andrews, Assistant Professor of Statistics, Northwestern University
Title: Rank-Based Estimation for Time Series Model Parameters
Abstract: The focus of this talk is rank-based estimation for time series model parameters. The parameter estimates considered minimize the sum of mean-corrected model residuals weighted by a function of residual rank, and are similar to the rank estimates proposed by L.A. Jaeckel [Estimating regression coefficients by minimizing the dispersion of the residuals, Ann. Math. Statist. 43 (1972) 1449–1458] for estimating linear regression parameters. Rank estimates are known to be robust and relatively efficient. It will be shown this is true in the case of parameter estimation for standard linear and nonlinear time series processes. The estimation technique is robust because the rank estimates are n^{1/2}-consistent (n represents sample size) and asymptotically normal under mild conditions. Since the weight function can be chosen so that rank estimation has the same asymptotic efficiency as maximum likelihood estimation, rank estimation is also relatively efficient. In addition, rank estimation dominates traditional Gaussian quasi-maximum likelihood estimation with respect to both robustness and asymptotic efficiency.
Wednesday November 18, 2009, 11am
Speaker: Jie Yang, Assistant Professor, Dept. of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago
Title: Classification Model Based on Permanent Process
Abstract: This talk introduces a new statistical model based on a permanent process for supervised classification problems. Unlike many research works in the literature, the permanent model assumes only exchangeability instead of independence on observations. Regardless of the number of classes or the dimension of the feature variables, the model may require only 2-3 parameters for fitting the covariance structure within clusters. It works well even if the class occupies non-convex, disjoint regions, or regions overlapped with other classes in the feature space. The application to DNA microarray analysis indicates that the permanent model is more capable of handling high-dimensional data. It can employ more feature variables in an efficient way and reduce the prediction error significantly. This is critical when the true classification relies on non-reducible high-dimensional features.
Wednesday November 4, 2009, 11am
Speaker: Donald Hedeker, Professor of Biostatistics, University of Illinois at Chicago
Title: Multilevel Models for Ecological Momentary Assessment (EMA) Data: An Application of a Mixed-Effects Location Scale Model
Abstract: For longitudinal data, multilevel models include random subject effects to indicate how subjects influence their responses over the repeated assessments. The error variance and the variance of the random effects are usually considered to be homogeneous. These variance terms characterize the within-subjects (i.e., error variance) and between-subjects (i.e., random-effects variance) variation in the data. In studies using Ecological Momentary Assessment (EMA), up to thirty or forty observations are often obtained for each subject, and interest frequently centers around changes in the variances, both within- and between-subjects. In this presentation, focus is on an adolescent smoking study using EMA where interest is on characterizing changes in mood variation. In terms of the multilevel model, covariates are allowed to influence the mood variances to address this. Also, a subject-level random effect is added to the within-subject variance specification. This permits subjects to have influence on the mean, or location, and variability, or (square of the) scale, of their mood responses. Additionally, the location and scale random effects are allowed to be correlated. These mixed-effects location scale models have useful applications in many research areas where interest centers on the joint modeling of the mean and variance structure.
Wednesday October 21, 2009, 11am
Speaker: Dr. Liqun Xi, Northwestern University
Title: The minimum capture proportion for reliable estimation in capture-recapture models
Abstract: For capture-recapture models, a reliable estimate for the population size is possible only for a reasonably large capture proportion, especially for a heterogeneous population. In all capture-recapture models, how large the capture proportion should be to ensure a reliable estimate for the population size is an important question. In this seminar, an idea for obtaining the minimum capture proportions for reliable estimation in capture-recapture models is introduced. Some results are presented, as well a real capture- recapture data with the application of the proposed results.
Wednesday October 7, 2009, 11am
Speaker: Yuan Liao, Northwestern University
Title: Posterior Consistency of Nonparametric Conditional Moment Restricted Models
Abstract: This paper considers the nonparametric conditional moment restricted model
that was previously studied by Ai and Chen (2003). We look at the estimation
of the nonparametric structural function in a Bayesian way, starting by
transforming the conditional restrictions equivalently into infinite number of
unconditional moment restrictions, and then derive the posterior distribution
of the parameter of interest based on the limited information likelihood. We
focus on the frequentist properties of the posterior distribution, allowing the
nonparametric structural function to be partially identified. It is shown that the
posterior converges to any neighborhood of the identified region. Finally, we
apply our results to the nonparametric instrumental regression model and the
single index model.
Winter 2009
Friday, April 17, 2009 at 2 pm (Unusual TIME and LOCATION)
Joint Seminar with IEMS Department
Seminar Room: Tech, Room M228 (2145 Sheridan Road, Evanston)
Speaker: Cyrus R. Mehta, President of Cytel Corporation & Adjunct Professor of Biostatistics, Harvard University
Title: Design and Implementation of Late-Stage Adaptive Trials: Experiences of an Industry Consultant
Abstract: Sound statistical principles combined with careful planning of the logistical details are essential for successful implementation of an adaptive clinical trial. In this presentation I will share my experience as a consultant involved in several late stage adaptive designs. Topics that I will cover include sample size re-estimation, population enrichment and seamless phase II/III design. Each topic will be illustrated with a real case study. The crucial role of simulation will be highlighted. Regulatory experiences will be discussed.
Wednesday, May 6, 2009 at 11 am
Speaker: Joseph Kang, Assistant Professor of Biostatistics, Department of Preventive Medicine, Northwestern University
Title: Causal inference for weight control behaviors among adolescent girls
Abstract: Overweight and obesity often begin in childhood, but few successful models exist for prevention and treatment of obesity in children and adolescent. Among adolescent girls, dieting may prospectively predict weight gain. Due to possible reciprocal causality between diet and weight gain, a quantitative causal analysis is necessary. During past two decades, Rubin Causal Model (RCM) has been known to idealize a successful design to quantify causal effects. In this talk, some inferential strategies using RCM will be discussed in order to estimate causal effects of weight control behaviors. The first session of this talk will be dedicated to a discussion of 1) recent semiparametric methods to infer causal estimands in RCM and 2) the importance of moderators that change the causal effects of dieting. The second session will discuss an extended RCM to adjust for the measurement error of weight control behaviors using latent class model. We use the National Longitudinal Study of Adolescent Health (Add health) data set for entire analyses.
Wednesday, May 13, 2009 at 11 am
Speaker: Viktor Todorov, Assistant Professor, Department of Finance, Kellogg School of Management, Northwestern University
Title: Limit Theorems for Power Variations of Pure-Jump Processes with Application to Activity Estimation
Abstract: This paper derives the asymptotic behavior of realized power variation of pure-jump Ito semimartingales as the sampling frequency within a fixed interval increases to infinity. We prove convergence in probability and an associated central limit theorem for the realized power variation on the space of functions of the power equipped with a local uniform topology. We apply the limit theorems to propose an efficient adaptive estimator of the activity of discretely-sampled Ito semimartingale over a fixed interval.
Wednesday, May 27, 2009 at 11 am
Speaker: Junhui Wang, Assistant Professor, Department of Mathematics, Statistics,
and Computer Science, UIC
Title: To be announced
Abstract: To be announced
Fall 2008
Friday, October 3 at 1 pm
Speaker: Professor Terry Speed, Statistics, UC Berkeley and Bioinformatics, WEHI
Title: Some statistical issues arising with next-generation DNA sequencing data
Abstract: Next generation sequencing machines are now producing tens of millions of short sequencing reads. These need to be mapped back to a reference genome, if there is one, and then further processed in a way which varies with the task, For mRNA-seq, these need to be assigned to genes, exons, or other transcriptional units, and counted. For ChIP-seq, we need to find putative binding sites. What sort of statistical issues arise, and how should we proceed with the analyses. Some initial ideas will be presented.
Thursday, October 23 at 4 pm
Speaker: Professor Ingram Olkin, Department of Statistics, Stanford University
Title: Life Distributions in Survival Analysis and Reliability: Structure of Semiparametric Families
Abstract: Semiparametric families are families that have both a real parameter and a parameter that is itself a distribution. A number of semiparametric parametric families suitable for lifetime data in survival or reliability are introduced: scale, power, frailty (proportional hazards), age, moment, and others. Interesting results on stochastic orderings are obtained for these families. The coincidence of two families provides a characterization of the underlying distribution. Some of the characterization results provide a rationale for the use of certain families. In this talk we provide an overview of these semiparametric families, and present several characterizations.
This work is joint with Albert W. Marshall.
Spring 2008
Wednesday, April 9 at 12 pm
Speaker: Professor Heping Zhang, Biostatistics, Director of Collaborative Center for Statistics in Science, Yale University.
Title: Joint Modeling of Time Series Measures and Recurrent Events and Analysis of the Effects of Air Quality on Respiratory Symptoms
Abstract: Exposure to ambient pollutants at concentrations above defined standards is a risk factor for respiratory symptoms, especially in sensitive children. Many studies have been undertaken to monitor air quality and to assess its association with respiratory symptoms. We propose a joint mixed effects regression model of time series measures and recurrent events to analyze the air quality and respiratory symptom data from the Yale Mothers and Infants Health Study.
Three mothers' symptoms (runny nose, cough, and sore throat) and three infants' symptoms (runny nose, cough, and general sickness) were investigated. To alleviate the computational complexity, a two-stage maximum likelihood based estimation procedure is introduced to estimate the parameters, and simulation studies are conducted to assess the validity of this estimation procedure.
Our analysis reveals differences in the etiology of respiratory symptoms between mothers and infants. Most notably, coarse particles of mass between 2.5 and 10 microns in diameter increased the risks of mothers' runny nose and cough symptoms, but had no significant impact on any of the three infants' symptoms. The sulfate level was negatively associated with the risk of infants' runny nose and cough symptoms, but had no significant effects on any of the three mothers' symptoms. High level of humidity is negatively associated with the mothers' cough incidence, but had no significant association on any of the three infants' symptoms. Such differences reveal not only the sensitivity of the mothers and infants to the air quality, but also call for further understanding of the differences. It is possible that actions taken to overcome humidity by mothers may inadvertently affect the infants.
This is a joint work with Yuanqing Ye, Peter Diggle, and Jian Shi.
Wednesday, April 23 at 12 pm
Speaker: Professor Dan Nordman, Department of Statistics, Iowa State University
Title: Tapered empirical likelihood for time series data
Abstract: This talk aims to motivate and describe a formulation of empirical likelihood for time series inference based on tapered data blocks. Data blocks are a device for capturing the time dependence and the proposed method involves tapering these blocks in a special way. The resulting empirical likelihood has chi squared limits for nonparametrically calibrating confidence intervals for time series parameters, such as means and correlations. Tapering is shown to improve the chi-squared approximation and enhance the coverage accuracy of intervals compared to untapered empirical likelihood versions. Simulation evidence is provided and block choices are considered as well.
Wednesday, May 7 at 12 pm
Speaker: Professor Ginger Davis, Department of Systems and Information Engineering, University of Virginia
Title: Hierarchical Bayesian Markov Switching Models with Application to Predicting Spawning Success of Shovelnose Sturgeon
Abstract: The timing of spawning in fish is tightly linked to environmental factors however these factors are not very well understood for many species. Specifically, little information is available to guide recruitment efforts for endangered species such as the sturgeon. Therefore, we propose a Bayesian hierarchical model for predicting spawning success of the shovelnose sturgeon which uses both biological and behavioral (longitudinal) data. In particular, we use data produced from a tracking study conducted in the Lower Missouri River. The data produced from this study consist of biological variables associated with readiness to spawn along with longitudinal behavioral data collected using telemetry and data storage device sensors. These high frequency data are complex both biologically and in the underlying behavioral process. To accommodate such complexity, the model we developed uses an eigenvalue predictor, derived from the transition probability matrix of a two-state Markov switching model with GARCH dynamics, as a generated regressor in a hierarchical linear regression model. Finally, in order to minimize the computational burden associated with estimation of this model, a parallel computing approach is proposed.
Wednesday, May 14 at 12 pm
Speaker: Professor Xiaofeng Shao, Department of Statistics, University of Illinois at Urbana-Champaign
Title: Portmanteau tests in time series
Abstract: This talk consists of two parts. In the first part, we will talk about testing for white noise and its applications to goodness-of-fit of long memory time series models. The limitation of the current asymptotic theory for portmanteau tests will be pointed out and new theoretical results will be discussed. In the second part, we will introduce generalized portmanteau type test statistics in the frequency domain to test independence between two stationary time series. Unlike the existing tests, each time series is allowed to possess short memory, long memory or anti-persistence. Under the null hypothesis of independence, the asymptotic null distributions of the proposed statistics are standard normal. The results from a simulation study will also be presented.
Winter 2008
Wednesday, February 13 at 12 pm
Speaker: Lu Tian, Assistant Professor, Department of Preventive Medicine, Northwestern University
Title: Lasso Regularization for the Accelerated Failure Time Model
Abstract: It is challenging to develop a stable regression model for predicting failure time outcomes when the dimension of the covariates is big relative to the sample size. Further complication arises due to the fact that failure time responses are often not completely observed because of right censoring. In this paper, we proposed to couple the LASSO type regularization methods with the Gehan's rank based estimator in the setting of accelerated failure time model to construct a stable and parsimonious prediction model. Unlike the inverse probability weighting approach, the proposed estimators are valid under the general noninformative censoring assumption. We also propose an efficient numerical algorithm for obtaining the entire regularization path to facilitate the adaptive selection of the tuning parameter. We illustrate the proposed methods with an application to predict the survival time of breast cancer patients based on a set of clinical prognostic factors and collected gene signatures and evaluate their finite sample performance through a simulation study.
Wednesday, February 27 at 12 pm
Speaker: Peter McCullagh, John D. MacArthur Distinguished Service Professor, Department of Statistics, University of Chicago
Title: Sampling bias and logistic models
Abstract: In a regression model, the joint distribution for each finite sample of units is determined by a function px(y) depending only on the list of covariate values x = (x(u1), . . . , x(un)) on the sampled units. No random sampling of units is involved. In biological work, random sampling is frequently unavoidable, in which case the joint distribution p(y, x) depends on the sampling scheme. Regression models can be used for the study of dependence provided that the conditional distribution p(y | x) for random samples agrees with px(y) as determined by the regression model for a fixed sample having a non-random configuration x. This paper develops a model that avoids the concept of a fixed population of units, thereby forcing the sampling plan to be incorporated into the sampling distribution. For a quota sample having a predetermined covariate configuration x, the sampling distribution agrees with the standard logistic regression model with correlated components. For most natural sampling plans such as sequential or simple random sampling, the conditional distribution p(y | x) is not the same as the regression distribution unless px(y) has independent components. In this sense, most natural sampling schemes involving binary random-effects models are biased. The implications of this formulation for subject-specific and population-averaged procedures are explored.
Wednesday, March 5 at 12 pm
Speaker: Sandy L. Zabell, Professor, Department of Statistics and Department of Mathematics, Northwestern University
Title: On Student’s 1908 paper “The probable error of a mean”
Abstract: This month marks the one-hundredth anniversary of the appearance of William Sealey Gosset’s celebrated paper “The probable error of a mean”. Gosset’s elegant contributionrepresented the first in a series of exact, “small-sample” results that were developed by Gosset, Fisher, and others to form a central component of the modern theory of statistical inference. This talk celebrates the centenary of Gosset’s paper by discussing both its background and impact on modern statistical theory and practice.
Wednesday, March 12 at 12 pm
Speaker: Rong Chen, Professor, Department of Statistics, Rutgers University
Title: Constrained Sequential Monte Carlo (CSMC)
Abstract: The sequential Monte Calo (SMC) methodologies have been shown to have great promises in solving very high dimensional and complex problems often encountered in applications such as communication, bioinformatics and financial data analysis. The key to a successful SMC implementation is efficiency, not only in terms of statistical inference accuracy, but also on the computational complexity. Efficiency is directly related to the design of the key components of SMC, including the intermediate distributions, the trial 'growth' distribution, and the resampling method. Many problems in application share a common feature - the target distribution is highly constrained. That is, the target distribution is a truncated distribution on an ill-shaped subspace of a high dimensional space. The constraints, without careful treatments, are a main source of obstacles in successful implementations of SMC. In this talk, we develop a set of algorithms categorized as Constrained Sequential Monte Carlo (CSMC) for solving such problems, including strategies in designing the intermediate distributions, the trial distributions, the resampling steps and Markov moves with CSMC.
Spring 2007
Tuesday, March 27 at 11 am
Speaker: Wei Biao Wu, Assistant Professor, Department of Statistics, The University of Chicago
Title: New Perspectives in the Theory of Time Series
Abstract: I will present a unified framework for a large-sample theory of time series. Topics in classical time series analysis will be revisited and they include the estimation of covariances, spectral densities and long-run variances and linear prediction. I will also talk about high dimensional covariance matrices estimation and inference of mean and quantiles of non-stationary processes. In the second part I will discuss dependence, a fundamental concept in statistics. Our viewpoint provides new insights in the study of complicated random systems. I will also discuss relations with nonlinear system theory, experimental design, information theory, risk-metrics theory and high dimensional covariance matrices estimation.
Tuesday, April 9 at 1 pm
Speaker: Vadim Linetsky, Professor, Department of Industrial Engineering and Management Sciences, Northwestern University
Title: Time-Changed Markov Processes in Asset Pricing
Abstract: The procedure of a time changing stochastic process, going back to S. Bochner, allows one to construct new processes from a given process by running it on a new clock that can itself be a non-decreasing stochastic process (random time). When the process to be time changed is a Markov process and the Laplace transform of the time change is known, there is an explicit representation of the expectation operator of the time changed process in terms of the resolvent of the original Markov process and the Laplace transform of the time change. We use this result to build a rich tool box of analytically tractable asset pricing models in finance that incorporate stochastic volatility, state-dependent jumps, and state-dependent killing rates (or default intensities). Among the resulting models is a new credit-equity model that is an extension of the constant elasticity of variance (CEV) model with stochastic volatility, jumps, and default, as well as extensions of the Cox-Ingersoll-Ross and the Ornstein-Uhlenbeck models with mean-reverting jumps.
Tuesday, May 8 at 11 am
Speaker: Hui Xie, Assistant Professor, School of Public Health, University of Illinois at Chicago
Title: A Local Sensitivity Analysis Approach to Longitudinal Non-Gaussian Data with Nonignorable Dropout
Abstract: Longitudinal non-Gaussian data subject to potentially nonignorable dropout is a challenging problem. Very often data contain little information about the dropout mechanism. As a result, frequently an analysis has to rely on some strong but unverifiable assumptions, among which ignorability is a key one. Sensitivity analysis has been advocated to assess the likely effect of alternative assumptions about dropout mechanism on such an analysis. Previously Ma et al. (2005) applied a general index of local sensitivity to nonignorability (ISNI) (Troxel et al. 2004) to measure the sensitivity of MAR estimates to small departures from ignorability for multivariate normal outcomes. In this paper, we extend the ISNI methodology to handle longitudinal non-Gaussian data subject to nonignorable dropout. Specifically we propose to quantify the sensitivity of inferences in the neighborhood of an MAR generalized linear mixed model (GLMM) for longitudinal data. Through a simulation study, we evaluate the performance of the proposed methodology. We then illustrate the methodology in one real example: Smoking Cessation Data.
Tuesday, May 22 at 11 am
Speaker: Hira L. Koul, Professor, Department of Statistics and Probability, Michigan State University
Title: Model Diagnostics via Martingale Transforms
Abstract: Classical problems in statistics are to fit a distribution up to unknown location-scale parameters and to fit a parametric model to the regression-autoregressive function. The first problem is generic to many other statistical models including the celebrated regression and autoregressive and generalize autoregressive conditionally heteroscedastic (ARCH-GARCH) models where one is testing that innovations are from a given distribution. It will be argued that the Khamaladze's martingale transformation of the residual empirical process that yields asymptotically distribution free tests for the one sample location-scale model does the same thing for a parametric heteroscedastic regression model, and ARCH-GARCH models. Analogous tests for the second problem will be also discussed.
Friday, May 25 at 3 pm
Speaker: Cliff Speigelman, Professor, Department of Statistics, Texas A&M University University
Title: Statistical considerations on the process of discovering and validating biomarker candidates using MS platforms
(Joint with Lorenzo J. Vega Montoto and Asokan Mulayath Variyath)
Abstract: Claims have been made that the application of supervised pattern recognition methodology can be used with MS proteomic data to achieve near perfect sensitivity and specificity for detecting early stage cancer. So far those claims have not been verified in part due to the use of less than optimal experimental design, but in the interim significant effort has been spent on proteomic biomarker discovery research (without significant positive results) largely using tandem MS platforms. Underpinning the proteomics studies are several key components including standardization of materials, bioinformatics, reagent development, MS improvements, and statistics. This presentation discusses the NCI CPTAC program generally and a related mouse studies project. Several areas where statistical design of experiment input is present will be discussed.
Thursday, June 7 at 11 am
Speaker: Archana Singh, PhD student, Department of Computer Science, University of Tsukuba, Japan and National Food Research Institute, Tsukuba, Japan
Title: Robustness of FDR Method in Brain Mapping Studies using Functional near Infra-Red Spectroscopy
(Joint with Ippeita Dan)
Abstract: Near infrared spectroscopy (NIRS) is an emerging non-invasive technique, which allows monitoring of brain activity in infants, patients, and healthy subjects with a relative ease of application than other techniques, because it is portable and is more permissive to subjects' movements and allows the subjects' brain monitoring in a more eco-friendly setting. It allows simultaneous measurements through many channels ranging from below ten to around two hundred, thus escalating the issue of multiple testing. Till date, only a few studies have considered this issue using Bonferroni correction, which tends to be conservative in spatially correlated fNIRS data. In addition, its power is inversely proportional to the number of channels, which varies among fNIRS experiments depending on selected region of interest (ROI), thereby leading to a subjective inference. This problem may be well circumvented by a more contemporary approach, called false discovery rate (FDR). In this session, I will illustrate how the application of FDR procedures can provide a more objective and also more powerful inference than Bonferroni method in analyzing neuroimaging analysis with real data. In addition, I will present the results from a simulation analysis that show that FDR provides greater sensitivity while maintaining the conventional specificity control.
Winter 2007
Tuesday, January 23 at 2 pm
Speaker: Joel L. Horowitz, Charles E. and Emma H. Morrison Professor of Market Economics, Department of Economics, Northwestern University
Title: Nonparametric Instrumental Variables Estimation of a Quantile Regression Model
(Joint with Sokbae Lee)
Abstract: We consider nonparametric estimation of a regression function that is identified by requiring a specified quantile of the regression "error" conditional on an instrumental variable to be zero. The resulting estimating equation is a nonlinear integral equation of the first kind, which generates an ill-posed-inverse problem. The integral operator and distribution of the instrumental variable are unknown and must be estimated nonparametrically. We show that the estimator is mean-square consistent, derive its rate of convergence in probability, and give conditions under which this rate is optimal in a minimax sense. The results of Monte Carlo experiments show that the estimator behaves well in finite samples.
Tuesday, February 6 at 2 pm
Speaker: Leah J. Welty, Assistant Professor, Department of Preventive Medicine, Northwestern University
Title: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality
Abstract: A distributed lag model (DLM) is a regression model that includes lagged exposure variables as covariates; its corresponding distributed lag (DL) function describes the relationship between the lag and the coefficient of the lagged exposure variable. DLMs have recently been used in environmental epidemiology for quantifying the cumulative effects of weather and air pollution on mortality and morbidity. Standard methods for formulating DLMs include unconstrained, polynomial, and penalized spline DLMs. These methods may fail to take full advantage of prior information about the shape of the DL function for environmental exposures, or for any other exposure with effects that are believed to smoothly approach zero as lag increases, and are therefore at risk of producing sub-optimal estimates.
We propose a Bayesian DLM (BDLM) that incorporates prior knowledge about the shape of the DL function and also allows the degree of smoothness of the DL function to be estimated from the data. In a simulation study, we compare our Bayesian approach with alternative methods that use unconstrained, polynomial and penalized spline DLMs. We also show that BDLMs encompass penalized spline DLMs: under certain assumptions, imposing a prior on the DL coefficients is analogous to smoothing the DL coefficients with a penalty specified by the prior. We apply our BDLM to data from the National Morbidity, Mortality, and Air Pollution Study (NMMAPS) to estimate the short term health effects of particulate matter air pollution on mortality from 1987-2000 for Chicago, Illinois.
Tuesday, February 20 at 2 pm
Speaker: Ruey S. Tsay, H.G.B. Alexander Professor of Econometrics and Statistics, Graduate School of Business, The University of Chicago
Title: The Dynamics of Threshold Interest Rate Models
(Joint with Shang C. Chiou)
Abstract: We propose a two-factor arbitrage-free term structure model for interest rates, where the short-term interest rate follows a threshold model with stochastic volatility. Under the proposed model, the number of thresholds is unknown and must be endogenously determined by a model selection procedure. To estimate the proposed model, we develop an efficient Bayesian method by transforming the threshold problem into a structural-break problem. Simulation study shows that the proposed Bayesian method provides an accurate estimation of the thresholds and the associated parameters of the model. In applications, the U.S. data strongly favor the newly proposed model over other models with constant volatility. We further compare the threshold model to its affine counterpart and the Markov-switching model, demonstrating the significant difference of using the thresholds. We find that the threshold model built implies a kinked yield function and can generate an inverted yield curve. In addition, for U.S. monthly bond yields with 11 maturities (1 to 6 months and 1 to 5 years), the threshold model has smaller out-of-sample pricing errors than other models, especially for the long-term yields.
Tuesday, March 6 at 2 pm
Speaker: Ying Wei, Assistant Professor
Department of Biostatistics, Columbia University
Title: A Dynamic Quantile Regression Transformation Model for Longitudinal Data
Abstract: This paper describes a flexible nonparametric quantile regression model for longitudinal data. The basic elements of the model are a time-dependent power transformation on the longitudinal dependent variable and a varying-coefficient model for conditional quantile functions. A two-step estimation procedure is proposed to fit the model, and its consistency property is established. Tuning parameters are chosen with generalized cross validation in conjunction with a Schwarz-type information criterion. The proposed method is illustrated by a data on the time evolution of CD4 cell counts in HIV-1 infected patients under three different treatments. The quantile regression approach for longitudinal data enables construction of pointwise prediction band of individual trajectories without requiring parametric distributional assumptions. This is joint work with Prof. Yunming Mu at University of Texas at A & M.
Fall 2006
Monday, September 25, 2006, 11 am
Speaker: Professor Ji-Ping Wang, Department of Statistics, Northwestern University
Title: Statistical models for nucleosome DNA alignment and linker length prediction in Eukaryotic cells
Abstract: Eukaryotic DNAs exist in a highly compacted form known as chromatin. The nucleosome is the fundamental repeating subunit of chromatin, formed by rapping a short tretch of DNA, 47bp in length, around four pairs of istone proteins. Nucleosome DNA obtained by experiments however varies in ength due to imperfect digestion. We develop a mixture model that haracterizes the known dinucleotide periodicity probabilistically to mprove the alignment of nucleosomal DNAs. To further investigate chromatin tructure, we experimentally cloned and sequenced di-nucleosome sequences rom yeast. Each dinucleosome sequence roughly cover two nucleosomes located toward the two ends) with a linker DNA in between. A HMM model is rained based on the nucleosome sequence alignment for prediction of ucleosome positioning. Results show that Eukaryotic cells do favor periodic inker length in chromatin forming on a roughly 10 bp basis.
Monday, October 9, 2006, 11 am
Speaker: Professor Zhigang Zhang, Department of Statistics, Oklahoma State University
Title: A Class of Transformed Mean Residual Life Models with Censored Survival Data
Monday, October 23, 2006, 11 am
Speaker: Dr. Lanju Zhang, Department of Biostatistics, MedImmue Inc.
Title: Response-Adaptive Randomization for Survival Trials: The Parametric Approach
Abstract: Few papers in the literature deal with response-adaptive randomization procedures for survival outcomes and those that do either dichotomize the outcomes or use a nonparametric approach. In this talk, the optimal allocation approach and a parametric response-adaptive randomization procedure are used under exponential and Weibull distributions. The optimal allocations are derived for both distributions and the doubly-adaptive biased coin design is applied to target the optimal allocations. The asymptotic variance of the procedure is obtained for the exponential distribution. The effect of intrinsic delay of survival outcomes is treated. These findings are based on rigorous theory, but also verified by simulation. We illustrate our procedure by redesigning a clinical trial.
Monday, October 30, 2006, 11 am
Speaker: Professor Jan Hannig, Department of Statistics, Colorado State University
Title: Statistical Model for Tracking with Applications
Abstract: We propose a new tracking model that allows for birth, death, splitting and merging of targets. Targets are also allowed to go undetected for several frames. The splitting and merging of targets is a novel addition for a statistically based tracking model. This addition is essential for the tracking of storms, which is the motivation for this work. The utility of this tracking method extends well beyond the tracking of storms. It can be valuable in other tracking applications that have splitting or merging, such as vortexes, radar/ sonar signals, or groups of people. The method assumes that the location of a target behaves like a Gaussian Process when it is observable. A Markov Chain model decides when the birth, death, splitting, or merging of targets takes place. The tracking estimate is achieved by an algorithm that finds the tracks that maximize the conditional density of the unknown variables given the data. The problem of how to quantify the confidence in a tracking estimate is addressed as well. Finally, some sufficient conditions for consistency of this tracking estimate are presented and an almost sure convergence of the tracking estimate to the true path is proved. The practical suitability of this method is then demonstrated on simulated and real data.
Based on a joint work with Thomas C.M. Lee and Curtis B. Storlie.
Monday, November 6, 2006 at 10 am
Speaker: Professor Alfred Rademaker, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University
Title: The design and analysis of cancer clinical trials
Abstract: Cancer clinical trials run the spectrum from Phase 0 feasibility studies to Phase IV surveillance studies. This talk will focus on statistical methods related to Phase I and Phase II clinical trials. For Phase I, the standard 3+3 design as well as the continual reassessment method will be discussed. For Phase II studies, the Simon 2-stage design will be described, as well as the use of conditional methods for interval estimation of response rate. Other design variations, such as randomized Phase II or combined Phase II/Phase III studies, will also be presented.
Spring 2006
Tuesday, March 7, 2006 at 11 am
Speaker: Ms. Cindy Xin Wang, Department of Statistics, Northwestern University
Title: Gatekeeping Procedures Based on Weighted Bonferroni Tests for Multiple Endpoints in Dose Finding Studies
Abstract: In many dose finding studies there are hierarchically ordered endpoints (e.g., primary, secondary, etc.) and a given dose is compared with a control on any endpoint conditional on the tests on the higher-ordered endpoints being significant (serial gatekeeping). It is required to control the familywise error rate at a designated level taking into account multiplicity of tests. We give a closed procedure (Marcus, Pertiz and Gabriel 1976) for this problem by applying the general and flexible tree-structured testing approach to gatekeeping problems developed in Dmitrienko, Wiens, Tamhane and Wang (2006). The proposed procedure uses weighted Bonferroni tests for testing intersection hypotheses. For an easier implementation of this closed procedure, we give an equivalent stepwise procedure that uses penalized Bonferroni tests for all endpoints except the last, for which it uses a penalized Holm test. The penalty charged at each step of testing is inversely proportional to a so-called rejection gain factor, which depends on the number of rejections at earlier steps and the weights assigned to those rejected hypotheses. The method is applied to an diabetes drug trial data with three endpoints. Extensions in which the Bonferroni test is replaced with the Simes or resampling or the Dunnett test are indicated.
Tuesday, April 4, 2006, 11 am
Speaker: Ms. Yang Ge, Department of Statistics, Northwestern University
Title: On Consistency of Bayesian Inference with Mixtures of Logistic
Regression Models
Abstract: This is a theoretical study of the consistency properties of Bayesian inference using mixtures of logistic regression models. When standard logistic regression models are combined in a ‘mixtures of experts’ set-up, a flexible model is formed to model the relationship between a binary (yes-no) response y and a vector of predictors x. Bayesian inference conditional on the observed data can then be used for regression and classification. This study gives conditions on choosing the number of experts (i.e., number of mixing components) k, or choosing a prior distribution for k, so that Bayesian inference is ‘consistent’, in the sense of ‘often approximating’ the underlying true relationship between y and x. The resulting classification rule is also ‘consistent’, in the sense of having near-optimal performance in classification. We show these desirable consistency properties with a nonstochastic k growing slowly with the sample size n of the observed data, or with a random k that takes large values with nonzero but small probabilities.
Monday, April 24, 2006 at 3:30 pm
Speaker: Professor Mohsen Pourahmadi, Division of Statistics, Northern Illinois University
Title: Generalized Linear Models for the Covariance Matrix of Longitudinal Data
Abstract: We survey the progress made in modelling covariance matrices from the perspective of generalized linear models (GLM) and show how one can move beyond the use of the identity and logarithmic link functions, and prespecified structures. Observing that most time-domain models (ARMA, state-space,....) in time series analysis are means to diagonalize a Toeplitz covariance matrix via a unit lower triangular matrix (Cholesky decomposition), we discuss the distinguished role of the Cholesky decomposition in providing a systematic and data-based procedure for formulating and fitting parsimonious models for general covariance matrices guaranteeing the positive-definiteness of the estimates. Pulling together some techniques from regression and time series analyses provide the necessary tools for the procedure which reduces the unintuitive task of modelling covariance matrices to that of a sequence of regression models. The procedure is illustrated using a real longitudinal dataset.Once a bona fide GLM framework for modelling covariances is found, its Bayesian, nonparametric, generalized additive and other extensions can be developed in direct analogy with the respective extensions of the traditional GLM.
Tuesday, May 2, 2006 at 11 am
Speaker: Professor Hakan Demirtas, Division of Epidemiology and Biostatistics, University of Illinois at Chicago
Title: Multiple imputation under Bayesianly smoothed random-coefficient hierarchical pattern-mixture models for nonignorably missing longitudinal data
Abstract: Conventional pattern-mixture models can be highly sensitive to model misspecification. In many longitudinal studies, where the nature of the drop-out and the form of the population model are unknown, interval estimates from any single pattern-mixture model may suffer from undercoverage, because uncertainty about model misspecification is not taken
into account. In this talk, I will introduce a new class of Bayesian random coefficient pattern-mixture models to address potentially non-ignorable drop-out. Instead of imposing hard equality constraints to overcome inherent inestimability problems in pattern-mixture models, I propose to smooth the polynomial coefficient estimates across patterns using a hierarchical
Bayesian model that allows random variation across groups. Using real and simulated data, I show that multiple imputation under a three-level linear mixed-effects model which accommodates a random level due to drop-out groups can be an effective method to deal with non-ignorable drop-out by allowing model uncertainty to be incorporated into the imputation process.
Papers that are relevant to this talk:
Demirtas, H. & Schafer, J.L. (2003). On the performance of
random-coefficient pattern-mixture models for non-ignorable drop-out.
Statistics in Medicine, 22, 2553-2575.
Demirtas, H. (2004). Modeling incomplete longitudinal data. Journal of
Modern Applied Statistical Methods, Volume 3, No 2, 305-321.
Demirtas, H. (2005). Multiple imputation under Bayesianly smoothed
pattern-mixture models for non-ignorable drop-out. Statistics in Medicine,
24, 2345-2363.
Demirtas, H. (2005). Bayesian analysis of hierarchical pattern-mixture
models for clinical trials data with attrition and comparisons to commonly
used ad-hoc and model-based approaches. Journal of Biopharmaceutical
Statistics, Volume 15, Issue 3, 383-402.
Tuesday, May 9, 2006 at 11 am
Speaker: Professor Peter Song, Department of Statistics and Actuarial Science,
University of Waterloo
Title: Maximization by Parts in Likelihood Inference
Abstract: In this talk I will present a new algorithm for solving a score equation for the maximum likelihood estimate in certain problems of practical interest. The method circumvents the need to compute second order derivatives of the full likelihood function. It exploits the structure of certain models that yield a natural decomposition of a very complicated likelihood function. In this decomposition, the first part is a log likelihood from a simply analyzed model and the second part is used to update estimates from the first. Convergence properties of this iterative (fixed point) algorithm are examined and asymptotics are derived for estimators obtained by using only a finite number of iterations. I will illustrate several examples in the presentation, including multivariate Gaussian copula models, nonnormal random effects models, generalized linear mixed models, and state space models. Properties of the algorithm and of estimators are discussed in detail via simulation studies on a bivariate copula model and a nonnormal linear random effects model.
Tuesday, May 16, 2006 at 11 am
Speaker: Professor Edward C. Malthouse, Department of Integrated Marketing Communications, Medill School, Northwestern University
Title: Conceptualizing and Measuring Media Engagement and its Effects
Abstract: We propose measuring the latent construct “media engagement” with a third-order confirmatory factor analysis model. The approach is tested using five large reader surveys of 100 and 50 newspapers, 100 magazines, and 39 and 8 media web sites. Over 400 qualitative interviews generated samples of items (questions) from the construct domain for the three media platforms. Consumer surveys measured the items on samples of readers. We used exploratory factor analysis (EFA) to develop scales measuring different dimensions of engagement and confirmatory factor analysis (CFA) to purify the scales further. Additional EFA was used to identify higher-order factors, which were then tested with confirmatory models. We contrast the higher-order factor structure for the three media platforms. Predictive validity is assessed by relating the engagement factors to outcome measures such as usage with random coefficient models and ridge regression. Three quasi-experiments evaluate the effect of engagement on advertising effectiveness.
(This is a joint research with Bobby Calder, Marketing Department, Kellogg.)
Tuesday, May 30, 2006 at 11 am
Speaker: Professor Torben G. Andersen, Kellogg School of Management, Northwestern University and NBER
Title: Continuous-Time Models, Realized Volatilities, and Testable Distributional Implications for Daily Stock Returns
Abstract: We provide a framework for analyzing and understanding daily return distributions within the context of traditional continuous-time asset price processes. We develop a sequence of simple-to-implement distributional tests from transformed inter-daily returns. They hinge on the availability of intraday data for construction of nonparametric realized variation measures and jump detection statistics. Each step speaks to key features of the process underlying the discretely observed prices and should help in developing empirically more realistic models. For thirty large stocks, we find that time-varying diffusive volatility, jumps and leverage effects are all critical in order to describe the dynamic dependencies in the observed prices.
Coauthors:
Tim Bollerslev, Dept. of Economics and Fuqua School, Duke University and NBER
Per H. Frederiksen, Jyske Bank, Denmark
Morten Ø. Nielsen, Dept. of Economics, Cornell University
Fall 2005
Monday, October 17, 2005 at 11 am
Speaker: Professor Denise Scholtens, Department of Preventive Medicine, Northwestern University
Title: Local modeling of global interactome networks
Abstract: Accurate systems biology modeling requires a complete catalog of protein complexes and their constituent proteins. We discuss a graph theoretic/statistical algorithm for local dynamic modeling of protein complexes using data from affinity purification-mass spectrometry experiments. The algorithm readily accommodates multicomplex membership by individual proteins and dynamic complex composition, two biological realities not accounted for in existing topological descriptions of the overall protein network. A penalized likelihood approach guides the protein complex modeling algorithm. With an accurate complex membership catalog in place, systems biology can proceed with greater precision.
Monday, October 31, 2005 at 11 am
Speaker: Professor Hua Yun Chen, Department of Epidemiology & Biostatistics, University of Illinois at Chicago
Title: Approximation to locally semiparametric efficient scores in missing data problems through likelihood robustification
Abstract: In parametric/semiparametric models with missing data, the efficient estimator often cannot be obtained without additional model assumptions even if the efficient estimator has a simple form when no missing data are involved. Robins et al. proposed to find the locally efficient estimator as a compromise and showed that the locally efficient estimator have the doubly robust property when the missing data are missing at random in Rubin's sense. In practice, the approach proposed by Robins et al. to finding a locally efficient estimator can be very challenge to implement. We propose an alternative representation of the efficient score through likelihood robustification. The proposed representation is straightforward to obtain, can be applied to missing data with arbitrary missing patterns, and is amenable to computing the locally efficient score. The estimator based on the proposed representation has the doubly robust property when missing data are MAR, and only requires correct specification of the missing data mechanism model for consistency when missing data are nonignorable. Estimation and inferences on the parameters are proposed. Applications of the proposed method are illustrated by examples. The performance of the approach is examined by a simulation study.
Monday, November 14, 2005 at 11 am
Dr. Guei-Feng (Cindy) Tsai, Department of Statistics, Northwestern University
Title: Semi-nonparametric Models and Inference for High Dimensional Microarray Data
Abstract: We develop a new approach to analyze high dimensional cell-cycle microarray data with no replicates. There are two kinds of correlations for cell-cycle microarray data. Measurements are correlated within a gene, and measurements are also correlated between genes since some genes may be biologically related. The proposed procedure combines a classification method, the quadratic inference function method and nonparametric techniques for complex high dimensional data. We first perform a gene classifying analysis to classify genes into classes with similar cell-cycle patterns, including a class with no cell-cycle phenomena at all. We use genes within the same class as pseudo-replicates to build nonparametric models and inference functions. In order to incorporate correlation of longitudinal measurements, the quadratic inference function method is also applied. This approach allows us to perform chi-squared tests for testing whether the coefficients are time varying or not. This also allows us to determine whether certain genes regulate cell cycles. A real data example on cell-cycle microarray data as well as simulations are illustrated.
Friday, December 9, 2005 at 11 am
Speaker: Dr. Alex Dmitrienko, Eli Lilly
Title: Branching tests in clinical trials with multiple objectives
Abstract: This talk discusses branching multiple tests with clinical trial applications. Branching tests arise in clinical trials with hierarchically ordered multiple objectives, for example, in the context of multiple dose-control tests with logical restrictions or analysis of multiple endpoints. The proposed branching approach is based on the principle of closed testing and generalizes the serial and parallel gatekeeping approaches. The branching testing methodology will be illustrated using a clinical trial with multiple endpoints (primary, secondary and tertiary) and multiple objectives (superiority and non-inferiority testing) as well as a dose-finding trial with multiple endpoints.
