Past and Upcoming Events

To receive seminar reminders via e-mail, please send a request to hongmei@northwestern.edu or jzwang@northwestern.edu.
All the seminars are held in the meeting room of the Department of Statistics, 2006 Sheridan Road, Evanston, IL 60208.

 

Fall 2009

Wednesday November 18, 2009, 11am

Speaker: Jie Yang, Assistant Professor, Dept. of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago

Title: Classification Model Based on Permanent Process
Abstract: This talk introduces a new statistical model based on a  permanent process for supervised classification problems. Unlike many research works in the literature, the permanent model assumes only exchangeability instead of independence on observations. Regardless of the number of classes or the dimension of the feature variables, the model may require only 2-3 parameters for fitting the covariance structure within clusters. It works well even if the class occupies non-convex, disjoint regions, or regions overlapped with other classes in the feature space. The application to DNA microarray analysis indicates that the permanent model is more capable of handling high-dimensional data. It can employ more feature variables in an efficient way and reduce the prediction error significantly. This is critical when the true classification relies on non-reducible high-dimensional features.

 

Wednesday December 2, 2009, 11am

Speaker: Beth Andrews, Assistant Professor of Statistics, Northwestern University

Title:
Abstract:

 

Wednesday November 4, 2009, 11am

Speaker: Donald Hedeker, Professor of Biostatistics, University of Illinois at Chicago

Title: Multilevel Models for Ecological Momentary Assessment (EMA) Data: An Application of a Mixed-Effects Location Scale Model
Abstract: For longitudinal data, multilevel models include random subject effects to indicate how subjects influence their responses over the repeated assessments.  The error variance and the variance of the random effects are usually considered to be homogeneous.  These variance terms characterize the within-subjects (i.e., error variance) and between-subjects (i.e., random-effects variance) variation in the data.  In studies using Ecological Momentary Assessment (EMA), up to thirty or forty observations are often obtained for each subject, and interest frequently centers around changes in the variances, both within- and between-subjects.  In this presentation, focus is on an adolescent smoking study using EMA where interest is on characterizing changes in mood variation.  In terms of the multilevel model, covariates are allowed to influence the mood variances to address this.  Also, a subject-level random effect is added to the within-subject variance specification.  This permits subjects to have influence on the mean, or location, and variability, or (square of the) scale, of their mood responses.  Additionally, the location and scale random effects are allowed to be correlated.  These mixed-effects location scale models have useful applications in many research areas where interest centers on the joint modeling of the mean and variance structure.

 

Wednesday October 21, 2009, 11am

Speaker: Dr. Liqun Xi, Northwestern University

Title: The minimum capture proportion for reliable estimation in capture-recapture models
Abstract: For capture-recapture models, a reliable estimate for the population size is possible only for a reasonably large capture proportion, especially for a heterogeneous population. In all capture-recapture models, how large the capture proportion should be to ensure a reliable estimate for the population size is an important question. In this seminar, an idea for obtaining the minimum capture proportions for reliable estimation in capture-recapture models is introduced. Some results are presented, as well a real capture- recapture data with the application of the proposed results.

 

Wednesday October 7, 2009, 11am

Speaker: Yuan Liao, Northwestern University

Title: Posterior Consistency of Nonparametric Conditional Moment Restricted Models
Abstract: This paper considers the nonparametric conditional moment restricted model
that was previously studied by Ai and Chen (2003). We look at the estimation
of the nonparametric structural function in a Bayesian way, starting by
transforming the conditional restrictions equivalently into infinite number of
unconditional moment restrictions, and then derive the posterior distribution
of the parameter of interest based on the limited information likelihood. We
focus on the frequentist properties of the posterior distribution, allowing the
nonparametric structural function to be partially identified. It is shown that the
posterior converges to any neighborhood of the identified region. Finally, we
apply our results to the nonparametric instrumental regression model and the
single index model.

 

Some Past Seminars

Winter 2009

Friday, April 17, 2009 at 2 pm (Unusual TIME and LOCATION)

Joint Seminar with IEMS Department
Seminar Room: Tech, Room M228 (2145 Sheridan Road, Evanston)
Speaker: Cyrus R. Mehta, President of Cytel Corporation & Adjunct Professor of Biostatistics, Harvard University
Title: Design and Implementation of Late-Stage Adaptive Trials: Experiences of an Industry Consultant
Abstract: Sound statistical principles combined with careful planning of the logistical details are essential for successful implementation of an adaptive clinical trial. In this presentation I will share my experience as a consultant involved in several late stage adaptive designs. Topics that I will cover include sample size re-estimation, population enrichment and seamless phase II/III design. Each topic will be illustrated with a real case study. The crucial role of simulation will be highlighted. Regulatory experiences will be discussed.

 

Wednesday, May 6, 2009 at 11 am 

Speaker: Joseph Kang, Assistant Professor of Biostatistics, Department of Preventive Medicine, Northwestern University

Title: Causal inference for weight control behaviors among adolescent girls

Abstract: Overweight and obesity often begin in childhood, but few successful models exist for prevention and treatment of obesity in children and adolescent. Among adolescent girls, dieting may prospectively predict weight gain. Due to possible reciprocal causality between diet and weight gain, a quantitative causal analysis is necessary. During past two decades, Rubin Causal Model (RCM) has been known to idealize a successful design to quantify causal effects. In this talk, some inferential strategies using RCM will be discussed in order to estimate causal effects of weight control behaviors. The first session of this talk will be dedicated to a discussion of 1) recent semiparametric methods to infer causal estimands in RCM and 2) the importance of moderators that change the causal effects of dieting. The second session will discuss an extended RCM to adjust for the measurement error of weight control behaviors using latent class model. We use the National Longitudinal Study of Adolescent Health (Add health) data set for entire analyses.



Wednesday, May 13, 2009 at 11 am 

Speaker: Viktor Todorov, Assistant Professor, Department of Finance, Kellogg School of Management, Northwestern University

Title: Limit Theorems for Power Variations of Pure-Jump Processes with Application to Activity Estimation

Abstract: This paper derives the asymptotic behavior of realized power variation of pure-jump Ito semimartingales as the sampling frequency within a fixed interval increases to infinity. We prove convergence in probability and an associated central limit theorem for the realized power variation on the space of functions of the power equipped with a local uniform topology. We apply the limit theorems to propose an efficient adaptive estimator of the activity of discretely-sampled Ito semimartingale over a fixed interval.

 

Wednesday, May 27, 2009 at 11 am 

Speaker: Junhui Wang, Assistant Professor, Department of Mathematics, Statistics,
and Computer Science, UIC

Title: To be announced

Abstract: To be announced

 

Fall 2008

Friday, October 3 at 1 pm

Speaker: Professor Terry Speed, Statistics, UC Berkeley and Bioinformatics, WEHI

Title: Some statistical issues arising with next-generation DNA sequencing data

Abstract: Next generation sequencing machines are now producing tens of millions of short sequencing reads. These need to be mapped back to a reference genome, if there is one, and then further processed in a way which varies with the task, For mRNA-seq, these need to be assigned to genes, exons, or other transcriptional units, and counted. For ChIP-seq, we need to find putative binding sites. What sort of statistical issues arise, and how should we proceed with the analyses. Some initial ideas will be presented.

 

Thursday, October 23 at 4 pm

Speaker: Professor Ingram Olkin, Department of Statistics, Stanford University

Title: Life Distributions in Survival Analysis and Reliability: Structure of Semiparametric Families

Abstract: Semiparametric families are families that have both a real parameter and a parameter that is itself a distribution.  A number of semiparametric parametric families suitable for lifetime data in survival or reliability are introduced:  scale, power, frailty (proportional hazards), age, moment, and others. Interesting results on stochastic orderings are obtained for these families. The coincidence of two families provides a characterization of the underlying distribution. Some of the characterization results provide a rationale for the use of certain families.  In this talk we provide an overview of these semiparametric families, and present several characterizations.

This work is joint with Albert W. Marshall.


Spring 2008

Wednesday, April 9 at 12 pm

Speaker: Professor Heping Zhang, Biostatistics, Director of Collaborative Center for Statistics in Science, Yale University.

Title: Joint Modeling of Time Series Measures and Recurrent Events and Analysis of the Effects of Air Quality on Respiratory Symptoms

Abstract: Exposure to ambient pollutants at concentrations above defined standards is a risk factor for respiratory symptoms, especially in sensitive children. Many studies have been undertaken to monitor air quality and to assess its association with respiratory symptoms. We propose a joint mixed effects regression model of time series measures and recurrent events to analyze the air quality and respiratory symptom data from the Yale Mothers and Infants Health Study.

Three mothers' symptoms (runny nose, cough, and sore throat) and three infants' symptoms (runny nose, cough, and general sickness) were investigated. To alleviate the computational complexity, a two-stage maximum likelihood based estimation procedure is introduced to estimate the parameters, and simulation studies are conducted to assess the validity of this estimation procedure.

Our analysis reveals differences in the etiology of respiratory symptoms between mothers and infants. Most notably, coarse particles of mass between 2.5 and 10 microns in diameter increased the risks of mothers' runny nose and cough symptoms, but had no significant impact on any of the three infants' symptoms. The sulfate level was negatively associated with the risk of infants' runny nose and cough symptoms, but had no significant effects on any of the three mothers' symptoms. High level of humidity is negatively associated with the mothers' cough incidence, but had no significant association on any of the three infants' symptoms. Such differences reveal not only the sensitivity of the mothers and infants to the air quality, but also call for further understanding of the differences. It is possible that actions taken to overcome humidity by mothers may inadvertently affect the infants.

This is a joint work with Yuanqing Ye, Peter Diggle, and Jian Shi.

 

Wednesday, April 23 at 12 pm

Speaker: Professor Dan Nordman, Department of Statistics, Iowa State University

Title: Tapered empirical likelihood for time series data

Abstract: This talk aims to motivate and describe a formulation of empirical likelihood for time series inference based on tapered data blocks.  Data blocks are a device for capturing the time dependence and the proposed method involves tapering these blocks in a special way. The resulting empirical likelihood has chi squared limits for nonparametrically calibrating confidence intervals for time series parameters, such as means and correlations.  Tapering is shown to improve the chi-squared approximation and enhance the coverage accuracy of intervals compared to untapered empirical likelihood versions.  Simulation evidence is provided and block choices are considered as well.

 

 

Wednesday, May 7 at 12 pm

Speaker: Professor Ginger Davis, Department of Systems and Information Engineering, University of Virginia

Title: Hierarchical Bayesian Markov Switching Models with Application to Predicting Spawning Success of Shovelnose Sturgeon

Abstract: The timing of spawning in fish is tightly linked to environmental factors however these factors are not very well understood for many species. Specifically, little information is available to guide recruitment efforts for endangered species such as the sturgeon. Therefore, we propose a Bayesian hierarchical model for predicting spawning success of the shovelnose sturgeon which uses both biological  and behavioral (longitudinal) data. In particular, we use data produced from a tracking study conducted in the Lower Missouri River. The data produced from this study consist of biological variables associated with readiness to spawn along with longitudinal behavioral data collected using telemetry and data storage device sensors. These high frequency data are complex both biologically and in the underlying behavioral process. To accommodate such complexity, the model we developed uses an eigenvalue predictor, derived from the transition probability matrix of a two-state Markov switching model with GARCH dynamics, as a generated regressor in a hierarchical linear regression model. Finally, in order to minimize the computational burden associated with estimation of this model, a parallel computing approach is proposed.

 

Wednesday, May 14 at 12 pm

Speaker: Professor Xiaofeng Shao, Department of Statistics, University of Illinois at Urbana-Champaign

Title: Portmanteau tests in time series

Abstract: This talk consists of two parts. In the first part, we will talk about testing for white noise and its applications to goodness-of-fit of long memory time series models. The limitation of the current asymptotic theory for portmanteau tests will be pointed out and new theoretical results will be discussed. In the second part, we will introduce  generalized portmanteau type test statistics in the frequency domain to test independence between two stationary time series. Unlike the existing tests, each time series is allowed to possess short memory, long memory or anti-persistence. Under the null hypothesis of independence, the asymptotic null distributions of the proposed statistics are standard normal. The results from a simulation study will also be presented.

Winter 2008

Wednesday, February 13 at 12 pm

Speaker: Lu Tian, Assistant Professor, Department of Preventive Medicine, Northwestern University

Title: Lasso Regularization for the Accelerated Failure Time Model

Abstract: It is challenging to develop a stable regression model for predicting failure time outcomes when the dimension of the covariates is big relative to the sample size. Further complication arises due to the fact that failure time responses are often not completely observed because of right censoring. In this paper, we proposed to couple the LASSO type regularization methods with the Gehan's rank based estimator in the setting of accelerated failure time model to construct a stable and parsimonious prediction model. Unlike the inverse probability weighting approach, the proposed estimators are valid under the general noninformative censoring assumption. We also propose an efficient numerical algorithm for obtaining the entire regularization path to facilitate the adaptive selection of the tuning parameter. We illustrate the proposed methods with an application to predict the survival time of breast cancer patients based on a set of clinical prognostic factors and collected gene signatures and evaluate their finite sample performance through a simulation study.


Wednesday, February 27 at 12 pm

Speaker: Peter McCullagh, John D. MacArthur Distinguished Service Professor, Department of Statistics, University of Chicago

Title: Sampling bias and logistic models

Abstract: In a regression model, the joint distribution for each finite sample of units is determined by a function px(y) depending only on the list of covariate values x = (x(u1), . . . , x(un)) on the sampled units. No random sampling of units is involved. In biological work, random sampling is frequently unavoidable, in which case the joint distribution p(y, x) depends on the sampling scheme. Regression models can be used for the study of dependence provided that the conditional distribution p(y | x) for random samples agrees with px(y) as determined by the regression model for a fixed sample having a non-random configuration x. This paper develops a model that avoids the concept of a fixed population of units, thereby forcing the sampling plan to be incorporated into the sampling distribution. For a quota sample having a predetermined covariate configuration x, the sampling distribution agrees with the standard logistic regression model with correlated components. For most natural sampling plans such as sequential or simple random sampling, the conditional distribution p(y | x) is not the same as the regression distribution unless px(y) has independent components. In this sense, most natural sampling schemes involving binary random-effects models are biased. The implications of this formulation for subject-specific and population-averaged procedures are explored.

 

Wednesday, March 5 at 12 pm

Speaker: Sandy L. Zabell, Professor, Department of Statistics and Department of Mathematics, Northwestern University

Title: On Student’s 1908 paper “The probable error of a mean”

Abstract: This month marks the one-hundredth anniversary of the appearance of William Sealey Gosset’s celebrated paper “The probable error of a mean”. Gosset’s elegant contributionrepresented the first in a series of exact, “small-sample” results that were developed by Gosset, Fisher, and others to form a central component of the modern theory of statistical inference. This talk celebrates the centenary of Gosset’s paper by discussing both its background and impact on modern statistical theory and practice.

Wednesday, March 12 at 12 pm

Speaker: Rong Chen, Professor, Department of Statistics, Rutgers University

Title: Constrained Sequential Monte Carlo (CSMC)

Abstract: The sequential Monte Calo (SMC) methodologies have been shown to have great promises in solving very high dimensional and complex problems often encountered in applications such as communication, bioinformatics and financial data analysis. The key to a successful SMC implementation is efficiency, not only in terms of statistical inference accuracy, but also on the computational complexity. Efficiency is directly related to the design of the key components of SMC, including the intermediate distributions, the trial 'growth' distribution, and the resampling method. Many problems in application share a common feature - the target distribution is highly constrained. That is, the target distribution is a truncated distribution on an ill-shaped subspace of a high dimensional space. The constraints, without careful treatments, are a main source of obstacles in successful implementations of SMC. In this talk, we develop a set of algorithms categorized as Constrained Sequential Monte Carlo (CSMC) for solving such problems, including strategies in designing the intermediate distributions, the trial distributions, the resampling steps and Markov moves with CSMC.


Spring 2007

Tuesday, March 27 at 11 am

Speaker: Wei Biao Wu, Assistant Professor, Department of Statistics, The University of Chicago

Title: New Perspectives in the Theory of Time Series

Abstract: I will present a unified framework for a large-sample theory of time series. Topics in classical time series analysis will be revisited and they include the estimation of covariances, spectral densities and long-run variances and linear prediction. I will also talk about high dimensional covariance matrices estimation and inference of mean and quantiles of non-stationary processes. In the second part I will discuss dependence, a fundamental concept in statistics. Our viewpoint provides new insights in the study of complicated random systems. I will also discuss relations with nonlinear system theory, experimental design, information theory, risk-metrics theory and high dimensional covariance matrices estimation.


Tuesday, April 9 at 1 pm

Speaker: Vadim Linetsky, Professor, Department of Industrial Engineering and Management Sciences, Northwestern University

Title: Time-Changed Markov Processes in Asset Pricing

Abstract: The procedure of a time changing stochastic process, going back to S. Bochner, allows one to construct new processes from a given process by running it on a new clock that can itself be a non-decreasing stochastic process (random time). When the process to be time changed is a Markov process and the Laplace transform of the time change is known, there is an explicit representation of the expectation operator of the time changed process in terms of the resolvent of the original Markov process and the Laplace transform of the time change. We use this result to build a rich tool box of analytically tractable asset pricing models in finance that incorporate stochastic volatility, state-dependent jumps, and state-dependent killing rates (or default intensities). Among the resulting models is a new credit-equity model that is an extension of the constant elasticity of variance (CEV) model with stochastic volatility, jumps, and default, as well as extensions of the Cox-Ingersoll-Ross and the Ornstein-Uhlenbeck models with mean-reverting jumps.

 

Tuesday, May 8 at 11 am

Speaker: Hui Xie, Assistant Professor, School of Public Health, University of Illinois at Chicago

Title: A Local Sensitivity Analysis Approach to Longitudinal Non-Gaussian Data with Nonignorable Dropout

Abstract: Longitudinal non-Gaussian data subject to potentially nonignorable dropout is a challenging problem. Very often data contain little information about the dropout mechanism. As a result, frequently an analysis has to rely on some strong but unverifiable assumptions, among which ignorability is a key one. Sensitivity analysis has been advocated to assess the likely effect of alternative assumptions about dropout mechanism on such an analysis. Previously Ma et al. (2005) applied a general index of local sensitivity to nonignorability (ISNI) (Troxel et al. 2004) to measure the sensitivity of MAR estimates to small departures from ignorability for multivariate normal outcomes. In this paper, we extend the ISNI methodology to handle longitudinal non-Gaussian data subject to nonignorable dropout. Specifically we propose to quantify the sensitivity of inferences in the neighborhood of an MAR generalized linear mixed model (GLMM) for longitudinal data. Through a simulation study, we evaluate the performance of the proposed methodology. We then illustrate the methodology in one real example: Smoking Cessation Data.


Tuesday, May 22 at 11 am

Speaker: Hira L. Koul, Professor, Department of Statistics and Probability, Michigan State University

Title: Model Diagnostics via Martingale Transforms

Abstract: Classical problems in statistics are to fit a distribution up to unknown location-scale parameters and to fit a parametric model to the regression-autoregressive function. The first problem is generic to many other statistical models including the celebrated regression and autoregressive and generalize autoregressive conditionally heteroscedastic (ARCH-GARCH) models where one is testing that innovations are from a given distribution. It will be argued that the Khamaladze's martingale transformation of the residual empirical process that yields asymptotically distribution free tests for the one sample location-scale model does the same thing for a parametric heteroscedastic regression model, and ARCH-GARCH models. Analogous tests for the second problem will be also discussed.


Friday, May 25 at 3 pm

Speaker: Cliff Speigelman, Professor, Department of Statistics, Texas A&M University University

Title: Statistical considerations on the process of discovering and validating biomarker candidates using MS platforms
(Joint with Lorenzo J. Vega Montoto and Asokan Mulayath Variyath)

Abstract: Claims have been made that the application of supervised pattern recognition methodology can be used with MS proteomic data to achieve near perfect sensitivity and specificity for detecting early stage cancer. So far those claims have not been verified in part due to the use of less than optimal experimental design, but in the interim significant effort has been spent on proteomic biomarker discovery research (without significant positive results) largely using tandem MS platforms. Underpinning the proteomics studies are several key components including standardization of materials, bioinformatics, reagent development, MS improvements, and statistics. This presentation discusses the NCI CPTAC program generally and a related mouse studies project. Several areas where statistical design of experiment input is present will be discussed.

 

Thursday, June 7 at 11 am

Speaker: Archana Singh, PhD student, Department of Computer Science, University of Tsukuba, Japan and National Food Research Institute, Tsukuba, Japan

Title: Robustness of FDR Method in Brain Mapping Studies using Functional near Infra-Red Spectroscopy
(Joint with Ippeita Dan)

Abstract: Near infrared spectroscopy (NIRS) is an emerging non-invasive technique, which allows monitoring of brain activity in infants, patients, and healthy subjects with a relative ease of application than other techniques, because it is portable and is more permissive to subjects' movements and allows the subjects' brain monitoring in a more eco-friendly setting. It allows simultaneous measurements through many channels ranging from below ten to around two hundred, thus escalating the issue of multiple testing. Till date, only a few studies have considered this issue using Bonferroni correction, which tends to be conservative in spatially correlated fNIRS data. In addition, its power is inversely proportional to the number of channels, which varies among fNIRS experiments depending on selected region of interest (ROI), thereby leading to a subjective inference. This problem may be well circumvented by a more contemporary approach, called false discovery rate (FDR). In this session, I will illustrate how the application of FDR procedures can provide a more objective and also more powerful inference than Bonferroni method in analyzing neuroimaging analysis with real data. In addition, I will present the results from a simulation analysis that show that FDR provides greater sensitivity while maintaining the conventional specificity control.

Winter 2007

Tuesday, January 23 at 2 pm

Speaker: Joel L. Horowitz, Charles E. and Emma H. Morrison Professor of Market Economics, Department of Economics, Northwestern University

Title: Nonparametric Instrumental Variables Estimation of a Quantile Regression Model
(Joint with Sokbae Lee)

Abstract: We consider nonparametric estimation of a regression function that is identified by requiring a specified quantile of the regression "error" conditional on an instrumental variable to be zero. The resulting estimating equation is a nonlinear integral equation of the first kind, which generates an ill-posed-inverse problem. The integral operator and distribution of the instrumental variable are unknown and must be estimated nonparametrically. We show that the estimator is mean-square consistent, derive its rate of convergence in probability, and give conditions under which this rate is optimal in a minimax sense. The results of Monte Carlo experiments show that the estimator behaves well in finite samples.

 

Tuesday, February 6 at 2 pm

Speaker: Leah J. Welty, Assistant Professor, Department of Preventive Medicine, Northwestern University

Title: Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality

Abstract: A distributed lag model (DLM) is a regression model that includes lagged exposure variables as covariates; its corresponding distributed lag (DL) function describes the relationship between the lag and the coefficient of the lagged exposure variable. DLMs have recently been used in environmental epidemiology for quantifying the cumulative effects of weather and air pollution on mortality and morbidity. Standard methods for formulating DLMs include unconstrained, polynomial, and penalized spline DLMs. These methods may fail to take full advantage of prior information about the shape of the DL function for environmental exposures, or for any other exposure with effects that are believed to smoothly approach zero as lag increases, and are therefore at risk of producing sub-optimal estimates.

We propose a Bayesian DLM (BDLM) that incorporates prior knowledge about the shape of the DL function and also allows the degree of smoothness of the DL function to be estimated from the data. In a simulation study, we compare our Bayesian approach with alternative methods that use unconstrained, polynomial and penalized spline DLMs. We also show that BDLMs encompass penalized spline DLMs: under certain assumptions, imposing a prior on the DL coefficients is analogous to smoothing the DL coefficients with a penalty specified by the prior. We apply our BDLM to data from the National Morbidity, Mortality, and Air Pollution Study (NMMAPS) to estimate the short term health effects of particulate matter air pollution on mortality from 1987-2000 for Chicago, Illinois.

 

Tuesday, February 20 at 2 pm

Speaker: Ruey S. Tsay, H.G.B. Alexander Professor of Econometrics and Statistics, Graduate School of Business, The University of Chicago

Title: The Dynamics of Threshold Interest Rate Models
(Joint with Shang C. Chiou)

Abstract: We propose a two-factor arbitrage-free term structure model for interest rates, where the short-term interest rate follows a threshold model with stochastic volatility. Under the proposed model, the number of thresholds is unknown and must be endogenously determined by a model selection procedure. To estimate the proposed model, we develop an efficient Bayesian method by transforming the threshold problem into a structural-break problem. Simulation study shows that the proposed Bayesian method provides an accurate estimation of the thresholds and the associated parameters of the model. In applications, the U.S. data strongly favor the newly proposed model over other models with constant volatility. We further compare the threshold model to its affine counterpart and the Markov-switching model, demonstrating the significant difference of using the thresholds. We find that the threshold model built implies a kinked yield function and can generate an inverted yield curve. In addition, for U.S. monthly bond yields with 11 maturities (1 to 6 months and 1 to 5 years), the threshold model has smaller out-of-sample pricing errors than other models, especially for the long-term yields.

Tuesday, March 6 at 2 pm

Speaker: Ying Wei, Assistant Professor
Department of Biostatistics, Columbia University

Title: A Dynamic Quantile Regression Transformation Model for Longitudinal Data

Abstract: This paper describes a flexible nonparametric quantile regression model for longitudinal data. The basic elements of the model are a time-dependent power transformation on the longitudinal dependent variable and a varying-coefficient model for conditional quantile functions. A two-step estimation procedure is proposed to fit the model, and its consistency property is established. Tuning parameters are chosen with generalized cross validation in conjunction with a Schwarz-type information criterion. The proposed method is illustrated by a data on the time evolution of CD4 cell counts in HIV-1 infected patients under three different treatments. The quantile regression approach for longitudinal data enables construction of pointwise prediction band of individual trajectories without requiring parametric distributional assumptions. This is joint work with Prof. Yunming Mu at University of Texas at A & M.


Fall 2006

Monday, September 25, 2006, 11 am

Speaker: Professor Ji-Ping Wang, Department of Statistics, Northwestern University

Title: Statistical models for nucleosome DNA alignment and linker length prediction in Eukaryotic cells

Abstract: Eukaryotic DNAs exist in a highly compacted form known as chromatin. The nucleosome is the fundamental repeating subunit of chromatin, formed by rapping a short tretch of DNA, 47bp in length, around four pairs of istone proteins. Nucleosome DNA obtained by experiments however varies in ength due to imperfect digestion. We develop a mixture model that haracterizes the known dinucleotide periodicity probabilistically to mprove the alignment of nucleosomal DNAs. To further investigate chromatin tructure, we experimentally cloned and sequenced di-nucleosome sequences rom yeast. Each dinucleosome sequence roughly cover two nucleosomes located toward the two ends) with a linker DNA in between. A HMM model is rained based on the nucleosome sequence alignment for prediction of ucleosome positioning. Results show that Eukaryotic cells do favor periodic inker length in chromatin forming on a roughly 10 bp basis.

Monday, October 9, 2006, 11 am

Speaker: Professor Zhigang Zhang, Department of Statistics, Oklahoma State University

Title: A Class of Transformed Mean Residual Life Models with Censored Survival Data

 

Monday, October 23, 2006, 11 am

Speaker: Dr. Lanju Zhang, Department of Biostatistics, MedImmue Inc.

Title: Response-Adaptive Randomization for Survival Trials: The Parametric Approach

Abstract: Few papers in the literature deal with response-adaptive randomization procedures for survival outcomes and those that do either dichotomize the outcomes or use a nonparametric approach. In this talk, the optimal allocation approach and a parametric response-adaptive randomization procedure are used under exponential and Weibull distributions. The optimal allocations are derived for both distributions and the doubly-adaptive biased coin design is applied to target the optimal allocations. The asymptotic variance of the procedure is obtained for the exponential distribution. The effect of intrinsic delay of survival outcomes is treated. These findings are based on rigorous theory, but also verified by simulation. We illustrate our procedure by redesigning a clinical trial.


Monday, October 30, 2006, 11 am

Speaker: Professor Jan Hannig, Department of Statistics, Colorado State University

Title: Statistical Model for Tracking with Applications

Abstract: We propose a new tracking model that allows for birth, death, splitting and merging of targets. Targets are also allowed to go undetected for several frames. The splitting and merging of targets is a novel addition for a statistically based tracking model. This addition is essential for the tracking of storms, which is the motivation for this work. The utility of this tracking method extends well beyond the tracking of storms. It can be valuable in other tracking applications that have splitting or merging, such as vortexes, radar/ sonar signals, or groups of people. The method assumes that the location of a target behaves like a Gaussian Process when it is observable. A Markov Chain model decides when the birth, death, splitting, or merging of targets takes place. The tracking estimate is achieved by an algorithm that finds the tracks that maximize the conditional density of the unknown variables given the data. The problem of how to quantify the confidence in a tracking estimate is addressed as well. Finally, some sufficient conditions for consistency of this tracking estimate are presented and an almost sure convergence of the tracking estimate to the true path is proved. The practical suitability of this method is then demonstrated on simulated and real data.

Based on a joint work with Thomas C.M. Lee and Curtis B. Storlie.


Monday, November 6, 2006 at 10 am

Speaker: Professor Alfred Rademaker, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University

Title: The design and analysis of cancer clinical trials

Abstract: Cancer clinical trials run the spectrum from Phase 0 feasibility studies to Phase IV surveillance studies.  This talk will focus on statistical methods related to Phase I and Phase II clinical trials. For Phase I, the standard 3+3 design as well as the continual reassessment method will be discussed.  For Phase II studies, the Simon 2-stage design will be described, as well as the use of conditional methods for interval estimation of response rate.  Other design variations, such as randomized Phase II or combined Phase II/Phase III studies, will also be presented.

Spring 2006

Tuesday, March 7, 2006 at 11 am

Speaker: Ms. Cindy Xin Wang, Department of Statistics, Northwestern University

Title: Gatekeeping Procedures Based on Weighted Bonferroni Tests for Multiple Endpoints in Dose Finding Studies

Abstract: In many dose finding studies there are hierarchically ordered endpoints (e.g., primary, secondary, etc.) and a given dose is compared with a control on any endpoint conditional on the tests on the higher-ordered endpoints being significant (serial gatekeeping). It is required to control the familywise error rate at a designated level taking into account multiplicity of tests. We give a closed procedure (Marcus, Pertiz and Gabriel 1976) for this problem by applying the general and flexible tree-structured testing approach to gatekeeping problems developed in Dmitrienko, Wiens, Tamhane and Wang (2006). The proposed procedure uses weighted Bonferroni tests for testing intersection hypotheses. For an easier implementation of this closed procedure, we give an equivalent stepwise procedure that uses penalized Bonferroni tests for all endpoints except the last, for which it uses a penalized Holm test. The penalty charged at each step of testing is inversely proportional to a so-called rejection gain factor, which depends on the number of rejections at earlier steps and the weights assigned to those rejected hypotheses. The method is applied to an diabetes drug trial data with three endpoints. Extensions in which the Bonferroni test is replaced with the Simes or resampling or the Dunnett test are indicated.


Tuesday, April 4, 2006, 11 am

Speaker: Ms. Yang Ge, Department of Statistics, Northwestern University
Title: On Consistency of Bayesian Inference with Mixtures of Logistic
Regression Models
Abstract: This is a theoretical study of the consistency properties of Bayesian inference using mixtures of logistic regression models. When standard logistic regression models are combined in a ‘mixtures of experts’ set-up, a flexible model is formed to model the relationship between a binary (yes-no) response y and a vector of predictors x. Bayesian inference conditional on the observed data can then be used for regression and classification. This study gives conditions on choosing the number of experts (i.e., number of mixing components) k, or choosing a prior distribution for k, so that Bayesian inference is ‘consistent’, in the sense of ‘often approximating’ the underlying true relationship between y and x. The resulting classification rule is also ‘consistent’, in the sense of having near-optimal performance in classification. We show these desirable consistency properties with a nonstochastic k growing slowly with the sample size n of the observed data, or with a random k that takes large values with nonzero but small probabilities.


Monday, April 24, 2006 at 3:30 pm

Speaker: Professor Mohsen Pourahmadi, Division of Statistics, Northern Illinois University

Title: Generalized Linear Models for the Covariance Matrix of Longitudinal Data

Abstract: We survey the progress made in modelling covariance matrices from the perspective of generalized linear models (GLM) and show how one can move beyond the use of the identity and logarithmic link functions, and prespecified structures. Observing that most time-domain models (ARMA, state-space,....) in time series analysis are means to diagonalize a Toeplitz covariance matrix via a unit lower triangular matrix (Cholesky decomposition), we discuss the distinguished role of the Cholesky decomposition in providing a systematic and data-based procedure for formulating and fitting parsimonious models for general covariance matrices guaranteeing the positive-definiteness of the estimates. Pulling together some techniques from regression and time series analyses provide the necessary tools for the procedure which reduces the unintuitive task of modelling covariance matrices to that of a sequence of regression models. The procedure is illustrated using a real longitudinal dataset.Once a bona fide GLM framework for modelling covariances is found, its Bayesian, nonparametric, generalized additive and other extensions can be developed in direct analogy with the respective extensions of the traditional GLM.


Tuesday, May 2, 2006 at 11 am

Speaker: Professor Hakan Demirtas, Division of Epidemiology and Biostatistics, University of Illinois at Chicago

Title: Multiple imputation under Bayesianly smoothed random-coefficient hierarchical pattern-mixture models for nonignorably missing longitudinal data
Abstract: Conventional pattern-mixture models can be highly sensitive to model misspecification. In many longitudinal studies, where the nature of the drop-out and the form of the population model are unknown, interval estimates from any single pattern-mixture model may suffer from undercoverage, because uncertainty about model misspecification is not taken
into account. In this talk, I will introduce a new class of Bayesian random coefficient pattern-mixture models to address potentially non-ignorable drop-out. Instead of imposing hard equality constraints to overcome inherent inestimability problems in pattern-mixture models, I propose to smooth the polynomial coefficient estimates across patterns using a hierarchical
Bayesian model that allows random variation across groups. Using real and simulated data, I show that multiple imputation under a three-level linear mixed-effects model which accommodates a random level due to drop-out groups can be an effective method to deal with non-ignorable drop-out by allowing model uncertainty to be incorporated into the imputation process.

Papers that are relevant to this talk:

Demirtas, H. & Schafer, J.L. (2003). On the performance of
random-coefficient pattern-mixture models for non-ignorable drop-out.
Statistics in Medicine, 22, 2553-2575.

Demirtas, H.  (2004). Modeling incomplete longitudinal data. Journal of
Modern Applied Statistical Methods, Volume 3, No 2, 305-321.

Demirtas, H. (2005). Multiple imputation under Bayesianly smoothed
pattern-mixture models for non-ignorable drop-out. Statistics in Medicine,
24, 2345-2363.

Demirtas, H.  (2005). Bayesian analysis of hierarchical pattern-mixture
models for clinical trials data with attrition and comparisons to commonly
used ad-hoc and model-based approaches.  Journal of Biopharmaceutical
Statistics, Volume 15, Issue 3, 383-402.


Tuesday, May 9, 2006 at 11 am

Speaker: Professor Peter Song, Department of Statistics and Actuarial Science,

University of Waterloo

Title: Maximization by Parts in Likelihood Inference
Abstract: In this talk I will present a new algorithm for solving a score equation for the maximum likelihood estimate in certain problems of practical interest. The method circumvents the need to compute second order derivatives of the full likelihood function. It exploits the structure of certain models that yield a natural decomposition of a very complicated likelihood function. In this decomposition, the first part is a log likelihood from a simply analyzed model and the second part is used to update estimates from the first. Convergence properties of this iterative (fixed point) algorithm are examined and asymptotics are derived for estimators obtained by using only a finite number of iterations. I will illustrate several examples in the presentation, including multivariate Gaussian copula models, nonnormal random effects models, generalized linear mixed models, and state space models. Properties of the algorithm and of estimators are discussed in detail via simulation studies on a bivariate copula model and a nonnormal linear random effects model.


Tuesday, May 16, 2006 at 11 am

Speaker: Professor Edward C. Malthouse, Department of Integrated Marketing Communications, Medill School, Northwestern University

Title: Conceptualizing and Measuring Media Engagement and its Effects

Abstract: We propose measuring the latent construct “media engagement” with a third-order confirmatory factor analysis model.  The approach is tested using five large reader surveys of 100 and 50 newspapers, 100 magazines, and 39 and 8 media web sites.  Over 400 qualitative interviews generated samples of items (questions) from the construct domain for the three media platforms.  Consumer surveys measured the items on samples of readers.  We used exploratory factor analysis (EFA) to develop scales measuring different dimensions of engagement and confirmatory factor analysis (CFA) to purify the scales further.  Additional EFA was used to identify higher-order factors, which were then tested with confirmatory models.  We contrast the higher-order factor structure for the three media platforms.  Predictive validity is assessed by relating the engagement factors to outcome measures such as usage with random coefficient models and ridge regression.  Three quasi-experiments evaluate the effect of engagement on advertising effectiveness.

(This is a joint research with Bobby Calder, Marketing Department, Kellogg.)


Tuesday, May 30, 2006 at 11 am

Speaker: Professor Torben G. Andersen, Kellogg School of Management, Northwestern University and NBER

Title: Continuous-Time Models, Realized Volatilities, and Testable Distributional Implications for Daily Stock Returns

Abstract: We provide a framework for analyzing and understanding daily return distributions within the context of traditional continuous-time asset price processes. We develop a sequence of simple-to-implement distributional tests from transformed inter-daily returns. They hinge on the availability of intraday data for construction of nonparametric realized variation measures and jump detection statistics. Each step speaks to key features of the process underlying the discretely observed prices and should help in developing empirically more realistic models. For thirty large stocks, we find that time-varying diffusive volatility, jumps and leverage effects are all critical in order to describe the dynamic dependencies in the observed prices.


Coauthors:

Tim Bollerslev, Dept. of Economics and Fuqua School, Duke University and NBER

Per H. Frederiksen, Jyske Bank, Denmark

Morten Ø. Nielsen, Dept. of Economics, Cornell University

Fall 2005

Monday, October 17, 2005 at 11 am

Speaker: Professor Denise Scholtens, Department of Preventive Medicine, Northwestern University

Title: Local modeling of global interactome networks

Abstract: Accurate systems biology modeling requires a complete catalog of protein complexes and their constituent proteins. We discuss a graph theoretic/statistical algorithm for local dynamic modeling of protein complexes using data from affinity purification-mass spectrometry experiments. The algorithm readily accommodates multicomplex membership by individual proteins and dynamic complex composition, two biological realities not accounted for in existing topological descriptions of the overall protein network. A penalized likelihood approach guides the protein complex modeling algorithm. With an accurate complex membership catalog in place, systems biology can proceed with greater precision.


Monday, October 31, 2005 at 11 am

Speaker: Professor Hua Yun Chen, Department of Epidemiology & Biostatistics, University of Illinois at Chicago

Title: Approximation to locally semiparametric efficient scores in missing data problems through likelihood robustification

Abstract: In parametric/semiparametric models with missing data, the efficient estimator often cannot be obtained without additional model assumptions even if the efficient estimator has a simple form when no missing data are involved. Robins et al. proposed to find the locally efficient estimator as a compromise and showed that the locally efficient estimator have the doubly robust property when the missing data are missing at random in Rubin's sense. In practice, the approach proposed by Robins et al. to finding a locally efficient estimator can be very challenge to implement. We propose an alternative representation of the efficient score through likelihood robustification. The proposed representation is straightforward to obtain, can be applied to missing data with arbitrary missing patterns, and is amenable to computing the locally efficient score. The estimator based on the proposed representation has the doubly robust property when missing data are MAR, and only requires correct specification of the missing data mechanism model for consistency when missing data are nonignorable. Estimation and inferences on the parameters are proposed. Applications of the proposed method are illustrated by examples. The performance of the approach is examined by a simulation study.


Monday, November 14, 2005 at 11 am

Dr. Guei-Feng (Cindy) Tsai, Department of Statistics, Northwestern University

Title: Semi-nonparametric Models and Inference for High Dimensional Microarray Data

Abstract: We develop a new approach to analyze high dimensional cell-cycle microarray data with no replicates. There are two kinds of correlations for cell-cycle microarray data. Measurements are correlated within a gene, and measurements are also correlated between genes since some genes may be biologically related. The proposed procedure combines a classification method, the quadratic inference function method and nonparametric techniques for complex high dimensional data. We first perform a gene classifying analysis to classify genes into classes with similar cell-cycle patterns, including a class with no cell-cycle phenomena at all. We use genes within the same class as pseudo-replicates to build nonparametric models and inference functions. In order to incorporate correlation of longitudinal measurements, the quadratic inference function method is also applied. This approach allows us to perform chi-squared tests for testing whether the coefficients are time varying or not. This also allows us to determine whether certain genes regulate cell cycles. A real data example on cell-cycle microarray data as well as simulations are illustrated.


Friday, December 9, 2005 at 11 am

Speaker: Dr. Alex Dmitrienko, Eli Lilly

Title: Branching tests in clinical trials with multiple objectives

Abstract: This talk discusses branching multiple tests with clinical trial applications. Branching tests arise in clinical trials with hierarchically ordered multiple objectives, for example, in the context of multiple dose-control tests with logical restrictions or analysis of multiple endpoints. The proposed branching approach is based on the principle of closed testing and generalizes the serial and parallel gatekeeping approaches. The branching testing methodology will be illustrated using a clinical trial with multiple endpoints (primary, secondary and tertiary) and multiple objectives (superiority and non-inferiority testing) as well as a dose-finding trial with multiple endpoints.

 

  • Department of Statistics
  • 2006 Sheridan Rd, Evanston, IL 60208
  • (847) 491-3974
  • FAX: (847) 491-4939
  • stats@northwestern.edu