Skip to main content

Winter 2019 Seminar Series

Wednesday, February 6, 2019

On the Restricted Mean Survival Time for Survival Data Analysis

Time: 11:00 a.m.

Speaker: Lihui Zhao, Associate Professor, Preventive Medicine; Feinberg School of Medicine, Northwestern University

Place: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Abstract: In a longitudinal study with the time to a specific event as the primary end point, standard methods of summarizing the treatment difference are based on Kaplan-Meier curves, the logrank test and the point and interval estimates via Cox's proportional hazards model. However, when the proportional hazards assumption is violated, the logrank test may not have sufficient power to detect the difference between two event time distributions, and the resulting hazard ratio estimate is difficult, if not impossible, to interpret as a treatment contrast. On the other hand, the restricted mean survival time (RMST) is an easily interpretable, clinically meaningful summary of the survival function in the presence of censoring. The RMST is the mean survival time of all subjects in the study population followed up to a time point t and can be estimated consistently by the area under the Kaplan-Meier curve over [0, t]. In this research, we discuss the extension of RMST in the more general setting of multiple time-to-event endpoints, which includes classical competing risks and semi-competing risks. The methods are illustrated with the data from two clinical trials.

Wednesday, February 13, 2019

Score-Matching Representative Approach for Big Data Analysis with Generalized Linear Models

Time: Due to the extremely cold temperatures predicted for Wednesday, January 30 we are cancelling this talk. It will be rescheduled for later date. Thank you.

Speaker: Keren Li, Postdoctoral Fellow, NSF-Simons Center for Quantitative Biology, Northwestern University

Place: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Abstract: We propose a fast and efficient strategy, called the representative approach, for big data analysis with linear models and generalized linear models. With a given partition of big dataset, this approach constructs a representative data point for each data block and fits the target model using the representative dataset. In terms of time complexity, it is as fast as the subsampling approaches in the literature. As for efficiency, its accuracy in estimating parameters is better than the divide-and-conquer method.

With comprehensive simulation studies and theoretical justifications, we recommend two representative approaches. For linear models or generalized linear models with a flat inverse link function and moderate coefficients of continuous variables, we recommend mean representatives (MR). For other cases, we recommend score-matching representatives (SMR).

As an illustrative application to the Airline on-time performance data, MR and SMR are as good as the full data estimate when available. Furthermore, the proposed representative strategy is ideal for analyzing massive data dispersed over a network of interconnected computers.

Wednesday, February 20, 2019

PANDA: AdaPtive Noisy Data Augmentation for Regularization of Undirected Graphical Models

Time: 11:00 a.m.

Speaker: Fang Liu, Associate Professor, Department of Applied and Computational Mathematics and Statistics, University of Notre Dame

Place: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Abstract: We propose PANDA, an AdaPtive Noise Augmentation technique to regularize estimating and constructing single and multiple undirected graphical models (UGMs). PANDA iteratively solves MLEs given noise augmented data in the regression-based framework until convergence. The noises can be designed to achieve various regularization effects on graph estimation, such as lasso, group lasso, ridge and elastic net, among others. When PANDA is used for constructing multiple graph simultaneously, two types of noises are augmented. The first type is to regularize the estimation of each graph while the second type promotes either the structural similarities (joint group lasso), or numerical similarities (joint fused ridge), among the edges in the same position across multiple graphs. We establish theoretically that the noise-augmented loss functions and its minimizer converge almost surely to the expected penalized loss function and its minimizer, respectively. We also derive the asymptotic distributions and inferences for the regularized regression coefficients through PANDA in the setting of GLMs. PANDA can be easily programmed in any standard software without resorting to complicated optimization techniques. We apply PANDA to the autism spectrum disorder data to construct a mixed-node graph, and a real-life lung cancer microarray data to simultaneously construct four protein networks.

Wednesday, February 27, 2019

Simultaneous Estimation and Variable Selection for Incomplete Event History Data

Time: 11:00 a.m.

Speaker: Jianguo (Tony) Sun, Professor of Statistics, University of Missouri

Place: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Abstract: This talk discusses regression analysis of incomplete event history
data with the focus on simultaneous estimation and variable selection.
Such data commonly occur in many areas such as medical studies and social sciences,
and a great deal of literature has been established for their analysis except for
the variable selection problem.  To address this, we will present
a new method, which will be referred to as a broken adaptive ridge regression
approach, and establish its asymptotic properties including the oracle property
and clustering effect.  Numerical studies suggest that the proposed method
performs well in practical situations and better than the existing methods.
An application will be presented.

Wednesday, March 13, 2019

Using Gene Expression to Tell Time

Time: 11:00 a.m.

Speaker: Rosemary Braun, Assistant Professor of Preventive Medicine (Biostatistics) and McCormick School of Engineering, Northwestern University

Place: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Abstract: Determining the state of an individual’s internal physiological clock has important implications for precision medicine, from diagnosing neurological disorders to optimizing drug delivery.  To be useful, such a test must be accurate, minimally burdensome to the patient, and robust to differences in patient protocols, sample collection, and assay technologies.  In this talk I will present TimeSignature, a novel machine-learning algorithm to estimate circadian variables from gene expression in human blood.  By making use of the high dimensionality of the gene expression measurements and exploiting the periodic nature of the circadian variables we wish to predict, TimeSignature can be applied to samples from disparate studies and yield highly accurate results despite systematic differences between the studies.  This generalizability is unique amongst expression-based predictors and addresses a major challenge in the development of robust biomarker tests.  This talk will detail the method, present several applications, and discuss our recent work to extend it.

Back to top