Skip to main content

Fall 2018 Seminar Series

Wednesday, October 10, 2018

Factor-Driven Two-Regime Regression

Time: 11:00 a.m.

Speaker: Yuan Liao, Associate Professor of Economics, Rutgers University

Place: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Abstract: We propose a two-regime regression model, where the linear regression coefficient has a structural break, driven by a vector of possibly unobservable factors, and the unobserved factors can be estimated from a large dataset. The model brings new phenomena to both computations and asymptotics.  We show that the computation can be reformulated as a mixed integer optimization. We also observe that the rate of convergence and the asymptotic distribution of the estimators are continuously affected, as the unobserved factors are estimated more accurately.

Wednesday, October 17, 2018

Deep mining of genome sequencing data

Time: 11:00 a.m.

Speaker: Xinkun ‘Sequen’ Wang, Director of NUSeq Core Facility; Center for Genetic Medicine faculty member; Research Associate Professor, Dept. of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University

Place: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Abstract: High-throughput genome sequencing has rapidly revolutionized biomedical research. Oceans of next-gen sequencing data have been produced, and more data is being generated at a faster pace. How to extract information and generate new biological knowledge from NGS data offers unprecedented challenges. Compared to the current capability in data production, our ability in data mining needs a significant overhaul. Solving key biological questions and improving health care will rely on efficient mining of genome sequencing data.

Wednesday, October 24, 2018

A classification method for predicting type 2 diabetes mellitus using sequencing data

Time: 11:00 a.m.

Speaker: Haiyan Wang, Professor, Department of Statistics, Kansas state University

Place: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Abstract: Type 2 diabetes mellitus (T2DM) affects the lives of millions of people through its life-altering complications. Current methods of identifying genetic polymorphisms responsible for T2DM face the limitation of sample size and low accuracy at the population level (AUC of 0.68 or below). This research presents a method to identify subtle effects of genetic variants using whole genome sequencing data and improve prediction accuracy of T2DM at the population level. To achieve this, a new feature selection procedure and a classier were proposed. The method involves (1) first applying sparse principal component analysis (PCA) to genotype data to obtain orthogonal features; (2) using SNP-specific regularization parameters to reduce the false positive rate of feature selection; (3) verifying feature relevance through Lasso penalized logistic regression in conjunction with sparse PCA. After applying to a dataset containing 625,597 SNPs and 23 environmental variables from each of 3,326 humans, the method identified over 450 genetic variants that each have subtle effects on T2DM prediction. These variants, in conjunction with clinical characteristics, led to greatly improved prediction accuracy (AUC 0.79) for new patients at the population level. The proposed method also has the advantage of computational efficiency, which is 20 times faster than Random Forest classifier, and thus provides a promising tool for large-scale genome-wide association studies.

Joint work with Luann C Jung at Massachusetts Institute of Technology, Xukun Li and Cen Wu at Kansas State University.

Wednesday, November 14, 2018

Quantile Lost Lifespan

Time: 11:00 a.m.

Speaker: Lauren Balmert, Assistant Professor of Preventive Medicine (Biostatistics), Biostatistics Collaboration Center (BCC), Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University

Place: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Abstract: The concept of lost lifespan, or life lost, has recently emerged as a new summary measure for cumulative information inherent in time-to-event data. This summary measure provides several benets over the traditional methods, including less sensitivity to heavy censoring and more straightforward clinical interpretation. However, there exists no systematic modeling approach on the quantile lost lifespan in the literature and hence no time-to-event data have been analyzed as such. In this paper, we propose a novel quantile regression model on the quantiles of the distribution of the lost lifespan under right censoring. The consistency and asymptotic normality of the regression parameters are established. To avoid estimation of the probability density function of the lost lifespan distribution under censoring, the estimating equation for the quantile lost lifespan is directly used to construct the test statistics for the regression parameters. To test a subset of the regression parameters, a perturbation method is employed from which confidence intervals based on a normal approximation can be constructed. Simulation results are presented to validate the finite sample properties of the proposed estimators and test statistics. The proposed method is illustrated with a real dataset from a clinical trial on cancer.

Authors: Lauren C. Balmert, Ruosha Li, Limin Peng, and Jong-Hyeon Jeong

Wednesday, November 28, 2018

A More Powerful High-Dimensional Mean Test Using Projections

Time: 11:00 a.m.

Speaker:  Yuan Huang, Assistant Professor of Biostatistics, University of Iowa

Place: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Abstract: Testing the population mean is fundamental in statistical inference. When the dimensionality is high, the traditional Hotelling’s T2 test becomes practically infeasible. In this talk, we propose a new testing method for high-dimensional mean vectors. The new method projects the original sample to a lower-dimensional space and carries out a test with the projected sample. We derive the theoretical optimal direction with which the projection test possesses the best power under alternatives. With an estimation procedure using the splitting strategy, the resulting test is an exact t-test under the normality assumption and an asymptotic X2-test with one degree of freedom without the normality assumption. Monte Carlo simulation studies show that the new test can be much more powerful than the existing methods and meanwhile retains the Type I error. The promising performance of the new test is further illustrated by a motivating real data example.

Back to top