Fall 2017 Seminar Series

Wednesday, October 18, 2017

CompareML: Structuring Machine Learning Research in Data Driven Science

Time: 11:00 a.m.

Speaker: Victoria Stodden, Associate Professor, University of Illinois at Urbana-Champaign

Place: 617 Library Place

Abstract: Statistical discovery is increasingly taking place using data not collected by the discoverers and often completely in silico. This calls on new considerations of methods and computational infrastructure that support statistical pipelines. In this talk I present a novel framework for statistical analysis of "organic data" as opposed to "designed data" (Kreuter & Peng 2014) called CompareML that permits the direct comparison of findings that purport to answer the same statistical question. I will illustrate that such computational frameworks are crucial to reproducible science by way of an example from genomics [acute leukemia (Golub et al 1999)] where traditional approaches (surprisingly) fail at scale.


Wednesday, October 25, 2017

Robust Simultaneous Inference for the Mean Function of Functional Data

Time: 11:00 a.m.

Speaker: Nedret Billor, Professor, Department of Mathematics and Statistics, Auburn University

Place: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Abstract: A substantial amount of attention has been drawn to the field of functional data analysis since many scientific fields involving applied statistics have started measuring and recording massive continuous data due to rapid technological advancements. While the study of the probabilistic tools for infinite dimensional variables started at the beginning of the 20th century, research on the development of statistical methods for functional data started only in the last two decades.   Further, these developed methods mainly require homogeneity of data, namely free of outliers.

However, functional data present new challenges when studying outlier contaminated datasets. In this talk, we will discuss robust simultaneous inference for the mean function based on polynomial splines, together with robust simultaneous confidence bands, and the asymptotic properties of the proposed robust estimator.  The robust simultaneous confidence band is also extended to the difference of the mean functions of two populations. The performance of the proposed robust methods and their robustness are demonstrated by an extensive simulation study and real data examples.

Wednesday, November 1, 2017

Data Science Applications in Genomics and Precision Medicine

Time: 11:00 a.m.

Speaker: Ramana V Davuluri, PhD. Professor of Preventive Medicine, Health and Biomedical Informatics, Northwestern University, Feinberg School of Medicine

Place: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Abstract: Given that the biomedical research is rapidly acquiring the character of BIG DATA, with rapid accumulation of datasets on genes, proteins and other molecules, data science applications are increasingly playing an important role in the analysis and interpretation of these large files ranging from discovery phase to clinical applications. Bioinformaticians have been successfully mastered the application of data science skills since the early days of human genome sequencing; for example, prediction of genes in the assembled genomes or genomic contigs. I will discuss some of those applications our group has successfully applied in the prediction of (a) gene promoters in the human genome (b) gene regulatory signals that are altered in breast cancer, (c) molecular grouping of brain tumor patients and (d) functional roles of germline single nucleotide variants that are associated with prostate cancer. I will also discuss various data science issues; for example –  (a) processing of unstructured data to prepare the data matrices; (b) clustering of samples based on gene expression data; (c) feature selection; and (d) classification algorithms, etc.