Winter 2018 Seminar Series

Wednesday, January 24, 2018

Adversarial Machine Learning - Big Data Meets Cyber Security

Time: 11:00 a.m.

Speaker: Bowei Xi, Associate Professor of Statistics, Department of Statistics, Purdue University

Place: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Abstract: As more and more cyber security incident data ranging from systems logs to vulnerability scan results are collected, machine learning techniques are becoming an essential tool for real-world cyber security applications. One of the most important differences between cyber security and many other applications is the existence of malicious adversaries that actively adapt their behavior to make the existing learning models ineffective. Unfortunately, traditional learning techniques are insufficient to handle such adversarial problems directly. The adversaries adapt to the defender's reactions, and learning algorithms constructed based on the current training dataset degrades quickly. To address these concerns, we develop a game theoretic framework to model the sequential actions of the adversary and the defender, while both parties try to maximize their utilities. We also develop an adversarial support vector machine method and an adversarial clustering algorithm to defend against active adversaries.

 

Wednesday, February 7, 2018

A Sparse Clustering Algorithm for Identifying Cluster Changes Across Conditions with Applications in Single-cell RNA-sequencing Data

Time: 11:00 a.m.

Speaker: Jun Li, Associate Professor, Department of Applied and Computational Mathematics and Statistics, University of Notre Dame

Place: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Abstract: Clustering analysis, in its traditional setting, identifies groupings of samples from a single population/condition. We consider a different setting when the data available are samples from two different conditions, such as cells before and after drug treatment. Cell types in cell populations change as the condition changes: some cell types die out, new cell types may emerge, and surviving cell types evolve to adapt to the new condition. Using single-cell RNA-sequencing data that measure the gene expression of cells before and after the condition change, we propose an algorithm, SparseDC, which identifies cell types, traces their changes across conditions, and identifies genes which are marker genes for these changes. By solving a unified optimization problem, SparseDC completes all three tasks simultaneously. As a general algorithm that detects shared/distinct clusters for two groups of samples, SparseDC can be applied to problems outside the field of biology.

 

Wednesday, February 14, 2018

Statistical Learning for Time Dependent Data

Time: 11:00 a.m.

Speaker: Likai Chen, PhD candidate, Department of Statistics, University of Chicago

Place: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Abstract: In statistical learning theory, researchers primarily deal with independent data and there is a huge literature. In comparison, it has been much less investigated for time dependent data, which are commonly encountered in economics, engineering, finance, geography, physics and other fields. In this talk, we focuses on concentration inequalities for suprema of empirical processes which plays a fundamental role in the statistical learning theory. We derive a Gaussian approximation and an upper bound for the tail probability of the suprema under conditions on the size of the function class, the sample size, temporal dependence and the moment conditions of the underlying time series. Due to the dependence and heavy-tailness, our tail probability bound is substantially different from those classical exponential bounds obtained under the independence assumption in that it involves an extra polynomial decaying term. We allow both short- and long-range dependent processes, where the long-range dependence case has never been previously explored. We showed our tail probability inequality is sharp up to a multiplicative constant.  These bounds work as theoretical guarantees for statistical learning applications under dependence. 

 

Wednesday, February 28, 2018

From Integrative Genomics to Therapeutic Discovery in Cancer Immunotherapies

Time: 11:00 a.m.

Speaker: Riyue Bao, Research Assistant Professor, Center for Research Informatics & Department of Pediatrics, University of Chicago

Place: Basement classroom - B02, Department of Statistics, 2006 Sheridan Road

Abstract: Anti-PD1-based immunotherapy has had a major impact on treatment of multiple cancer histologies. However, only a subset of patients responds to these treatments, and a beneficial outcome is frequently observed in patients with a spontaneous pre-existing T-cell response against their tumor. Therefore, identifying variables that could contribute to the differences in patients’ response and the underlying mechanisms will enable development of therapeutic solutions for patients lacking a beneficial tumor microenvironment. We use genomics approaches including RNAseq, whole exome sequencing, 16S ribosomal RNA amplicon sequencing, and metagenomic shotgun sequencing to discover (1) tumor-intrinsic oncogenic pathways that drive immune exclusion in non-responders (2) gut microbiome associated with anti-PD1 efficacy in metastatic melanoma patients (3) intratumor microbiome associated with survival in neuroblastoma patients, towards the ultimate goal of developing immune-potentiating interventions in combination with checkpoint inhibitors for improved clinical outcome.