Skip to main content

Spring 2024 Seminar Series

Department of Statistics and Data Science 2023-2024 Seminar Series - Spring 2024

The 2023-2024 Seminar Series will primarily be in person, but some talks will be offered virtually using Zoom. Talks that are virtual will be clearly designated and registration for the Zoom talks will be required to receive the zoom link for the event. Please email Kisa Kowal at k-kowal@northwestern.edu if you have questions. 

Seminar Series talks are free and open to faculty, graduate students, and advanced undergraduate students

 

Large Language Models to understand biomedical text

Friday, April 5, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Online - talk will be presented on Zoom, registration is required to receive the link (see below)

Speaker: Yuan Luo, Director, Institute for Artificial Intelligence in Medicine - Center for Collaborative AI in Healthcare; Associate Professor of Preventive Medicine (Health and Biomedical Informatics), McCormick School of Engineering and Pediatrics

Abstract: Large Language Models such as transformer-based models have been wildly successful in setting state-of-the-art benchmarks on a broad range of natural language processing (NLP) tasks, including question answering (QA), document classification, machine translation, text summarization, and others. Recently, the release of OpenAI’s free tool ChatGPT demonstrated the ability of large language models to generate content, with anticipations on its possible uses and potential controversies. The ethical and acceptable boundaries of ChatGPT’s use in scientific writing remain unclear. I will talk about our research on exploring large language models, e.g., long-sequence transformers and GPT style models, in the clinical and biomedical domains. Our work examines the adaptability of these large language models to a series of clinical NLP tasks including clinical inferencing, biomedical named entity recognition, EHR based question answering, interoperability etc.

Register here

TBA

Friday, April 12, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Andrea Montanari, Robert and Barbara Kleist Professor in the School of Engineering and Professor, Department of Electrical Engineering, Department of Statistics, Stanford University  

Abstract: TBA

This talk will be given in person on Northwestern's Evanston campus at the location listed above.

TBA

Friday, April 19, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: 

Abstract: 

This talk will be given in person on Northwestern's Evanston campus at the location listed above.

t-SNE and Local 1D Structures

Friday, April 26, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Anna Ma, Assistant Professor, UC Irvine, Department of Mathematics.

Abstract: Data visualization is a vital task in data exploration, especially in the presence of large-scale data sets. Rudimentary approaches for data visualization, such as scatter plots, histograms, and pie charts, can only represent a small number (typically, 1-2) of features at a time. Furthermore, such methods often lack the sophistication to capture higher dimensional structures in their representations. Fortunately, new approaches to high-dimensional data visualization, such as the t-distributed stochastic neighbor embedding (t-SNE) algorithm, have been proposed in recent years. One of t-SNE’s more interesting properties is its tendency to preserve local linear data structures while successfully representing clusterable data. Despite its wide success, there is limited mathematical understanding of the algorithm. In this talk, we will discuss the t-SNE algorithm and present theoretical guarantees for t-SNE’s output to answer the question: does t-SNE preserve 1-dimensional curves?

The work presented is joint with Kat Dover and Roman Vershynin.

This talk will be given in person on Northwestern's Evanston campus at the location listed above.

https://planitpurple.northwestern.edu/event/612111

TBA

Friday, May 3, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker:  Andrew Gelman, Professor of Statistics and Political Science, Columbia University

Abstract: TBA

This talk will be given in person on Northwestern's Evanston campus at the location listed above.


T-Stochastic Graphs

Friday, May 10, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker:  Karl Rohe, Professor of Statistics, University of Wisconsin–Madison

Abstract: Previous statistical approaches to hierarchical clustering for social network analysis all construct an "ultrametric" hierarchy. While the assumption of ultrametricity has been discussed and studied in the phylogenetics literature, it has not yet been acknowledged in the social network literature. We show that "non-ultrametric structure" in the network introduces significant instabilities in the existing top-down recovery algorithms. To address this issue, we introduce an instability diagnostic plot and use it to examine a collection of empirical networks. These networks appear to violate the "ultrametric" assumption. We propose a deceptively simple class of probabilistic models called T-Stochastic Graphs which impose no topological restrictions on the latent hierarchy. Perhaps surprisingly, this model generalizes the previous models.  To illustrate this model, we propose six alternative forms of hierarchical network models and then show that all six are equivalent to the T-Stochastic Graph model. These alternative models motivate a novel approach to hierarchical clustering that combines spectral techniques with the well-known Neighbor-Joining algorithm from phylogenetic reconstruction. We prove this spectral approach is statistically consistent.

This talk will be given in person on Northwestern's Evanston campus at the location listed above.

https://planitpurple.northwestern.edu/event/612112

An Automatic Finite-Sample Robustness Check: Can Dropping a Little Data Change Conclusions?

Friday, May 17, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Tamara Broderick, Associate Professor, Department of Electrical Engineering and Computer Science, MIT

Abstract: Practitioners will often analyze a data sample with the goal of applying any conclusions to a new population. For instance, if economists conclude microcredit is effective at alleviating poverty based on observed data, policymakers might decide to distribute microcredit in other locations or future years. Typically, the original data is not a perfect random sample from the population where policy is applied -- but researchers might feel comfortable generalizing anyway so long as deviations from random sampling are small, and the corresponding impact on conclusions is small as well. Conversely, researchers might worry if a very small proportion of the data sample was instrumental to the original conclusion. So we propose a method to assess the sensitivity of statistical conclusions to the removal of a very small fraction of the data set. Manually checking all small data subsets is computationally infeasible, so we propose an approximation based on the classical influence function. Our method is automatically computable for common estimators. We provide finite-sample error bounds on approximation performance and a low-cost exact lower bound on sensitivity. We find that sensitivity is driven by a signal-to-noise ratio in the inference problem, does not disappear asymptotically, and is not decided by misspecification. Empirically we find that many data analyses are robust, but the conclusions of several influential economics papers can be changed by removing (much) less than 1% of the data.

This talk will be given in person on Northwestern's Evanston campus at the location listed above.

https://planitpurple.northwestern.edu/event/612113

TBA

Friday, May 24, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: 

Abstract: 

This talk will be given in person on Northwestern's Evanston campus at the location listed above.