Spring 2026 Seminar Series
Department of Statistics and Data Science 2025-2026 Seminar Series - Spring 2026
The 2025-2026 Seminar Series will primarily be in person, but some talks will be offered virtually using Zoom. Talks that are virtual will be clearly designated and registration for the Zoom talks will be required to receive the zoom link for the event. Please email Kisa Kowal at k-kowal@northwestern.edu if you have questions.
Seminar Series talks are free and open to faculty, graduate students, and advanced undergraduate students
Feature learning and "the linear representation hypothesis" for monitoring and steering LLMs
Friday, April 17, 2026
Time: 11:00 a.m. to 12:00 p.m. central time
Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)
Speaker: Mikhail Belkin, HDSI Endowed Chair Professor in AI, Halicioglu Data Science Institute, University of California San Diego
Abstract: A trained Large Language Model (LLM) contains much of human knowledge. Yet, it is difficult to gauge the extent or accuracy of that knowledge, as LLMs do not always ``know what they know'' and may even be unintentionally or actively misleading. In this talk I will discuss feature learning introducing Recursive Feature Machines—a powerful method originally designed for extracting relevant features from tabular data. I will demonstrate how this technique enables us to detect and precisely guide LLM behaviors toward almost any desired concept by manipulating a single fixed vector in the LLM activation space.
This talk will be given in person on Northwestern's Evanston campus.
planitpurple.northwestern.edu/event/639970
Representation learning with iteratively reweighted kernel machines
Friday, May 1, 2026
Time: 11:00 a.m. to 12:00 p.m. central time
Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)
Speaker: Dmitriy Drusvyatskiy, Professor and HDSI Faculty Fellow, Halıcıoğlu Data Science Institute (HDSI), University of California San Diego
Abstract: The impressive practical performance of neural networks is often attributed to their ability to learn low-dimensional data representations and hierarchical structure directly from data. In this work, we argue that these two phenomena are not unique to neural networks, and surprisingly can be elicited from classical kernel methods. Namely, we show that the derivative of the kernel predictor can detect the influential coordinates with low sample complexity. Moreover, by iteratively using the derivatives to reweight the data and retrain kernel machines, one is able to efficiently learn hierarchical polynomials in a high dimensionsional regime. I will illustrate the developed theory with numerical experiments on both synthetic and real data sets.
This talk will be given in person on Northwestern's Evanston campus.
planitpurple.northwestern.edu/event/640122
Words matter: Multimodal Suicide Risk Prediction from Veterans Health Administration Clinical Notes
Friday, May 8, 2026
Time: 11:00 a.m. to 12:00 p.m. central time
Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)
Speaker: Jiang Gui, Associate Professor, Biomedical Data Science, Dartmouth College
Abstract: In this talk, we demonstrate that integrating unstructured clinical narratives with structured electronic health record (EHR) data enhances suicide risk prediction for U.S. Veterans, outperforming models that rely on structured data alone. By analyzing a retrospective matched case-control cohort of 4,584 Veterans who died by suicide and 22,657 controls, we compared traditional count-based text features against pretrained contextual large language model (LLM) embeddings, such as Clinical Longformer and BioClinicalBERT. We found that while Adaptive Mixture Categorization (AMC) improves the utility of skewed linguistic data, contextual LLM embeddings consistently provide comparable or superior predictive power, particularly within low- and moderate-risk tiers where structured indicators may be less obvious. Our multimodal approach, which integrated 66 structured patient characteristics with text features, yielded substantial performance gains, increasing AUROC by approximately 0.07–0.11 across various risk tiers and time windows. Furthermore, our temporal analysis revealed that while long-term data (270 days) is most informative for low-risk patients, short-term windows (<30 days) are critical for high-risk individuals. Using SHAP-based interpretability and topic modeling, we identified clinically coherent themes that shift semantically as risk increases, providing a context-aware framework for improving suicide prevention efforts within the Veterans Health Administration.
This talk will be given in person on Northwestern's Evanston campus.
planitpurple.northwestern.edu/event/639974
Curing AI Issues at the Source: The Power of Data-Centric Learning
Friday, May 15, 2026
Time: 11:00 a.m. to 12:00 p.m. central time
Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)
Speaker: Yanjie Fu, Associate Professor, School of Computing and Augmented Intelligence, Arizona State University
Abstract: While modern machine learning models have achieved remarkable performance, they remain highly vulnerable to systemic issues such as poor generalization, bias, overfitting, domain shift, and adversarial attacks. Traditionally, the research community has focused heavily on model-centric improvements; however, practical deployment increasingly demands a shift toward a Data-Centric AI paradigm. Drawing inspiration from gene editing—where genetic codes are modified to cure diseases—this talk introduces the concept of "Data Reshaping." By using AI to systematically edit and reconstruct data into optimal, task-specific data shape, we can cure AI issues at their source and boost downstream predictive accuracy. In this talk, we will navigate the landscape of data-centric learning (problems, methods, emerging
opportunities) and explore my journey from reinforcement data reshaping, to generative data reshaping, to LLM and agentic data reshaping.
This talk will be given in person on Northwestern's Evanston campus.