Winter 2026 Seminar Series: Department of Statistics and Data Science

Winter 2026 Seminar Series

Department of Statistics and Data Science 2025-2026 Seminar Series - Winter 2026

The 2025-2026 Seminar Series will primarily be in person, but some talks will be offered virtually using Zoom. Talks that are virtual will be clearly designated and registration for the Zoom talks will be required to receive the zoom link for the event. Please email Kisa Kowal at k-kowal@northwestern.edu if you have questions.

Seminar Series talks are free and open to faculty, graduate students, and advanced undergraduate students

Towards the Last Mile of Artificial General Intelligence: Open-World Long-Tailed Learning in Theory and Practice

Friday, January 23, 2026

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Dawei Zhou, Assistant Professor, Department of Computer Science, Virginia Tech

Abstract: Artificial General Intelligence (AGI) represents the next generation of AI that can match or exceed human intelligence across a wide spectrum of tasks. Despite remarkable advances, today’s AI systems succeed mainly in data-rich, well-structured settings—identifying common objects, summarizing routine content, or responding to typical queries. They struggle precisely where intelligence matters most—rare, high-stakes, and context-dependent scenarios such as scientific discovery, open-world cybersecurity, and rare disease diagnosis. We argue that this shortfall defines the Last Mile Problem on the path to AGI, which we frame as Open-World Long-Tailed Learning (OpenLT): how can we enable AI systems to reason, adapt, and generalize across the underrepresented, evolving, and open-ended domains? In this talk, I will discuss our group’s recent work on 1) OpenLT Characterization – How can we systematically characterize and uncover novel, complex patterns in open-world data?, 2) OpenLT Adaptation – How can AI models be effectively adapted to open and dynamic environments?, and 3) OpenLT Application and Deployment - hinging on the application of scientific hypothesis generation for 3D metamaterial design to discuss our proposed techniques and theoretical results for open-world long-tailed learning. Finally, I will close with thoughts on how addressing the Last Mile Problem can shape the next decade of AGI research and move us closer to systems that truly understand and operate in the open world.

This talk will be given in person on Northwestern's Evanston campus.

planitpurple.northwestern.edu/event/636717

TBA

Friday, February 13, 2026

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Suqi Liu, Assistant Professor, Department of Statistics, University of California, Riverside

Abstract: TBA

This talk will be given in person on Northwestern's Evanston campus.

Deep Survival Learning for Kidney Transplantation: Knowledge Distillation and Data Integration

Friday, February 20, 2026

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Kevin He, Associate Professor of Biostatistics and Associate Director of the Kidney Epidemiology and Cost Center (KECC), University of Michigan

Abstract: Prognostic prediction using survival analysis faces challenges due to complex relationships between risk factors and time-to-event outcomes. Deep learning methods have shown promise in addressing these challenges, but their effectiveness often relies on large datasets. However, when applied to moderate- or small-sized datasets, deep models frequently encounter limitations such as insufficient training data, overfitting, and difficulty in hyperparameter optimization. To mitigate these issues and enhance prognostic performance, this talk presents a flexible deep learning framework that integrates external risk scores with internal time-to-event data through a generalized Kullback–Leibler divergence regularization term. Applied to the national kidney transplant data, the proposed method demonstrates improved prediction of short-term mortality and graft failure following kidney transplantation by distilling and transferring prior knowledge from pre-policy-change teacher models to newly arrived post-policy-change cohorts.

This talk will be given in person on Northwestern's Evanston campus.

planitpurple.northwestern.edu/event/636129

What functions does XGBoost learn?

Friday, February 27, 2026

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Aditya Guntuboyina, Associate Professor, Department of Statistics, University of California, Berkeley

Abstract: We develop a theoretical framework that explains what kinds of functions XGBoost is able to learn. We introduce an infinite-dimensional function class that extends ensembles of shallow decision trees, along with a natural measure of complexity that generalizes the regularization penalty built into XGBoost. We show that this complexity measure aligns with classical notions of variation—in one dimension it corresponds to total variation, and in higher dimensions it is closely tied to a well-known concept called Hardy–Krause variation. We prove that the best least-squares estimator within this class can always be represented using a finite number of trees, and that it achieves a nearly optimal statistical rate of convergence, avoiding the usual curse of dimensionality. Our work provides the first rigorous description of the function space that underlies XGBoost, clarifies its relationship to classical ideas in nonparametric estimation, and highlights an open question: does the actual XGBoost algorithm itself achieve these optimal guarantees? This is joint work with Dohyeong Ki at UC Berkeley.

This talk will be given in person on Northwestern's Evanston campus.

planitpurple.northwestern.edu/event/636130

DEPARTMENT OF STATISTICS AND DATA SCIENCE