Skip to main content

Winter 2024 Seminar Series

Department of Statistics and Data Science 2023-2024 Seminar Series - Winter 2024

The 2023-2024 Seminar Series will primarily be in person, but some talks will be offered virtually using Zoom. Talks that are virtual will be clearly designated and registration for the Zoom talks will be required to receive the zoom link for the event. Please email Kisa Kowal at k-kowal@northwestern.edu if you have questions. 

Seminar Series talks are free and open to faculty, graduate students, and advanced undergraduate students


 

Sharper Risk Bounds for Statistical Aggregation

Friday, January 12, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Nikita Zhivotovskiy, Assistant Professor, Department of Statistics, University of California Berkeley

Abstract: In this talk, we revisit classical results in the theory of statistical aggregation, focusing on the transition from global complexity to a more manageable local one. The goal of aggregation is to combine several base predictors to achieve a prediction nearly as accurate as the best one, without assumptions on the class structure or target. Though studied in both sequential and statistical settings, they traditionally use the same “global” complexity measure. We highlight the lesser-known PAC-Bayes localization enabling us to prove a localized bound for the exponential weights estimator, and a deviation-optimal localized bound for Q-aggregation. Finally, we demonstrate that our improvements allow us to obtain bounds based on the number of near-optimal functions in the class, and achieve polynomial improvements in sample size in certain nonparametric situations. This is contrary to the common belief that localization doesn’t benefit nonparametric classes. Joint work with Jaouad Mourtada and Tomas Vaškevičius.

This talk will be given in person on Northwestern's Evanston campus at the location listed above.

https://planitpurple.northwestern.edu/event/608737

Phylogenomics: Some Identifiability Results

Friday, January 19, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Sebastien Roch, Professor, Department of Mathematics, University of Wisconsin-Madison

Abstract: The estimation of species phylogenies from genome-scale data is an important step in modern evolutionary studies. This estimation is complicated by the fact that genes evolve under biological processes that produce discordant trees. Such processes include horizontal gene transfer (HGT), gene duplication and loss (GDL), and incomplete lineage sorting (ILS), all of which can be modeled using random tree distributions. I will discuss recent results on the identifiability of these complex probabilistic models.

I will focus in particular on theoretical results for probabilistic models of HGT. Prior work has suggested the possibility of a “phase transition”, whereby reconstruction of the species tree may become significantly harder when the rate of transfer is high enough. I will report on recent work showing that, in fact, the species tree is identifiable for any rate of transfer, answering an open question in this area. Time permitting, I will also discuss the case of GDL.

No biology background will be assumed. 

This talk will be given in person on Northwestern's Evanston campus at the location listed above.

 https://planitpurple.northwestern.edu/event/609376

 

From 20GB to 100TB: a journey on the metagenomic road

Friday, February 9, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Zhong Wang, Computational Biologist, Genome Analysis Group Lead, Lawrence Berkeley National Lab

Abstract: Metagenomics has revolutionized our understanding of microbial functions, ecology, and evolution. Unraveling the complexity of environmental microbial communities demands numerous gigabases, or even terabases, of sequence data, thereby posing extraordinary computational challenges associated with data analysis. In this talk, I will chronicle our journey, navigating through the challenges presented by initially modest 20GB datasets to our current capability of handling substantial 100TB experiments. This journey entailed more than just augmenting storage or computational power; it also involved innovative thinking, experimentation with hardware scaling solutions, and the development of scalable software tools designed for immense datasets. By sharing our experiences, successes, and failures, this presentation aims to offer insights and strategies to fellow biologists and bioinformaticians navigating the rapidly expanding sea of metagenomic data.

This talk will be given in person on Northwestern's Evanston campus at the location listed above.

https://planitpurple.northwestern.edu/event/609733

CANceled

Friday, February 16, 2024

 

BET and BELIEF

Friday, February 23, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Kai Zhang, Associate Professor, Department of Statistics and Operations Research, University of North Carolina, Chapel Hill

Abstract: We study the problem of distribution-free dependence detection and modeling through the new framework of binary expansion statistics (BEStat). The binary expansion testing (BET) avoids the problem of non-uniform consistency and improves upon a wide class of commonly used methods (a) by achieving the minimax rate in sample size requirement for reliable power and (b) by providing clear interpretations of global relationships upon rejection of independence. The binary expansion approach also connects the symmetry statistics with the current computing system to facilitate efficient bitwise implementation. Modeling with the binary expansion linear effect (BELIEF) is motivated by the fact that two linearly uncorrelated binary variables must be also independent. Inferences from BELIEF are easily interpretable because they describe the association of binary variables in the language of linear models, yielding convenient theoretical insight and striking parallels with the Gaussian world. With BELIEF, one may study generalized linear models (GLM) through transparent linear models, providing insight into how modeling is affected by the choice of link. We explore these phenomena and provide a host of related theoretical results. This is joint work with Benjamin Brown and Xiao-Li Meng.

This talk will be given in person on Northwestern's Evanston campus at the location listed above.

https://planitpurple.northwestern.edu/event/610555

Approximate Co-sufficient Sampling with Regularization

Friday, March 1, 2024

Time: 11:00 a.m. to 12:00 p.m. central time

Location: Ruan Conference Room – lower level (Chambers Hall 600 Foster Street)

Speaker: Wanrong Zhu, Final-year PhD student, Department of Statistics, University of Chicago

Abstract: Goodness-of-fit (GoF) testing is ubiquitous in statistics and is applicable in many areas, for example, conditional independence testing, model selection, multiple testing, etc. We consider the problem of GoF testing for parametric models – testing whether observed data comes from some parametric null model. This testing problem involves a composite null hypothesis, due to the unknown values of the model parameters. In some special cases, co-sufficient sampling (CSS) can remove the influence of these unknown parameters via conditioning on a sufficient statistic—often, the maximum likelihood estimator (MLE) of the unknown parameters. However, many common parametric settings (including logistic regression) do not permit this approach, since conditioning on a sufficient statistic leads to a powerless test. The recent approximate co-sufficient sampling (aCSS) framework offers an alternative, replacing sufficiency with an approximately sufficient statistic (namely, a noisy version of the MLE). This approach recovers power in a range of settings where CSS cannot be applied, but can only be applied in settings where the unconstrained MLE is well-defined and well-behaved, which implicitly assumes a low-dimensional regime. In this talk, we extend aCSS to the setting of constrained and penalized maximum likelihood estimation, so that more complex estimation problems can now be handled within the aCSS framework, including examples such as mixtures-of-Gaussians (where the unconstrained MLE is not well-defined due to degeneracy) and high-dimensional Gaussian linear models (where the MLE can perform well under regularization, such as an ℓ1 penalty or a shape constraint).

This talk will be given in person on Northwestern's Evanston campus at the location listed above.

 https://planitpurple.northwestern.edu/event/610556