CIS 6200: A Course at the University of Pennsylvania
Lectures 2024
Lecture 1: Course Overview Slides. We went over what we will cover in this class and tried to make it sound interesting, important, and profound. Video here.
Lecture 2: In this lecture we define the basic notation and settings we will be using for the class, define squared error (Brier Score) and marginal mean consistency error (statistical bias), and show that for any model, uniformly shifting it to remove statistical bias reduces squared error by exactly the square of the magnitude of the shift. Video here.
Lecture 3: In this lecture we define quantiles and pinball loss, and show that if we shift a model to fix marginal quantile consistency error, we reduce its pinball loss. A special case of this is that pinball loss elicits quantiles. We also introduce the Hoeffding bound and use it to upper bound statistical bias (out of sample) for a model that was debiased in sample. Video here.
Lecture 4: We prove out of sample bounds for correcting marginal quantile consistency error (and in doing so introduce the DKW inequality), and then give a very silly algorithm that gets very low statistical bias in online adversarial label prediction. Its so silly that we all realize that marginal guarantees are not strong enough and demand more. Video here.
Lecture 5: In this lecture we give a simple algorithm that guarantees marginal quantile consistency in the online adversarial setting, and then turn to conformal prediction. We introduce the basic idea of conformal prediction: a one parameter family of prediction sets defined by a non-conformity score and a quantile threshold, and went through a couple of illustrative examples. Video here.
Lecture 6: In this class we prove basic guarantees for split conformal prediction, both in expectation over the calibration set and with high probability over the calibration set. Video here.
Lecture 7: We finish marginal conformal prediction by showing how to use our algorithm for marginal quantile consistency in the online adversarial setting to obtain prediction sets with marginal coverage from any non-conformity score in the online adversarial setting. Then we begin calibration, and introduce different ways of measuring calibration error, and relate them. Video here.
Lecture 8: We analyze two calibration algorithms: An iterative one that will generalize well to satisfying other conditional bias bounds we will study in the future, and a very simple calibration procedure called “Histogram Binning” that is tailored to calibration, but among other things lets us prove that the squared error of a model can be decomposed into the sum of calibration error of the model and its refinement score. We show that calibrating a model reduces its squared error by exactly its average squared calibration error. Video here.
Lecture 9: We study the decision theoretic properties of calibration. In particular, we show that it is a natural interface between prediction and downstream decision making. We prove that simultaneously for every downstream decision maker, choosing the action that maximizes the decision maker’s expected utility under the forecaster’s predicted distribution is optimal amongst all policies mapping forecasts to actions — if the forecasts are calibrated. Video here.
Lecture 10: We give an algorithm to post-process a quantile predictor to be quantile calibrated in a way that only improves its pinball loss. We then move on to online (mean) calibration, discuss our general approach of deriving an algorithm by viewing it as playing in a game against an adversary. We’ll derive the algorithm next lecture. Video here.
Lecture 11: In this class we derive and analyze an algorithm for obtaining diminishing calibration error in a sequential adversarial environment. Video here.
Lecture 12: In this lecture we give an iterative algorithm that is able to take as input an arbitrary model f, and output a more accurate model that has no statistical bias not just overall, but conditional on membership in any group in a prespecified collection of groups, which may intersect. Video here.
Lecture 13: We show how to find models that have no statistical bias, even conditional on group membership in an arbitrary set of groups by solving a linear regression problem over features defined by the indicator functions of the groups. Similarly we show how to find models that predict quantiles covering their target labels at the correct rate, even conditional on membership in an arbitrary set of groups, by solving a quantile regression problem over features defined by the indicator functions of the groups. We wave our hands vigorously at out of sample generalization. Video here.
Lecture 14: In this lecture we show how to solve any loss minimization problem with groupwise optimality guarantees with respect to some hypothesis class H — i.e. the error of the learned model conditional on membership in group g is competitive with the best model trained on group g alone, simultaneously for all (possibly intersecting) groups in a set G. We solve this problem by reducing it to the standard (non-conditional) learning problem over the benchmark class. Video here.
Lecture 15: We finish up loss minimization with conditional guarantees. Video here.
Lecture 16: In this lecture we define multicalibration, and connect it to downstream decision making. A predictor that is multicalbrated with respect to the levelsets of a class of policies mapping features to actions has the property that for any downstream decision maker (no matter their utility function), best responding to the predictor’s forecasts gives them higher utility than any policy in the class. Video here.
Lecture 17: In this lecture we derive an algorithm to post-process any model in the batch/distributional setting into a model that is only more accurate (as measured by squared error), and is multicalibrated with respect to any given grouping. Video here.
Lecture 18: Turning to the online adversarial setting, we begin building up machinery that will be useful to is. In this lecture we define and analyze the multiplicative weights algorithm for competing with the best fixed action in adversarial environments. Video here.
Lecture 19: In this lecture we reduce online convex optimization to online linear optimization, define zero sum games, their minimax and maximin values, and talk up the minimax theorem, which we’ll prove next lecture. Video here.
Lecture 20: In this lecture we prove the minimax theorem by reduction to online convex optimization. We then sketch the minimax proof of a remarkable theorem: There is no empirical test (no matter how sophisticated) that can distinguish an “informed forecaster” (who knows a time varying probabilistic process perfectly, and truthfully forecasts it) from a forecaster running an online adversarial algorithm (with no knowledge of the true generative process) designed only to pass the test. Video here.
Lecture 21: In this lecture we prove that any empirical test aimed at distinguishing an informed forecaster (who knows the outcome distribution at each round) from an uninformed forecaster (who is predicting using an online learning algorithm) must have no statistical power in the worst case. The proof is non-constructive via the minimax theorem. However the result is sweeping: Any property of a transcript that holds with high probability for any informed forecaster can also be guaranteed in the worst-case in online adversarial settings. Video here.
Lecture 22: In this lecture we show several methods for computing approximate minimax/maximin strategies in zero sum games by combining no-regret learning and best response in different combinations. We then introduce a framework in which to think about multiobjective optimization in sequential adversarial settings. Video here.
Lecture 23: We reduce online multiobjective optimization to online linear optimization, and show that even though the minimax theorem is false for multiobjective zero-sum games, we can still “approach” the value of the game in which the adversary moves first in a repeated setting in which the learner is obligated to move first. This framework will eventually lead us to efficient algorithms for conditional regret and calibration guarantees in online adversarial settings. Video here.
Lecture 24: In this lecture we show how to obtain conditional regret bounds, subject to an arbitrary set of overlapping conditioning events. For example we can condition on overlapping groupings of context that is available to get groupwise regret bounds, we can condition on time to get dynamic regret bounds, or condition on the actions we play to get swap regret bounds. We do this by reducing to the multiobjective optimization framework we solved last class. Video here.
Lecture 25: We give a simple, closed form algorithm for getting regret guarantees that hold in online adversarial settings not just marginally, but conditional on any collection of events that can be defined by available context and past history. This is weaker than the general definition of conditional regret we studied last class (which allowed us to condition also on our own action), but is still general enough to capture things like groupwise regret and adaptive regret. What we get (in contrast to the more general notion) is a simple, closed form algorithm that does not require solving an optimization problem at each round. Video here.
Lecture 26: In this lecture we define the general problem of making vector valued predictions that have small statistical bias conditional on any collection of conditioning events in an online adversarial environment. This generalizes multicalibration. We show how to reduce it to our general online multiobjective optimization framework to derive strong conditional bias bounds. Video here.
Lecture 27: In this lecture we derive a simple efficient algorithm for multicalibration in sequential, adversarial settings, by specializing our generic minimax optimization algorithm and showing that the minimax computation it needs to perform at every round has a simple closed form solution. Video here.