CIS 7000: A Course at the University of Pennsylvania
Lectures
Lecture 1: Course Overview Slides. We went over a high level overview for the course, and began studying marginal consistency: fixing marginal mean inconsistency reduces squared error, and generalization bounds. Video here: https://www.youtube.com/watch?v=EBZEJO1RebE
Lecture 2: Introduced quantiles and pinball loss. Defined marginal quantile consistency. Fixing marginal quantile consistency reduces pinball loss — by a quantifiable amount of the distribution is Lipschitz. Proved generalization bounds with DKW, and began marginal guarantees in sequential settings. Video here: https://www.youtube.com/watch?v=uuodIrCb4uc
Lecture 3: Sequential prediction with marginal quantile consistency guarantees. Offline to online reductions for mean and marginal quantile consistency. Begin calibration: Defined measures of calibration error and related them, and gave (but did not yet analyze) our first algorithm for calibrating a function. Calibration reduces squared error of a predictor by exactly its original squared calibration error. Video here: https://www.youtube.com/watch?v=_7MRy4OurR0
Lecture 4: Mean and Quantile calibration in the batch setting: We gave algorithms to post-process arbitrary functions f so that they become mean or quantile calibrated, while reducing their error (as measured by squared loss or pinball loss respectively). Video here: https://www.youtube.com/watch?v=ih8NBK_b-mQ
Lecture 5: Mean and quantile calibration in the sequential setting against an adversary. Beginning with an anecdote from calibration hero Rakesh Vohra — originally reviewers didn’t believe this result! Video here: https://www.youtube.com/watch?v=a6Zz9YpXCx8
Lecture 6: Begin using the features! Two algorithms for obtaining group conditional mean consistency for arbitrary intersecting groups. The first is an iterative algorithm, the 2nd is a direct optimization algorithm that solves a linear regression problem. If you replace pinball loss for squared error, you get group conditional quantile consistency. Generalization theorems from L1 regularization. Video here: https://www.youtube.com/watch?v=Hf6H2UoupGY
Lecture 7: Batch multicalibration. We analyzed the convergence of a batch multicalibration algorithm, discussed the importance of discretization, and proved out of sample generalization bounds. We didn’t cover the very similar case of quantile multicalibration, but the corresponding bounds are worked out in full in the notes. Video here: https://www.youtube.com/watch?v=rgsNJODCLhg
Lecture 8: Sequential adversarial multicalibration. We derived and analyzed an algorithm for obtaining multicalibration against an adversary, and discussed how it gets better sample complexity bounds than the batch algorithm we analyzed. We didn’t cover the very similar case of online quantile multicalibration, but the corresponding bounds are worked out in full in the notes. Video here: https://www.youtube.com/watch?v=7wUjMFeoGA8
Lecture 9: Conformal Prediction. We introduce the problem of conformal prediction, which reduces the problem of producing prediction sets to the problem of estimating quantiles of a one-dimensional “non-conformity score” distribution. We give the standard “expected marginal coverage” guarantee of split conformal prediction, and then precede to apply the various kinds of quantile estimation techniques we have built up in this class to give a series of stronger guarantees, that hold conditional on the calibration dataset, group membership, predicted thresholds, and in the sequential setting. Video here: https://www.youtube.com/watch?v=M3tkM4dcIPA
Lecture 10: Multicalibration with respect to real valued functions as Boosting. We show how to reduce the problem of multicalibrating over a set of real valued functions H to the problem of solving squared error regression problems over H. Then we give a simple “if and only if” characterization determining exactly when multicalibration with respect to H implies Bayes optimal prediction. Video here: https://www.youtube.com/watch?v=fEB7XFiEL4o
Lecture 11: (Multi)calibration under distribution shift. We think about using models on distributions that differ from the distributions that they have been trained on. We restrict attention to distributions that have the same conditional label distribution, but differ in their feature distributions. We show that multicalibration guarantees are transformed under distribution shift, and that if we were multicalibrated with respect to a class of functions that contains the likelihood ratio function mapping the source to the target distribution, then we will also be calibrated on the target distribution. We argue that this is enough to estimate the loss of any policy of our model on the target distribution (under any loss function) using only unlabeled samples. For example, it lets us estimate the error rate of a classifier out of distribution with only unlabeled samples. Video here: https://youtu.be/t7qOVkMopuk
Lecture 12: Jess Sorrell tells us about “Omnopredictors”. We have already seen that if a model f is calibrated, then for any loss function, following the policy of choosing the action that optimizes that loss function assuming that f(x) is the true probability that y = 1 conditional on x is the minimum loss policy. In this lecture we see that if additionally f is multicalibrated with respect to H, then for any -convex- loss function, following the policy of choosing the action that optimizes that loss function assuming that f(x) encodes true conditional probabilities also obtains lower loss than any h in H. The proof strategy is to show that if f is multicalibrated with respect to H, then any h in H is dominated by a policy of f, reducing to our previous statement. Video here: https://www.youtube.com/watch?v=9lFgyaxKneE