Prediction Powered Inference Discussion - Collective-ut

19 Jun 2025


      Greetings,
A number of us will meet next Thursday 10/26 from 10:00-11:00 
Central / 8:00-9:00 Pacific to discuss emerging statistical 
methods for correctly incorporating expensive and precise data 
with cheap and inaccurate data in statistical estimates. For 
example, an algorithmic classifier or large language model might 
make predictions about "content" such as text or images. 
"Validation data" by human annotators is often used to quantify 
the accuracy of these predictions; however, as long as predictions 
are perfectly accurate this does not prove that inaccurate 
predictions do not invalidate statistical conclusions. These 
methods use both forms of data to create more precise estimates 
consistent with the validation data.
I think these methods open up new powerful measurement strategies 
and study designs. I hope you will join our discussion :)
I think this discussion will partly be an orientation to this 
methodological literature and partly an occasion to brainstorm how 
we might use (or already are using) these techniques in our 
studies.
Here are some links to relevant articles. I don't expect you to 
read them all deeply before our meeting. Most are very technical. 
Looking at these articles can help orient you to this literature 
and prepare you for this discussion.
https://doi.org/10.1080/19312458.2023.2293713
This is my paper published back in 2023. I showed how to use an 
error modeling framework to correct the bias. If I say so myself, 
I think this is a pretty clear and easy to follow explanation of 
the problem. However, I think that the solution I proposed isn't a 
great fit for "black box" models such as LLMs.
https://arxiv.org/abs/2501.18577
This is the latest in the line of "Prediction Powered Inference" 
(PPI) papers.  It's extremely technical, but I think its the most 
generally applicable method currently available.  Unlike my paper, 
this approach does not require any difficult assumptions about 
classifier performance.  I tried out the R implementation just 
yesterday and it is fairly usable. Here's a tutorial: 
https://dankluger.github.io/PTDBootTutorial/Tutorial.html
https://naokiegami.com/paper/dsl_ss.pdf
This "designed based supervised learning" approach also claims to 
be very general and to work with "black box models". It is similar 
in spirit to PPI, but since it involves creating intermediate 
predictive models is more complex.
https://journals.sagepub.com/doi/abs/10.1177/00491241251326865. 
While most other studies have in mind something like using a model 
as an auxiliary coder in a content analysis. This paper suggests 
something a bit more radical by using language models as a proxy 
for human study participants.
-- 

Nathan TeBlunthuis
Assistant Professor
School of Information
University of Texas at Austin
https://teblunthuis.cc