The limits of synthetic research data — Olson Zaltman

0
7



The quant firm Verasight has just published fascinating research on research that explores the limits of synthetic data. I have attached the full report.

In June, they conducted a survey with 1,500 real people, and then compared those results with those of a similar survey that produced 1,500 Large Language Model (LLM) responses. Verasight specializes in political polling, so the three questions were:

  • Donald Trump approval/disapproval

  • 2026 Congressional ballot preference

  • A question about whether municipalities should enact restrictive zoning laws.

Verasight’s conclusion: “LLMs are not accurate replacements for human respondents.”

On the first two questions, where answers tend to fall predictably along party lines, the LLMs get within 5-10% of the topline distribution of the “real” poll.  That is not dreadful, although a gap of 5-10% in polling for such straightforward questions is a fairly sizable variance.

Looking at the crosstabs for different demographic groups, the LLMs performed worse.  And at an individual level,  the LLMs were even further off the mark.  Only 80 percent of the people the LLM predicted would “strongly disapprove” of Trump actually did – and the same percentage applied to those predicted to “strongly approve.”  Again, although that might sound respectable, a poll that is off by 20% in one direction or another is off by a lot.

And the LLMs were in an alternate universe on the last question, which is also one that was outside the LLMs’ training data.

  • Actual support:  28% for “Limit local zoning rules to build more homes”….43% against…29% don’t know

  • LLM result:  59% for limiting zoning rules…41% against…0% don’t know

Also of note…Verasight tested multiple LLMs and there was substantial variation across the models on these questions.

We have to extrapolate what this means for qualitative data. I hypothesize that LLMs would be even less reliable in that context. The issues in qual are (or should be) more nuanced that just “For” or “Against”  a particular candidate or topic. And good qual research will spin off into unexpected and unexplored territories, and it seems LLMs are quite unreliable in those spaces.

My conclusion from this is that, for corporate market research, synthetic data might be able to provide “in the ballpark” accuracy when predicting simple answers to straightforward questions. But on issues where any degree of nuance, uncertainty, or complexity is involved, the preferred method remains real people talking to real people.   

What do you think?