

“misrepresent” is a vague term. Actual graph from the study

The main issue is usual… sources. AI is bad at sources without a proper pipeline. They note that Gemini is the worst at 72%.
Note, they’re not testing models with their own pipeline. They’re testing other people’s products. This is more indicative of the product design than the actual models








Computers be like “this shit true asf” and it’s
the number 1a high voltage