Hi there! This is TITLE-ABS-KEY(“science journalism“), a newsletter about science journalism research. In the previous issue, I learned where mediatization of science has brought us in terms of scientists’ understanding of journalism.
This time, in what is becoming a recurring theme for this newsletter, I am exploring yet another paper from computer scientists eager to help journalists.
Today’s paper: Anders Sundnes Løvlie, Astrid Waagstein & Peter Hyldgård (2023) “How Trustworthy Is This Research?” Designing a Tool to Help Readers Understand Evidence and Uncertainty in Science Journalism, Digital Journalism, 11:3, 431-464, DOI: 10.1080/21670811.2023.2193344.
Why: I’m really interested in (and fairly skeptical of) media literacy tools so, naturally, if a paper promises to Help Readers Understand Evidence and Uncertainty in Science Journalism, I’m reading it.
Abstract: Widespread concerns about the spread of misinformation have gained urgency during the ongoing COVID-19 pandemic and pose challenges for science journalism, in particular in the health area. This article reports on a Research through Design study exploring how to design a tool for helping readers of science journalism understand the strength and uncertainty of scientific evidence in news stories about health science, using both textual and visual information. A central aim has been to teach readers about criteria for assessing scientific evidence, in particular in order to help readers differentiate between science and pseudoscience. Working in a research-in-the-wild collaboration with a website for popular science, the study presents the design and evaluation of the Scientific Evidence Indicator, which uses metadata about scientific publications to present an assessment of evidence strength to the readers. Evaluations of the design demonstrate some success in helping readers recognize whether studies have undergone scientific peer review or not, but also point to challenges in facilitating a more in-depth understanding. Insights from the study point to a potential for developing similar tools aimed at journalists rather than directly at audiences.
Alright, demonstrate some success but point to challenges from the end of that abstract is basically how I feel about any indicators and most media literacy tools. There are usually methodological issues, of course; any indicator that purports to represent the outcome of a fairly complex judgement call as a scale of colored letters or a set of labels is going to be at least somewhat reductive and opaque to end users.
But mostly I think the indicator approach is fundamentally rooted in an information deficit fallacy: if only these people had a quick and handy way of Knowing for Certain this source is Definitely Not Reliable so they would stop engaging with it and all would be good. Assessing the quality of a product and using it are two different decisions, and the former does not necessarily determine the latter. Otherwise nobody would ever buy junk food again after the invention of the Nutri-Score system.
Yet Nutri-Score has been shown to work, so there is some truth to this approach, just usually not in a way media literacy folks like to think. We are surrounded by a wide variety of really quite complex products that are even more opaque in terms of contents, sourcing, and production processes (I sort of get Nutri-Score rules, but I can’t really tell you how most processed foods are actually made at scale). And for foods, producers have a strong incentive to keep any unfavorable information about the product as unclear as possible.
The mandatory label provides a workaround for public health authorities, at least until producers learn to game this system — and one more mental shortcut for a consumer rushing to get home from the store through rows and rows of 100 different vanilla yoghurts.
But this is not food but science news; if you think people’s relationships with nutrition are complicated, you might need to sit down for media. (The cherry on top is the project focusing on health claims in media, which is basically the Final Boss of misinfo/disinfo I think.)
I'm still curious, of course, what the success was in and which of the challenges were pointed to; the intriguing last sentence teasing tools aimed at journalists is promising as well.
This study is in fact a collaboration between scientists and Videnskab.dk, Denmark’s leading popular science media with around 1 million monthly users (according to its English-language About page). The team is trying to design “a tool for helping readers understand the strength and uncertainty of scientific evidence in news stories about health science, using both textual and visual information.”
That’s fairly consistent with what we know about two of the problems with science news and media literacy: (1) just as humans can’t hear absolute silence or remove every last Earth microbe from a spacecraft, there’s always some uncertainty in science, and there are always actors eager to exploit that; (2) if Nutri-Score were a paragraph of text on the front of the packaging, rather than a set of green, yellow, orange and red labels, it would not work.
That is, journalists do need to represent and contextualize uncertainty in their stories, but that usually requires quite a bit of text — in a media environment where Elon Musk is removing headlines from link previews and you can’t add links to Instagram posts because Instagram doesn't want to be a home for news.
Hence: Nutri-Score for news! (not a new idea by any means.) But then there’s the objective for this project:
The main objective is to teach the website’s readers about criteria for assessing scientific evidence, in particular in order to help the readers differentiate between science and pseudoscience.
To me, this is where we first step off the Nutri-Score path. I honestly don’t think public health authorities genuinely aim to teach shoppers about criteria for assessing nutritional value, in particular in order to help differentiate between food and ‘pseudofood.‘ I mean, even if a bunch of policy papers say that’s the objective, it would be incredibly naive to believe that is how these labels ultimately work.
The goal is never to give people more homework, but always to establish credibility for a mental shortcut you’re offering them so that they know they can use it without thinking twice. I don’t need to know which ingredients ruin the score for a yoghurt (it would be so nice if everyone knew more of this stuff, no question, and this info should be readily available). I need a yoghurt, and if you’d like me to pick one that’s better for my health, you’d better not try to teach me stuff when I’m just working through my shopping list.
Of course, whether it’s Nutri-Score for dairy or for news, it needs to be transparent in how it’s assigned for people to trust it. So you do need to identify the criteria to distinguish between trustworthy and untrustworthy scientific claims regarding health
, and you need to be able to explain them somewhere in a way nonspecialists would understand.
But arguably what it needs even more is to be helpful: if I feel my life is somehow better after cutting out most foods or stories that scored below C, I can see why I should pay attention to this label. Easier for foods than for news, I’d say, and even for foods it’s not that straightforward.
In the lit review portion of the paper, the authors discuss a bunch of studies further outlining the problem of communicating uncertainty in science journalism, then, somewhat oddly, how science journalists are seemingly too pressed for time and underqualified to gauge uncertainty and the quality of information themselves — and then the digital tools already out there, with this pearl of a quote:
A recent report on efforts to create automated systems for fact-checking (AFC) concluded that fact-checking currently requires human judgment that cannot easily be replaced by an automated system.
Yeah, “easily” feels unnecessary here; AIs just can’t handle the truth.
The authors use a Research through Design methodology that I’ve instantly filed away to learn more about; not too keen on journalism through design, however. They used it in collaboration with the Danish media outlet to create something called Scientific Evidence Indicator (SEI). Again, I’m seeing an attempt to give readers homework: Thus the goal with this project was not just to create a “quality marker” for news items, but rather to create a learning tool that could raise awareness about scientific standards.
To do that, researchers mined the experience of staff science journalists to see how they assess the quality of a scientific study within health science. Incidentally, even after admitting human judgment can’t be automated and acknowledging, to a degree, that science journalists largely know what they are doing, the team maps these insights against metadata because they want the indicator to maybe be automated at some point. Sigh.
BUUUT, the Nordic Cochrane Institute came to the rescue! Quote: They were critical of the idea of basing an assessment purely on metadata, and suggested that a more qualitative assessment of each study was needed.
Delightful, thank you.
So, the final list of variables:
BFI Level (a national scientometric indicator to assess the quality of the source),
Evidence Hierarchy (a 7-point grade),
H-index from SCOPUS (
a measure of the "experience" of the scientific team
— love the quotation marks! Later on in the paper we discover that researchers kinda really hated this metric that has little to nothing to do with the quality of a paper, but added it anyway because journalists were using it as an heuristic to guide their attention),Special Remarks from journalists.
The two university authors on the team were apparently worried the model might be too closely shaped by the journalists’ work practices - so that it might end up simply as a validation of their existing practice, rather than offering a genuine assessment of scientific evidence.
That’s fair, but then they asked the staff to retroactively rate the stories they were already working on around that time, and some promptly scored 0. The team writes this was “eye-opening“ for everyone but don’t really go into much detail on what the eyes were opened to.
After a bunch of sketches and iterations and prototyping — this is design, after all — here’s what the team came up with:
This is designed to be the public-facing “entry-level element” of the indicator; if the user clicked on it, they would be taken to the expanded modal box going through the details of an assessment for this particular story. Now, the reason why I am not showing it to you in this newsletter is that between four variables and quite a bit of text it ends up being rather clunky, and the authors could not fit it into one neat illustration for the paper (and the Videnskab.dk stories used in A/B testing don’t seem to have it live anymore).
That’s one question I have for this indicator, and a bunch of others concern the little red sector. There’s no point in trying to get rid of all food rated E on the Nutri-Score scale; whether or how journalists should be covering research rated untrustworthy according to this metric, however, is an open question that this indicator sort of avoids.
The paper describes the A/B test with readers who were randomly assigned to see the same stories with and without an indicator (all five stories were rated either high or moderate on the SEI scale, no red) and then asked a few questions. All effects the team was able to track were quite small, but we’ll get to that in a bit.
I’ll just pick a quote that neatly summarizes what we are dealing with when computer scientists try to help journalists (Q4 asks readers to identify methods used in a study they read about):
…the very small difference in Q4 seems to indicate that readers who saw the SEI were not significantly better able to correctly identify the research methods used in the scientific study, than those who did not see the SEI. From our point of view this was somewhat surprising, as this information would be visible to any reader who clicked on the SEI anchor and subsequently on the symbol for the “Research Method” variable. A reasonable interpretation would be that few readers did that — in other words, most readers did not spend enough time exploring the SEI to encounter the information that was presented at this depth of the architecture, and which would have helped them answer question Q4.
Yeah, no, sorry, like I said — no homework for readers. 🤷🏼♀️
In the Discussion section, we finally get to the main question I flagged earlier: is this indicator actually helpful? Well, yes, just perhaps not for readers but for journalists, the paper says, as it can help them perform a quality check on the papers they encounter. I’d challenge this conclusion too; at best, this is a set of training wheels, and a suboptimally designed one at that (as the Limitations section explains at length).
Well, this was less fun than I thought. I often fear that the media literacy indicator research is too isolated from practice and reality, so we’ll never actually know if it’s worth trying to create new ones. And this paper doesn’t really help with those fears.