The results currently correspond to one of the crucial facets of AI-based hate speech detection: Average too little and you can’t solve the problem either; on average too much and you can censor the language with which marginalized teams strengthen and defend themselves: “You suddenly punish precisely those communities that are most frequently affected by hatred,” says Paul Röttger. PhD student at the Oxford Web Institute and co-author of the article.
Lucy Vasserman, Senior Software Program Engineer at Jigsaw, says Perspective overcomes these limitations by relying on human moderators to make the ultimate decision. Still, this course won’t be scalable for larger platforms. Jigsaw is now working on developing a feature that can re-prioritize posts and feedback based primarily on Perspective’s uncertainty – mechanically deleting content material that is positive as hateful and showing borderline content material to people.
The exciting thing about the new study is that it allows a differentiated assessment of the latest status. “A lot of the problems highlighted in this paper correspond to the fact that reused phrases are difficult to use in this fashion – this is well known in the industry but actually difficult to quantify,” she says. Jigsaw is now using HateCheck to better sense the variations between its fashions and the place to be enchanted.
The teachers are also enthusiastic about the analysis. “This paper provides us with a pleasant, clearly useful resource for evaluating industry programs,” says Maarten Sap, a voice AI researcher at the College of Washington who “enables businesses and customers to ask for improvements.”
Thomas Davidson, Assistant Professor of Sociology at Rutgers College, agrees. The constraints of language fashions and the clutter of language imply that there will always be trade-offs between under- and over-identifying hate speech, he says. “The HateCheck dataset helps make these tradeoffs visible,” he says.