Covering Comments Is Instagram’s Newest Anti-Bullying Tool

Harassment takes many forms. The platform’s latest update works to address a broader swath of negative interactions by hiding comments and sending warnings.
a young black woman sitting on her bed reading her phone.
Instagram acknowledges that comments don't have to break its rules to still be harmful, especially for its younger users.Photograph: Peter Cade/Getty Images

For years, Instagram has been on a mission to make itself the nicest place online. It’s a quixotic mission for a social media company, especially one whose core users are teenagers—an age group that has proven particularly adept at making each other miserable. Cyberbullying is hard to define and even harder to measure; even Facebook, Instagram’s parent company, can’t estimate how prevalent the behavior is on its platform, or whether it’s worsened as Instagram replaces school cafeterias and shopping malls as the main place where teenagers interact. Still, that hasn’t stopped Instagram from rolling out feature after feature to mitigate bad behavior, in its effort to clean up cyberbullying for good.

On Tuesday, Instagram is adding two new tools to its repertoire. First, the platform will automatically hide comments that look like they might constitute bullying even if they aren’t obviously breaking the rules. Second, it will send a new warning message for users whose comments are repeatedly flagged as toxic, in the hope of changing behavior at the onset. Both tools will roll out to Instagram users globally, starting with those who speak English, Portuguese, Spanish, French, Russian, Chinese, and Arabic.

Instagram already proactively hides and removes the nastiest comments. Now it’s trying to make more borderline cases less visible. The new feature uses machine learning to find comments that look like ones reported to Instagram for bullying in the past. (The tool catches emoji too—as in, “You look like 💩.”) Of course, context matters. The same words might be playful ribbing in one conversation and straight-up meanness in another. Even humans can’t always agree on whether or not something crosses the line: A 2017 study from the Pew Research Center found Americans split on whether behaviors like name calling or purposefully embarrassing someone counted as “harassment” at all.

Photograph: Instagram

Instagram, in this case, has chosen to err on the side of caution. “We’re trying to catch as much bullying and harassment as we can,” says Carolyn Merrell, Instagram’s global head of policy programs, who acknowledged a trade-off between protecting its users and stifling their free speech. While Facebook’s rules against bullying and harassment contain a dizzying array of prohibited language, Merrell says that “some comments may not violate our community guidelines but may still be seen as harmful, harassing, or bullying.”

If flagged by the AI, Instagram will now place those comments behind a box of text that says “View hidden comments.” Anyone can tap on that box to reveal the offending content, and people can still report those comments to Instagram if they violate the community guidelines. Instagram will also give people the option to remove the content cover from comments that they receive on their page. So if the crude joke your friend makes on your photo gets buried by the comment cover, you can choose to move it back out into the open.

Instagram is also introducing a new warning message for people whose comments are repeatedly flagged as problematic by Instagram’s algorithms. In the past, Instagram has encouraged people to rethink a potentially offensive message before posting it with gentle nudges: “Are you sure you want to post that?” Now, people who repeatedly try to post offensive comments will see an additional warning message: If they continue to post comments that violate community guidelines, they risk having their account deleted. (Per its policy, Instagram says it removes accounts with “a certain number of violations within a window of time.”)

Photograph: Instagram

The tool might seem like Instagram’s version of a “scared straight” campaign, but Merrell says the approach is meant to remind people that their words can have an impact on others (and consequences for them). Instagram has already seen some success with its previous warning messages: The social network says one in five people will either edit their comment or delete it after being prompted to rethink what they wrote. Other efforts to study the effects of automated feedback on toxic comments have seen similar results, suggesting that these simple interventions do have an effect. And as Adam Mosseri, Instagram’s head, noted last year, automated tools can be “especially crucial for teens, since they are less likely to report online bullying even when they are the ones who experience it the most."

Still, combating cyberbullying online can often feel like a cat-and-mouse game, with bullies adapting their behavior to get around new tools and filters that try to stop it. And while Instagram has seen some signs that its automated interventions can have an effect, new provocations are always popping up. Then there are the old ways of harassing someone, which may also be in need of attention. “We’re looking at DMs,” says Merrell, who acknowledged that Instagram had work to do in terms of dealing with cyberbullying there.


More Great WIRED Stories