Reducing Community Forum Spam through Machine Learning and Automation

“Like an avalanche.”

“Like a freight train.”

“Like a tidal wave.”

These may be clichés, but they aptly describe the overwhelmed feeling I had last April.

As community capability manager for the majority of Intel external communities, I’m responsible for many aspects of keeping the communities up and running – and that includes protecting them against spam. Spam has always been a sporadic problem, but in April 2015 Intel’s community forums were inundated in spam – up to 10,000 unwanted posts per day.

The typical reactive activities we employed to control spam, which are pretty standard across the industry, couldn’t scale to such a level. Volunteer moderators and Intel employees were spending unsustainable amounts of time and effort watching for and deleting unwanted posts. The community forum platform we use includes some out-of-the-box controls, such as the ability to define certain phrases and users to block. The problem with using filters is that there are an infinite number of phrases and user accounts. You can’t block them all. As soon as we’d block one phrase or one user, the spammers would just slightly modify the phrase (such as using symbols instead of letters, putting spaces between each letter, or misspelling a word), or create a new account. We’d block and the spammers would adjust… rinse and repeat. We simply couldn’t keep up. There had to be a better way.

To complicate matters, we have several separate instances of the community forum platform. Every filter, every user account block, had to be repeated on each instance. Plus, not all rules applied to all instances. For example, in 90 percent of the cases, we should block posts containing the word “casino,” because advertisements for online or real casinos is not appropriate content for Intel forums. But wait, we’re Intel. We have silicon and other products embedded in slot machines. So there may be instances where “casino” actually makes sense in an Intel community forum post.

In fact, Intel technology is potentially involved in every industry. As the Internet of Things evolves and becomes ubiquitous, it becomes harder and harder to define a one-size-fits-all rule to block this or that word or phrase. For example, “appliance repair” used to be a phrase that we would automatically label as spam. But now most appliances are connected devices, and so it becomes increasingly likely that a post about refrigerators or microwave ovens may NOT be spam.

Again, there had to be a better way. A way to apply more intelligence and more proactive methodologies instead of the endless battle we were waging. Fighting spam is not my or the volunteer moderators’ primary job function. For the spammers, it’s their only job – they’re dedicated to finding ways around what we’re doing. It was clear that traditional and reactive manual efforts could not win. We simply didn’t have enough resources.

The answer lay in automation and machine learning.

Intel is already putting the power of automation to work in many areas, such as PC health monitoring and factory processes. Automation has increased efficiency and effectiveness in these areas, and we wanted to gain the same benefits for spam control. The automated solution we developed is described in detail in our recent white paper, “Preventing Spam on Intel Public Community Forums.”

SNACKABLE-Noise.png

Two aspects of our solution are especially effective in helping block spam on Intel’s community forums:

  • Machine learning. Using sophisticated machine-learning techniques, the spam-filtering service blocks unwanted and malicious content automatically. It uses a reputation-based system to monitor user profiles and discern the likelihood of a given source submitting spam.
  • Multilingual analysis. Using text analytics, the spam filter detects harmful content, such as profanity and other spam-related content, in 75 languages.

Attacks dropped off immediately after we implemented the spam-filtering solution in June, and spam levels have remained manageable ever since. Spikes have all but disappeared thanks to the spam-filtering service’s ability to learn.

The word has gotten out about how effective the solution is. Several business groups at Intel, who maintain their own community forums apart from those protected by our automated solution, have initiated conversations about how they might adapt learnings from our project to their own forums.

From our research during our quest for a better solution, we know that Intel is not alone in this – many other large companies with community forums are facing the same problem. Spam has always been a problem and always will be even with our solution – but I’d be happy to share some details of how we implemented the automated solution. I’d also like to hear what other IT professionals are doing to fight spam on community forums. Please join the conversation by leaving a comment below.