Sybil attacks are named after a fictional character with dissociative identity disorder. Sybil Attacks are attacks against the reputation of online social networks by proliferation of fake profiles using false identities. Fake profiles have become a persistent and growing menace in online social networks. As businesses and individuals embrace social networks the line between physical and online world is getting blurred. Hence it is critical to detect, prevent and contain fake accounts in online communities. This article looks at the specific dangers caused by fake profiles and solutions to detect and prevent them.
Fake Accounts & the Problems
The root cause of Fake accounts is the popularity of the open systems such as Facebook, Twitter and Linkedin. Identities have become porous, instant and temporary leading to easy creation of fake profiles. Fake accounts can be few types :
- Accounts created using fake identities.
- Accounts created using stolen identities.
- Compromised accounts.
Both are serious issues and can break trustworthiness of online communities.
Trust of online communities is broken by,
- manipulating the reputations of businesses, individuals, entities, using paid fake accounts and fake voting, reviews.
- Adversely affect the trends, news by spread of false information and spam.
- Act as anonymous front for harassment and ransom.
Fake accounts have not been limited to OSN (Online Social Networks) alone of course but also affect all forms of online open identities such as crypto currency wallets, emails and phone numbers.
The problem can be looked at two ways;
- Preventive approach which relies on making the signup process closed linked to a robust real life identity. (Closed Systems)
- Detection of fake profiles after the signup.(Open Systems)
The first one is harder to implement as many business models depend on more and more people signing up. So ease of signup is number one priority. There is also the aspect of privacy that takes the precedence over detection of fake accounts. So many open systems such as FB, Twitter and Linkedin completely do away with any form of verification of identification.
The more pragmatic solution is to figure out methods of detecting and blocking fake accounts after the signup.
Some networks rely on wisdom of crowd or the action of aggrieved party to flag down the fake or problematic account. While it has some success in cases of standalone fake accounts, it isn’t effective against clusters of fake accounts as well as automated sybil attacks.
Another approach would be the use set of behavioural thumb rules to determine who to let in and keep. For example a person who is a friend of trusted person is considered trust worthy. The accounts are also monitored for, frequency of posts, types of posts, type and frequency of interactions, devices & IP addresses from which they login, time of activity and many such parameters. But as social spheres grow and people start adding people who aren’t part of their physical circles this becomes harder to manage and rely upon. These solutions do not account for stolen and compromised identities as well.
So more evolved solutions rely on the use of artificial intelligence to recognise fake account patterns. The standard procedure for AI (machine learning) based solution is as follows.
- Collection of data with manually (or otherwise) tagged known fake accounts.
- Training models to to learn the complex patterns and rules.
- Automation to enforce the rules.
Machine Learning Classifiers
Training the machine to learn is the most critical point of any AI based system. It requires thorough understanding of the domain, the datasets and the inter relation of the datasets. Based on this the right type of classier is chosen and implemented. Some of the most commonly used classifiers in the context of Fake profile detection are as below.
- Naive Bayes Classification
- Decision Tree Classification
- Support Vector Machine
- Logistic Regression
These classifiers are only the starting point and to improve the accuracy, it is better to try to different classifiers, vary the parameters and compare against known data.
So availability of known diverse data is equally important in designing a detection and prevention system. One such dataset is available here (https://www.kaggle.com/bitandatom/social-network-fake-account-dataset). In order to increase the accuracy it is better to get the data in the context of targeted geography and demographies
Further Reading and References
Detecting Clusters of Fake Accounts : http://theory.stanford.edu/~dfreeman/papers/clustering.pdf
Detecting Anomalous social Behaviour : https://people.mpi-sws.org/~gummadi/papers/anomalous-socialbehavior.pdf
Detecting Compromised Accounts on Social Networks :http://www.cs.ucsb.edu/~vigna/publications/2013_NDSS_compa.pdf
Uncovering Large Group of Malicious Accounts : https://users.cs.duke.edu/~qiangcao/publications/synchrotrap.pdf
Advagato Trust Metric : http://www.advogato.org/trust-metric.html