Last updated at Mon, 23 May 2022 19:38:40 GMT
A new type of threat that has been on the rise over the past few years is the use of fake social media profiles for cyber scams. These profiles usually try to impersonate someone or something a user usually trusts, such as big-name brands and companies. Fake profiles can be used in a variety of malicious ways, like luring users to phishing sites, posing as customer service, spreading false information, or harvesting personal data for future attacks. No matter your industry or type of customer, these social media scams can be incredibly damaging to your brand, so it’s important to monitor for fake accounts that try to impersonate your organization or employees.
Here is how Rapid7 has automated the process of identifying and taking down fake social media profiles to help companies proactively identify these scams.
Identifying malicious profiles: challenges and solutions
As part of the brand protection services we offer at Rapid7, we search for, identify, and take down fake social media profiles that might be harmful to an organization’s brands and customers. Given the number of new accounts created each day across a multitude of social media platforms, it's impossible to do this work manually.
So, we created an algorithm to automatically search for and identify an actual malicious account in a way that does not disrupt service for the social platform. As you may imagine, this process is much easier said than done, as we need to be able to distinguish between a real account, a fake account, and an account being used for malicious activity against a specific company or brand.
Creating a sufficient test dataset
The Challenge: In order to write an algorithm that accurately detects malicious accounts, we had to create a large enough dataset of profiles to test it on. We wanted to create an algorithm that didn’t just identify suspicious activity and potentially fake accounts but could actually distinguish a fake profile that is being used to maliciously harm a company or brand. Therefore, the dataset had to contain legitimate profiles, irrelevant profiles and profiles that are fake with varying reasons as to why.
The Solution: To create the right dataset, our analyst team started combing various social media sites for fake profiles. However, we struggled to find enough fake profiles manually to create a good enough dataset. So, we started creating synthetic data to help us build the right dataset to test and optimize our algorithm.
Distinguishing company names from individual names
The Challenge: Some companies have brands which are the same as or similar to the names of people. Therefore, a lot of profiles come up when we search for these brand names. So, determining which accounts might be malicious and which are just a person that happens to be named like the brand is difficult. We had to develop an algorithm that knows if the profile name is just a person's name or if it's masquerading as another brand. This is even more difficult when the brand name contains more than one word.
The Solution: Using Natural Language Processing (NLP), we analyze the content of the profile’s posts to gauge if they are written by a private person or by an organization. We also use dictionaries of common names in several languages to determine if the name in question is a person’s name or if it is part of a name that is maliciously trying to appear like or associate with a known brand.
Spotting abnormal behavior
The Challenge: When going over profiles, it is important to look for behavior that would be classified as ‘odd’ for an official brands page. A company's official social media page will behave in a certain way and will have expected posting tendencies and habits (more or less). Fake profile pages try to imitate those habits to appear trustworthy and legitimate. Obviously, spotting these differences is not an exact science, so we needed to develop an algorithm that could distinguish “normal” company social media activity versus “abnormal” to help determine if it was a malicious page.
The Solution: To check for abnormal behavior, we take the profile in question and run a comprehensive list of tests using certain parameters and patterns, which most users don’t usually check when looking at a profile. For example, we’ll evaluate the number of posts on the page relative to the time the profile exists. If there are a large number of posts in a short lifetime of the page, it’s usually a sign they’ve been posted to appear legitimate and mirror a page that has been around for a while. Some other parameters we evaluate are the page’s creation date, its number of followers, the types of links they share, the domains they forward to and many others. Our algorithm then uses these parameters to make a determination if the activity is legitimate or potentially malicious.
Avoiding disruptions to social media sites
The Challenge: Searching across so many profiles in such a short amount of time can seem like odd behavior for a server, and even disrupt a site’s speed and performance. We don’t want our algorithm to degrade performance for social media sites, so we needed to build it in a way that searched effectively without causing issues to the site.
The Solution: To solve this, we use a session manager to create requests from different sessions to balance the workload and to not appear malicious ourselves. We do all this while also using the same manager to supervise the amount and frequency of the requests so that we don’t slow down or harm the platform we are searching.
Distinguishing logos vs. people
The Challenge: After determining that a profile is fake, one of the indicators that it may be used for malicious purposes is the profile picture. If the profile picture is a company logo, it usually means the page is trying to imitate that company’s social media presence. Identifying pictures automatically is not an easy task and had to be incorporated into our algorithm to help us identify malicious profiles.
The Solution: To evaluate profile pictures, we use deep learning algorithms to identify if the profile picture is a logo or a photograph of a person. These algorithms were developed in-house using frameworks such as Keras and Google’s machine learning utilities. If the logo resembles a company's or brand's, we can say with a pretty high degree of certainty that the profile is indeed a malicious profile.
Conclusions and recommendations
After a lot of fine-tuning through machine learning, testing, and expanding our dataset, we developed a fully automatic algorithm to spot fake profiles across the clear and dark web. But the work doesn’t end there. We are constantly adjusting our algorithm and process to account for new hacker Tools, Tactics and Procedures (TTPs) to make sure we are effectively identifying malicious profiles. Currently, our algorithm works on many of the biggest and most popular social media platforms around the world, enabling us to expose and take down thousands of fake profiles each month.
Despite our efforts, many fake profiles still exist and are constantly being created. This new, emerging threat is becoming more and more relevant and we advise users to educate themselves and be cautious before they interact with a social media page. Looking for things like number of followers, frequency of posts, creation dates, language used, domains that are linked from the page and other small details are a great way to keep yourself safe. In addition, security teams should have a process in place to monitor for fake social media accounts that impersonate their company brand or employees to avoid both financial and reputation damages.