How to Tame ‘Data in the Wild’


Misty Carter, CFE
ACFE Research Specialist

Emails, social media posts, blogs, instant messages — what do they all have in common? For one, they are tools used by millions of people each day to communicate with the rest of the world. What else do they have in common? They help detect fraud. You might be wondering, “Why and how is a Facebook update relevant to fraud detection?” Consider how much new data is created every second. Think about how many posts, emails or text messages you personally send each day. Now think about how much of this data is never touched. 

To put it into perspective, a study conducted by International Data Corporation (IDC), a U.S. market research firm, estimated that text, also known as unstructured data, will account for 90 percent of all data created in the next decade. Unstructured data, sometimes referred to as “data in the wild,” is basically free-form data that has not been put into a structured format. Since unstructured data is a relatively unexploited resource for fraud examiners, it makes sense to use it in a way that can provide more insight into areas prone to fraud that might have been previously untouched.

Before coming to the ACFE, I spent 10 years working in the audit field. I found mining through text data during fraud investigations to be one of the most useful tools in my auditing toolkit. Today, many fraud examiners are using a similar data analysis method to help explain, understand, or interpret a situation or a person’s actions or thoughts. This type of non-traditional analysis is referred to as textual analytics. In fact, the FBI and Ernst and Young’s Fraud Investigation and Dispute Services Practice have used textual analysis on email communications from past corporate investigations to determine the most common words used by employees engaged in rogue trading and fraud. As a result of their analysis, they identified the top 15 keywords and phrases used by fraud perpetrators. This list of keywords can be used proactively to prevent fraud from occurring or spot it early in the process.

The use of keywords, however, is only one facet of analyzing textual data. The ACFE’s new online course, Textual Analytics, identifies various techniques that can help fraud examiners, including examples of how data from free-text fields, email, social media sites and other sources can be used to uncover fraud. This course provides an overview of different types of data and how it should be managed prior to being analyzed. It also explains how textual data can be used to assess fraud risk in areas that might not be on management’s radar. 

If you are looking for new and innovative ways to add value to your organization, this course will provide you with the tools necessary to effectively reduce fraud risk exposure while enhancing your fraud detection skills.

Read more about the new course.