Adding Structure to the Use of Unstructured Data


Jeremy Clopton, CFE, CPA, ACDA
Senior Managing Consultant, Forensic and Valuation Services

In the age of big data, it should come as no surprise the ACFE’s 2014 Report to the Nations ranks proactive monitoring and analysis of data as the most effective anti-fraud control, with respect to both duration and median loss. What may be surprising is the type of data being monitored and analyzed.

Unstructured data (things like external emails and social media) is becoming a larger portion of the big data pie every year. In a 2005 report, Gartner Research indicated that unstructured data comprised about 80 percent of all available data in an organization. Fast forward a few years, and that percentage is likely much higher. The challenge we face as investigators is how to best use this unstructured data in our investigations. The solution begins with the collaboration between data analytics and digital forensics, as referenced in my post in February on that topic.

As highlighted in a recent article posted on, the Detroit Crime Commission (DCC) has embraced this collaboration as well. The article and accompanying video cover the general framework of how the DCC is using analytics of both structured and unstructured data for fighting crime. While the article is focused on a specific software solution, it contains valuable information about the DCC’s mindset and reasoning behind the use of what they call big data analytics. This information is applicable regardless of software choice, industry or location. Some key conceptual takeaways from the article include:

  • Network analysis and relationship mapping. Using information gathered from online sources, the DCC identifies criminal enterprises and their members, as well as how the various organizations interact. Applying this to occupational fraud, identifying the network and relationship map for key vendors, employees and customers may help in uncovering corruption and kickback schemes.
  • Analyzing both unstructured and structured data. Rather than relying solely on criminal databases and arrest records, DCC uses information from online posts to supplement their structured information and gather intelligence not otherwise available. Applied to occupational fraud, the analysis of email communications, text messages and chat sessions may provide information regarding unknown relationships or activities not identifiable in the transaction detail.
  • Data visualization. The video accompanying this article shows a great example of using data visualization to uncover relationships and “see the data” more quickly than traditional methods. The old saying that “a picture is worth a thousand words” is truer than ever in data analytics. Using data visualization helps identify trends, patterns and relationships not readily identifiable in reading through large volumes of data. This technology can truly help an investigator see the issues in the data.

The application of data analytics in law enforcement is a great example of leveraging big data. The DCC’s success using these concepts underscores the importance of proactive monitoring and analysis of data for fraud detection and prevention. 

Follow Jeremy on Twitter @j313 or at