Data Mining for Compliance

Home
Data Mining Techniques
Applied Statistics
Predictive Models

 

 

 

 

Data mining is the exploration and analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules.  In a broader context, data mining is also referred to as a knowledge discovery process or KDP, where KDP refers to all the steps involved in creating deployable data mining models.  While there are many other definitions of data mining, data miners essentially help in searching for useful patterns in large quantities of data. 

The goal of data mining is to leverage meaningful data patterns to improve business practices.  Specifically, in the case of compliance management, this involves discovering “knowledge” that enables an organization to improve target selection and audit resource management.  An environment that is suitable for data mining has these characteristics:

bullet

There are extremely large sources of data

bullet

These sources contain knowledge that is non-obvious

bullet

Once discovered, this non-obvious knowledge has strong potential in decision making and improving the overall business processes

bullet

The task of discovering this potential knowledge cannot be done manually

These data sources contain hundreds of attributes that would have to be analyzed in order to model and predict compliance.  Given the scope of the data and complexity of the attributes, methods and technologies are needed to enable automated extraction of useful knowledge from the data.  Data mining provides the means to discover the strategic and decision-enabling information hidden within these large databases. 

Data mining analysis has a unique aspect about it: it enables data exploration and analysis without any specific hypothesis in mind. This sometimes distinguishes it from traditional statistical analysis, where experiments are designed on the basis of a particular hypothesis.  This lends a strong exploratory flavor to any data-mining endeavor.  However, some structure must be followed in any data-mining project in order to achieve success.  A typical data-mining endeavor encompasses a set of phases.  This includes steps such as selecting the right information sources to mine, gathering insights into the data, applying suitable preprocessing and transformations on the raw data, building predictive data mining models, interpreting and validating the results, and finally deploying the models.  Building effective data mining solutions require a systematic design approach that includes many aspects in the overall process.