Data Mining Process

The good life is a process, not a state of being. It is a direction not a destination.

Carl Rogers

I searched a framework fit for every data mining task, I found a good one from an article of Oracle.

And here is my summary. The data mining process has 4 steps:

Step 1. Problem Definition

This initial phase of a data mining project focuses on understanding the project objectives and requirements. Once you have specified the project from a business perspective, you can formulate it as a data mining problem and develop a preliminary implementation plan.

Step 2. Data Gathering & Preparation

The data understanding phase involves data collection and exploration. As you take a closer look at the data, you can determine how well it addresses the business problem. You might decide to remove some of the data or add additional data. This is also the time to identify data quality problems and to scan for patterns in the data.

  1.  Data Access

  2.  Data Sampling

  3.  Data Transformation

Data in the real world is dirty [3]. They are often incomplete (lacking attribute values, lacking certain attributes of interest, or containing only aggregate data), noisy (containing errors or outliers),‰ inconsistent (containing discrepancies in codes or names).

Step 3. Model Building

In this phase, you select and apply various modeling techniques and calibrate the parameters to optimal values. If the algorithm requires data transformations, you will need to step back to the previous phase to implement them

  1.  Create Model

  2.  Test Model

  3.   Evaluate & Interpret Model

Some important questions [2]:

  • Is at least one of predictors useful in predicting the response? (F-statistics)
  • Do all the predictors help to explain Y, or is only a subset of the predictors useful? (all subsets or best subsets)
  • How well does the model fit the data?
  • Given a set of predictor values, what response value should we predict, and how accurate is our prediction?

Step 4. Knowledge Deployment

Knowledge deployment is the use of data mining within a target environment. In the deployment phase, insight and actionable information can be derived from data.

  1. Model Apply
  2. Custom Reports
  3. External Applications


  1. The Data Mining Process, Oracle
  2. Trevor Hastie and Rob Tibshirani, Model Selection and Qualitative Predictors, URL:
  3. Nguyen Hung Son, Data cleaning and Data preprocessing, URL:

One thought on “Data Mining Process

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s