The good life is a process, not a state of being. It is a direction not a destination.
I searched a framework fit for every data mining task, I found a good one from an article of Oracle.
And here is my summary. The data mining process has 4 steps:
Step 1. Problem Definition
This initial phase of a data mining project focuses on understanding the project objectives and requirements. Once you have specified the project from a business perspective, you can formulate it as a data mining problem and develop a preliminary implementation plan.
Step 2. Data Gathering & Preparation
The data understanding phase involves data collection and exploration. As you take a closer look at the data, you can determine how well it addresses the business problem. You might decide to remove some of the data or add additional data. This is also the time to identify data quality problems and to scan for patterns in the data.
- Data Access
Data in the real world is dirty . They are often incomplete (lacking attribute values, lacking certain attributes of interest, or containing only aggregate data), noisy (containing errors or outliers), inconsistent (containing discrepancies in codes or names).
Step 3. Model Building
In this phase, you select and apply various modeling techniques and calibrate the parameters to optimal values. If the algorithm requires data transformations, you will need to step back to the previous phase to implement them
- Create Model
Evaluate & Interpret Model
Some important questions :
- Is at least one of predictors useful in predicting the response? (F-statistics)
- Do all the predictors help to explain Y, or is only a subset of the predictors useful? (all subsets or best subsets)
- How well does the model fit the data?
- Given a set of predictor values, what response value should we predict, and how accurate is our prediction?
Step 4. Knowledge Deployment
Knowledge deployment is the use of data mining within a target environment. In the deployment phase, insight and actionable information can be derived from data.
- Model Apply
- Custom Reports
- External Applications
- The Data Mining Process, Oracle
- Trevor Hastie and Rob Tibshirani, Model Selection and Qualitative Predictors, URL:https://www.youtube.com/watch?v=3T6RXmIHbJ4
- Nguyen Hung Son, Data cleaning and Data preprocessing, URL:http://www.mimuw.edu.pl/~son/datamining/DM/4-preprocess.pdf