Data-mining has become a buzzword that includes the whole process of data treatment (data collection, preparation, data analysis, and application), and pattern modelling techniques mostly with the goal of making predictions.
The Gartner Group as “the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques.”
Data Mining Tool
Data mining uses software to test an entire population and identify the hidden patterns in a given population. For simple tasks, such as sampling on a few thousand records, these steps suppose a few hours or days of work that can be done with a desktop tool. Yet, for complex tasks, data mining becomes a full scale project. The intuitive steps become phases, server based tools such as ETL and data warehouse are needed, as well as specialized support for them. Technical requirements for data mining can be summarized as follows:
- Exploratory data analysis (EDA) tools are used by the business specialists to explore the data. They make extensive use of visual analysis.
- Extract Transform and Load tools (ETL) are to format the data into data useable for the DA. ETL tools are a standard component of data warehouse products.
- Database (data warehouse) to store the data for analysis.
- Data Analysis interactive tool, used by the business specialists to do the analysis and report the results.
Data Mining Process
To be successful, data mining should be implemented as a business process and should have clearly defined business objectives and deployment plans (i.e. answer relevant business questions), and be performed by people with extensive business understanding. It should consist of the following steps, where the first and the last belong to the business, while the rest are more technical:
- Formulate questions.
- Choose analysis methods.
- Prepare the data to apply the methods.
- Apply the methods to the data.
- Interpret and evaluate the results obtained.
The “full scale data mining” projects requires good organization. The most well-known DM methodology is Cross-Industry Standard Process for Data Mining (CRISP-DM), which was developed by Daimler-Chrysler and SPSS. According to this methodology data mining phases and tasks are as follows:
- Business understanding
- Data understanding
- Data preparation
Data Mining and Big Data
Data mining and big data are interrelated terms. Since the volume, variety, veracity, value, and velocity of data is at sky-rocketing, data mining becomes integral part of dealing with big data of enterprises.