Tutor HuntResources Computing Resources

Data Mining Techniques

What are various procedures for obtaining the right data?

Date : 06/11/2015

Author Information

Rabia

Uploaded by : Rabia
Uploaded on : 06/11/2015
Subject : Computing

Data mining different procedures are used for obtaining the useful information e.g. data pre-processing, frequent pattern mining, association rule mining, classification, regression and clustering. By using data pre-processing techniques, noise can be remove from data as real world data is generally Incomplete e.g. lacking attribute values, lacking certain attributes of interest, or containing only aggregate data: Noisy: containing errors or outliers or Inconsistent as containing discrepancies in codes or names. By applying data pre-processing cleaning technique we can fill in missing values, can smooth noisy data, can identify or remove outliers and can resolve inconsistencies. Data reduction can help in reducing the volume but produce same analytical results. Data discretization can replace numeric values into nominal values. Frequent pattern mining Pattern mining consists of using/developing data mining algorithms to discover interesting, unexpected and useful patterns in databases. Pattern mining algorithms can be designed to discover various types of patterns: subgraphs, associations, indirect associations, trends, periodic patterns, rules, lattices, sequential patterns, etc. Whereas an interesting pattern is such pattern that appears frequently in a database. These can also be rare patterns, patterns with a high confidence, the top patterns, etc. Association rule Mining: Association rules are if/then statements that help uncover relationships between seemingly unrelated data in given data set as here in supermarket transactions an example of association rule would be "if customer is single then it is 80% likely that sold units are 1". Association rules are created by analyzing data for frequent if/then patterns and using the criteria support and confidence to identify the most important relationships. Support is an indication of how frequently the items appear in the dataset. Confidence indicates the number of times the if/then statements have been found to be true. Classification & Regression: Regression is used to model relationships between predictors and targets, and the targets could be continuous or categorical. Categorical target variables are modelled in classification. Clustering: Cluster is a group of objects that belongs to the same class. In other words, similar objects are grouped in one cluster and dissimilar objects are grouped in another cluster. Clustering can also help marketers discover distinct groups in their customer base. And they can characterize their customer groups based on the purchasing patterns.

This resource was uploaded by: Rabia

Other articles by this author