PROCESS OF DISCOVERING INTERESTING PATTERN AND KNOWLEDGE FROM LARGE AMOUNTS OF DATA
Statistics studies the collection, analysis, interpretation or explanation, and presentation
Machinelearning investigates how computers can learn (or improve their performance)
based on data
HIGH PERFORMANCE COMPUTING
PATTERNS CAN BE MINED
DISCRIMINATION: COMPARISON OF FEATURES OF ONE CLASS DATA OBJETC AGAINST GENERAL FEATURES OF OBJECTS FROM ONE OR MULTIPLE CLASS OBJECTS
summarizing the data of the class under study (often
called the target class) in general terms
There are many kinds of frequent patterns, including frequent itemsets, frequent subsequences (also known as sequential patterns), and frequent substructures.
Classification is the process of finding a model (or function) that describes and distinguishes data classes or concepts.
Regression analysis is astatistical methodology that is most often used for numeric prediction,
CLUSTERING ANALYSIS AND
Unlike classification and regression, which analyze class-labeled (training) data sets,
clustering analyzes data objects without consulting class labels.
DATA CAN BE MINED
ISSUES OF DATA MINING RESEARCH
EFFICIENCY AND SCALABILITY
DIVERSITY OF DATA TYPES
DATA MINING AND SOCIETY
(1) easily understood byhumans, (2) valid on new or test data with some degree of certainty, (3) potentiallyuseful, and(4) novel. A pattern is also interesting if it validates a hypothesis that the user sought to confirm.