Data pre-processing

Data cleaning
1. Missing values
  1. Ignore the tuple
  2. Fill in the missing value manually
  3. Use a global constant to fill in the missing value
  4. Use the attribute mean to fill in the missing value
  5. Use the attribute mean for all samples belonging to the same class as the given tuple
  6. Use the most probable value to fill in the missing value
  7. Use the most probable value to fill in the missing value
2. Noisy data
  Anmerkungen:
  - Noise is a random error or variance in a measured variable.
  1. Binning
    1. Regression
  2. Clustering
3. Data cleaning as a process
4. Data integration and transformation
  1. Data Integration
  2. Data Transformation
    1. Smoothing
    2. Aggregation
    3. Generalization
    4. Normalization
    5. Attribute construction
Data reduction
1. Data cube aggregation
2. Attributes subset selection
  1. Stepwise forward selection
  2. Stepwise backward elimination:
  3. Combination of forward selection and backward elimination
  4. Decision tree induction
3. Dimensionality reduction
4. Numerosity reduction
5. Data discretization and concept hierarchy generation
Why Preprocess the Data?
Data Discretization and Concept Hierarchy Generation
1. Discretization and Concept Hierarchy Generation for Numerical Data
2. Concept Hierarchy Generation for Categorical Data
Descriptive Data Summarization
1. Measuring the Central Tendency
2. Measuring the Dispersion of Data
3. Graphic Displays of Basic Descriptive Data Summaries

Zusammenfassung der Ressource

	Erstellt von Saravanakumar vor etwa 9 Jahre