Zusammenfassung der Ressource
EDA
- Data Granularity - Levels in the data. ie Time,
Years, Months, Weeks, Days, Hours
- Consistency -
Dates
01/01/2000 or
1/1/00
- Corruption and Accuracy -
System generated problems /
Human errors / Out of date
- Data Duplication
- Missing Data
- SOLUTIONS
- capitalisation (transform all)
- Combine or concatenations of variables
- Careful use of fomats
- Removals of
unwanted characters
- Exclusion
- consistent units
- Add system checks
- Reduce the variable types
- Data Types / Model Roles
Anlagen:
- Categorise Data
- Discrete Data
- Gender
- Make of car
- Number of cars
- Data that can only take certain values.
- Continuous Data
- Data that can take any
value (within a range)
- Bank balances
- Measurements
- Dates
- Data Levels
Anlagen: