统计

2018-02-24  本文已影响0人  薛定喵喵喵

CH1 Data mining

Major data mining tasks

  1. Classication and regression

    • Classication predicts categorical attribute values;
    • regression predicts numerical attribute values
  2. Cluster analysis

Given a set of objects, each having a set of attributes, and a
similarity measure among them, nd clusters (i.e., groups) such
that

  1. Association analysis

Given a transactional database, nd the sets of objects that
frequently appear within the same transactions
also called frequent pattern mining

Various data repositories

CH2a Data preprocessing

-noisy
-inconsistent
-redundant

Data preprocessing tasks

  1. Data cleaning
    fill in missing values
    e.g., Occupation="
    smooth out noise, containing errors or outliers
    faulty data collection instruments
    human or computer error at data entry
    errors in data transmission

    outlier: usually, a value higher/lower than 1.5 x IQR
    e.g., Salary = -10"
    correct inconsistencies in the data
    e.g., Age = \42", Birthday = \03/07/2010"
    e.g., discrepancy between duplicate records

Given N tuples, are numerical attributes A and B correlated?


图片.png
  1. Data integration
    Data integration combines data from multiple sources into a coherent data store

Entity identification problem
Do two objects from different data sources refer to the same entity?
Example Is the record that has customer id = 234 (from one source) equivalent to that where cust num = 234 (from the other source)?
Metadata can help e.g., for each attribute, look at the name, meaning, data type, range of values permitted, etc

data value conflicts
For the same entity, attribute values from different sources may differ e.g., weight measured in kilograms or pounds

data redundancy

  1. Data transformation
    (Goal: modify the data in order to improve data mining performance)
  2. Data reduction

attribute/feature construction

normalization: scaled to fall within a smaller, specied range

min-max normalization

z-score normalization

Data reduction

上一篇 下一篇

猜你喜欢

热点阅读