Knowledge Discovery from Sensor DataMarch 1, 2006 By: Pang-Ninh Tan, Michigan State University Sensors
Extracting useful knowledge from raw sensor data is not a trivial task. Conventional data analysis tools might not be suitable for handling the massive quantity, high dimensionality, and distributed nature of the data. Recent years have seen growing interest in applying data mining techniques to efficiently process large volumes of sensor data .
Data mining, a critical component of the knowledge discovery process (Figure 1), consists of a collection of automated and semi-automated techniques for modeling relationships and uncovering hidden patterns in large data repositories [2, 3]. It draws upon ideas from diverse disciplines such as statistics, machine learning, pattern recognition, database systems, information theory, and artificial intelligence.
Figure 1. The overall process of knowledge discovery from data (KDD) includes data preprocessing, data mining, and postprocessing of the data mining results
Data mining has been successfully used on sensor data in applications as diverse as human activity monitoring , vehicle monitoring , and vibration analysis . This article presents an overview of such techniques and describes some of the technical challenges that must be overcome when mining sensor data.
Before data mining techniques can be applied, the raw data must undergo a series of preprocessing steps to convert them into an appropriate format for subsequent processing. The typical preprocessing steps include:
- 1. Feature extraction—to identify relevant attributes for a data mining task using techniques such as event detection, feature selection, and feature transformation (including normalization and application of Fourier or wavelet transforms)
- 2. Data cleaning—to resolve data quality issues such as noise, outliers, missing values, and miscalibration errors
- 3. Data reduction—to improve the processing time or reduce the variability in data by means of techniques such as statistical sampling and data aggregation
- 4. Dimension reduction—to reduce the number of features presented to a data mining algorithm; principal component analysis (PCA), ISOMAP, and locally linear embedding (LLE) are some examples of linear and nonlinear dimension reduction techniques
Data mining can be broadly classified into four distinct tasks, which will be discussed in the next four sections.
The goal of predictive modeling is to build a model that can be used to predict—based on known examples collected in the past—future values of a target attribute. There are many predictive modeling methods available, including tree-based, rule-based, nearest neighbor, logistic regression, artificial neural networks, graphical methods, and support vector machines . These methods are designed to solve two types of predictive modeling tasks: classification and regression. Classification deals with discrete-valued target attributes; regression, with continuous-valued target attributes. For example, detecting whether a product on the assembly line is good or defective is considered a classification task, while predicting the amount of rainfall in summer is a regression task.