online read us now
Paper details
Number 2 - June 2016
Volume 26 - 2016
Data mining methods for prediction of air pollution
Krzysztof Siwek, Stanisław Osowski
Abstract
The paper discusses methods of data mining for prediction of air pollution. Two tasks in such a problem are important:
generation and selection of the prognostic features, and the final prognostic system of the pollution for the next day. An
advanced set of features, created on the basis of the atmospheric parameters, is proposed. This set is subject to analysis and
selection of the most important features from the prediction point of view. Two methods of feature selection are compared.
One applies a genetic algorithm (a global approach), and the other—a linear method of stepwise fit (a locally optimized
approach). On the basis of such analysis, two sets of the most predictive features are selected. These sets take part in
prediction of the atmospheric pollutants PM10, SO2, NO2 and O3. Two approaches to prediction are compared. In the
first one, the features selected are directly applied to the random forest (RF), which forms an ensemble of decision trees.
In the second case, intermediate predictors built on the basis of neural networks (the multilayer perceptron, the radial basis
function and the support vector machine) are used. They create an ensemble integrated into the final prognosis. The paper
shows that preselection of the most important features, cooperating with an ensemble of predictors, allows increasing the
forecasting accuracy of atmospheric pollution in a significant way.
Keywords
computational intelligence, feature selection, neural networks, random forest, air pollution forecasting