Cartographical aspects of data mining to gain knowledge from the Agricultural and National Population and Housing Census

Anna Fiedukowicz
Warsaw University of Technology
Faculty of Geodesy and Cartography
Department of Cartography
Poland

Jędrzej Gąsiorowski
Institute of Geodesy and Cartography, Warsaw
Poland

Abstract

In the face of ubiquitous data availability, it becomes a challenge to process data in such a way that allows to gain useful knowledge based on the analysis of source information. The aim of the authors was to discuss the use of advanced spatial data mining techniques to data collected by the Central Statistical Office interviewers in two censuses: Agricultural Census and National Census of Population and Housing and of data enrichment. Using this approach, which is a modern equivalent of the cartographic research method, allows not only to discover spatial patterns and regularities, but above all to reveal some knowledge contained in the database. Taking into account the scope and level of detail (the lowest available level of aggregation by the Central Statistical Office are communes) in the data obtained in the two censuses a number of relationships between data may be expected – both intuitive, requiring only statistical confirmation and cartographic visualization, as well as more complex and "hidden" in the data. Identification, analysis and visualization of these dependencies will allow to gain additional knowledge that can be used to develop national spatial planning policy.
The authors presented proposals of either statistical analyses or cartographic presentation of the results of analyses, which may be useful in achieving objectives set by the statistical geoportal. The article describes two examples of such analyses. The first one is based on multiple regression analysis taking into account the neighborhood relationships. The model describing the relationships between variables gathered for the administrative units was constructed in the result of the analysis. The second example described in the article is a cluster analysis performed by the k-means algorithm. This method was used for statistical classification of administrative units allowing to extract homogeneous groups with regard to multi-factor similarity determined in a non-metric feature space.

Keywords:

statistical data; data mining; geostatistics portal

Full Text:

PDF (Polish)

References

Fiedukowicz A., Gąsiorowski J., Kowalski P. J., Olszewski R., Pillich-Kolipińska A., 2012: The statistical geoportal and the cartographic “added value”– creation of the spatial knowledge infrastructure. Geodesy and Cartography, Vol. 61, No. 1, zaakceptowany w redakcji.

Hartigan J. A., Wong M. A., 1979: A K-Means Clustering Algorithm. Applied Statistics Vol. 28, No. 1, 100-108.

Iwaniak A., 2011: Inteligentny geoportal, III Konferencja z cyklu „Wolne oprogramowanie w geoinformatyce", Wroclaw.

Kantardzic M., 2003: Data mining: Concepts, Models, Methods and Algoritms. John Wiley & Sons, New York.

Kopczewska K., Kopczewski T., Wójcik P., 2009: Metody ilościowe w R. Aplikacje ekonomiczne i finansowe, CeDeWu.pl, Warszawa.

Koronacki J., Ćwik J., 2008: Statystyczne systemy uczące się. Akademicka Oficyna Wydawnicza EXIT, Warszawa.

Tibshirani R., Walther G., 2005: Cluster Validation by Prediction Strength. Journal of Computational and Graphical Statistics, Vol. 14, Issue 3, 511-528.

Witkowski B., 2010: Zastosowanie metod ekonometrii przestrzennej. Prace Instytutu Ekonomii, Szkoła Główna Handlowa, Kolegium Analiz Ekonomicznych.