Aportación a la extracción de conocimiento aplicada a datos mediante agrupamientos y sistemas difusos

  1. Ojeda Magaña, Benjamin
Supervised by:
  1. Rubén Ruelas Lepe Director
  2. Diego Andina de la Fuente Director

Defence university: Universidad Politécnica de Madrid

Fecha de defensa: 02 November 2010

Committee:
  1. Francisco Javier Montero de Juan Chair
  2. Antonio Alvarez Vellisco Secretary
  3. Juan Bautista Grau Olivé Committee member
  4. Antonio Vega Corona Committee member
  5. Ascensión Gallardo Antolín Committee member

Type: Thesis

Abstract

Abstract In recent years technological advances have led to the generation and collection of large amount of mainly numerical data, and there is a great interest on processing them for extract knowledge and information with the main objective of making systems more efficient where these data were obtained from. Information in a database is found implicit in the values that represent the system different states while knowledge is implicit in relations between the different attribute values or features of the data base. Those relations are identified by groups (internal structure) that must be discovered and that describe relations between input and output states. For this purpose different techniques have been developed, one of which is through partitional clustering algorithms. In this thesis a contribution to knowledge is proposed and information extraction from numerical databases through fuzzy hybrid partitional clustering algorithms. Information is extracted by grouping and characterizing data in typical, atypical and noise, as well as application to image sub-segmentation where a new approach is proposed with interesting characteristics for detecting atypical pixels that could be linked to microcalcifications in order to detect breast cancer, or wood knots for assess its quality, both cases treated on this thesis, or in any other application for industry or health, in example, where it does not matter if pixels to find are in very small quantities. Knowledge is extracted through setting up two fuzzy models of type Takagi-Sugeno that allows automatic characterization and classification of new data. This will gives a system able to produce information about the processed numerical data with these models. On this job we have mainly used the hybrid clustering algorithm PFCM (Possibilistic Fuzzy c-Means) where which we have added an improvement whose algorithm were called GKPFCM (Gustafson-Kessel Possibilistic Fuzzy c-Means) and that allows to find groups with patterns more approximated to natural distributions of the data groups. This is reflected in an unsupervised learning for identification of bananas, ripe and unripe tomatoes also presented in this document. Within major achievements of this thesis development we can cite: Is proposed a new approach for sub-segmentation of digital images based on the clustering algorithm PFCM. The purpose is to identify data sub-groups of interest that could be atypical or typical data while in many applications, particularly in diagnosis, these last are the more interesting ones. In this thesis we show up two applications for real cases. Is improved the PFCM (GKPFCM) algorithm by embodying the Mahalanobis distance because the found groups have a better approximation to the data distribution. Also is proposed a construction of a classifier that makes possible to obtain information automatically from new data by classifying and characterising them as typical, atypical or noise. Classifier is based on two fuzzy models of type Takagi-Sugeno which obtains its parameters from results generated by the GKPFCM algorithm.