Supervised classification and network location problems via mathematical optimization (estudio de problemas de clasificación supervisada y de localización en redes mediante optimización matemática)

  1. Baldomero Naranjo, Marta
Dirigida por:
  1. Antonio Manuel Rodríguez Chía Director/a

Universidad de defensa: Universidad de Cádiz

Fecha de defensa: 16 de noviembre de 2021

Tribunal:
  1. Alfredo Marín Pérez Presidente/a
  2. Inmaculada Espejo Miranda Secretario/a
  3. Joerg Kalcsics Vocal

Tipo: Tesis

Teseo: 690743 DIALNET lock_openTESEO editor

Resumen

This PhD dissertation addresses several problems in the fields of Supervised Classification and Location Theory using tools and techniques coming from Mathematical Optimization. A brief description of these problems and the methodologies proposed for their analysis and resolution is given below. In the first chapter, the principles of Supervised Classification and Location Theory are discussed in detail, emphasizing the topics studied in this thesis. The following two chapters discuss Supervised Classification problems. In particular, Chapter 2 proposes exact solution approaches for various models of Support Vector Machines (SVM) with ramp loss, a well-known classification method that limits the influence of outliers. The resulting models are analyzed to obtain initial bounds of the big M parameters included in the formulation. Then, solution approaches based on three strategies for obtaining tighter values of the big M parameters are proposed. Two of them require solving a sequence of continuous optimization problems, while the third uses the Lagrangian relaxation. The derived resolution methods are valid for the l1-norm and l2-norm ramp loss formulations. They are tested and compared with existing solution methods in simulated and real-life datasets, showing the efficiency of the developed methodology. Chapter 3 presents a new SVM-based classifier that simultaneously deals with the limitation of the influence of outliers and feature selection. The influence of outliers is taken under control using the ramp loss margin error criterion, while the feature selection process is carried out including a new family of binary variables and several constraints. The resulting model is formulated as a mixed-integer program with big M parameters. The characteristics of the model are analyzed and two different solution approaches (exact and heuristic) are proposed. The performance of the obtained classifier is compared with several classical ones in different datasets. The next two chapters deal with location problems, in particular, two variants of the Maximal Covering Location Problem (MCLP) in networks. These variants respond to the modeling of two different scenarios, with and without uncertainty in the input data. First, Chapter 4 presents the upgrading version of MCLP with edge length modifications on networks. This problem aims at locating p facilities on the nodes (of the network) so as to maximize coverage, considering that the length of the edges can be reduced within a budget. Hence, we have to decide on: the optimal location of p facilities and the optimal edge length reductions. To solve it, we propose three different mixed-integer formulations and a preprocessing phase for fixing variables and removing some constraints. Moreover, we analyze the characteristics of these formulations to strengthen them by proposing valid inequalities. Finally, we compare the three formulations and their corresponding improvements by testing their performance over different datasets. The following chapter, Chapter 5, also considers a MCLP, albeit from the perspective of uncertainty. In particular, this chapter addresses a version of the single-facility MCLP on a network where the demand is distributed along the edges and uncertain with only a known interval estimation. We propose a minmax regret model where the service facility can be located anywhere along the network. Furthermore, we present two polynomial algorithms for finding the location that minimizes the maximal regret assuming that the demand realization is an unknown constant or linear function on each edge. We also include two illustrative examples as well as a computational study to show the potential of the proposed methodology. This PhD dissertation ends with the conclusions of the research carried out and the presentation of future lines of work.