Marco Capo joined the Basque Center for Applied Mathematics in 2015 as a PhD student (Predoc Severo Ochoa 2014 grant). He holds a Bachelor Degree in Mathematics, obtained in 2012 from the Universidad Simón Bolívar (Venezuela), and a Master of Science in Mathematical Modeling in Engineering from the Erasmus Mundus MathMods Programme, an international two years master’s program awarding a joint diploma between University of L’Aquila (Italy), University of Nice (France) and University of Hamburg (Germany).
His PhD thesis has been directed by Prof. Jose Antonio Lozano, Scientific Director of BCAM and research line leader of the Machine Learning group in the center, and by Dr. Aritz Pérez, member of that same group.
On behalf of all BCAM members, we would like to wish Marco the best of luck in his upcoming thesis defense.
The K-means algorithm is undoubtedly one of the most popular clustering analysis techniques, due to its easiness in the implementation, straightforward parallelizability and competitive computational complexity, when compared to more sophisticated clustering alternatives. However, the progressive growth that data generation has seen in the last years represents a significant challenge for this algorithm, as its time complexity grows linearly with respect to both the number of instances, n, and dimensionality of the problem, d. This fact hinders its scalability on such massive data sets. Another major downside of the K-means algorithm corresponds to its high dependence on the initial conditions, which not only may affect the quality of the obtained solution, but may also have a major impact on its computational load. In this dissertation, we propose different approximations to the K-means problem that tackle all these difficulties.