The Spanish Ministry of Economy and Competitiveness (MINECO) has published the call for PhD Students 2017. The call offers up to three PhD Student positions with regard to Severo Ochoa accreditation (SEV-2013-0323) at the Basque Center for Applied Mathematics (BCAM, Bilbao, Spain). The positions at BCAM are in the following areas:
• Core in Applied Mathematics
• Data Science
• Computational Mathematics
For the Data Science PhD Student position at BCAM, applications are welcome in (but not restricted to) the following projects:
• Project 1: Statistical modelling of high-throughput phenotypic data in plant breeding (in collaboration with Biometris, Wageningen University & Research, the Netherlands).
• Project 2: Rare event prediction with machine learning methods
• Project 3: Estimation of Distribution Algorithms for Combinatorial Optimization Problems with Constraints
• Project 4: Data mining of stream time series
Further information:
• The application period is from 3rd to 18th October 2017 (at 15:00h).
• BCAM website: http://www.bcamath.org/3predoctoralseveroochoa2017/
• MINECO website*: click here
• Official announcement *: click here
• Unified register for candidates *: https://sede.micinn.gob.es/rus/
• Submission service *: https://sede.micinn.gob.es/ayudaspredoctorales/
(*) in Spanish
For further scientific information, please contact with the corresponding supervisors. For further administrative information, please do not hesitate to contact Miguel Benítez (This email address is being protected from spambots. You need JavaScript enabled to view it.).
Call for PhD Students 2017 at the Basque Center for Applied Mathematics (BCAM). Severo Ochoa accreditation.
PhD projects on the Data Science Area
Line: Applied Statistics
Project 1: Statistical modelling of high-throughput phenotypic data in plant breeding
Recent developments in phenotyping techniques have created renewed interest in statistical methods for supporting and improving phenotyping strategies. New phenotyping techniques, often high-throughput, use a wide array of equipment to continuously monitor growth and development for large populations of plants in the greenhouse and in the open field. As a consequence, for many plant phenotypes, long series of repeated measures come available between seed emergence and final yield. New statistical methods are thus required that are able to extract the most relevant information from multiple time series data on all aspects of plant growth and development. At this moment there is no standard way of analysing this kind of data. The project aims to develop new spatio-temporal statistical methods for the analysis of time-series of phenotypic data. To that aim, semi-parametric smoothing methods jointly with mixed model techniques will be explored. The project has a strong interdisciplinary nature. It involves (a) research in statistical methodology and computation; (2) knowledge on quantitative genetics; and (3) analysis of data from real agricultural field experiments. The thesis project will be developed in the Group of Applied Statistics (AS) of BCAM (http://www.bcamath.org). Research stays at Biometris (www.biometris.nl) are expected to occur throughout the research period.
PhD Advisors: María Xosé Rodríguez Álvarez (This email address is being protected from spambots. You need JavaScript enabled to view it., http://www.bcamath.org/en/people/mrodriguez) and Fred van Eeuwijk (This email address is being protected from spambots. You need JavaScript enabled to view it., https://www.wur.nl/es/Persons/Fred-van-van-Eeuwijk.htm)
Line: Machine Learning
Project 2: Rare event prediction with machine learning methods
Rare events are ubiquitous in many areas such as nature, engineering or business. These are events that happen with a very low probability. Typical examples are earthquakes, flooding, but also traffic accidents with the result of dead or fatal working accidents. Predicting these kinds of events is highly relevant as they imply a high cost in terms of money of human beings. Unfortunately, predicting these events is very difficult by the own nature of the event (they happen with very low probability). Statistics have developed many methods to deal with rare events, however these methods make strong assumptions on the data that do not need to be real. Machine learning literature has dealt with these rare events from different perspectives: strong-imbalance classification problems, outlier detection, one-class classification, anomaly/novelty detection, etc. This project has as objective to develop new methods and algorithms to solve rare events in the field of machine learning. The developed methods will be put in practice in the solution of a real prediction problem of rare events from data of an insurance company. This practical application implies to work in big data environments. The project will be co-supervised with a researcher of the University of the Basque Country UPV/EHU and in close collaboration with the insurance company.
PhD Advisors: Jose A. Lozano (This email address is being protected from spambots. You need JavaScript enabled to view it., http://www.sc.ehu.es/isg) and Iñaki Inza (This email address is being protected from spambots. You need JavaScript enabled to view it., http://www.sc.ehu.es/isg)
Line: Heuristic Optimization
Project 3: Estimation of Distribution Algorithms for Combinatorial Optimization Problems with Constraints
Combinatorial optimization problems (COPs) are optimization problems characterized for having a finite search space. Classical examples are the travelling salesman problem, the quadratic assignment problem or the knapsack problem. All these problems have in common that they are NP-hard: there is no known polynomial algorithm able to solve all the instances. Therefore the scientific community has developed algorithms able to find good solutions in bounded computational time, they are called metaheuristics algorithms.
Estimation of Distribution Algorithms (EDAs) are a set of metaheuristics algorithms that belong to the Evolutionary Computation field. Contrary to Genetic Algorithms that use genetic operators such as crossover and mutation to generate new individuals, in EDAs a probability distribution is learnt from the selected individuals, which is later sampled to generate new promising individuals. EDAs have been successfully used in the solution of different real problems such as protein folding, flowshop scheduling problem, etc. In spite of that, there are still optimization problems where EDAs have not found outstanding results, one of these areas are COPs with constraints. This is due to the use of inappropriate probability models. Almost of application of EDAs to COPs with constraints learnt probability distribution in unbounded spaces and then modify the individuals at sampling time. In this way the learnt distribution does not account for the information in the selected individuals. In this PhD project we pursue the use of probability distributions for constraint spaces in the area of EDAs. Particularly we plan to use exponential distributions based on distances that can consider bounded spaces. The obtained algorithms will be applied in the solution of academic problems such as graph partitioning or number partitioning and finally to a real problem from a company. The project will be co-supervised with a researcher of the University of the Basque Country UPV/EHU.
PhD Advisors: Jose A. Lozano (This email address is being protected from spambots. You need JavaScript enabled to view it., http://www.sc.ehu.es/isg) and Josu Ceberio (This email address is being protected from spambots. You need JavaScript enabled to view it., http://www.sc.ehu.es/isg)
Line: Machine Learning
Project 4: Data mining of stream time series
Time series and stream data have been largely worked in the last decade, however this is not the case of stream time series. Time series data streams can be considered as a non-stop data sequence. The most paradigmatic example is energy consumption: current smart-meters provide a measure of the energy consumption with a high frequency. Contrary to most of the analysis of time series where the objective is to predict the next value in the series, our goal is to carry out data mining task in the time series stream. For instance, we are interested in developing new and efficient algorithms to carry out clustering and supervised classification tasks in this type of data. The real application of these algorithms is varied: from detecting fraud in the use of electricity to anticipating to diseases in patients with heart problems. We will apply the developed algorithms in some real problem from a company.
PhD Advisors: Jose A. Lozano (This email address is being protected from spambots. You need JavaScript enabled to view it., http://www.sc.ehu.es/isg) and Aritz Perez (This email address is being protected from spambots. You need JavaScript enabled to view it., http://www.bcamath.org/en/people/aperez)