Keywords: Nonparametric inference, functional data, directional data, modal regression, quantile regression, incomplete data, specification tests, set estimation, spatial processes.

Beyond all the technical challenges that the big data era brought about, in terms of data storage and data management capacity, the design and implementation of new approaches to data analysis and inference has inevitably drawn the attention of statistical research groups working on the interface between statistics and machine learning, devoted to the design and implementation of statistical tools for handling high dimensional data.

However, dealing with huge amounts of data is not the only problem that we, as statisticians, must confront in order to effectively extract information from data. Data are not only big, but complex; and complexity arises not only by their nature, but also by the dynamics they follow. Our proposal, Complex dynamics and nonparametric inference (CoDyNP) aims to provide nonparametric tools for understanding data evolution, dependence and relations (dynamics) in non-standard settings (incomplete and aggregated, functional, directional and spatio-temporal data). CoDyNP is structured in four workpackages, whose design enable us to continue exploiting our capacities and know-how in different areas (e.g. functional or directional data analysis) as well as exploring new methodologies and applications contexts (e.g. correlation distance methods, modal regression or network structures).

CoDyNP is organized in four workpackages.

  • The first one considers research lines associated to the complexity of the data nature (and this lines can be obviously combined and interrelated): incomplete (including biased, truncated and/or censored data, as well as missing data; L1.1) and aggregated data; functional and high-dimensional data (L1.2); directional data (with circular, cylindrical and toroidal data as particular/derived cases; L1.3); and spatio-temporal processes (L1.4).
  • The second package, WP2. Nonparametric inference is set up by three research lines devoted to set estimation (L2.1); methods beyond mean regression (L2.2) and correlation distance (L2.3).
  • Within WP3. Software, we aim to update and improve our R packages related to the research lines in WP1 and WP2, namely alphahull and alphahull3d for set estimation; NPCirc for circular data; DOvalidation for incomplete and aggregated data; multimode for multimodality tests, fda.usc for functional data and the more recent (HDiR) for directional high density regions. We also aim to produce at least a package on quantile regression (BwQuant) and another one on ROC comparisons.
  • Regarding WP4. Data application, we will describe some areas of application within ongoing collaborations with other groups (COVID-19, medical imaging, genetics, environmental sciences, ecology, security, linguistics) all of them directly related with areas of intervention in Pillar 2 of Horizon Europe (and subsequently, with the Estrategia Española de Ciencia, Tecnología e Innovación).

This project is a continuation from MTM2016-76969-P (Innpar2D), more details in