Deep Medicine

I am a machine learning scientist at the George Institute for Global Health. I got my PhD from the Department of Automatic Control and Systems Engineering at The University of Sheffield in the United Kingdom in 2017 focusing on Machine Learning and Data Mining for Environmental Systems Modelling and Analysis. I have done research in Fluid Dynamics, Simultaneous Localization and Mapping, Wireless Sensor Networks in Cognitive Radio Regimes, and System Identification using NARX models. I received my MSc in Applied Mathematics and Computational Science from the King Abdullah University of Science and Technology (KAUST) in the Kingdom of Saudi Arabia in 2011, and I received my BSc in Mechatronics Engineering with Summa Cum Laude from the Instituto Tecnológico y de Estudios Superiores de Monterrey (ITESM) in Mexico City in 2008.
My research interests include:

Predictive modelling
Statistical, machine and deep learning
Nonlinear system identification
Data mining
Data visualisation

Publication

Ayala Solares, J. R., Data Mining and Machine Learning for Environmental Systems Modelling and Analysis, PhD thesis, University of Sheffield, 2017, Available here
Ayala Solares, J. R., H. L. Wei and S. A. Billings, A novel logistic-NARX model as a classifier for dynamic binary classification, Neural Computing and Applications, 2017
Ayala Solares, J. R., H. L. Wei, R. J. Boynton, S. N. Walker and S. A. Billings, Modelling and prediction of global magnetic disturbances in near-Earth space: A case study for Kp index using NARX models, Space Weather, 2016
Ayala Solares, J. R. and Wei, H. L., Nonlinear model structure detection and parameter estimation using a novel bagging method based on distance correlation metric, Nonlinear Dynamics, 2015
Ayala Solares, J. R. and Wei, H. L., A New Distance Correlation Metric and Bagging Method for NARX Model Estimation, The University of Sheffield Engineering Symposium Conference Proceedings Vol. 1, 2014
Ayala Solares, J. R., Zouheir Rezki, M-S. Alouini, Optimal power allocation of a single transmitter-multiple receivers channel in a cognitive sensor network, Wireless Communications in Unusual and Confined Areas (ICWCUCA), 2012 International Conference on. IEEE, 2012
Ayala Solares, J. R., Zouheir Rezki, M-S. Alouini, Optimal power allocation of a sensor node under different rate constraints, IEEE International Conference on Communications (ICC), 2012
Ayala Solares, J. R., Optimal Power Allocation of a Wireless Sensor Node under Different Rate Constraints, Master Thesis, King Abdullah University of Science and Technology, 2011, Available here

Projects

WiFi GraphSLAM in 1-D (2011)

Importance: In mobile robotics, a very active area of research in recent years deals with the capacity of a robot to build a map of the environment and to simultaneoulsy localize itself within this map in absence of external referencing systems such as GPS. This scenario is the so-called Simultaneous Localization And Mapping (SLAM) problem. Solving the SLAM problem consists of estimating the robot trajectory and the map of the environment as the robot moves in it.
Goals: Nowadays there is a widespread deployment of wireless sensor networks which provide the opportunity for localization and mapping using only signal-strength measurements. The goal is to use indoor WiFi signals to perform SLAM in a one-dimensional environment.
Outcome: This project implemented a WiFi GraphSLAM algorithm for localization and mapping in a one-dimensional simulated environment using real WiFi-intensity measurements.
Methods and techniques: GraphSLAM algorithm
Future work: Extensions on the actual work are vast and interesting. Further analysis is required to consider more realistic scenarios (2D, 3D, fading, shadowing, ...).

Optimal Power Allocation of a Sensor Node under different Rate Constraints (2010-2011)

Importance: The energy consumption, transmission power, memory and computational speed of the devices have a direct influence in the viability and cost of wireless sensor networks (WSNs). Assuming that we have some sensors located in an isolated area and each one of these devices is very distant from the others, we would like to keep the sensors operational with as low maintenance as possible. Among other design criteria of sensor devices, the battery life-cycle is of crucial interest.
Goals: Considering the challenge of efficient spectrum utilization in order to satisfy the increasing demand for wireless communication systems, the project addresses the problem of minimizing the transmit power of a sensor node while satisfying different transmission rate constraint, i.e. an instantaneous or an average transmission rate constraint, along with an interference constraint.
Outcome: For each case constraint, a closed-form solution for the optimal power allocation was derived for a class of fading channels, along with insightful asymptotical analysis. In all cases, numerical results were provided for either Rayleigh or Nakagami-m fading channels.
Methods and techniques: Optimization, MATLAB simulation.
Future work: Extensions on this work include the study and analysis of different types of channels: Single Transmitters - Multiple Receivers Channel, Multiple Transmitters - Single Receiver Channel, and Multiple Transmitters - Multiple Receivers Channel.

Machine Learning and Data Mining for Environmental Systems Modelling and Analysis (2013-2017)

Importance: The analysis of environmental systems is important because it can improve decision making for the development, implementation and maintenance of environmental protection policies, or control design for systems that interact with environmental variables.
Goals: This project focuses mainly on environmental scenarios where limited data are available (as opposed to big data problems with mega- or gigabytes of data). Traditional machine learning algorithms can handle such scenarios, although most of them have difficulties to handle time-variant information, and are unable to provide a good understanding of the inner dynamics of a system, which is usually of great interest in environmental problems. To overcome such issues, in this project the Nonlinear AutoRegressive with eXogenous inputs (NARX) methodology is applied and further extended to provide interpretability of nonlinear dependencies, uncertainty analysis, and to handle both continuous and categorical data, which are common in environmental systems.
Outcome: For the first time, a package in the R programming language is developed as a tool to help in the training of NARX models. Two new major components are added to the NARX methodology. The first one combines the distance correlation metric, which can provide interpretability of nonlinear dependencies, and the bagging method, which can provide an uncertainty analysis, to extend the deterministic notion of the Orthogonal Forward Regression algorithm. The second major component improves the NARX methodology in order to handle binary outputs. All these improvements are applied in two case studies. The first one analyses the Atlantic Meridional Overturning Circulation (AMOC). The second case scenario focuses on the modelling of global magnetic disturbances in near-Earth space using the Kp index.
Methods and techniques: NARX models, time series analysis, cross-validation
Future work: Identify how much information from the past should be included in a NARX model. Extend the NARX methodology to handle not only binary outputs but also multi class outputs. Investigate possible alternatives to combine the NARX methodology with Bayesian methods. Combine the NARX methodology with Deep Learning techniques, to handle big data problems.

R Package for Supervised Learning with Artificial Hydrocarbon Networks (2015-Now)

Importance: Artificial hydrocarbon networks (AHNs) is a supervised learning method inspired on the structure and the inner chemical mechanisms of organic compounds. It aims to package information, from a set of instances and in a hierarchical approach, loosely inspired on the way elements and molecules interact to produce organic, carbon-based networks. As a result, this method performs a supervised learning task that can solve regression and classification problems. Moreover, this method can handle uncertain and imprecise information typically found in real-world, such as: sales prediction, forecasting, signal processing, control systems, and others.
Goals: To provide an accessible and easy-to-use tool to implement artificial hydrocarbon networks, with functions that facilitate their creation, training and testing.
Outcome: The development of the first R package for AHNs.
Methods and techniques: AHNs, R programming
Future work: There are a couple of issues that require further research. The first one is how to select the number of molecules. This hyperparameter is important because it defines the complexity of the network. A small number of molecules will not be able to achieve a good approximation, while a large number will result in an unnecessary use of computational resources. Also, the current implementation of the algorithm is susceptible to get stuck in local minima during training, and as the number of input variables increases, the technique suffers from the curse of dimensionality. Furthermore, the package can only train a single network or component at a time, although it has been shown that several networks can be trained at once to obtain a better input-output model.

External Links

Home page: link
Google scholar: link
Linkedin: link
Github: link
Twitter: link

Contact

Email: roberto.ayalasolares@georgeinstitute.ox.ac.uk

Team

Jose Roberto Ayala Solares