Scalable Artificial Intelligence Networks for Data Analysis in Growing Dimensions

Ministry of Education and Science of Russia (Project No. 14.Y26.31.0022) a program for state support of scientific research conducted under the direction of leading scientists in Russian educational institutions of higher education, scientific institutions subordinate to the Federal Agency for Scientific Organizations, and state scientific centers of the Russian Federation

 

Project goal:

The main problem addressed by the project is: to develop advanced methods for data mining in high dimension, optimised for high (dozens and hundreds) and very high (thousands, tens thousands and higher) dimension. For this purpose, novel advanced methods and algorithms for fast non-iterative and reversible correction of errors and knowledge transfer will be developed for AI systems and implemented in open access software.

 

Project objectives:

In this project we aim to develop advanced methods for data mining in high dimension, optimised for growing dimensionality, including high dimensionality (dozens and hundreds) and very high dimensionality (thousands, tens thousands and higher). Such a technology is necessary in the world of fast growing data, explosive growth of production of AI systems, and fast development of non-stationary large heterogeneous networks of computers and gadgets. For this purpose we aim:

 

     To develop and explore new aspects and applications of the measure concentration theory (namely stochastic separation theorems for a wide families of data distributions and various classes of separation surfaces) as a background of the new technology.

     To develop methods for separation of genuinely high-dimensional problems from reducible problems that have low intrinsic dimension;

     To develop a theory, methodology and tools for creating new, and upgrading existing Big Data AI systems so that they are capable of learning on-the-fly from mistakes in real time.

     To develop theory and methods of dynamic models of optimal complexity based on the idea of “game against observer” (the worst case worlds);

     To adapt and apply the developed methods to the analysis of high dimensional data about biological neural nets (both in vitro and in viva).

     To generate solutions and open access software for high dimensional data analysis and for correction of large legacy AI systems.

     To generate specific and particular solutions for analysis of large and high dimensional live video streams, to complex biophysical, technical and hybrid man-machine systems.

     To create the Laboratory of advanced methods for high-dimensional data analysis and to provide its sustainable functioning.

 

Project expected results:

     Mathematical theory of data mining methods in high dimension based on  geometry and functional analysis (measure concentration effects);

     Advanced methods for multidimensional data mining in high dimension;

     Model reduction methods for separation of genuinely high-dimensional problems from reducible problems with low intrinsic dimension;

     Technology of models of optimal complexity based on games against observer;

     Applied problem-oriented software for a series of specific problems: analysis of in vivo and in vitro neuronal nets, analysis of  high dimensional live video streams, observation and modelling of complex technical and hybrid systems (man-machine) like exoskeleton and of large and complex ecosystems.

     Open access software for high dimensional data analysis and for correction of large legacy AI systems.

     Specific results about these systems achieved by new methods.

     Publication of at least 44 papers in the journals indexed in WoS database of which no less than 11 papers are in the journals from the first quartile (Q1) in WoS database.

     Submission of at least three patent applications (including one international patent application) for new high-dimensional data mining algorithms.

     Sustainable functioning of the Laboratory of advanced methods for high-dimensional data analysis.

 

Description of the proposed research:

The work will be organised in 7 working packages (WPs):

WP1. Geometrical theory of data mining in high (dozens and hundreds) and very high (thousands, tens of thousands and higher) dimension.

WP2. Theory of models of optimal complexity based on the analysis of games against observer (the worst case worlds).

WP3. Development and implementation of specific algorithms for mining of high-dimensional data.

WP4. Analysis of large and high dimensional live video streams: theory, algorithms, implementation, and testing

WP5. Applications of advanced high dimensional data analysis methods to biological neural systems (in vivo and in vitro).

WP6. Applications of advanced high dimensional data analysis methods and models of optimal complexity to complex biophysical, technical and hybrid man-machine systems.

WP7. Dissemination of the project results

 

Description of the work for work packages:

WP1. Geometrical theory of data mining in high (dozens and hundreds) and very high (thousands, tens of thousands and higher) dimension.

Measure concentration effects were introduced in mathematics by Levy in 1922: volume of a ball is concentrated near its border, a sphere, and, moreover, in a small vicinity of any equator of the sphere. Using this observation, he developed a new area of functional analysis. Physicists have used these effects much earlier. Maxwell, Gibbs, and Einstein created equilibrium statistical mechanics, which is the first application of measure concentration. More recently, several famous mathematicians (Milman, Gromov and Talgrand) have developed the measure concentration theory much further.

At the end of 20th century, Hecht-Nielsen has paid attention to the observation that in genuinely high-dimensional databases many data vectors are pairwise almost orthogonal (have small inner product) and the  number of these vectors may be much large than dimension. Very recently, Gorban and Tyukin with co-authors have proved that this effect manifests with probability close to one for independently chosen data vectors and have demonstrated its importance for machine learning and analysis of video information [A.N. Gorban, I.Y. Tyukin, D.V. Prokhorov, K.I. Sofeikov Approximation with random bases: Pro et Contra, Information Sciences 364-365, (2016), 129-145. Q1 IN COMPUTER SCIENCE, INFORMATION SYSTEMS].

It is well known in brain science that small groups of neurons play important role in pattern recognition. Mathematical explanation of this effect can be found in the multidimensional nature of data. Gorban with co-authors have proved that the high-dimensional problems can be solved by ensembles of small neural networks (here, “high dimension” means several tens or more, the effect is important already for dim>50) [A.N. Gorban, I.Y. Tyukin, I. Romanenko, The Blessing of Dimensionality: Separation Theorems in the Thermodynamic Limit, IFAC-PapersOnLine 49-24 (2016), 064–069; A keynote talk at the 2nd IFAC Workshop on Thermodynamic Foundation of Mathematical Systems Theory (TFMST II), Vigo, Spain, 28-30 September 2016.]. The proposed project aims to finalise the theory of this phenomenon, to extend it to other multidimensional phenomena and to develop algorithms for high-dimensional data analysis based on these phenomena.

One more well-known difficulty appears in analysis of multidimensional data. Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrate many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore, a lot of recent applications in machine learning exploited the properties of non-quadratic error functionals based on L1 norm or even sub-linear potentials corresponding to fractional norms. The back side of these approaches is tremendous increase in computational cost for optimization. Till so far, no approaches have been suggested to deal with arbitrary error functionals, in a flexible and computationally efficient framework. We propose to develop the theory and basic universal data approximation algorithms, based on piece-wise quadratic error potentials of subquadratic growth (PQSQ potentials) [A.N. Gorban, E.M. Mirkes, A. Zinovyev, Piece-wise quadratic approximations of arbitrary error functions for fast and robust machine learning, Neural Networks 84 (2016), 28-38. Q1 IN COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE]. This is a new and universal framework to minimize arbitrary sub-quadratic error potentials using an algorithm with guaranteed fast convergence to the local or global error minimum. It can be applied in most of existing machine learning methods, including methods of data approximation and regularized regression, and leads to the improvement in the computational cost/accuracy trade-off.

 

In particular, the following tasks will be performed in WP1:

     Formulate and prove stochastic separation theorems for more general distributions. We will start from the distributions with log-concave densities and bounded moments and then study the problems caused by slow decay (“heavy tails”), strong deviations from the unimodality and the independence hypotheses.

     Formulate and prove stochastic separation theorems for more general sets of classifiers, including small neural networks.

     Extend the stochastic separation theorems for separation of subsets (not only of a point from a set).

     Develop algorithms for knowledge transfer between heterogeneous AI systems (mutual corrections) using the one-trial correctors.

     Develop theory of sparse and robust solutions of high-dimensional data mining problems by collections of independent and weekly dependent small ensembles of neurons;

     Develop and implement in the open access software algorithms of data approximation, based on piece-wise quadratic error potentials of subquadratic  growth (PQSQ potentials);

     Develop theory and algorithms of “dressing” of approximate data models by small neural ensembles with improve the quality of solution (measured, for example, by sensitivity and specificity);

     Develop theory and algorithms of big data analysis by hierarchies of receptive fields build from many small independent or weakly dependent neural ensembles.

 

WP2. Theory of models of optimal complexity based on the analysis of games against observer (the worst case worlds).

We will develop the module of complexity optimization and the game-theory approach for evaluation of the optimal complexity of models. Every real system (or network of coupled systems) admits an (almost) infinitely deep hierarchy of dynamic models, with increasing complexity. Any model in this hierarchy would typically involve a set of parameters. For optimal values of these parameters, the error of the model is minimal, and it decreases as the model complexity increases. Unfortunately, the problem becomes ill-conditioned in higher dimensions. Identifying the optimal values of all the unknown parameters from available measurements is not always possible, and a theory of optimal complexity of models is required. Developing this theory is amongst our main goals. In the core of any such theory lie the notions and quantitative criteria of achievable accuracy. We construct these notions by combining methods and concepts of nonlinear observers, optimal control [Tyukin I. Adaptation in dynamical systems. Cambridge University Press; 2011] and bifurcations of limit sets [A.N. Gorban, Singularities of transition processes in dynamical systems: Qualitative theory of critical delays, Electron. J. Diff. Eqns., Monograph 05, 2004]. We then proceed by creating technologies for estimating the optimal complexity for classes of systems. The model of optimal complexity is the one where the state and parameters can be estimated reliably, in the worst-case scenario, subject to the accuracy demands. In contrast to the previous works of Sjöberg at al, however, we do not impose any specific assumptions on the unmodelled part of the model. We only demand that it is a function satisfying standard existence conditions for the corresponding system of differential equations. The spurious factors in our project are determined from the point of view of observer design. The decision criteria are not mere observability/identifiability tests but are the outcomes of a process termed here as a game against an adaptive observer.

We search for a perturbation maximizing the worst-case error, under given estimation time. As a result, for every model of a given class in the hierarchy, a value (e.g. the sup-norm of the perturbation) will be assigned. The smaller the value, the more spurious is the increment in the model’s complexity. The optimal complexity is defined as the point at which an increase in the model’s accuracy is balanced by the increase in the model’s spuriousness. This will allow us to formulate an uncertainty principle of modelling as an identity connecting an achievable accuracy (subject to complexity) subject to the set of adaptive observers employed to solve the identification/observation problem.

 

In particular, the following tasks will be performed in WP2:

     Create theory of games against observer;

     Develop the game-theory approach to modelling with optimal complexity;

     Theoretical analysis of the maximal achievable accuracy and the uncertainty principle in the modelling/identification/observation problem;

     Testing the approach on modelling of biological systems and crises anticipation.

 

WP3. Development and implementation of specific algorithms for mining of high-dimensional data

This WP3 will follow the theoretical achievements of WP1&2 and transform them into algorithms and software. At the same time, it will mediate between theoretical WP1&2 and more applied and experimental WPs 4-6 because the algorithms and the software should meet the needs of these applied WPs.

 

In particular, the following tasks will be performed in WP3:

     Implement the one-trial corrector algorithms in an open source software and test them on the datasets prepared in WP5 and WP6 and on the open access benchmarks.

     Developing, implementation and testing of “dressing” algorithms for improvement of approximate data models by collections of independent or very weakly dependent small neural ensembles.

     Implement and test algorithms for knowledge transfer between heterogeneous AI systems (mutual corrections) using the one-trial correctors.

     Development, implementation and testing of universal data approximation algorithms, based on piece-wise quadratic error potentials of subquadratic growth (PQSQ potentials);

     Development, implementation and testing of algorithms for modelling with optimal complexity based on the analysis of games against observer (the worst case worlds).

     Development, implementation and testing of algorithms, based on the methods of topological grammars, for separation of genuinely high-dimensional problems from reducible problems with low intrinsic dimension;

     Development, implementation and testing of cascade algorithms based on hierarchies of receptive fields build from many small independent or very weakly dependent neural ensembles;

     Employ neural networks with self-esteem [S.E. Gilev, A.N. Gorban, E.M. Mirkes, Small experts and internal conflicts in artificial neural networks, Doclady USSR Academy of Sciences, 1991. V.320, N.1. 220-223.] for organisation of semisupervising self-training networks of heterogeneous AI systems with mutual corrections. Implement and test these algorithms on the datasets prepared in WP5 and WP6 and on open access benchmarks.

 

The last task about model reduction needs special comments. There exist many algorithms for linear and nonlinear dimensionality reduction, from classical PCA to various version of nonlinear PCA, principal graphs and principal cubic complexes [AN Gorban, B Kégl, DC Wunsch, AY Zinovyev, Principal manifolds for data visualization and dimension reduction. Berlin-Heidelberg: Springer; 2008]. A.N. Gorban with co-authors developed a universal method of graph grammars for data approximation and dimensionality reduction [Gorban AN, Zinovyev A. Principal manifolds and graphs in practice: from molecular biology to dynamical systems. International journal of neural systems. 2010, 20 (3), 219-32]. For high-dimensional data this method should be adopted, implemented and tested. Of course, in the applied software the novel methods should be combined with the classical approaches.

 

WP4. Analysis of large and high dimensional live video streams: theory, algorithms, implementation, and testing

Analysis of large and high dimensional live video streams is crucially important in many areas: from security to any computer vision applications. Ability of a new high-dimensional data mining algorithms to improve the solutions of this classical problem is a compulsory maturity test.

Now, deep learning provides us with good benchmarks and the advanced base of comparison. There exist also several other bases (support vector machines, various versions of decision forests, etc.). We expect that our algorithms will require less computer time and memory (for the same accuracy) or will have better accuracy under restrictions on computer time and memory. To prove this hypothesis we will compare all the novel approach to several bases of comparison. The tasks for this WP4 have the following structure:

 

In particular, the following tasks will be performed in WP4:

     Preparation of the benchmarks for analysis of large and high dimensional live video streams, selection and preparing bases for comparison. Development of an identity collector from real CCTV footage and from online resources for gathering of identities from various phenotypical clusters. A database of anonymised, profiled and pre-processed images of identities will be created.

     Implementation and testing of the correction and knowledge transfer algorithms for large AI systems for analysis of videostreams.

     Comparison of “dressing” algorithms for improvement of approximate data models by collections of independent or very weakly dependent small neural ensembles with the basic algorithms on the benchmarks.

     Comparison of cascade algorithms based on hierarchies of receptive fields build from many small independent or very weakly dependent neural ensembles with the basic algorithms on the benchmarks.

     Comparison of universal data analysis algorithms, based on piece-wise quadratic error potentials of subquadratic growth (PQSQ potentials) with the standard algorithms on the benchmarks;

     Performance analysis of the reduction algorithms, based on the methods of topological grammars, for separation of genuinely high-dimensional problems from reducible problems with low intrinsic dimension, on the benchmarks.

 

WP5. Applications of advanced high dimensional data analysis methods to biological neural systems (in vivo and in vitro).

In this WP we will use data which have been collected in the labs of the host organization for a long time. These collections of data and precise experiments will provide us by the important challenge. The methods developed in WP1-WP3 should prove their usefulness for analysis of real brain. The tasks in this WP include data preparation, data analysis, modelling and comparison of the results to the experiments.

In particular, the following tasks will be performed in WP5:

     Analysis of large-scale neural recordings, development of algorithms for analysis of calcium imaging and electrophysiological data.

     Relating single neuron and network activity to behavioural patterns. Imaging data from retrosplenial and auditory cortices will be analysed while animals are given a relevant stimulus (virtual reality or a complex sound stimulus).

     Analysis of multi-channel electrophysiological data from neuronal cultures. Identification of activity changes under the application of modulatory drugs.

     Identification of neuron types from electrophysiological recordings and morphological reconstruction data. Application of developed techniques to the identification of unknown interneuron types from hippocampal areas.

     Analysis of electromyography recordings, prediction of movement.

     Development of artificial neural network model incorporating biologically relevant details, influence of specific biophysical features on computation and learning in neural networks.

 

WP6. Applications of advanced high dimensional data analysis methods and models of optimal complexity to several complex biophysical, technical and hybrid man-machine systems.

The host University has several well-known laboratories and groups that produce high quality big data about complex biophysical, technical and hybrid systems (man-machine). In the frame of the project, we aim to collaborate with majority of these groups. For this purpose, a regular working seminar “Analysis of high-dimensional data” will be organised on the weekly basis. The datasets and the experiment which produce large high-dimensional data streams will be discussed on the seminar, the research programs for selected data sets and streams will be approved by the seminar and the team will follow these programs. The results will be also presented on the seminar and published.

 

In particular, the following tasks will be performed in WP6:

     Organisation of the regular working seminar “Analysis of high-dimensional data” on the weekly basis.

     Selection of the datasets and streams for detailed analysis.

     Presentation and discussion of the data analysis programs for the selected datasets and streams on the seminar.

     Adaptation of the methods and software for analysis of selected data.

     Data analysis for selected data, presentation and discussion of the results.

Each of these tasks will be performed several times during each year. We aim to analyse several most promising high quality data sets and streams produced in the host University.

 

WP7. Dissemination of the project results

This working package WP7 includes two directions of work: external (including international) and internal (in the host University). For external dissemination we will use three main processes: (i) publication of scientific papers and books, (ii) organization of international workshops, and (ii) use of mass media in the form of press-releases, dedicated web page, blogs discussion, social media and Twitter information. Internal dissemination means organization of special master-classes, lecture courses and research proseminars for students, PhD students and young scientists.

 

In particular, the following tasks will be performed in WP7:

     Annual master-classes for students, PhD students and young scientists given by A.N.Gorban (not less than 16 contact hours every year);

     Web-page of the project;

     Three International Research Workshops: Geometry of Big Data, Small Neuronal Ensembles and Brain, Models of Optimal Complexity;

     Publication at least 44 papers in the journals indexed in WoS database and not less than 22 paper in the journals from the first quartile (Q1) in WoS database;

     Publication of press-releases in the main news agencies (Eureca, ScienceDaily, etc.) every year.

 

The scientific and technical results obtained in the framework of scientific research will make it possible to create technologies that are the basis for the innovative development of the domestic market for products and services, the stable position of Russia in the external market, and will ensure, within the framework of the Strategy for Scientific and Technical Development of the Russian Federation (approved by the Decree of the President of the Russian Federation December 1, 2016 No. 642 "On the Strategy for Scientific and Technological Development of the Russian Federation") the transition to advanced digital, intelligent production technologies, robotic systems, new materials and design methods, creation of large data processing systems, machine learning and artificial intelligence (H1).)