**Scalable
Artificial Intelligence Networks for Data Analysis in Growing Dimensions**

Ministry of Education and Science of Russia (Project No.
14.Y26.31.0022) a program for state support of scientific research conducted under
the direction of leading scientists in Russian educational institutions of
higher education, scientific institutions subordinate to the Federal Agency for
Scientific Organizations, and state scientific centers
of the Russian Federation

**Project
goal:**

The main problem addressed by the project is: to develop
advanced methods for data mining in high dimension, optimised for high (dozens
and hundreds) and very high (thousands, tens thousands and higher) dimension.
For this purpose, novel advanced methods and algorithms for fast non-iterative
and reversible correction of errors and knowledge transfer will be developed
for AI systems and implemented in open access software.

**Project
objectives:**

In this project we aim to develop advanced methods for data
mining in high dimension, optimised for growing dimensionality, including high
dimensionality (dozens and hundreds) and very high dimensionality (thousands,
tens thousands and higher). Such a technology is necessary in the world of fast
growing data, explosive growth of production of AI systems, and fast
development of non-stationary large heterogeneous networks of computers and
gadgets. For this purpose we aim:

• To develop and explore new aspects and
applications of the measure concentration theory (namely stochastic separation
theorems for a wide families of data distributions and various classes of
separation surfaces) as a background of the new technology.

• To develop methods for separation of
genuinely high-dimensional problems from reducible problems that have low
intrinsic dimension;

• To develop a theory, methodology and tools
for creating new, and upgrading existing Big Data AI systems so that they are
capable of learning on-the-fly from mistakes in real time.

• To develop theory and methods of dynamic
models of optimal complexity based on the idea of “game against observer” (the
worst case worlds);

• To adapt and apply the developed methods to
the analysis of high dimensional data about biological neural nets (both in
vitro and in viva).

• To generate solutions and open access
software for high dimensional data analysis and for correction of large legacy
AI systems.

• To generate specific and particular
solutions for analysis of large and high dimensional live video streams, to
complex biophysical, technical and hybrid man-machine systems.

• To create the Laboratory of advanced
methods for high-dimensional data analysis and to provide its sustainable
functioning.

**Project
expected results:**

• Mathematical theory of data mining methods
in high dimension based on
geometry and functional analysis (measure concentration effects);

• Advanced methods for multidimensional data
mining in high dimension;

• Model reduction methods for separation of
genuinely high-dimensional problems from reducible problems with low intrinsic
dimension;

• Technology of models of optimal complexity
based on games against observer;

• Applied problem-oriented software for a
series of specific problems: analysis of in vivo and in vitro neuronal nets,
analysis of high dimensional live video
streams, observation and modelling of complex technical and hybrid systems
(man-machine) like exoskeleton and of large and complex ecosystems.

• Open access software for high dimensional
data analysis and for correction of large legacy AI systems.

• Specific results about these systems
achieved by new methods.

• Publication of at least 44 papers in the
journals indexed in WoS database of which no less than 11 papers are in the
journals from the first quartile (Q1) in WoS database.

• Submission of at least three patent
applications (including one international patent application) for new
high-dimensional data mining algorithms.

• Sustainable functioning of the Laboratory
of advanced methods for high-dimensional data analysis.

**Description
of the proposed research:**

The work will be organised in 7 working packages (WPs):

WP1. Geometrical theory of data mining in high (dozens and
hundreds) and very high (thousands, tens of thousands and higher) dimension.

WP2. Theory of models of optimal complexity based on the
analysis of games against observer (the worst case worlds).

WP3. Development and implementation of specific algorithms
for mining of high-dimensional data.

WP4. Analysis of large and high dimensional live video
streams: theory, algorithms, implementation, and testing

WP5. Applications of advanced high dimensional data analysis
methods to biological neural systems (in vivo and in vitro).

WP6. Applications of advanced high dimensional data analysis
methods and models of optimal complexity to complex biophysical, technical and
hybrid man-machine systems.

WP7. Dissemination of the project results

**Description
of the work for work packages:**

WP1. Geometrical theory of data mining in high (dozens and
hundreds) and very high (thousands, tens of thousands and higher) dimension.

Measure concentration effects were introduced in mathematics
by Levy in 1922: volume of a ball is concentrated near its border, a sphere,
and, moreover, in a small vicinity of any equator of the sphere. Using this
observation, he developed a new area of functional analysis. Physicists have
used these effects much earlier. Maxwell, Gibbs, and Einstein created
equilibrium statistical mechanics, which is the first application of measure
concentration. More recently, several famous mathematicians (Milman, Gromov and Talgrand) have
developed the measure concentration theory much further.

At the end of 20th century, Hecht-Nielsen has paid attention
to the observation that in genuinely high-dimensional databases many data vectors
are pairwise almost orthogonal (have small inner product) and the number of these vectors may be much large
than dimension. Very recently, Gorban and Tyukin with co-authors have proved
that this effect manifests with probability close to one for independently
chosen data vectors and have demonstrated its importance for machine learning
and analysis of video information [A.N. Gorban, I.Y. Tyukin, D.V. Prokhorov,
K.I. Sofeikov Approximation with random bases: Pro et Contra, Information
Sciences 364-365, (2016), 129-145. Q1 IN COMPUTER SCIENCE, INFORMATION
SYSTEMS].

It is well known in brain science that small groups of
neurons play important role in pattern recognition. Mathematical explanation of
this effect can be found in the multidimensional nature of data. Gorban with
co-authors have proved that the high-dimensional problems can be solved by
ensembles of small neural networks (here, “high dimension” means several tens
or more, the effect is important already for dim>50) [A.N. Gorban, I.Y.
Tyukin, I. Romanenko, The Blessing of Dimensionality: Separation Theorems in
the Thermodynamic Limit, IFAC-PapersOnLine 49-24 (2016), 064–069; A keynote
talk at the 2nd IFAC Workshop on Thermodynamic Foundation of Mathematical
Systems Theory (TFMST II), Vigo, Spain, 28-30 September 2016.]. The proposed
project aims to finalise the theory of this phenomenon, to extend it to other
multidimensional phenomena and to develop algorithms for high-dimensional data
analysis based on these phenomena.

One more well-known difficulty appears in analysis of
multidimensional data. Most of machine learning approaches have stemmed from
the application of minimizing the mean squared distance principle, based on the
computationally efficient quadratic optimization methods. However, when faced
with high-dimensional and noisy data, the quadratic error functionals
demonstrate many weaknesses including high sensitivity to contaminating factors
and dimensionality curse. Therefore, a lot of recent applications in machine
learning exploited the properties of non-quadratic error functionals
based on L1 norm or even sub-linear potentials corresponding to fractional
norms. The back side of these approaches is tremendous increase in
computational cost for optimization. Till so far, no approaches have been
suggested to deal with arbitrary error functionals,
in a flexible and computationally efficient framework. We propose to develop
the theory and basic universal data approximation algorithms, based on
piece-wise quadratic error potentials of subquadratic
growth (PQSQ potentials) [A.N. Gorban, E.M. Mirkes, A. Zinovyev, Piece-wise
quadratic approximations of arbitrary error functions for fast and robust
machine learning, Neural Networks 84 (2016), 28-38. Q1 IN COMPUTER SCIENCE,
ARTIFICIAL INTELLIGENCE]. This is a new and universal framework to minimize
arbitrary sub-quadratic error potentials using an algorithm with guaranteed
fast convergence to the local or global error minimum. It can be applied in
most of existing machine learning methods, including methods of data
approximation and regularized regression, and leads to the improvement in the
computational cost/accuracy trade-off.

In particular, the following tasks will be performed in WP1:

• Formulate and prove stochastic separation
theorems for more general distributions. We will start from the distributions
with log-concave densities and bounded moments and then study the problems
caused by slow decay (“heavy tails”), strong deviations from the unimodality and the independence hypotheses.

• Formulate and prove stochastic separation
theorems for more general sets of classifiers, including small neural networks.

• Extend the stochastic separation theorems
for separation of subsets (not only of a point from a set).

• Develop algorithms for knowledge transfer
between heterogeneous AI systems (mutual corrections) using the one-trial
correctors.

• Develop theory of sparse and robust
solutions of high-dimensional data mining problems by collections of
independent and weekly dependent small ensembles of neurons;

• Develop and implement in the open access
software algorithms of data approximation, based on piece-wise quadratic error
potentials of subquadratic growth (PQSQ
potentials);

• Develop theory and algorithms of “dressing”
of approximate data models by small neural ensembles with improve the quality
of solution (measured, for example, by sensitivity and specificity);

• Develop theory and algorithms of big data
analysis by hierarchies of receptive fields build from many small independent
or weakly dependent neural ensembles.

WP2. Theory of models of optimal complexity based on the
analysis of games against observer (the worst case worlds).

We will develop the module of complexity optimization and the
game-theory approach for evaluation of the optimal complexity of models. Every
real system (or network of coupled systems) admits an (almost) infinitely deep
hierarchy of dynamic models, with increasing complexity. Any model in this
hierarchy would typically involve a set of parameters. For optimal values of
these parameters, the error of the model is minimal, and it decreases as the
model complexity increases. Unfortunately, the problem becomes ill-conditioned
in higher dimensions. Identifying the optimal values of all the unknown
parameters from available measurements is not always possible, and a theory of
optimal complexity of models is required. Developing this theory is amongst our
main goals. In the core of any such theory lie the notions and quantitative
criteria of achievable accuracy. We construct these notions by combining
methods and concepts of nonlinear observers, optimal control [Tyukin I.
Adaptation in dynamical systems. Cambridge University Press; 2011] and
bifurcations of limit sets [A.N. Gorban, Singularities of transition processes
in dynamical systems: Qualitative theory of critical delays, Electron. J. Diff.
Eqns., Monograph 05, 2004]. We then proceed by creating technologies for
estimating the optimal complexity for classes of systems. The model of optimal
complexity is the one where the state and parameters can be estimated reliably,
in the worst-case scenario, subject to the accuracy demands. In contrast to the
previous works of Sjöberg at al,
however, we do not impose any specific assumptions on the unmodelled
part of the model. We only demand that it is a function satisfying standard
existence conditions for the corresponding system of differential equations.
The spurious factors in our project are determined from the point of view of
observer design. The decision criteria are not mere
observability/identifiability tests but are the outcomes of a process termed
here as a game against an adaptive observer.

We search for a perturbation maximizing the worst-case error,
under given estimation time. As a result, for every model of a given class in
the hierarchy, a value (e.g. the sup-norm of the perturbation) will be
assigned. The smaller the value, the more spurious is the increment in the
model’s complexity. The optimal complexity is defined as the point at which an
increase in the model’s accuracy is balanced by the increase in the model’s
spuriousness. This will allow us to formulate an uncertainty principle of
modelling as an identity connecting an achievable accuracy (subject to
complexity) subject to the set of adaptive observers employed to solve the
identification/observation problem.

In particular, the following tasks will be performed in WP2:

• Create theory of games against observer;

• Develop the game-theory approach to
modelling with optimal complexity;

• Theoretical analysis of the maximal
achievable accuracy and the uncertainty principle in the
modelling/identification/observation problem;

• Testing the approach on modelling of
biological systems and crises anticipation.

WP3. Development and implementation of specific algorithms
for mining of high-dimensional data

This WP3 will follow the theoretical achievements of
WP1&2 and transform them into algorithms and software. At the same time, it
will mediate between theoretical WP1&2 and more applied and experimental
WPs 4-6 because the algorithms and the software should meet the needs of these
applied WPs.

In particular, the following tasks will be performed in WP3:

• Implement the one-trial corrector
algorithms in an open source software and test them on the datasets prepared in
WP5 and WP6 and on the open access benchmarks.

• Developing, implementation and testing of
“dressing” algorithms for improvement of approximate data models by collections
of independent or very weakly dependent small neural ensembles.

• Implement and test algorithms for knowledge
transfer between heterogeneous AI systems (mutual corrections) using the
one-trial correctors.

• Development, implementation and testing of
universal data approximation algorithms, based on piece-wise quadratic error
potentials of subquadratic growth (PQSQ potentials);

• Development, implementation and testing of
algorithms for modelling with optimal complexity based on the analysis of games
against observer (the worst case worlds).

• Development, implementation and testing of
algorithms, based on the methods of topological grammars, for separation of
genuinely high-dimensional problems from reducible problems with low intrinsic
dimension;

• Development, implementation and testing of
cascade algorithms based on hierarchies of receptive fields build from many
small independent or very weakly dependent neural ensembles;

• Employ neural networks with self-esteem
[S.E. Gilev, A.N. Gorban, E.M. Mirkes, Small experts and internal conflicts in artificial
neural networks, Doclady USSR Academy of Sciences,
1991. V.320, N.1. 220-223.] for organisation of semisupervising
self-training networks of heterogeneous AI systems with mutual corrections.
Implement and test these algorithms on the datasets prepared in WP5 and WP6 and
on open access benchmarks.

The last task about model reduction needs special comments.
There exist many algorithms for linear and nonlinear dimensionality reduction,
from classical PCA to various version of nonlinear PCA, principal graphs and
principal cubic complexes [AN Gorban, B Kégl, DC
Wunsch, AY Zinovyev, Principal manifolds for data visualization and dimension
reduction. Berlin-Heidelberg: Springer; 2008]. A.N. Gorban with co-authors
developed a universal method of graph grammars for data approximation and
dimensionality reduction [Gorban AN, Zinovyev A. Principal manifolds and graphs
in practice: from molecular biology to dynamical systems. International journal
of neural systems. 2010, 20 (3), 219-32]. For high-dimensional data this method
should be adopted, implemented and tested. Of course, in the applied software
the novel methods should be combined with the classical approaches.

WP4. Analysis of large and high dimensional live video
streams: theory, algorithms, implementation, and testing

Analysis of large and high dimensional live video streams is
crucially important in many areas: from security to any computer vision
applications. Ability of a new high-dimensional data mining algorithms to
improve the solutions of this classical problem is a compulsory maturity test.

Now, deep learning provides us with good benchmarks and the
advanced base of comparison. There exist also several other bases (support
vector machines, various versions of decision forests, etc.). We expect that
our algorithms will require less computer time and memory (for the same
accuracy) or will have better accuracy under restrictions on computer time and
memory. To prove this hypothesis we will compare all the novel approach to
several bases of comparison. The tasks for this WP4 have the following
structure:

In particular, the following tasks will be performed in WP4:

• Preparation of the benchmarks for analysis
of large and high dimensional live video streams, selection and preparing bases
for comparison. Development of an identity collector from real CCTV footage and
from online resources for gathering of identities from various phenotypical
clusters. A database of anonymised, profiled and pre-processed images of
identities will be created.

• Implementation and testing of the
correction and knowledge transfer algorithms for large AI systems for analysis
of videostreams.

• Comparison of “dressing” algorithms for
improvement of approximate data models by collections of independent or very
weakly dependent small neural ensembles with the basic algorithms on the
benchmarks.

• Comparison of cascade algorithms based on
hierarchies of receptive fields build from many small independent or very
weakly dependent neural ensembles with the basic algorithms on the benchmarks.

• Comparison of universal data analysis
algorithms, based on piece-wise quadratic error potentials of subquadratic growth (PQSQ potentials) with the standard
algorithms on the benchmarks;

• Performance analysis of the reduction
algorithms, based on the methods of topological grammars, for separation of
genuinely high-dimensional problems from reducible problems with low intrinsic
dimension, on the benchmarks.

WP5. Applications of advanced high dimensional data analysis
methods to biological neural systems (in vivo and in vitro).

In this WP we will use data which have been collected in the
labs of the host organization for a long time. These collections of data and
precise experiments will provide us by the important challenge. The methods
developed in WP1-WP3 should prove their usefulness for analysis of real brain.
The tasks in this WP include data preparation, data analysis, modelling and
comparison of the results to the experiments.

In particular, the following tasks will be performed in WP5:

• Analysis of large-scale neural recordings,
development of algorithms for analysis of calcium imaging and
electrophysiological data.

• Relating single neuron and network activity
to behavioural patterns. Imaging data from retrosplenial
and auditory cortices will be analysed while animals are given a relevant
stimulus (virtual reality or a complex sound stimulus).

• Analysis of multi-channel
electrophysiological data from neuronal cultures. Identification of activity
changes under the application of modulatory drugs.

• Identification of neuron types from
electrophysiological recordings and morphological reconstruction data.
Application of developed techniques to the identification of unknown
interneuron types from hippocampal areas.

• Analysis of electromyography recordings,
prediction of movement.

• Development of artificial neural network
model incorporating biologically relevant details, influence of specific
biophysical features on computation and learning in neural networks.

WP6. Applications of advanced high dimensional data analysis
methods and models of optimal complexity to several complex biophysical,
technical and hybrid man-machine systems.

The host University has several well-known laboratories and
groups that produce high quality big data about complex biophysical, technical
and hybrid systems (man-machine). In the frame of the project, we aim to
collaborate with majority of these groups. For this purpose, a regular working
seminar “Analysis of high-dimensional data” will be organised on the weekly
basis. The datasets and the experiment which produce large high-dimensional
data streams will be discussed on the seminar, the research programs for
selected data sets and streams will be approved by the seminar and the team
will follow these programs. The results will be also presented on the seminar
and published.

In particular, the following tasks will be performed in WP6:

• Organisation of the regular working seminar
“Analysis of high-dimensional data” on the weekly basis.

• Selection of the datasets and streams for
detailed analysis.

• Presentation and discussion of the data
analysis programs for the selected datasets and streams on the seminar.

• Adaptation of the methods and software for
analysis of selected data.

• Data analysis for selected data,
presentation and discussion of the results.

Each of these tasks will be performed several times during
each year. We aim to analyse several most promising high quality data sets and
streams produced in the host University.

WP7. Dissemination of the project results

This working package WP7 includes two directions of work:
external (including international) and internal (in the host University). For
external dissemination we will use three main processes: (i)
publication of scientific papers and books, (ii) organization of international
workshops, and (ii) use of mass media in the form of press-releases, dedicated
web page, blogs discussion, social media and Twitter
information. Internal dissemination means organization of special master-classes,
lecture courses and research proseminars for
students, PhD students and young scientists.

In particular, the following tasks will be performed in WP7:

• Annual master-classes for students, PhD
students and young scientists given by A.N.Gorban
(not less than 16 contact hours every year);

• Web-page of the project;

• Three International Research Workshops:
Geometry of Big Data, Small Neuronal Ensembles and Brain, Models of Optimal
Complexity;

• Publication at least 44 papers in the
journals indexed in WoS database and not less than 22 paper in the journals
from the first quartile (Q1) in WoS database;

• Publication of press-releases in the main
news agencies (Eureca, ScienceDaily, etc.) every year.

The scientific and technical results obtained in the
framework of scientific research will make it possible to create technologies
that are the basis for the innovative development of the domestic market for
products and services, the stable position of Russia in the external market,
and will ensure, within the framework of the Strategy for Scientific and
Technical Development of the Russian Federation (approved by the Decree of the
President of the Russian Federation December 1, 2016 No. 642 "On the
Strategy for Scientific and Technological Development of the Russian
Federation") the transition to advanced digital, intelligent production
technologies, robotic systems, new materials and design methods, creation of
large data processing systems, machine learning and artificial intelligence
(H1).)