Current research

Forward Orthogonal Least Squares Regression:

Papers submission to IEEE

I will soon be submitting for peer review to IEEE my research on NARMAX systems, where I propose an optimized version of the FOrLSR in addition to two new algorithmic classes in that framework. In short, the FOrLSR algorithms allow to obtain an analytical expression (aka symbolic representation) for any unknown system or function. The abstract and the conclusion of the papers below give an overview of the work. Note that the critical parts of the papers (pseudo-codes, theorem, proofs, etc) have been removed, as the work is currently unpublished and thus not yet registered as my intellectual property.

Paper 1: Arborescent Orthogonal Least Squares Regression (AOrLSR)

The first part (rFOrLSR) presents a common algorithm but with its nested loops being transformed in a single matrix operation (for heavy parallelization and GPU operation) and made recursive to flatten the complexity from quadratic to linear. → linear algebra-based optimization with numerics.

The second part of the paper is the arborescence design, which is essentially a linear algebra and graph-theory-based optimization procedure to tackle the NP-hard problem of finding the linear equation solution with the largest amount of zero entries in the solution vector and the smallest solving error.

Read or download the AOrLSR paper.

Paper 2: Dictionary Morphing Least Squares Regression (DMOrLSR)

The second paper (DMOrLSR) morphs the user-passed regressor (functions) to allow expansions which adapt their terms to the system output (imagine a fourier transform which finds the exact peaks and then only contains those terms, being sparse). This is based on genetic algorithms, linear algebra, matrix calculus / infinitesimal optimization, with closed form gradients and Hessians independent of the user-passed function, number of arguments and data.

The library makes optimal sparse least squares fitting of any vectors passed by the user for the given system response. All provided examples are with severely non-linear auto-regressive systems, however the user can pass dictionaries with any type of functions. This is thus a very general mathematical framework supporting for example wavelet-, RBF, polynomial, etc fitting. The papers also contain an example of using the library to fit a nested expansion in |x|^n inside a fraction or of nesting non-linearities inside other non-linearities.

Read or download the DMOrLSR paper.

Python Package for both Papers:

Those algorithms will also soon be released as a GPU-accelerated open-source python machine learning package

Supplementary NARMAX research

Based on my previously described paper, further ameliorations and papers are planned:

Together with a Technische Universität Berlin professor, research is planned to replace some layers of neurons in Audio-Generating Neural Networks (for Text to Speech generation, Music synthesis, etc) by more efficient and interpretable NARMAX systems
In depth study of the effect of data-set size and excitation signal on the fitting quality and convergence speed
Ameliorations to the genetic vector space generator sub-algorithm
Morphing support for rational NARMAXes and noise stabilization

Dimensionality selection paper

This machine learning algorithm automatically selects the amount of information worth keeping to represent data or a digital system. The applications are for example finding ideal compression ratios (storage and transmission) or ideal approximations of systems.

I propose a version of the algorithm by Zhu and Ghodsi in 2006 , where the recursion is unrolled to a much more efficient matrix-form algorithm, which also adds the ability to introduce constraints on the dimensionality selection. The derivation and a C++ and a Python implementation are finished and examples are documented but the paper hasn’t been written yet.

Alternative link for paper.

Eigen contributions

Eigen is a open-source highly optimized lazy-evaluation C++ expression-tree-based linear algebra library, which is used amongst other in Tensorflow, on which Google's AIs are based.

→ Accepted Contribution:

Meta-templated DenseBase Concatenation
Meta-templated Binary indexing as found in Numpy/Matlab
→ Currently in discussion with the engineers if this conversion workaround is to be adopted or if the framework should be modified to support it natively.
Some bug reports

→ Contributions I am working on:

A Least-Squares Sparsifying system solver for largely under-determined systems based on the Machine Learning research going into my next paper (rFOrLSR)
Guides/Docs to convert Numpy code into Eigen code
Potentially contribute C++ bindings allowing to use Matplotlib for debugging Eigen data-structures.