Deep Learning project


This project stems from a deep learning module at TU Berlin (April 2019), where the goal was to understand the workings of the SING architecture https://github.com/facebookresearch/SING from a mathematical and from an implementation (python) perspective and then propose ameliorations. SING is a deep learning-based music notes synthesizer trained on the NSynth dataset and is based on an LSTM sequence generator paired with a convolutional decoder.

The first proposed ameliorations were to replace the standard short-time Fourier transform (STFT) loss function by audio representations closer to human hearing in the form of the Constant Q Transform (CQT) and the Mel frequency cepstrums (MFC coefficients). The second proposed amelioration was to replace the ReLU activation functions by exponential linear units (ELU), which have useful mathematical properties increasing convergence speed and stability. There were sadly some last-minute problems when porting the code to the TU’s computation servers (which we had restricted access to), which is why some results remain empirically unproven.


Read or download the paper.