Probabilistic models for melodic sequences
MetadataShow full item record
Structure is one of the fundamentals of music, yet the complexity arising from the vast number of possible variations of musical elements such as rhythm, melody, harmony, key, texture and form, along with their combinations, makes music modelling a particularly challenging task for machine learning. The research presented in this thesis focuses on the problem of learning a generative model for melody directly from musical sequences belonging to the same genre. Our goal is to develop probabilistic models that can automatically capture the complex statistical dependencies evident in music without the need to incorporate significant domain-specifc knowledge. At all stages we avoid making assumptions explicit to music and consider models that can can be readily applied in different music genres and can easily be adapted for other sequential data domains. We develop the Dirichlet Variable-Length Markov Model (Dirichlet-VMM), a Bayesian formulation of the Variable-Length Markov Model (VMM), where smoothing is performed in a systematic probabilistic manner. The model is a general-purpose, dictionary-based predictor with a formal smoothing technique and is shown to perform significantly better than the standard VMM in melody modelling. Motivated by the ability of the Restricted Boltzmann Machine (RBM) to extract high quality latent features in an unsupervised manner, we next develop the Time-Convolutional Restricted Boltzmann Machine (TC-RBM), a novel adaptation of the Convolutional RBM for modelling sequential data. We show that the TC-RBM learns descriptive musical features such as chords, octaves and typical melody movement patterns. To deal with the non-stationarity of music, we develop the Variable-gram Topic model, which employs the Dirichlet-VMM for the parametrisation of the topic distributions. The Dirichlet-VMM models the local temporal structure, while the latent topics represent di erent music regimes. The model does not make any assumptions explicit to music, but it is particularly suitable in this context, as it couples the latent topic formalism with an expressive model of contextual information.