PhD Defense by Kumar Ashis Pati

Primary tabs


Kumar Ashis Pati

Ph.D. in Music Technology

School of Music

Georgia Institute of Technology




Learning to Manipulate Latent Representations of Deep Generative Models

-- towards improving interactivity and controllability in automatic music creation



Date: 9th December 2020

Time: 12:00 to 14:00 (EDT)

Location: https://bluejeans.com/1783153167

Note: This defense is remote-only​​




Deep generative models have emerged as a tool of choice for the design of automatic music composition systems. While these models are capable of learning complex representations from data, a limitation of many of these models is that they allow little to no control over the generated music. Latent representation-based models, such as Variational Auto-Encoders, have the potential to alleviate this limitation as they are able to encode hidden attributes of the data in a low-dimensional latent space. However, the encoded attributes are often not interpretable and cannot be explicitly controlled.


The work presented in this thesis seeks to address these challenges by learning to manipulate and design latent spaces in a way that allows control over musically meaningful attributes that are understandable by humans. This in turn can allow explicit control of such attributes during the generation process and help users realize their compositional goals. Specifically, three different approaches are proposed to investigate this problem. The first approach shows that we can learn to traverse latent spaces of generative models to perform complex interactive music composition tasks. The second approach uses a novel latent space regularization technique which can encode individual musical attributes along specific dimensions of the latent space. The third approach attempts to use attribute-informed non-linear transformations over an existing latent space such that the transformed latent space allows controllable generation of data. In addition, the problem of disentanglement learning in the context of symbolic music is investigated systematically by proposing a tailor-made dataset for the task and evaluating the performance of several different methods for unsupervised and supervised disentanglement learning. Together, the proposed methods will help address critical shortcomings of deep music generative models and pave the path towards intuitive interfaces which can be used by humans in real compositional settings.




Prof. Alexander Lerch (Advisor, School of Music, Georgia Institute of Technology)

Prof. Devi Parikh (School of Interactive Computing, Georgia Institute of Technology)

Prof. Gil Weinberg (School of Music, Georgia Institute of Technology)

Prof. Jason Freeman (School of Music, Georgia Institute of Technology)

Prof. Mark Riedl (School of Interactive Computing, Georgia Institute of Technology)

Rudolph van der Merwe (Senior Engineering Manager,  Advanced Computations Group, Apple Inc.)





  • Workflow Status:
  • Created By:
    Tatianna Richardson
  • Created:
  • Modified By:
    Tatianna Richardson
  • Modified:


Target Audience

    No target audience selected.