Harnessing the power of AI through collaborative research
We believe research is the key to create innovative and effective solutions for our customers.
We welcome collaboration opportunities to continuously advance in field of AI. Reach out to us at research@emulationai.com for latest on the research front.
Recent Publications
A Preliminary Study on Augmenting Speech Emotion Recognition Using a Diffusion Model
In this paper, we propose to utilise diffusion models for data augmentation in speech emotion recognition (SER). In particular, we present an effective approach to utilise improved denoising diffusion probabilistic models (IDDPM) to generate synthetic emotional data. We condition the IDDPM with the textual embedding from bidirectional encoder representations from transformers (BERT) to generate high-quality synthetic emotional samples in different speakers’ voices. We implement a series of experiments and show that better quality synthetic data helps improve SER performance. We compare results with generative adversarial networks (GANs) and show that the proposed model generates better-quality synthetic samples that can considerably improve the performance of SER when augmented with synthetic data. Read More
Multi-task Learning From Unlabelled Data to Improve Cross Language Speech Emotion Recognition
A novel multi-task learning (MTL) approach to effectively utilise unlabelled data to improve the generalisation as well as the performance of cross-language SER systems. In particular, It is proposed to use language and domain identification as auxiliary tasks, which facilities the proposed framework to learn from abundantly available language identification data. The proposed model is evaluated on publicly available datasets in four languages and achieves state-of-the-art performance. Read More
Emotions Beyond Words: Non-Speech Audio Emotion Recognition With Edge Computing
Non-speech emotion recognition has a wide range of applications including healthcare, crime control and rescue, and entertainment, to name a few. Providing these applications using edge computing has great potential, however, recent studies are focused on speech-emotion recognition using complex architectures. In this paper, a non-speech-based emotion recognition system is proposed, which can rely on edge computing to analyse emotions conveyed through non-speech expressions like screaming and crying. Read More
AI-Based Emotion Recognition: Promise, Peril, and Prescriptions for Prosocial Path
In this paper, we present the promises and perils of AER applications. We discuss the ethical challenges related to the data and AER systems and highlight the prescriptions for prosocial perspectives for future AER applications. We hope this work will help AI researchers and developers design prosocial AER applications. Read More
Privacy Enhanced Speech Emotion Communication Using Deep Learning Aided Edge Computing
The use of an adversarial learning framework can be deployed at the edge to unlearn the users’ private information in speech representations. These privacy-enhanced representations can be transmitted to the central server for decision-making. The proposed model evaluated multiple speech emotion datasets and show that the proposed model can hide users’ specific demographic information and improve the robustness of emotion identification without significantly impacting performance. Read More
A Survey On Deep Reinforcement Learning For Audio-Based Applications
Deep reinforcement learning (DRL) is poised to revolutionise the feld of artifcial intelligence (AI) by endowing autonomous systems with high levels of understanding of the real world. Currently, deep learning (DL) is enabling DRL to efectively solve various intractable problems in various felds including computer vision, natural language processing, healthcare, robotics, to name a few. Most importantly, DRL algorithms are also being employed in audio signal processing to learn directly from speech, music and other sound signals in order to create audio-based autonomous systems that have many promising applications in the real world. In this article, we conduct a comprehensive survey on the progress of DRL in the audio domain by bringing together research studies across different but related areas in speech and music. Read More
Transformers in Speech Processing: A Survey
In this paper, we present a comprehensive survey that aims to bridge research studies from diverse subfields within speech technology. By consolidating findings from across the speech technology landscape, we provide a valuable resource for researchers interested in harnessing the power of transformers to advance the field. We identify the challenges encountered by transformers in speech processing while also offering insights into potential solutions to address these issues. Read More