Home

Following previous major advances, self-supervised learning (SSL) has recently emerged as one of the most promising artificial intelligence (AI) methods. With this technique, it becomes feasible to take advantage of the colossal amounts of existing unlabeled data to significantly improve the results of various AI systems. In particular, the field of speech processing (SP) is being rapidly transformed by the rise of SSL due to massive industrial investments, and the explosion of data both made available by few companies. Although incredibly powerful, the complexity of SSL models requires researchers and the industry to acquire extraordinary computing capacities, which drastically reduces both the access to fundamental research in this field and its deployment in real products. For instance, existing works based on SSL models for speech are in fact relying on a system maintained and made available by a single company (wav2vec 2.0). The entire life cycle of the technology, from its theoretical foundations to its practical deployment, including the analysis of societal aspects, is therefore dependent only on institutions with the physical and financial means to support the intensity of the development of this technique. The E-SSL project aims at re-empowering the scientific community and the speech industry with the necessary control over self-supervised learning in order to ensure its fair evolution and deployment by facilitating both academic research and its transfer to industry. In practice, E-SSL holistically integrates three key issues of self-supervised learning for speech representations including its effective computational efficiency, its societal impacts and the feasibility of its extension to future products.