Researchers at Sony Computer Science Laboratories (CSL) have developed a new Deep Learning method to enhance and restore the quality of heavily compressed songs and audio recordings

Screen Shot 2022-09-02 at 10.01.21 PM — Source: https://arxiv.org/pdf/2207.01667.pdf

Today, many sophisticated tools and technologies allow us to store vast amounts of music and audio recordings on our electronic devices. A group of codec technologies, consisting of encoders and decoders, are used to encrypt, modify, and compress media files.

The so-called lossless codecs and lossy codecs are two different categories of codecs. Lossless codecs, including PKZIP and PNG codecs, duplicate the original file after decompression. Lossy compression techniques, on the other hand, make a copy of the original file look and sound just like the original, but take up less space on your electronic device.

A lossy audio codec basically works by compressing a digital audio stream, removing some data, and then decompressing it. It is usually difficult or impossible for humans to distinguish between original and unzipped files.

However, lossy codecs can introduce defects or alter the audio signal when using high compression ratios. Deep learning techniques have recently been employed to circumvent the shortcomings of lossy codecs and improve compressed files.

A new deep learning technique created by researchers at Sony Computer Science Laboratories (CSL) improves and restores the quality of overly compressed music and audio recordings. Their approach is built on generative adversarial networks (GANs). This is a machine learning model in which two neural networks “compete” to make accurate or reliable predictions.

The proposed model consists of two different models, ‘Generator (G)’ and ‘Critic (D)’. A spectrogram (a visual representation of the spectral frequencies of an audio signal) represents an extract of the MP3-compressed music audio signal provided to the generator.

The generator gradually improves in its ability to generate smaller inpainted replicas of the original signal. Meanwhile, the critic component of the GAN architecture acquires the ability to recognize the difference between the original high-quality files and the recovered files. To ensure that the music or audio data contained in the recovered file is as accurate as possible to the original file, the information obtained by the critic is finalized to improve the quality of the recovered file. used for purposes.

In a series of tests, researchers evaluated the performance of GAN-based architectures. The primary goal was to improve MP3 input quality and see if we could provide compressed samples that resemble the original file better than those produced by existing baseline models of audio compression. Their findings indicate that decompressed models of heavily compressed (16 kbit/s and 32 kbit/s) MP3 song models often sound better to experienced human listeners than the original compressed files. is shown. On the other hand, the team found that their model produced slightly substandard results despite using a lower compression ratio (64 kbit/s mono).

According to their paper, this architecture can generate and add realistic high-frequency information that improves the audio quality of compressed songs. The material produced included percussion components, guitar sounds, and vocal sibilance.

The team believes that it can significantly reduce the size of MP3 audio files without affecting quality or introducing obvious imperfections to the human ear.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Stochastic Restoration of Heavily Compressed Musical Audio using Generative Adversarial Networks'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and reference article.

Please Don't Forget To Join Our ML Subreddit

Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree at the Indian Institute of Technology (IIT), Bhubaneswar. She is a data her science enthusiast and has a keen interest in the scope of artificial intelligence applications in various fields. Her passion lies in exploring new advancements in technology and its practical applications.

Researchers at Sony Computer Science Laboratories (CSL) have developed a new Deep Learning method to enhance and restore the quality of heavily compressed songs and audio recordings

Opinion | What humans could learn from artificial intelligence

Bluware and Microsoft Azure develop OSDU-enabled interactive AI seismic interpretation solution for energy super major | Azure Blog and Updates

You may also like

Leave a Comment Cancel Reply

About Us

Recent Articles

Featured