Audio Coding

  • Uploaded by: api-3806739
  • 0
  • 0
  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Audio Coding as PDF for free.

More details

  • Words: 2,033
  • Pages: 49
Audio Coding Techniques (I) 

Introduction  



Lossless audio coding  



How is audio different from speech? Human auditory system Reversibility of closed-loop DPCM Inter-channel decorrelation

Perceptual audio coding  

Psychoacoustics Perceptual entropy EE493Q: Digital Speech Processing

Introduction to Audio 

What is audio? 



“Of or relating to high-fidelity sound reproduction”

How is audio different from speech? 

Higher sampling rate  



Higher accuracy 



12-16 bits per sample

Multi-channel 



CD-quality music: 20KHz Wideband speech in video conferencing: 7KHz

Mono, stereo, 6-channel

Require much more bandwidth 

raw data rate ~700Kbps per channel EE493Q: Digital Speech Processing

Stereo-Audio

L

R

EE493Q: Digital Speech Processing

Audio Compression 

How is it different from speech compression? 

Requirement Lossless compression is important in some applications (e.g., archiving and mixing of highquality recordings in professional environments)  Perceptually lossless compression (e.g., MP3 music) 



Principles No physical model exists for audio production  Instead, more emphasis is put on human auditory system, in particular, psychoacoustic masking effect 

EE493Q: Digital Speech Processing

Sound Quality Requirements

EE493Q: Digital Speech Processing

Review Question (I)

NO Q: Given an audio sampled at 16KHz, its 25th subband (12-12.5KHz) has SPL of 10dB, can human ear hear it? EE493Q: Digital Speech Processing

Review Question (II)

NO Q: Given a masker tone with 2Khz and 60dB, if the testing tone is played at the 15th CB with SPL of 50dB, is it masked?

EE493Q: Digital Speech Processing

Review Question (III)

AFTER Q: Consider the echo hiding scheme for audio watermarking, do we want to insert echoes before or after the masker?

EE493Q: Digital Speech Processing

Audio Coding Techniques (I) 

Introduction  



Lossless audio coding  



How is audio different from speech? Human auditory system Reversibility of closed-loop DPCM Inter-channel decorrelation

Perceptual audio coding  

Psychoacoustics Perceptual entropy EE493Q: Digital Speech Processing

Overview

Hans and Schafer, “Lossless compression of digital audio” IEEE Signal Processing Magazine, July 2001 Note: such approach does not take inter-channel correlation into account, which is unlikely to be optimal EE493Q: Digital Speech Processing

Intra-Channel Decorrelation

(rounding)

Notes: prediction residues e(n) are integers due to rounding A(z) is autoregressive (AR) model; B(z) is moving average (MA) model EE493Q: Digital Speech Processing

Justification of Reversibility Recall: quantization is not invertible, how can we achieve lossless compression regardless of the rounding operation?

e(n) = x(n) − Q[ xˆ (n)], K

xˆ (n) = ∑ ak x(n − k ) k =1

K

xˆ (n) = ∑ ak x(n − k ), k =1

x(n) = e(n) + Q[ xˆ (n)] Decoder

Encoder

Answer: closed-loop DPCM guarantees the reversibility EE493Q: Digital Speech Processing

Inter-Channel Decorrelation

L

s

R d

Average: s=(L+R)/2

Difference: d=R-L

EE493Q: Digital Speech Processing

Stereo Recording Techniques* 

X-Y technique: two directional microphones are placed coincidentally, typically at a 90+ degree angle to each other 



Mono-compatible

A-B technique: two omni-directional microphones are used at an especial distance to each other (20 centimeters up to some meters). 

Add another microphone at the center, it becomes “Decca Tree” EE493Q: Digital Speech Processing

Audio Coding Techniques (II) 

MP3 Audio Compression   



Filter bank/Modified DCT Psychoacoustic Models Bit Allocation

Advanced Audio Coding (AAC) Techniques    

MPEG-1,2,4 SONY ATRAC Lucent PAC Dolby AC-3 EE493Q: Digital Speech Processing

Introduction 

What does ISO MPEG-1 Audio provide? A transparently lossy audio compression system based on the weaknesses of the human ear. 







Can provide compression by a factor of 6 and retain sound quality. One part of a three part standard that includes audio, video, and audio/video synchronization

MPEG-2 and MPEG-4 have advanced audio coding (AAC) options ITU-T has its own standardized algorithm for wideband speech (audio) EE493Q: Digital Speech Processing

MPEG-I Audio Features  

PCM sampling rate of 32, 44.1, or 48 kHz Four channel modes:  



Monophonic and Dual-monophonic Stereo and Joint-stereo

Three modes (layers in MPEG-I speak):   

Layer I: Computationally cheapest, bit rates > 128kbps Layer II: Bit rate ~ 128 kbps, used in VCD Layer III: Most complicated encoding/decoding, bit rates ~ 64kbps, originally intended for streaming audio

EE493Q: Digital Speech Processing

MPEG-I Encoder Architecture

EE493Q: Digital Speech Processing

MPEG-I Encoder Architecture 







Polyphase Filter Bank: Transforms PCM samples to frequency domain signals in 32 subbands Psychoacoustic Model: Calculates acoustically irrelevant parts of signal Bit Allocation: Allots bits to subbands according to input from psychoacoustic calculation. Frame Creation: Generates an MPEG-I compliant bit stream.

EE493Q: Digital Speech Processing

What is Filter Bank?

Synthesis

Analysis EE493Q: Digital Speech Processing

Filter Bank Illustration

EE493Q: Digital Speech Processing

Modified Discrete Cosine Transform

Forward Transform

Inverse Transform EE493Q: Digital Speech Processing

Pre-Echo Distortion

EE493Q: Digital Speech Processing

MPEG-I Psychoacoustic Models MPEG-I standard defines two models:  Psychoacoustic Model 1: 

 



Less computationally expensive Makes some serious compromises in what it assumes a listener cannot hear

Psychoacoustic Model 2: 

Provides more features suited for Layer III coding, assuming of course, increased processor bandwidth.

EE493Q: Digital Speech Processing

Step 1: Spectral Analysis and SPL Normalization 

Convert samples to frequency domain 

Use a Hann weighting and then a DFT 





Simply gives an edge artifact (from finite window size) free frequency domain representation.

Model 1 uses 512 (Layer I) or 1024 (Layers II and III) sample window. Model 2 uses a 1024 sample window and two calculations per frame.

EE493Q: Digital Speech Processing

Step 2: Identification of Tonal and Noise Maskers 



Need to separate sound into “tones” and “noise” components Model 1: 

Local peaks are tones, lump remaining spectrum per critical band into noise at a representative frequency.

Example: 

Model 2: 

Calculate “tonality” index to determine likelihood of each spectral point being a tone 

based on previous two analysis windows

EE493Q: Digital Speech Processing

Graphic Illustration

X: tonal O: noise

EE493Q: Digital Speech Processing

Three Types of Frequency Masking

Noise-Masking-Tone (NMT): SMR=4dB  Tone-Masking-Noise (TMN): SMR=24dB  Noise-Masking-Noise (NMN): SMR=26dB 

NMT

Asymmetry EE493Q: Digital Speech Processing

TMN

Step 3: Decimation and Reorganization of Maskers 

“Smear” each signal within its critical band 



Use either a masking (Model 1) or a spreading function (Model 2).

Adjust calculated threshold by incorporating a “quiet” mask – masking threshold for each frequency when no other frequencies are present.

EE493Q: Digital Speech Processing

Step 4: Calculation of Individual Masking Thresholds  

Calculate a masking threshold for each subband in the polyphase filter bank Model 1:  



Selects minima of masking threshold values in range of each subband Inaccurate at higher frequencies – recall how subbands are linearly distributed, critical bands are NOT!

Model 2: 

If subband wider than critical band: 



Use minimal masking threshold in subband

If critical band wider than subband: 

Use average masking threshold in subband

EE493Q: Digital Speech Processing

Graphic Illustration

Tonal components

Noise components

EE493Q: Digital Speech Processing

Step 5: Calculating Global Masking Thresholds 

The hard work is done – now, we just calculate the signal-to-mask ratio (SMR) per subband 



SMR = signal energy / masking threshold

The calculated SMR results can be used by audio codec to determine how many bits are needed to spend on each subband 

This is where most compression occurs – if some coefficient is below the masking threshold, it does not need any bit! EE493Q: Digital Speech Processing

Graphic Illustration

EE493Q: Digital Speech Processing

Psychoacoustic Model Summary input audio frame Spectral Analysis and SPL Normalization Identification of Tonal and Noise Maskers Decimation and Reorganization of Maskers Calculation of Individual Masking Thresholds Calculating Global Masking Thresholds Signal-to-Masking Ratios (SMR) EE493Q: Digital Speech Processing

Example: Calculating Signal Energy

EE493Q: Digital Speech Processing

Calculating Masking Thresholds

EE493Q: Digital Speech Processing

SMR Results

EE493Q: Digital Speech Processing

How Perceptual Lossless Compression is Achieved? A

C

D

B

Coefficient A requires bits, but not coefficient B (masked) Question: how about coefficients C and D?

EE493Q: Digital Speech Processing

Summary of Perceptual Audio Coding 

Psychoacoustics   



Frequency dependency: Human ears are most sensitive to 2-4KHz Masking: A tone could be inaudible because of the presence of another one (close in frequency or time) Asymmetry: Noise-masking-tone is easier than tonemasking-noise

MP3  



Time-to-frequency transformation by filter bank or modified Discrete Cosine Transform Psychoacoustic Model I or II produces Signal-to-Masking Ratio (SMR) that guides the bit allocation process for each subband Perceptually lossless at the bit rate of 64K-128Kbps EE493Q: Digital Speech Processing

Headphone Technology

http://www.technologyreview.com/read_article.aspx?id=17642&ch=infotech EE493Q: Digital Speech Processing

Audio Coding Techniques (II) 

MP3 Audio Compression   



Filter bank/Modified DCT Psychoacoustic Models Bit Allocation

Beyond technical issues  

Legal, practical and ethic issues Open discussions

EE493Q: Digital Speech Processing

Legal Issues Surrounding MP3 It's a civil offense, punishable by fine, if you distribute music that you don't own the rights to.  It's a criminal offense to copy music illegally and then redistribute it for financial gain.  There is a great deal of uncertainty about how copyright laws should function in the digital world, but the laws themselves are clear 

EE493Q: Digital Speech Processing

The Story of Home-Taping Nightmare 





In 1970s, tapes become easy to be duplicated at home – nobody was caught as copyright violators, right? The economics of the entire system actually collapsed, and was only revived by the forced implementation of an entirely new audio format, the compact disc (CD). A tax on all blank tapes and taping mechanisms was created in accordance with the 1992 Home Recording Act to offset lost revenues

EE493Q: Digital Speech Processing

Now Comes the MP3 Nightmare 





The ease of downloading and sharing MP3s due to internet Those mammoth companies are still going to sue every college student they find with MP3s on their site. On 9 October 1998, when the RIAA filed for a temporary restraining order to prevent San Josebased Diamond Multimedia from selling their new MP3 player. Called the "Rio," this player retails for $199 and is essentially a Walkman for MP3 files.

EE493Q: Digital Speech Processing

What is Legal? 

Most MP3 files on the internet are illegal except 





Recorded works to which you personally own the copyrights. Recorded works in the public domain.

As long as you keep your MP3s in the privacy of your own hard drive and not on the Web, you are very hard to catch and relatively harmless. EE493Q: Digital Speech Processing

MP3: The Transformation of Recording Industry 

Why didn't the ISO/IEC address the copyright issue when developing MPEG1? 



Its members weren't necessarily thinking about the legal ramifications but instead focused on creating an effective technology.

Unsuccessful fight-back strategy by RIAA: search and destroy 

Only by systematic, consistent, massive legal action can the record company possibly hope to win this war. EE493Q: Digital Speech Processing

Watermarking: a Technical Savior? 

RIAA's Secure Digital Music Initiative (SDMI) goes in vain around 2000  



Your midterm project might have shown this All existing watermarking techniques are not good enough to win this war against piracy

Ultimately, it's all about a workable revenue model. Once that's been established, then perhaps the quality and convenience of the MP3 format can be seen as a boon to the industry instead of a threat.

EE493Q: Digital Speech Processing

Ethic Issues Copying music for your own private use is cool, but posting that music to a Web site or distributing it in any way is not. By doing this you are robbing people who worked very hard to create the music you like.  Think about: is there any difference between downloading an album from internet for free and stealing an album from the local store? 

EE493Q: Digital Speech Processing

Open Discussions Who should get paid?  Is there a better business model?  Is there any better technical solution than watermarking to fight against piracy?  Which side will you take? A defender of RIAA or a hacker? 

EE493Q: Digital Speech Processing

Related Documents