Overview & Sound And Audio

  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Overview & Sound And Audio as PDF for free.

More details

  • Words: 10,417
  • Pages: 32
MULTIMEDIA OVERVIEW Introduction The word “multimedia” comes from the Latin words multus which means numerous and media which means middle or center. More recently the word media (singular medium) started to convey the sense “intermediary”. Multimedia therefore means “multiple intermediaries” or “multiple means”. The word “multimedia” may be used as a noun (e.g. Multimedia is a new technological field) and as an adjective (e.g. a multimedia document). Multimedia in the general sense therefore means “multiple intermediaries” between the source and sink of information or “multiple means” by which information is stored, transmitted, presented or perceived. The multiple means by which we can perceive information are the following : • text (e.g. books and newspapers) • images and graphics (e.g. photographs, magazines, sketches) • sound (e.g. radio, gramophone records and audio cassettes) • video and animation (TV, video cassettes and motion pictures) Multimedia Presentation and Production A multimedia presentation is basically a “show” whose content is expressed through various media types like text, image, sound, video etc. There can be various objectives of the presentation, for example, to deliver some information about a company’s performance (corporate presentation), to enhance the knowledge of students (computer based training), to present the facilities offered by a travel company to the tourists (tourist kiosk) and so on. In fact any subject matter where information may be expressed through various visual and audio information may be a potential application area for a multimedia presentation. The end users who execute and watch the presentations are called the target audience. Different types of presentations may have different categories of audience like company employees, students, professionals, factory workers, tourists etc. The presentation is usually played back on a personal computer either from the hard disk or a CD-ROM. Sometimes when the audience consists of a large number of people, the presentation may be projected on a big screen using a projection system. Before a presentation can be viewed, it has to be created. This is known as multimedia production. The production work is carried out by a team of professionals equipped with the required skill and knowledge. These professionals are called developers or authors and the development work is called authoring. Authoring involves a number of steps which we shall discuss later in details. Characteristics of a Multimedia Presentation There is no general consensus on the exact definition of multimedia. Several definitions have been proposed by authors with varying scopes. According to Fetterman and Gupta, digital multimedia is defined as the integration of upto six media types in an interactive, color computing environment. In contrast, Vaughan proposes a more generic definition : 1

multimedia is any combination of text, graphic, art, sound, animation and video delivered by any electronic means. Thus non-interactive and non-digital devices are also included within the purview of multimedia. As per Minoli and Keinath, multimedia is an interdisciplinary application-oriented technology that capitalizes on the multi-sensory nature of humans. (Humans are multi-sensory as they can communicate with sight, hearing, touch, smell, and taste). From these varying identities, let us try to isolate a fixed set of characteristics that a multimedia presentation should possess. The first characteristic of a multimedia presentation is of course the use of multiple media. Text has been the main mode of communication for many years during the pre-multimedia era, and still continues to be one, but it is now more and more supplemented by other media, which often prove more effective. As computer technology progressed, in addition to text, pictures were also started being used. Displaying pictures required improved display devices and powerful processing capabilities. Pictures are sub-divided into two types : the real-world pictures captured by a camera, called images, and the hand-drawn pictures like sketches and portraits, called graphics. Text, images and graphics are together referred to as static elements, because they do not change over time. With further improvement in technology, time-varying elements like sound and movies started to become widely used. Movies are again divided into two classes : those which depict real-world incidents are called motion pictures (recorded on film) or motion video (recorded on magnetic media), and those which depict artificial or imaginary scenarios, are called animation. The question that can arise in one’s mind is : Can any combination of media types form a multimedia presentation? The general convention that is followed goes like this : a legitimate multimedia presentation should contain at least one static media, like text, image or graphics, and at least one time-varying media, like audio, video or animation. We would regard another important characteristic to be part of the definition, that of interactivity. Unlike TV and motion picture which we view passively in a sequence which the program director has meant us to see, the feature that distinguishes multimedia from the rest is that the viewer can directly interact with the media and change its order or sequence of playback. This is a powerful capability which allows us to view only portions of a show which we are interested in, or navigate from one portion of the show to another as per our preference. Though this capability was already present in a rudimentary form e.g. we could change channels on a TV, rewind or forward an audio or video sequence in an audio-cassette or video-cassette player, multimedia takes us a step further. The earlier devices were sequential in nature meaning that we had to skip a portion of the show to move ahead but using multimedia capabilities we could move instantly to any specific point within a show. This is also called random access to distinguish it from sequential access. Secondly multimedia also allows us to interact with the individual media components of a show e.g. start a video clip or switch off the background music or select options from an on-screen menu. This was never possible with the earlier devices. To emphasize this capability of not viewing a presentation in a pre-defined sequence and instead “jump” or diversify to any scene of the presentation, we

2

use the phrase non-linear presentation to depict a multimedia show in contrast to a linear presentation to depict TV shows and motion pictures. The sequential devices like an audio- or video-cassette player can not provide us with the capability of random access. Thus multimedia production and playback needs to be done on a digital platform like a personal computer. This brings us to the next important characteristics of a multimedia, that all the media needed to create a multimedia show like text, image, sound etc. have to be present in a digital form. We would see later what the digital form actually means and how the media are transformed into this form. One would tend to believe that any number of media files played on a computer would comprise a multimedia presentation. For example, we can edit a text document in a word processor, play a video clip in the Media Player and listen to an Audio-CD using the CDPlayer. All these applications collectively however would not constitute a multimedia presentation. To clarify that, we introduce another important characteristic of a multimedia presentation called integrity. This means that although there may be several media types present and playing simultaneously, they had to be integrated or be part of a single entity which is the presentation. We should not be able to separate out the various media and control them independently, rather they should be controlled from within the framework of the presentation. Elements of a multimedia presentation The first thing that comes to our minds when we hear the phrase “multimedia presentation” is something made out of a collection of text, pictures, sound, animation etc. However multimedia means more than that. Let us first try to understand what a “multimedia presentation” essentially means. Everyone of us has heard the phrase “A picture is worth a thousand words”. So in the electronic arena too, a picture is now used more often to communicate ideas. The term picture can be used to mean two things. First, a photographic snapshot, called an image. For example you might want to tell your friend how your new car looks like. Instead of describing every aspect of the car like shape, size, color, the seats, the lights, the driving mechanism and so on in so many words, you could instead send a snapshot of the car which will speak for itself. A number of snapshots covering the external and internal features would of course be better. The other type of picture is one that is drawn by hand and is called a graphic. This can be a simple sketch of black lines, like a portrait or an architectural drawing, or as complex as an painting with many shades of color like the Mona Lisa. Intermediate between these lie drawings with limited number of colors like charts, graphs, logos, maps etc. Nowadays graphics also includes pictures drawn using computer software like Paintbrush, Photoshop or Coreldraw. Usually the number of colors in a graphic is relatively smaller than that in a photograph. For example, you might want to explain to a class how the internal combustion engine of your car looks like. In this case instead of taking your car apart and trying to take a snapshot of the engine, it would be much simpler if you could draw the engine inside out on a piece of paper. You also have the additional flexibility of showing the internal features in as much detail as 3

you like and in a size that suits your context like greatly magnifying parts for pointing out details. Other than just showing how the internals and externals of your car looks like, you might want others to know what kind of sound that car makes while starting or how well the car’s music system sounds. Enter the third type of communication media called audio. People have been communicating for years using audio through telephones, radios or walkie-talkies, but when you need to incorporate audio in to your multimedia presentation you have to look at it at a whole new perspective. Now suppose you are not satisfied just by showing others how your car looks like and or letting them hear how it sounds. You would like them to get the actual feel of the car by showing them how it picks up speed from 0 to 60 kmph in a few seconds, and perhaps the 100 kmph dash. The best way to do this will be to use a movie camera, get a video clip of your car and let your friends have a look at it. This communication media is as you have guessed – the video. Television and motion pictures are some of the well known examples of communicating ideas through this medium. Again suppose you are in front of your class with the sketch of the IC engine in your hand and trying to make them understand how the engine actually works, but in vain. From your sketches alone the class fails to comprehend how the piston actually moves or how the fuel ignites. You would wish at that time that the sketches actually move to illustrate your points. This is in fact a reality nowadays and this medium goes by the name of animation. The concept of animation started with the cartoon films, of making your drawings walk, talk and behave like living objects, and nowadays have become an important media to convey those ideas where you cannot get video clips. Other instances when animation becomes immensely helpful, is when the objects are too small to be photographed (e.g. motions of subatomic particles), or too large (e.g. planetary bodies), or an abstract entity (e.g. electromagnetic waves) – the list goes on. Advantages of Multimedia Systems The benefits of multi-sensory systems are manifold. First, the interaction with the computer system may appear more natural and friendly. Second, useful redundancy of information may be possible. For example user attention may be better caught when both visual and audio modes are mixed. Third, complementary information presented together may improve memorization of knowledge. Also the fact that the user interacts with the presentation leads to greater retentivity of information. It has been remarked that we remember 20% of what we read, 40% of what we see and hear but more than 80% of what we do. It is therefore clear if the interactive element of multimedia presentations could be utilized to its fullest potential, they can form important tools for imparting training and education.

4

Fourth, emotional information is easier to convey. For example a video image of a correspondent added to his voice message may better express his views and sound more convincing. Lastly, multisensory systems may be of invaluable benefit to users with special needs. Hardware and Software Requirements Most of the PCs are IBM compatible but what makes them multimedia enabled? To answer this question the Multimedia Marketing Council (MMC) has come out with specifications for multimedia PCs which can assure a quality playback of multimedia productions. These are known as MPC (multimedia PC) specifications. The MMC includes a number of companies like Creative Labs Inc, Fujitsu, Media Vision, Microsoft Corp, NEC Technologies, Olivetti, Philips Consumer Electronics Co., Zenith Data Systems. Hardware and software requirements of a multimedia PC can be categorized into two classes. Multimedia playback usually requires smaller amount of resources, those which are sufficient for viewing an existing multimedia presentation. Multimedia production generally requires greater and more powerful resources and should fulfill all requirements for designing and developing a multimedia presentation. In both cases however storage and processing requirements are much greater than non-multimedia PCs, because media components like images, audio and video occupy large file sizes and require powerful processors for manipulation and presentation, as compared to plain text files. Multimedia Playback Processor should be at least Pentium class and a minimum of 8 MB RAM is required, although 32 MB is recommended for smoother playback. Hard disk should be at least 540 MB in capacity, have 15 ms access time and should be able to provide 1.5 MB/sec sustained throughput. The monitor and video display adapter should conform to SVGA standards and should be able to support 800 X 600 display mode with true color (16.7 million colors). VRAM amount should be at least 4 BM. The PC should be equipped with a CD-ROM drive having a speed of at least 4X but higher speeds like 36X are recommended. Most of the multimedia presentations are delivered and executed from a CD-ROM disc. To be able to hear sounds, the PC should have a sound card with attached speakers. Input devices like standard 101-keys keyboard and two-button mouse should be present.

5

Multimedia PC system software should be compatible with Windows 95 or higher, with standard software for playback of media files in standard formats (e.g. Windows Media Player). In some cases application programs like web browsers (e.g. Internet Explorer), media players (e.g. QuickTime Movie Player, Real Media Player) and document readers (e.g. PowerPoint Viewer, Adobe Acrobat Reader) might be required to display additional content. Multimedia Production For production work processors should be at least Pentium II class or higher. Memory should be at least 128 MB with 256 MB recommended. Multimedia production work require a huge amount of disk space and so typical requirements would be around 10 GB with 40 GB recommended. Like playback systems monitor and video display adapter should conform to SVGA standards and should be able to support 800 X 600 display mode with true color. VRAM amount should be at least 4 MB with 8 MB recommended. The PC should be equipped with a CD-ROM drive having a speed of at least 4X but higher speeds like 36X are recommended. In addition the PC should also have a CDWriter as most of the multimedia presentations are delivered on a CD-ROM disc. For playback of audio components the PC should have a sound card and attached speakers. In addition for recording of sound like human speech or environmental sounds, one or more microphones would be required, capable of being connected to the sound card. Standard input devices like the keyboard and mouse would be required. Additional input devices and accessories would be required for digitizing media components. A scanner would be required for converting paper images into the electronic form, while a video capture card would be required for converting analog video from a video cassette to the digital form. Multimedia PC system software should be compatible with Windows 95 or higher, with standard software for playback of media files in standard formats. Additional software would be required for editing the media components and authoring the presentation. Editing software are used to manipulate media components to suit the developers requirements. Examples are Adobe Photoshop and CorelDraw (for image editing), CoolEdit and SoundForge (for audio editing), Adobe Premiere and Adobe AfterEffects (for video editing), Macromedia Flash and Kinetix 3D Studio Max (for creating animation). Authoring software are used to integrate all the edited media into a single presentation and build navigational pathways for accessing the media. Examples include Macromedia Director and Asymmetrix ToolBook. 6

To display web content web browsers would be required. Examples are Microsoft Internet Explorer and Netscape Navigator. To create web content HTML and JavaScript editors might be required. Examples include Microsoft FrontPage and Macromedia DreamWeaver. Uses of Multimedia Some of the important uses of multimedia are mentioned below : For home : Games for kids, interactive encyclopedias, story-telling, cartoons etc. For school : Learning packages, simulation of lab experiments (especially something that cannot be done at a school lab). Industrial training : Computer based training (CBT) packages for employees, both technical and marketing. Kiosks : Information accessed through a touch screen and viewed on a monitor. Can be a multi-lingual product catalog with provisions for placing orders. Can also be used for dispensing important information e.g. train timings at a railway station. Corporate presentations : For presenting salient features of a company and its products/services to clients, prospective customers, suppliers, retailers etc. Business : Items difficult to stock like glass utensils, industrial equipment etc. can be displayed to prospects by company sales people. Real estate agents can display interior and exterior of buildings along with necessary information like dimension and price. Architects and engineers can transform blueprints into buildings and products. Tourism : Tour companies can market packaged tours by showing prospects glimpses of the places they would like to visit, details on lodging, fooding, special attractions etc. Video-conferencing : Allows real-time interactions between people who need to work together but cannot be in the same place at the same time. This helps organizations to cut travel costs and allow working groups to span wider geographical areas. Distance learning : Expertise of the best instructors can be delivered to thousands of students over the entire globe in less time. Also allows the students to learn at their own pace.

7

Digitization Concepts It has already been mentioned that multimedia involves digital representations of the media components and requires digital computers for playback. Let us try to understand what this really means. Analog Representations An analog quantity is a physical value that varies continuously over space and/or time. It can be described by mathematical functions of the type s=f(t), s=f(x,y,z) or s=f(x,y,z,t). Physical phenomena that stimulate human senses like light and sound can be thought of as continuous waves of energy in space. Continuity implies that there is no gap in the energy stream at any point. These phenomena can be measured by instruments which transform the captured physical variable into another space/time dependent quantity called a signal. If the signal is also continuous we say that it is analogous to the measured variable. The instruments are called sensors and the signals usually take the form of electrical signals. For example, a microphone converts the environmental sound energy into electrical signals and a solar cell converts the radiant energy (light and heat) from the sun into electrical signals.

Analog signals have two essential properties: • The signal delivered by the capturing instrument may take any possible value within the limits of the instrument. Thus the value can be expressed by any real number in the available range. Analog signals are thus said to be amplitude continuous. • The value of the analog signal can be determined for any possible value of time or space variable. Analog signals are therefore also said to be time or space continuous. Digital Representations In contrast to analog signals, digital signals are not continuous over space or time. They are discrete in nature which means that they exist or have values only at certain points in space or instants in time, but not at other points or instants. To use a personal computer to 8

create multimedia presentations, all media components have to be converted to the digital form because that is the form the computer recognizes and can work with.

Analog to Digital Conversions The transformation from analog to digital form requires three successive steps : Sampling, Quantization and Code-word generation. Sampling Sampling involves examining the values of the continuous analog signal at certain points in time and thereby isolate a discrete set of values from the continuous suite of values. Sampling is usually done at periodic time or space intervals. For time-dependant quantities like sound, sampling is done at specific intervals of time and is said to create time-discretization of the signal. For time-independent quantities like a static image, sampling is done at regular space intervals (i.e. along the length and breadth of the image) and is said to create space-dicretization of the signal.

The figure illustrates the sampling process. For every clock pulse the instantaneous value of the analog waveform is read thus yielding a series of sampled values. The sampling clock frequency is referred to as sampling rate. For a static image, sampling rate would be measured in the spatial domain i.e. along the length and width of the image area and would actually denote the pixel resolution, while for a time-varying medium like sound, it denotes how many times per second the analog wave is sampled and measured in Hertz. Since the input analog signal is continuous, the value change over space or time. The A/D conversion process takes a finite time to complete hence the input analog signal must beheld constant during the conversion process to avoid conversion problems. This is done by a sample-and-hold circuit.

9

Quantization This process consists of converting a sampled signal into a signal which can take only a limited number of values. Quantization is also called amplitude-discretization. To illustrate this process consider an analog electrical signal whose value varies in a continuous way between 0 mV and +255 mV. Sampling of the signal creates a set of discrete values, which can have any value within the specified range, say a thousand different values. For quantizing the signal we need to fix the total number of values permissible. Suppose we decide that we will consider only 256 of the thousand sampled values that adequately represents the total range of sampled values i.e. from the minimum to the maximum. This enables us to create a binary representation of each of the considered values. We can now assign a fixed number of bits to represent the 256 values considered. Since we know that n binary digits can give rise to 2^n numbers, so a total of 8 bits would be sufficient to represent the 256 values. The number of bits is referred to as the bit-depth of the quantized signal. (Incidentally, we could have considered all the thousand values, but that would have required a larger number of bits and corresponding more computing resource, but more of that later on). Code-word Generation This process consists of associating a group of binary digits called a code-word to every quantized value. In the above example, the 256 permissible values will be allocated values from 00000000 for the minimum value to 11111111 for the maximum value. Each binary value actually represents the amplitude of the original analog signal at a particular point or instant, but between two such points the amplitude value is lost. This explains how a continuous signal is converted into a discrete signal. The whole process of sampling operation followed by quantization and code word generation is called digitization. The result is a sequence of values coded in binary format. Physically an analog signal is digitized by passing it through a electronic chip called an Analog-to-Digital Converter (ADC). Digital to Analog Conversion The digital form of representation is useful inside a computer for storage and manipulation. Since humans only react to physical sensory stimuli, playback of the stored media requires a conversion back to the analog form. Our eyes and ears can only sense 10

the physical light and sound energies, which are analog in nature, not the digital quantities stored inside a computer. Hence the discrete set of binary values need to be converted back to the analog form during playback. For example a digital audio file needs to be converted to the analog form and played back using a speaker for it to be perceived by the human ear. A reverse process to that explained above is followed for this conversion. Physically this is done by passing the digital signal through another electronic chip called a Digital-to-Analog Converter (DAC).

Relation between Sampling Rate and Bit Depth As we increase the sampling rate we get more information about the analog wave. So the resultant digital wave would be a more accurate representation of the analog wave. However increasing the sampling rate also implies we have more data to store and thus require more space. In terms of resources this implies more disk space and RAM and hence greater will be the cost involved. Increasing the number of samples per second also means we require more numbers to represent them. Hence we require a greater bit depth. If we use a lower bit depth than is required we will not be able to represent all the sample values. Hence the advantage of using a higher sampling rate will be lost. On the other hand if we use a lower sampling rate we will get a lesser amount of information regarding the analog wave. So the digital sound will not be an accurate representation of the analog wave. Now if we use a high bit depth, we will have provisions for representing a large number of samples per second. Because of the larger number of bits the size of the sound file will be quite large but because of the low number of samples the quality will be degraded as compared to the original analog wave. Quantization Error No matter what the choice of bit depth digitization can never perfectly encode a continuous analog signal. An analog waveform has an infinite number of amplitude values but a quantizer has a finite number of intervals. All the analog values between two intervals can only be represented by the single number assigned to that interval. Thus the quantized value is only an approximation of the actual. For example suppose the binary number 101000 corresponds to the analog value of 1.4 V, and 101001 corresponds to 1.5 V and the analog value at sample time is 1.45 V. Because 1010001/2 is not available the quantizer must round up to 101001 or down to 101000. Either way there will be an error with a magnitude of one-half of an interval. Quantization error (e) is the difference between the actual analog value at sample time and the quantized value, as shown below. Let us consider an analog waveform which is sampled at a, b and c and the corresponding sample values are A, B and C. Considering the portion between A and B, the actual value of the signal at some point x after a is xX 11

but value of the digital output is fixed at xm. Thus there is an error equal to the length mX. Similarly at point y, actual value of analog signal is yY but digital output is fixed at yn. Thus error increases to nY. This continues for all points between a and b until just before b for an actual value of almost bB, we get a sampled value still fixed at bp. The error is maximum at this point and equals to pB which is also almost equal to the height of one step. This maximum error is the quantization error, denoted by e and is equal to one step size of the digital output.

Because of quantization error there is always a distortion of the wave when represented digitally. This distortion effect is physically manifested as noise. Noise is any unwanted signal that creeps in along with the required signal. To eliminate noise fully during digitization we must sample at an infinite rate which is practically impossible. Hence we must find out other ways to reduce the effects of noise. Other than quantization error, noise may also percolate in from the environment, as well as the electrical equipment used for digitization. In characterizing digital hardware performance we can determine the ratio of the maximum expressible signal amplitude to the maximum noise amplitude. This determines the S/N (signal to noise) ratio of the system. It can be shown that the S/N ratio expressed in decibels varies as 6 times the bit-depth. Thus increasing bit-depth during sampling leads to the reduction of noise. To remove environmental noise we need to use good quality microphones and sound proof recording studios. Noises generated from electrical wires may be reduced by proper shielding and earthing of the cables. After digitization noise can also be removed by using sound editing software. These employ noise filters to identify and selectively remove noise from the digital audio file. Each sample value needs to be held constant by a hold circuit until the next sample value is obtained. Thus the maximum difference between the sample value and the actual value of the analog wave is equal to the height of one step. If be the peak to peak height of the wave and

be the bit depth, then number of steps is

which is equal to the quantization error

. Height of each step is

. Thus we have the relation:

.

Signal to noise ratio Expressed in decibels the SNR is seen to be directly proportional to the bit-depth: This implies that if bit-depth is increased by 1, during digitization, the signal to noise ratio increases by 6 dB. Importance of Digital Representation 12

The key advantage of the digital representation lies in the universality of representation. Since any medium, be it text or image or sound is coded in a unique form which ultimately results in a sequence of bits, all kinds of information can be handled in the same way. The following advantages are also evident: Storage : The same digital storage device, like memory chips, hard disks, floppies and CD-ROMs, can be used for all media. Transmission : Any single communication network capable of supporting digital transmission has the potential to transmit any multimedia information. Digital signals are less sensitive to noise than analog signals. Attenuation of digital signals are lesser. Error detection and correction can be implemented. The encryption of the information is possible to maintain confidentiality. Processing : Powerful software programs can be used to analyze, modify, alter and manipulate multimedia data in a variety of ways. This is probably where the potential is the highest. The quality of the information may also be improved by removal of noises and errors. This capability enables us to digitally restore old photographs or noisy audio recordings. Drawbacks of Digital Representation The major drawback lies in the coding distortion. The process of first sampling and then quantizing and coding the sampled values introduces distortions. Also since a continuous signal is broken into a discrete form, a part of the signal is actually lost and cannot be recovered. As a result the signal generated after digital to analog conversion and presented to the end user has little chance of being completely identical to the original signal. Another consequence is the requirement of large digital storage capacity required for storing image, sound and video. Each minute of CD-quality stereo sound requires 10 MB of data and each minute of full screen digital video fills up over 1 GB of storage space. Fortunately compression algorithms have been developed to alleviate the problem to a certain extent.

SOUND PRINCIPLES OF ACOUSTICS

13

Nature of Sound Sound is a form of energy similar to heat and light. Sound is generated from vibrating objects and can flow through a material medium from one place to another. During generation the kinetic energy of the vibrating body is converted to sound energy. Acoustic energy flowing outwards from its point of generation can be compared to a spreading wave over the surface of water. When an object starts vibrating or oscillating rapidly, a part of their kinetic energy is imparted to the layer of the medium in contact with the object e.g. the air surrounding a bell. The particles of the medium on receiving the energy starts vibrating on their own, and in turn help to impart a portion of their energy to the next layer of air particles, which also starts vibrating. This process continues thereby propagating the acoustic energy throughout the medium. When it reaches our ears it sets the ear-drums into similar kind of vibration and our brain recognizes this as sound. Sound Waves As the sound energy flows through the material medium, it sets the layers of the medium into oscillatory motion. This creates alternate regions of compression and expansion. We represent this pictorially in the form of a wave with alternate positive and negative peaks on either side of a horizontal axis.

Spatial and Temporal waves

Waves can be of two types : Spatial and temporal waves. Spatial Waves represent the 14

vibrating states of all particles in the path of a wave at an instant of time. The horizontal axis represents the distance of all particles. Distance of separation between points in the same phase is called the Wavelength. The particles at points O and D have the same state of motion at that instant and are said to be at the Same Phase. The length of the wave between O and D is called the Wavelength.

Temporal Waves represent the state of a single particle in the path of a wave over a period of time. The horizontal axis represents the time period over which the wave flows. The time elapsed between which the particle is in the same phase is called the Time Period. The state of the particle is same at instants O and D, and the particle is said to have undergone one Complete Cycle or Oscillation. The time interval between instants O and D is said to be Time Period of the wave.

Fundamental Characteristics A sound wave has three fundamental characteristics : Amplitude of a wave is the maximum displacement of a particle in the path of a wave and is the peak height of the wave. The physical manifestation of amplitude is the intensity of energy of the wave. For sound waves this corresponds to the loudness of sound

15

The second characteristic is Frequency. This measures the number of vibration of a particle in the path of a wave, in one second. The physical manifestation of frequency of a sound wave is the pitch of sound. A high pitched sound has higher frequency than a dull sound. Frequency is measured in an unit called Hertz and denoted by Hz. A sound of 1 Hz is produced by an object vibrating at the rate of 1 vibration per second. The total range of human hearing lies between 20 Hz at the lower end to 20,000 Hz (or 20 KHz) at the higher end.

The third characteristic is the Waveform. This is the actual shape of the wave when represented pictorially. The physical manifestation is the quality or timbre of sound. This helps us to distinguish between sounds coming from different instruments like guitar and violin.

Musical Sound and Noise Sounds pleasant to hear are called Musical and those unpleasant to our ears are called 16

Noise. Though quite subjective, musical sounds normally originate from periodic or regular vibrations while noise generally originates from irregular or non-periodic vibrations. Musical sounds most commonly originate from vibrating strings, like in guitars and violins, vibrating plates, like in drums and tabla, and vibrating air columns, like in pipes and horns. In all these cases periodic vibration is responsible for the musical sensation. Tone and Note A Tone is a sound having a single frequency. A tone can be represented pictorially by a wavy curve called a Sinusoidal wave. A tone is produced when a tuning fork is struck with a padded hammer. The kind of vibration associated with the generation of a tone is called Simple Harmonic Motion. This is executed by a spring loaded weight tied by a flexible string to a point moving around the circumference of a circle at constant speed.

In daily life we do not hear single frequency tones. The sounds we normally hear are a composite mixture of various tones of varying amplitudes and frequencies. Such a composite sound is called a Note. The tone parameters determine the resultant of the note.

Psycho-Acoustics and Decibels In harnessing sound for various musical instruments as well as for multimedia applications, the effects of sound on human hearing and the various factors involved needs to be analyzed. Psycho-acoustics is the branch of acoustics which deals with these effects. A unit for measuring loudness of sound as perceived by the human ear is Decibel. It involves comparing the intensity of a sound with the faintest sound audible by the human ear and expressing the ratio as a logarithmic value. The full range of human hearing is 120 decibels. Logarithms are designed for talking about numbers of greatly different magnitude such as 56 vs. 7.2 billion. The most difficult problem is getting the number of zeros right. We can use scientific notations like 5.6 X 10^1 and 7.2 X 10^9 but these are awkward to deal with. For convenience we find the ratio between the two numbers and 17

convert it to a logarithm. This gives us a number like 8.1. To avoid the decimal we multiply the number by 10. If we measured one value as 56 HP (horse power - a measure of power) and another as 7.2 billion HP, we say that one is 81 dB greater than the other. Power in dB = 10 log10 (power A / power B) or Power in dB = 20 log10 (amplitude A / amplitude B) The usefulness of this becomes apparent when we think how our ear perceives loudness. The softest audible sound has a power of about 10^-12 watt/sq. meter and the threshold of pain is about 1 watt/sq. meter. giving a total range of 120 dB. Thus when we speak of a 60 dB sound we actually mean : 60 dB =10 * log10 (Energy content of the measured sound / Energy content of the softest audible sound) Thus, Energy content of measured sound = 106 * (Energy content of softest audible sound) Secondly our judgement of relative levels of loudness is somewhat logarithmic. If a sound has 10 times more power than another, we hear it twice as loud. (Logarithm of 10 is equal to 1). Most studies of psycho-acoustics deal with sensitivity and accuracy of human hearing. The human ear can respond to a range of amplitudes. People's ability to judge pitch is quite variable. Most subjects studied could match pitches to within 3%. Recognition in terms of timbre is not very well studied but once we have learned to identify a particular timbre, recognition is possible even if loudness and pitch are varied. We are able to perceive the direction of sound source with some accuracy. Left, right and height information is determined by the difference of sound in each ear. We can also understand whether the sound source is moving away or towards us. Masking One of the most important findings from the study of psycho-acoustics is a phenomenon called Masking which has wielded profound influence in later years of digital processing of sound. Masking occurs due to the limitations of human ear in perceiving multiple sources of sound simultaneously. When a large number of sound waves of similar frequencies are present in the air at the same time then it is seen that higher volume or higher intensity sounds apparently predominates over lower intensity sounds and 'masks' the latter out or makes it inaudible. Thus even though the masked out sound actually exists yet we are unable to perceive it as a separate source of sound. The higher intensity sound is called the Masker and the lower intensity sound is called Masked. This phenomenon effective over a limited range of frequencies beyond which masking may not be perceptible. This range of frequencies is called the Critical Band. Though masking occurs as a result of limitations of human ear, nevertheless modern sound engineers have converted it to advantage in designing digital sound compressors. Software for compressing sound files in digital compressors utilize the masking phenomenon to throw away irrelevant information from sound files in order to reduce its size and storage space. 18

BASIC SOUND SYSTEMS An elementary sound system consists of 3 main components : microphone, amplifier and loudspeaker. A microphone is a device for converting sound energy to electrical energy. An amplifier is a device which boosts the electrical signals leaving the microphone in order to drive the loudspeakers. A loudspeaker is a device which converts electrical energy back into sound energy.

Microphones Sound pressure exists as patterns of air pressure. The microphone changes this information into patterns of electrical current. There are several characteristics that classify microphones : One classification is based on how the microphone responds to the physical properties of a sound wave (like pressure, gradient etc.). Another classification is based on the directional properties of the microphone. A third classification is based on the mechanism by which the microphone creates an electrical signal. Moving Coil Microphones In a moving-coil or dynamic microphone, sound waves cause movement of a thin metallic diaphragm and an attached coil of wire. A magnet produces a magnetic field which surrounds the coil. As sound impinges on the diaphragm attached to the coil, it causes movement of the coil within the magnetic field. A current is therefore produced proportional to the intensity of the sound hitting the diaphragm.

Condenser Microphone 19

Often called the capacitor or condenser microphone, here the diaphragm is actually the plate of a capacitor. The incident sound on the diaphragm moves the plate thereby changing the capacitance and generating a voltage. In a condenser microphone the diaphragm is mounted close to but not touching a rigid backplate. A battery is connected to both pieces of metal which produces an electrical potential or charge between them. The amount of charge is determined by the voltage of the battery, the area of the diaphragm and backplate, and distance between the two. This distance changes as the diaphragm moves in response to sound. When distance changes current flows in the wire as the battery maintains the correct charge. The amount of current is proportional to the displacement of the diaphragm. A common variant of this design uses a material, usually a kind of plastic, with a permanent charge on it. This is called an Electrets microphone.

Pressure Microphones It consists of a pressure sensitive element contained in an enclosure open to air on one side. Sound waves creates a pressure at the opening regardless of their direction of origin, which moves the diaphragm to generate current. The polar plot of a microphone graphs the output of the microphone with equal sound levels being input into the microphone at various angles around the microphone. The polar plot for an pressure microphone is a circle. So desired sound and noise are picked off equally from all directions. Thus it is also called an omni-directional microphone. Ex:Audio Technica omnidirectional Microphone

20

Gradient Microphones The diaphragm is open to air on both sides so that the net force on it is proportional to the pressure difference. A sound impinging upon the front of the microphone creates a pressure at the front opening. A short time later it enters the back of the microphone and creates a differential pressure on the microphone. Sounds from the side create identical pressure on both side of the diaphragm and produce no resultant displacement. The polar response resembles the figure 8. It has maximum response for sound from the openings and minimum response for sound incident from the sides. Also known as bi-directional microphone.

Cardiod Microphones Structurally similar to a bi-directional one except for a layer of resistive material (cloth or foam) on one side. The resistive material slows down the sound pressure from the back opening to the diaphragm and is optimized so that time taken to reach the diaphragm is same from rear and front. Pressures being equal at front and rear there is no movement of the diaphragm resulting in noise cancellation at the rear and partial cancellation at the sides producing a hear-shaped polar plot. Also known as the uni-directional microphone.

21

Amplifiers and Loudspeakers Amplifier is a device in which a varying input signal controls a flow of energy to produce an output signal that varies in the same way but has a larger amplitude. The input signal may be a current, a voltage, a mechanical motion, or any other signal, and the output signal is usually of the same nature. The ratio of the output voltage to the input voltage is called the voltage gain. The most common types of amplifiers are electronic and use a series of Transistors as their principal components. In most cases, the transistors are incorporated into integrated circuit chips. Loudspeakers converts electrical energy back to acoustic energy. The electrical current is made to flow in a coil attached to a paper cone free to oscillate in a magnetic field. Attractive and repulsive forces generated vibrate the loudspeaker cone thereby producing sound. An important criteria is an even response for all frequencies. However the requirements for good high and low frequency response conflicts with each other. Thus there are separate units called woofer, midrange and tweeter for reproducing sound of different frequencies. Woofer : 20 Hz to 400 Hz Midrange : 400 Hz to 4 KHz Tweeter : 4KHz to 20 KHz

DIGITAL REPRESENTATION OF SOUND Introduction Since the early days of sound recording, it was customary to process sound by converting it into electrical signals. Thus the earliest playback systems were either gramophone records or magnetic tapes. In gramophone records the movement of the stylus over the grooves of the records were converted into electrical signals which were used to drive a loudspeaker. In the case of magnetic tapes, the magnetic pulses on the ferrous coating of the tape were used to generate signals in the playback head which were then amplified for driving a speaker system. To use sound in multimedia applications, it has to be recorded and processed in a personal computer. For that sound needs to be converted to digital format. The process of conversion is called Sampling and is done in Analog-to-Digital (AD) and Digital-to-Analog (DA) converters. 22

Analog vs. Digital The term Analog refers to a time-varying signal. When a physical quantity like sound is represented as a time-varying electrical signal we say that sound is recorded in analog format. The electrical signal can also be represented as a wave similar to the way we represent a sound wave. Each point on the electrical wave will be proportional to the amplitude of the sound wave at that point, but measured in units like volts or millivolts instead of decibels or displacement of particles. When data is stored on a computer it must be done in the form of discrete numbers converted to binary formats i.e. as bits. This is called Digital format. For text to be stored on the hard disk, each character is converted first to numbers using a table called the ASCII table (American Standard Code for Information Interchange) e.g. a converted to 65, b converted to 66, c converted to 67 etc. These numbers are then converted to digital format using 7 bits as per ASCII specs. e.g. 65 converted to 1000000, 66 converted to 1000001, 67 converted to 1000010 etc. These bits are then stored as magnetic pulses on the hard disk on the computer i.e. 1 signifies presence of a magnetic pulse and 0 signifies absence of the pulse. Sampling For converting sound into digital format the sound wave is sampled or examined at specific points and the values of the wave - either air pressure or electrical signals - are stored as a set of numbers in an appropriate scale. This is called sampling. Since sampling provides us with values of the signal at specific points, the more we sample the more values we get and better we can represent the original signal. Sampling rate is the number of times the signal is sampled each second. The second phase of the sampling process is the conversion of the sampled values into binary numbers. Each decimal number is converted to a string of bits i.e. 1s and 0s. The number of bits used to represent these numbers is called the sampling resolution. Pulse Code Modulation (PCM) The process of converting an analog signal into a digital signal is called Pulse Code Modulation or PCM and involves sampling. As sampling rate is increased we obtain more data about the input signal and the output signal becomes a closer approximation of the input. To accommodate the larger number of values due to increased rate, resolution must be increased by increasing the number of bits. Generally speaking if we use n bits, the largest number we can represent is (2^n) i.e. using 3 bits we can represent upto the number 8. Thus increasing sampling resolution gives us more number of steps to represent larger values.

23

Nyquist's Sampling Theory The question of how often to sample is answered by Nyquist's Sampling Theory which states that if f is the highest frequency component of an analog wave then sampling frequency should be at least 2f in order to reconstruct the signal properly. Aliasing is a defect which creeps in the digital output signal when we sample an analog wave at lower rates than that dictated by Nyquist's theory. Entire negative cycles may be missed giving an output with long positive cycles, thereby introducing low-pitch distortions.

AUDIO COMPRESSION CODECs The above calculations show that a 1 minute sound clip takes up approximately 10.5 MB of disk space and should be processed at 1.4 Mbits per second. These real world requirements may make it impossible for processing the full bit stream of audio. Hence arises the need for compression. Compression techniques help to reduce the size of digital audio files and subsequently demands a lower bit rate for processing. Compression involves specialized software called CODECs (short for Compression / Decompression) which takes the digital audio files as input and produces compressed versions of the files at the output, for storage on the hard disk. During playback, the compressed files are again reverted back to their original forms (decompression) by the same software and sent to the speakers for conversion into sound energy. Lossless vs. Lossy Compression Compression can be of two types : Lossless compression, where the original audio data is not changed permanently in any way during the process of compression, and so after decompression we get back the original audio data. Since no portion of the original data is changed audio quality is maintained exactly at its original level. Lossy compression on the other hand discards some of the original audio data permanently for achieving compression which are not recovered after decompression. Since some of the original 24

data is discarded quality of the compressed files degrades with respect to the original file. The advantage of lossy compression is that it achieves a much greater amount of compression, of the order of 10 to 15 times, compared to only 2 to 3 times for lossless compression. DPCM : A Lossless CODEC One of the simplest lossless CODECs is Differential Pulse Code Modulation (DPCM). Here instead of storing each of the sampled digital values separately, the difference between subsequent samples are stored instead. If the original value of the samples were 5, 8, 10 etc. then for achieving compression, the first sample value i.e. 5 was stored as it is, but the subsequent values are the differences between the second and first value i.e. 3, between the third and second values i.e. 2 and so on. Since storing differences usually requires a much lower amount of space than the original values, the final file size was much smaller than the original. During decompression the first stored value i.e. 5 was added to the first difference value i.e. 3 to get back 8, the second difference value i.e. 2 was added to 8 to get back 10 and so on. Since the original values are retrieved exactly, the compression type is “lossless” and quality is maintained same as the original.

SBC : A Lossy CODEC Sub Band Coding (SBC) is an example of a lossy CODEC. The digitized audio input is first divided into frequency bands or ranges by the mapping block and then sent to a psycho-acoustic block. This block analyses the audio data in each frequency band and determines which portions are inaudible to the human ear as a result of masking. It filters off the irrelevant portions and sends the remaining portion for quantization. The frame packing block does the final assembling of the data and adds additional error-checking codes before sending the data to the output as an encoded data stream. The popular MPEG format is an implementation of the SBC CODEC. Here since irrelevant portions of the audio data are permanently discarded, the CODEC is lossy in nature. (http://www.cselt.it/mpeg)

25

SYNTHESIZERS AND MIDI Synthesizers are electronic instruments which allow us to generate digital samples of sounds of various instruments synthetically i.e. without the actual instrument being present. The sound samples can be modulated through appropriate hardware to change their loudness, pitch etc. The core of a synthesizer is a special purpose chip or IC for producing sound. Synthesizer Basics Synthesizers can be broadly classified into two categories : FM Synthesizers generate sound by combining elementary sinusoidal tones to build up a note having the desired waveform. Earlier generation synthesizers were generally of FM type, the sounds of which lacked the depth of real-world sounds. Wavetable Synthesizers, created later on, produced sound by retrieving high-quality digital recordings of actual instruments from memory and playing them on demand. Modern synthesizers are generally of wavetable type. The sounds associated with synthesizers are called patches, and the collection of all patches is called the Patch Map. Each sound in a patch map must have a unique ID number to identify it during playback. The audio channel of a synthesizer is divided into 16 logical channel, each of which is capable of playing a separate instrument. Musical Instrument Digital Interface (MIDI) The Musical Instrument Digital Interface (MIDI) is a protocol or set of rules for connecting digital synthesizers to personal computers. It has three portions : Hardware

26

It lays down the specification of the round 5-pin connector used for connecting synthesizers to PCs. Since most PCs do not have such a connector built into their hardware, an interface adapter is generally used for this purpose. The adapter has on one side the familiar 25-pin PC serial connector and on the other hand two round 5-pin MIDI connectors.

File Format The MIDI specifications made provisions to save synthesizer audio in a separate file format called MIDI files. MIDI files are totally different from normal digital audio files (like WAV files) in that they do not contain the audio data at all, but rather the instructions on how to play the sound. These instructions act on the synthesizer chips to produce the actual sound. Because of this, MIDI files are extremely compact as compared to WAV files. They also have another advantage that the music in a MIDI file can easily be changed by modifying the instructions using appropriate software. Messages MIDI based instructions are called messages. These messages carry the information on what instruments to play in which channel and how to play them. Each message consists of three bytes : the first is the Status Byte which contains

The function or operation to be performed and the channel number which is to be affected. The other two bytes are called Data Bytes and they provide additional parameters on how to perform the indicated operation. Channel Messages

27

Messages specific for each channel and contain data for actual key notes to be played. The status byte contain channel number and function and data bytes contain additional parameters like note number, velocity etc. for example when a key is pressed on a MIDI keyboard it sends a Note On message on the MIDI OUT port. The status byte indicate the channel number and the two following data bytes indicate the key number and velocity. When a key is released the keyboard will send a Note Off message along with the channel number, key number and velocity.

System Messages Instructions for the entire system as a whole. Includes messages for starting, stopping, resetting etc. as well as trouble-shooting codes and vendor specific information. Usually contains only the status byte.

28

GM Specifications General MIDI specifications provide for standardized patch maps, so that a MIDI file created on one system will sound exactly the same on another system conforming to the GM specifications. This is because each of the sounds in the patch map has exactly the same ID or patch number in both the systems, and it is the patch numbers which are mentioned in the messages that produce the sound from the synthesizer chip. The administrative and technological issues are controlled by a body called MIDI Manufacturers Association (www.midi.org) SOUND CARD ARCHITECTURE The sound card is an expansion board in your multimedia PC which interfaces with the CPU via slots on the mother-board. Externally it is connected to speakers for playback of sound. Other than playback the sound card is also responsible for digitizing, recording and compressing the sound files. Basic Components The basic internal components of the sound card include : SIMM Banks : Local memory of the sound card for storing audio data during digitization and playback of sound files. DSP : The digital signal processor which is the main processor of the sound card and coordinates the activities of all other components. It also compresses the data so that it takes up less space. DAC/ADC : The digital-to-analog and analog-to-digital converters for digitizing analog sound and reconverting digital sound files to analog form for playback. WaveTable/FM Synthesizers : For generating sound on instructions from MIDI messages. The wavetable chip has a set of pre-recorded digital sounds while the FM chip generates the sound by combining elementary tones. CD Interface : Internal connection between the CD drive of the PC and the sound card. 16-bit ISA connector : Interface for exchanging audio data between the CPU and sound card. Amplifier : For amplification of the analog signals from the DAC before being sent to the speakers for playback. The external ports of the sound card include : Line Out : Output port for connecting to external recording devices like a cassette player or an external amplifier. MIC : Input port for feeding audio data to the sound card through a microphone connected to it. 29

Line In : Input port for feeding audio data from external CD/cassette players for recording or playback. Speaker Out : Output port for attaching speakers for playback of sound files. MIDI : Input port for interfacing with an external synthesizer.

Source : www.pctechguide.com

Processing Audio Files

WAV files From the microphone or audio CD player a sound card receives a sound as an analog signal. The signals go to an ADC chip which converts the analog signal to digital data. The ADC sends the binary data to the DSP, which typically compresses the data so that it takes up less space. The DSP then sends the data to the PC’s main processor which in turn sends the data to the hard drive to be stored. To play a recorded sound the CPU fetches the file containing the compressed data and sends the data to the DSP. The DSP decompresses the data and sends it to the DAC chip which converts the data to a time varying electrical signal. The analog signal is amplified and fed to the speakers for playback. MIDI files 30

The MIDI instruments connected to the sound card via the external MIDI port, or the MIDI files on the hard disk retrieved by the CPU, instructs the DSP which sounds to play and how to play them, using the standard MIDI instruction set. The DSP then either fetches the actual sound from a wavetable synthesizer chip or instructs an FM synthesizer chip to generate the sound by combining elementary sinusoidal tones. The digital sound is then sent to the DAC to be converted to analog form and routed to the speakers for playback. File Formats Wave (Microsoft) File (.WAV) : This is the format for sampled sounds defined by Microsoft for use with Windows. It is an expandable format which supports multiple data formats and compression schemes. Macintosh AIFF (.AIF/ .SND) : This format is used on the Apple Macintosh to save sound data files. An .AIFF file is best when transferring files between the PC and the Mac using a network. RealMedia (.RM/.RA) : These are compressed formats designed for real-time audio and video streaming over the Internet. MIDI (.MID) : Text files containing instructions on how to generate music. The actual music is generated from digital synthesizer chips. Sun JAVA Audio (.AU) : Only audio format supported by JAVA Applets on the Internet. MPEG -1 Level 3 (.MP3) : Highly compressed audio files providing almost CD-quality sound.

31

Image & Graphics

32

Related Documents

Sound & Audio
June 2020 1
Core Audio Overview
April 2020 10
Sound
June 2020 17
Sound
June 2020 21