Internet Systems
Chapter 21. Multimedia: Audio, Video, Speech Synthesis and Recognition The multimedia revolution began on the desktop, with the widespread availability of CD-ROMs. Because of bandwidth dependency, we expect desktop technology to lead Web technology. Multimedia files can be big. But streaming audio and video technologies allow the audios and videos to begin playing while the files are downloading. Creating audio and video clips to incorporate into Web page often requires powerful software. We focus on using existing audio and video clips.
The BGSOUND Element The simplest way to add sound to a page is with the BGSOUND element (in the header). Key properties: SRC
the URL of the audio clip to play
LOOP
-1 (default): the clip loops indefinitely > 0: the number of times to play the clip 0 or < -1: play the clip exactly once
BALANCE
between –10000 (only left speaker) and 10000 (only right. The default is 0 (balanced).
VOLUME
between –10000 (min) and 10000 (max, the default)
These properties can be set by scripting.
274
Internet Systems
The DYNSRC Property of IMG In an IMG tag, instead of the SRC property, use DYNSRC if the value is the URL of a video clip. Other properties to use with DYNSRC: LOOP
as before
START
one of the events fileopen or mouseover
You should also use the ALT property, whose value is text displayed if the browser can’t handle the clip.
The EMBED Element The EMBED element embeds a media clip (audio or video) into a page. It lets us display a GUI that gives the user direct control over the media clip. Key properties: SRC
the URL of the media file
LOOP
true to loop indefinitely; else false for just once
HIDDEN
true to hide the GUI; default is false
When the browser encounters an EMBED tag, it plays the specified clip with the player registered to handle the media type on the client computer. If the media clip is a .wav (Windows Wave) file, Internet Explorer typically uses the Windows Media ActiveX control.
275
Internet Systems
Windows Media Player ActiveX Control Microsoft ActiveX controls are embedded in Web pages displayed in Internet Explorer. Embedding the Windows Media Player ActiveX control in a Web page gives access to the media formats supported by the Windows Media Player. The GUI lets the user • play, pause, and stop a media clip, • move quickly forward or backward through the clip, and • control the volume of audio. Key parameters in the OBJECT element: NAME
VALUE
FileName
the URL of the media clip
AutoStart
true if the clip plays when loaded
Loop
true if the clip plays indefinitely
ShowControls
true if the controls are displayed
The values of parameters can be set by scripting.
276
Internet Systems
Microsoft Agent Microsoft Agent is a technology for interactive animated characters in a Windows application or Web page. The Microsoft Agent ActiveX control gives access to four predefined characters: Peedy the Parrot Genie Merlin Robby the Robot These characters allow users to interact with a page in natural ways (including speech). The control accepts both mouse and keyboard interaction. It generates speech if a compatible text-to-speech engine is installed. It recognizes speech if a compatible speech recognition engine is installed. You can create your own characters with • Microsoft Agent Character Editor and • Microsoft Linguistic Sound Editing Tool Both are downloadable from the Microsoft Agent Web site. We’ll also look at the following ActiveX controls: Lernout and Hauspie TruVoice text-to-speech (TTS) engine Microsoft Speech Recognition Engine See the references in the text to Microsoft’s downloads and documentation.
277
Internet Systems
The OBJECT elements for all three of these ActiveX controls are in the page header. The CODEBASE property of all three OBJECT elements specifies the version of the control to download. Typically, no parameters are given in the OBJECT element. The Characters collection of an Agent object is accessed with agent_name.Characters where agent_name is the ID of the Microsoft Agent object. To load the character information for one of the characters from the Microsoft Web site, use the Load method for the collection: agent_name.Characters.Load( character_name, url );
where character_name is the name of a character (e.g., “peedy”) and url is the URL for the character information. The Character method of the Characters collection takes as its argument the name of a character (e.g., “peedy”) and returns a reference (a “character”) to the Agent object (associated with this character by the Load method). For example, if the Agent ID is agent, then parrot = agent.Characters.Character( “peedy” ); assigns to parrot a reference to the agent object, which represents the Peedy the Parrot character.
278
Internet Systems
Where agent_ref is a reference to an agent object (a character), agent_ref.Get( behavior_type, behavior_element ); downloads specific behavior information (behavior_element) for a type of behavior (behavior_type). For example, for type “state”, some of the elements are “Showing”: the behavior when the character is first displayed “Speaking”: the behavior when the character is speaking “Hiding”: the behavior when the character disappears For type “animation”, some of the elements are “Greet” “MoveUp” “GetAttention” Animation behavior elements are activated with the Play method.
Example: parrot.Get( “state”, “Showing” ); parrot.Get( “state”, “Speaking” ); parrot.Get( “animation”, “Greet” ); // Display the Showing behavior: parrot.Show(); parrot.Play( “Greet” ); // Display the Speaking behavior: parrot.Speak( “Hello!” ); parrot.Play( “GreetReturn” ); The GreetReturn behavior is downloaded with the Greet behavior. The Speak() method makes use of the TTS object. There is also a MoveTo( x, y ) method for a character.
279
Internet Systems
Some of the tags inserted into the text string that’s spoken: \Pau = n\ Pause for n millseconds. \Pit = n\
Set the pitch to n hertz, 50 ≤ n ≤400
For speech recognition, the voice commands that can be used to interact with the Agent object must be registered in the character’s Commands collection: agent_ref.Commands.Add( cmd_name, display_string, recognition_string, enabled_flag, display_flag ); where cmd_name is the name used in scripting for the command, display_string is displayed in a pop-up menu when the character or the Agent taskbar is right-clicked, recognition_string is the string of words recognized as the command, enabled_flag is true if this string of words is currently a candidate for recognition, and display_flag is true if the command’s name is listed in the character’s pop-up menu. In the recognition string, optional words are placed in []’s. Example: parrot.Commands.Add( “order”, “Order a widget”, “Order [a widget]”, true, true); When the Scroll Lock key is pressed, a small rectangular area appears below the character, eventually announcing that it is listening for a command.
280
Internet Systems
Some properties of the Commands collection: Caption
the text appearing below the character
Voice
the text appearing with the list of commands when the taskbar is right-clicked
Visible
if true, the commands appear in the pop-up menu
281
Internet Systems
Example: Suppose we have parrot.Commands.Caption = “Ordering information”; When the Scroll Lock key is pressed, the area below Peedy contains
-- Peedy is preparing to listen -Please wait to speak. This changes to -- Peedy is listening -for “Ordering information” commands After the user says (for example) “Order a widget”, the following (hopefully) appears: -- Peedy is not listening -Heard “Ordering information” When a voice command is received, the Agent control’s Command event fires with the name of the command as a parameter. For example, the above example would fire the event Command( order ). Some other methods for a character (i.e., agent reference): Activate: Make this the currently active character when multiple characters appear. Innterrupt: Interrupt the current animation, and display the next animation in the queue of animations for this character. StopAll: Stop all animations of a specified element for this character. Example: parrot.Activate();
282
Internet Systems
RealPlayer ActiveX Control RealPlayer supports streaming audio (e.g., radio stations) and video To embed a RealPlayer object in a Web page, use an EMBED element with attributes ID SRC
the URL of the source
WIDTH, HEIGHT
the dimensions of the control
AUTOSTART
true or false, as before
CONTROLS
which controls are available; Default gives the standard set
TYPE
the MIME type of the embedded file; for audio, this is audio/x-pn/realaudio-plugin
Some of these parameters can be set by scripting. Where rp_object is a RealPlayer object and url is an appropriate URL:
rp_object.SetSource( url ) sets the source URL of the audio or video stream. rp_object.DoPlayPause() toggles between pausing and playing the stream. It starts playing it after the source is set by scripting.
283
Internet Systems
Embedding VRML in a Web Page VRML (Virtual Reality Modeling Language) is a markup language for specifying objects and scenes. It’s purely text and (like HTML) can be created with a text editor (e.g., Notepad). Many 3D modeling programs can save 3D designs in VRML format.
A world is a VRML file (extension .wrl). Both Netscape and Internet Explorer have free, downloadable plug-ins for viewing worlds. In a Web page, use an EMBED element with attributes SRC, WIDTH, HEIGHT. The object rendering has controls that allow you (using the mouse) to change your perspective and to walk around a scene.
284