Chapter 1,2,3.docx

  • Uploaded by: Normay Bartolo
  • 0
  • 0
  • December 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Chapter 1,2,3.docx as PDF for free.

More details

  • Words: 2,907
  • Pages: 16
Narrator Ebook to Audiobook Converter

A Research paper presented to the faculty of CAVITE STATE UNIVERSITY Carmona Campus

In partial fulfillment Of the requirements for the degree Bachelor of Science in Information Technology

By

Ayna A. Bast

November 2018

Chapter 1 THE PROBLEM AND THE SETTING

Introduction Have you ever thought of listening to your books, articles, and other documents instead of reading them? Text Speaker reads your text documents aloud on your PC and converts them to audio files in MP3 or WAV format. Listen to the audio files on your MP3 player, iPod, iPhone, and mobile phone while you do other tasks at home or at work. Text Speaker offers a great selection of high quality, human sounding voices. The continuous growing of people’s music library requires more advanced ways of computing playlists through algorithms that match tracks to the user’s preferences. Several approaches have been made to enhance the user’s listening experience. The application of background music in the way of reading may open up a new era of learning possibilities. For centuries, educators have used music as a learning tool that connects the concept to be acquired with a catchy song or rhythm (Beentjes, J.W.J. et all 1996). An electronic book (also referred to as an “E-book”) is an electronic version of a traditional print book (or other printed material Such as, for example, a magazine, newspaper, and So forth) that can be read by using a personal computer or by using an E-book reader. Unlike PCs or handheld computers, E-book readers deliver a reading experience comparable to traditional paper books, while adding powerful electronic features for note taking, fast navigation, and key word Searches. However, such actions, irrespective of whether or not they

are performed on a PC, handheld computer, or E-book reader, generally require the user to read the text from a display. Thus, the use of an E-book generally requires the user to focus his or her visual attention on a display to read the text content (e.g., book, magazine, newspaper, and So forth) of the E-book. Moreover, reading of an E-book is generally performed without any music playing in the background, particularly without any music playing from the E-book itself. The same is true for other types of hand-held devices Such as personal digital assistants (PDAS) and so forth. In order to increase the naturalness of oral communications between humans and machines, all speech aspects must be involved. Speech does not only transmit ideas and concepts, but also carries information about the attitude, emotion and individuality of the speaker (Y. Chen, et all 2003). Speech is the most used and natural way for people to communicate. From the beginning of the man-machine interface research, speech has been one of the most desired mediums to interact with computers. Therefore, speech recognition and text-to-speech capability have been studied to make the communication with machines more human likely. In order to increase the naturalness of oral communications between humans and machines, all speech aspects must be involved. Speech does not only transmit ideas and concepts, but also carries information about the attitude, emotion and individuality of the speaker. Speaker identity, the sound of a person’s voice, is a key factor in oral communications. Background of the Study Audiobook has been used since the time e-books had been released. Audio book has been used by parents and also their children in helping them read. This study focuses on making ebooks to audiobooks for pdf, txt, docs and zip file that you want to listen rather than read. It

would be desirable and highly advantageous to have a hand-held device that allows a user to assimilate content without having to look at a display. Objectives Our intentions are to provide a new application in smartphones to have an easier reading and a pleasant listening experience at the same time that will help the users to able to study while doing other task at home or in school for their school works, generally relates to hand held devices and, more particularly, to mixing music and text-to-speech (TTS). Significance of the Study The students will be the beneficiary of this application they will able to learn proper intonation of sentences by listening to converted Audiobook, especially in pronunciation exercise, this can also increase the usability and productivity of the Google Drive. The application improves listening experience. They don’t have to download a video, PDF, TXT, Docs or zip file in order to access it. By this application they can listen to long articles with a soft background track. When converting your document to MP3 format, you can combine speech with music. The file formats supported for the background music are MP3, WAV, AIFF, WMA, MPA, ASF, MPEG, MPG, and M1V. The result of this application may help to the users to give them an easily and conveniently reading experience. Lastly, the development of this study will also take benefit for the future researchers. They might think of making this system more complex which may results to the development of another system.

Time and place The study was conducted from October 2018 to December 2018 in My Value Max Inc. located at Cavite State University Carmona – Campus. Scope and Limitations One of the functions of this application the user can also see the document and be able to read it while listening to the text voice that reads the text file. It will continue playing while in Sleep Mode. The player can also modify the way a voice speaks, by speeding up or slowing down the speech, changing the pitch, and changing the volume. The user can also pick play background music while the application reads your document fluently, including Free Classical music artist like Mozart, Beethoven, Bach, Chopin, etc. The user can also enable the option Add background music to the output file. With the Test Button you can listen to how your audio file sounds. You can adjust the volume of the background music with the help of the slider. Definitions of Terms The following terms as used by the researchers are operationally defined: Audio Files refers to a computer file that contains digitized audio either in the Compact Disc (CDDA)

format

or

in

an

MP3,

AAC

or

other

compressed

format.

See codec

examples, file and sampling. E-Book Reader refers to handheld computer devices like Amazon's Kindle, Barnes and Noble's NOOK and Apple's iPad that make it possible for books in digital form to be viewed and read by users

Human Sounding Voices refers to voice (or vocalization) is the sound produced by humans and other vertebrates using the lungs and the vocal folds in the larynx, or voice box. Voice is not always produced as speech, however. Infants babble and coo; animals bark, moo, whinny, growl, and meow; and adult humans laugh, sing, and cry. iPad is a portable music player developed by Apple Computer support a wide variety of audio formats, including MP3, AAC, WAV, and AIFF. PDA short for personal digital assistant a hand held device that combines computing, telephone/fax, Internet and networking features. A typical PDA can function as a cellular phone, fax sender, Web browser and personal organizer. PDAs may also be referred to as a palmtop, hand-held computer or pocket computer. WAV refers to an audio file format, created by Microsoft that has become a standard PC audio file format for everything from system and game sounds to CD-quality audio. A Wave file is identified by a file name extension of WAV (rarely, Audio for Windows). Text Speaker refers to your own text and sample some of the languages and voices that we offer for speech-enabling websites, giving a voice to your online documents and mobile apps, or making your online/offline content more accessible with text to speech. Text to Speech abbreviated as TTS, is a form of speech synthesisthat converts text into spoken voice output. Text to speech systems were first developed to aid the visually impaired by offering a computer-generated spoken voice that would "read" text to the user. TTS should not be confused with voice response systems.

Chapter II REVIEW OF RELATED LITERATURE According to Jianlei Xie et all. (2002), there is provided an E-book. The E-book comprises a memory device, a text-to-speech (TTS) module, and a music module. The memory device stores files. The files include text and music. The TTS module Synthesizes Speech corresponding to the text. The music module plays back the music. The at least one speaker outputs the Speech and the music. According to Clark Quinn, professor, author, and expert in computer-based education, defined mobile learning as the intersection of mobile computing (the application of small, portable, and wireless computing and communications devices) and e-learning (learning facilitated and supported through the use of information and communications technology).he predicted that mobile learning would one day provide learning that was truly independent of time and place and facilitated by portable computers capable of providing rich interactivity, total connectivity, and powerful processing. in May 2005, Ellen Wagner, senior director of Global Education Solutions at Mac-romedia, proclaimed that the mobile revolution had finally arrived. Wherever one looks, evidence of mobile penetrations is irrefutable: cellphones, PDA's MP3 players, portable game devices, handhelds, tablets, and laptops abound. No demographic is immune from this phenomenon. From toddlers to seniors, people are increasingly connected and are digitally communicating with each other in ways that would have been impossible only a few years ago. Music capabilities allow an Ebook user to enjoy digital music output from the Ebook. TTS capabilities allow an Ebook user to listen to Synthesized text output from the Ebook. The

combination of music and TTS allow an Ebook user to listen to the text along with background music. The majority of the evidence tends to support background music due to its positive implications. Cool, Yarbrough, Patton, Runde, and Keith (1994) conducted a study that proved radio noise generally was considered to be somewhat helpful to students while studying. It kept them focused and on task. Howard Gardner, a Harvard graduate, wrote, Frames of Mind, in the early 1980’s. It has since become one of the most influential books for education. Gardner believes that music creates a positive and relaxing environment that allows for sensory integration to take place and improves concentration abilities. Sensory integration is essential for establishing long-term memory. He has also seen background music successfully used to mask outside traffic sounds, release stress before an exam, and to reinforce subject matter (Campbell, 1997). Jensen (1998) reported that music can deliver as much as sixty percent more content in five percent of the time usually taken to deliver the same materials.

Based on the article written by Bossard,
 L.
 (2008), Several
 solutions
 already


use
 intelligent
 playlists
 embedded
 in
 music
 players
 installed
 on
 computers. 
 There
 are
 also
 online
 solutions,
 the
 most
 popular
 of
 which
 is last.fm,


which
 acts
 as
 a
 personalized
 radio
 station


that
 plays
 preferred music.
 On
 the


 other
 hand
 it
 does
 not
 allow
 playback
 of
 a
 certain
 track.
 There
 are


also


 other
 solutions,
 like
 the
 genius
 function
 of
 iTunes
 or
 the
 Music


Explorer;
 both
 use
 the
 user’s
 music
 collection
 to
 generate
 playlists.
 The 
 biggest
 disadvantage
 of
 the
 latter
 solution
 is
 that
 the
 user
 can
 use


only
 tracks
 that
 he/she
 already
 has
 on
 his/her
 PC
 to
 generate
 playlists.
 Of 
 course
 this
 limits
 the
 power
 or
 the
 algorithm
 very
 much.


According to Lorenzi
 (2007)
 proposes
 a
 way
 of
 representing
 the
 similarity 
 between
 tracks
 in
 a
 10‐dimensional
 Euclidian
 space
 (further
 called
 music


space),
 where
 the
 closeness
 of
 tracks
 is
 approximately
 proportional
 to


their
 similarity.
 7M
 songs
 currently
 appear
 in
 the
 database,
 but
 only
 500K


of
 them
 have
 enough
 user
 statistics
 to
 be
 mapped
 in
 the
 graph.
 Using


this


simplified
 and


computationally efficient
 way
 of
 finding
 similar
 tracks,


several
 applications
 can
 explore
 new
 ways


of
 computing
 playlists.
 Most
 of


them
 offer
 support
 in
 playlist
 generation
 but
 none
 also
 provides
 the
 tracks 
 to
 be
 played.
 This
 could
 be
 seen
 as
 a
 disadvantage because
 not
 all


people
 possess
 all
 tracks
 that
 are
 suggested
 by
 the
 space.


Klusacek [59] proposed a conditional pronunciation modeling method. It uses timealigned streams of phones and phonemes to model a speaker’s specific pronunciation. The

system uses phonemes drawn from a lexicon of pronunciations of words recognized by an automatic speech recognition system to generate the phoneme stream and an open-loop phone recognizer to generate a phone stream. The phoneme and phone streams are aligned at the frame level and conditional probabilities of a phone, given a phoneme, are estimated using cooccurrence counts. A likelihood detector is then applied to these probabilities for the speaker detection task. This approach achieves a relatively high accuracy in comparison with other phonetic methods in the SuperSID project at the Johns Hopkins 2002 Workshop [114] [90]. According to H. Gish, et all (1986), A majority of the speaker models, including the Gaussian mixture models, are based on modeling the underlying distribution of feature vectors from a speaker. When the speech is corrupted, the spectral based features are also corrupted and so their distributions are modified. Thus, a speaker model trained using speech from one type of corrupt environment will generally perform poorly in recognizing the same speaker using speech collected under different conditions since the feature distributions are now different. Various studies of speaker recognition systems using degraded or distorted speech have shown a dramatic decrease in performance [47] [38]. Current speaker recognition researches mainly focus on recognition under controlled conditions such as Switchboard telephone speech, which is close-talking speech. A large amount of effort is still needed in research about speaker recognition robustness under unlimited conditions in open environment with distant microphones.

Chapter III RESEARCH METHODOLOGY This chapter discusses the research design, the selection of the participants as well the instrumentation and validation, data gathering procedures, treatment and analysis of data. Materials Various hardware and software were used for the study. A Windows Operated, Personal Computer, printer and 8gb flash-drive were the hardware utilized for the development of the study. For the software requirements, the following were used; Adobe Photoshop CC and Adobe Illustrator CS6 for the graphical user interface of the application; Java for the programming language; MySQL for the database; Sublime text and Notepad ++ for coding; Google Chrome,

Torch r20, Mozilla Firefox for the browser of the study and Microsoft Office 2010 to create the documentation. Methods The application design is about developing the NARATOR E-book to Audiobook Converter application using which the user can do the following things. 

Read the Documents by just Listening.



Converts EBooks files to Audiobook file



Change the GUI Color Scheme.



Change the Music background.



Change the reader voice personality,



Change the mode (Day/Night Mode) in which the page is being displayed.



Search for some content in the document using keywords.



Auto flag document pages and sections



Read .PDF , .DOCX , .TXT files from google Drive



Share the content of a book on a Facebook wall.



Set an alarm as a remainder to read a particular book in the future.

SOFTWARE DEVELOPMENT MODEL: (WATER FALL MODEL) The waterfall model is a popular version of the systems development life cycle model for software engineering. Often considered the classic approach to the systems development life cycle, the waterfall model describes a development method that is linear and sequential.

Waterfall development has distinct goals for each phase of development. Imagine a waterfall on the cliff of a steep mountain. Once the water has flowed over the edge of the cliff, gravity is in control, and water cannot run uphill. It is the same with waterfall development. Once a phase of development is completed, the development proceeds to the next phase and there no or little interplay between phases [12, 24] (Figure 1). Requirements This is the first phase of the software development life cycle. Here we gather all the requirements that have to be fulfilled by the developed software Application [12]

Figure 1. Definitions of different phases of the water fall model. Source: CrackMBA. Waterfall Model, 2011. http://crackmba.com/ waterfall-model/, accessed Nov. 2018.

Design After gathering the requirements we will design this particular project. Here we will design the system according to the requirements we gathered in the first phase. We use UML to document aspects of the design of the system [12]. Construction Here the code is implemented. This is the phase where we implement the actual system according to the design. This phase is also called the coding phase [12] Testing We will test, after coding part is finished. In this testing phase, we will test the coding part by using different testing methods. We will execute the code with a variety of tests until there are no errors. Once integration is done, we have to again test the system for proper functionality [12]. Installation After testing the application we have to deploy or install the software or application in the real time environment to make use of it. In this deployment process the customer is involved. He is

seeing all the coding, testing and executing part. If he wants any changes, again it will be modified [12]. Maintenance If we have any issues, when we are using the software/application, we will handle them in the maintenance phase. After deployment process, if they are not satisfied with that particular project, again it will be modified. So the project team is maintaining all these phases, in consultation with the customers [12]

Related Documents

Chapter
May 2020 60
Chapter
November 2019 76
Chapter
October 2019 79
Chapter 1 - Chapter 2
June 2020 62

More Documents from ""

Thesis Paper Mor.docx
December 2019 20
Chapter 1,2,3.docx
December 2019 18
Chapter 1,2,3.docx
December 2019 19
Userguide3.0es.pdf
June 2020 20
Bach Dueto Bwv 802
May 2020 15