Chapter 1,2,3.docx

Uploaded by: Normay Bartolo
0
0

December 2019
PDF

Download

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA

Overview

Download & View Chapter 1,2,3.docx as PDF for free.

More details

Words: 2,907
Pages: 16

Preview
Full text

Narrator Ebook to Audiobook Converter

A Research paper presented to the faculty of CAVITE STATE UNIVERSITY Carmona Campus

In partial fulfillment Of the requirements for the degree Bachelor of Science in Information Technology

By

Ayna A. Bast

November 2018

Chapter 1 THE PROBLEM AND THE SETTING

Introduction Have you ever thought of listening to your books, articles, and other documents instead of reading them? Text Speaker reads your text documents aloud on your PC and converts them to audio files in MP3 or WAV format. Listen to the audio files on your MP3 player, iPod, iPhone, and mobile phone while you do other tasks at home or at work. Text Speaker offers a great selection of high quality, human sounding voices. The continuous growing of people’s music library requires more advanced ways of computing playlists through algorithms that match tracks to the user’s preferences. Several approaches have been made to enhance the user’s listening experience. The application of background music in the way of reading may open up a new era of learning possibilities. For centuries, educators have used music as a learning tool that connects the concept to be acquired with a catchy song or rhythm (Beentjes, J.W.J. et all 1996). An electronic book (also referred to as an “E-book”) is an electronic version of a traditional print book (or other printed material Such as, for example, a magazine, newspaper, and So forth) that can be read by using a personal computer or by using an E-book reader. Unlike PCs or handheld computers, E-book readers deliver a reading experience comparable to traditional paper books, while adding powerful electronic features for note taking, fast navigation, and key word Searches. However, such actions, irrespective of whether or not they

are performed on a PC, handheld computer, or E-book reader, generally require the user to read the text from a display. Thus, the use of an E-book generally requires the user to focus his or her visual attention on a display to read the text content (e.g., book, magazine, newspaper, and So forth) of the E-book. Moreover, reading of an E-book is generally performed without any music playing in the background, particularly without any music playing from the E-book itself. The same is true for other types of hand-held devices Such as personal digital assistants (PDAS) and so forth. In order to increase the naturalness of oral communications between humans and machines, all speech aspects must be involved. Speech does not only transmit ideas and concepts, but also carries information about the attitude, emotion and individuality of the speaker (Y. Chen, et all 2003). Speech is the most used and natural way for people to communicate. From the beginning of the man-machine interface research, speech has been one of the most desired mediums to interact with computers. Therefore, speech recognition and text-to-speech capability have been studied to make the communication with machines more human likely. In order to increase the naturalness of oral communications between humans and machines, all speech aspects must be involved. Speech does not only transmit ideas and concepts, but also carries information about the attitude, emotion and individuality of the speaker. Speaker identity, the sound of a person’s voice, is a key factor in oral communications. Background of the Study Audiobook has been used since the time e-books had been released. Audio book has been used by parents and also their children in helping them read. This study focuses on making ebooks to audiobooks for pdf, txt, docs and zip file that you want to listen rather than read. It

would be desirable and highly advantageous to have a hand-held device that allows a user to assimilate content without having to look at a display. Objectives Our intentions are to provide a new application in smartphones to have an easier reading and a pleasant listening experience at the same time that will help the users to able to study while doing other task at home or in school for their school works, generally relates to hand held devices and, more particularly, to mixing music and text-to-speech (TTS). Significance of the Study The students will be the beneficiary of this application they will able to learn proper intonation of sentences by listening to converted Audiobook, especially in pronunciation exercise, this can also increase the usability and productivity of the Google Drive. The application improves listening experience. They don’t have to download a video, PDF, TXT, Docs or zip file in order to access it. By this application they can listen to long articles with a soft background track. When converting your document to MP3 format, you can combine speech with music. The file formats supported for the background music are MP3, WAV, AIFF, WMA, MPA, ASF, MPEG, MPG, and M1V. The result of this application may help to the users to give them an easily and conveniently reading experience. Lastly, the development of this study will also take benefit for the future researchers. They might think of making this system more complex which may results to the development of another system.

Time and place The study was conducted from October 2018 to December 2018 in My Value Max Inc. located at Cavite State University Carmona – Campus. Scope and Limitations One of the functions of this application the user can also see the document and be able to read it while listening to the text voice that reads the text file. It will continue playing while in Sleep Mode. The player can also modify the way a voice speaks, by speeding up or slowing down the speech, changing the pitch, and changing the volume. The user can also pick play background music while the application reads your document fluently, including Free Classical music artist like Mozart, Beethoven, Bach, Chopin, etc. The user can also enable the option Add background music to the output file. With the Test Button you can listen to how your audio file sounds. You can adjust the volume of the background music with the help of the slider. Definitions of Terms The following terms as used by the researchers are operationally defined: Audio Files refers to a computer file that contains digitized audio either in the Compact Disc (CDDA)

format

or

in

an

MP3,

AAC

or

other

compressed

format.

See codec

examples, file and sampling. E-Book Reader refers to handheld computer devices like Amazon's Kindle, Barnes and Noble's NOOK and Apple's iPad that make it possible for books in digital form to be viewed and read by users

Human Sounding Voices refers to voice (or vocalization) is the sound produced by humans and other vertebrates using the lungs and the vocal folds in the larynx, or voice box. Voice is not always produced as speech, however. Infants babble and coo; animals bark, moo, whinny, growl, and meow; and adult humans laugh, sing, and cry. iPad is a portable music player developed by Apple Computer support a wide variety of audio formats, including MP3, AAC, WAV, and AIFF. PDA short for personal digital assistant a hand held device that combines computing, telephone/fax, Internet and networking features. A typical PDA can function as a cellular phone, fax sender, Web browser and personal organizer. PDAs may also be referred to as a palmtop, hand-held computer or pocket computer. WAV refers to an audio file format, created by Microsoft that has become a standard PC audio file format for everything from system and game sounds to CD-quality audio. A Wave file is identified by a file name extension of WAV (rarely, Audio for Windows). Text Speaker refers to your own text and sample some of the languages and voices that we offer for speech-enabling websites, giving a voice to your online documents and mobile apps, or making your online/offline content more accessible with text to speech. Text to Speech abbreviated as TTS, is a form of speech synthesisthat converts text into spoken voice output. Text to speech systems were first developed to aid the visually impaired by offering a computer-generated spoken voice that would "read" text to the user. TTS should not be confused with voice response systems.

Chapter II REVIEW OF RELATED LITERATURE According to Jianlei Xie et all. (2002), there is provided an E-book. The E-book comprises a memory device, a text-to-speech (TTS) module, and a music module. The memory device stores files. The files include text and music. The TTS module Synthesizes Speech corresponding to the text. The music module plays back the music. The at least one speaker outputs the Speech and the music. According to Clark Quinn, professor, author, and expert in computer-based education, defined mobile learning as the intersection of mobile computing (the application of small, portable, and wireless computing and communications devices) and e-learning (learning facilitated and supported through the use of information and communications technology).he predicted that mobile learning would one day provide learning that was truly independent of time and place and facilitated by portable computers capable of providing rich interactivity, total connectivity, and powerful processing. in May 2005, Ellen Wagner, senior director of Global Education Solutions at Mac-romedia, proclaimed that the mobile revolution had finally arrived. Wherever one looks, evidence of mobile penetrations is irrefutable: cellphones, PDA's MP3 players, portable game devices, handhelds, tablets, and laptops abound. No demographic is immune from this phenomenon. From toddlers to seniors, people are increasingly connected and are digitally communicating with each other in ways that would have been impossible only a few years ago. Music capabilities allow an Ebook user to enjoy digital music output from the Ebook. TTS capabilities allow an Ebook user to listen to Synthesized text output from the Ebook. The

combination of music and TTS allow an Ebook user to listen to the text along with background music. The majority of the evidence tends to support background music due to its positive implications. Cool, Yarbrough, Patton, Runde, and Keith (1994) conducted a study that proved radio noise generally was considered to be somewhat helpful to students while studying. It kept them focused and on task. Howard Gardner, a Harvard graduate, wrote, Frames of Mind, in the early 1980’s. It has since become one of the most influential books for education. Gardner believes that music creates a positive and relaxing environment that allows for sensory integration to take place and improves concentration abilities. Sensory integration is essential for establishing long-term memory. He has also seen background music successfully used to mask outside traffic sounds, release stress before an exam, and to reinforce subject matter (Campbell, 1997). Jensen (1998) reported that music can deliver as much as sixty percent more content in five percent of the time usually taken to deliver the same materials.

Based on the article written by Bossard,  L.  (2008), Several  solutions  already 

use  intelligent  playlists  embedded  in  music  players  installed  on  computers.   There  are  also  online  solutions,  the  most  popular  of  which  is last.fm, 

which  acts  as  a  personalized  radio  station 

that  plays  preferred music.  On  the

  other  hand  it  does  not  allow  playback  of  a  certain  track.  There  are 

also

  other  solutions,  like  the  genius  function  of  iTunes  or  the  Music 

Explorer;  both  use  the  user’s  music  collection  to  generate  playlists.  The   biggest  disadvantage  of  the  latter  solution  is  that  the  user  can  use 

only  tracks  that  he/she  already  has  on  his/her  PC  to  generate  playlists.  Of   course  this  limits  the  power  or  the  algorithm  very  much. 

According to Lorenzi  (2007)  proposes  a  way  of  representing  the  similarity   between  tracks  in  a  10‐dimensional  Euclidian  space  (further  called  music 

space),  where  the  closeness  of  tracks  is  approximately  proportional  to 

their  similarity.  7M  songs  currently  appear  in  the  database,  but  only  500K 

of  them  have  enough  user  statistics  to  be  mapped  in  the  graph.  Using 

this 

simplified  and 

computationally efficient  way  of  finding  similar  tracks, 

several  applications  can  explore  new  ways 

of  computing  playlists.  Most  of 

them  offer  support  in  playlist  generation  but  none  also  provides  the  tracks   to  be  played.  This  could  be  seen  as  a  disadvantage because  not  all 

people  possess  all  tracks  that  are  suggested  by  the  space. 

Klusacek [59] proposed a conditional pronunciation modeling method. It uses timealigned streams of phones and phonemes to model a speaker’s specific pronunciation. The

system uses phonemes drawn from a lexicon of pronunciations of words recognized by an automatic speech recognition system to generate the phoneme stream and an open-loop phone recognizer to generate a phone stream. The phoneme and phone streams are aligned at the frame level and conditional probabilities of a phone, given a phoneme, are estimated using cooccurrence counts. A likelihood detector is then applied to these probabilities for the speaker detection task. This approach achieves a relatively high accuracy in comparison with other phonetic methods in the SuperSID project at the Johns Hopkins 2002 Workshop [114] [90]. According to H. Gish, et all (1986), A majority of the speaker models, including the Gaussian mixture models, are based on modeling the underlying distribution of feature vectors from a speaker. When the speech is corrupted, the spectral based features are also corrupted and so their distributions are modified. Thus, a speaker model trained using speech from one type of corrupt environment will generally perform poorly in recognizing the same speaker using speech collected under different conditions since the feature distributions are now different. Various studies of speaker recognition systems using degraded or distorted speech have shown a dramatic decrease in performance [47] [38]. Current speaker recognition researches mainly focus on recognition under controlled conditions such as Switchboard telephone speech, which is close-talking speech. A large amount of effort is still needed in research about speaker recognition robustness under unlimited conditions in open environment with distant microphones.

Chapter III RESEARCH METHODOLOGY This chapter discusses the research design, the selection of the participants as well the instrumentation and validation, data gathering procedures, treatment and analysis of data. Materials Various hardware and software were used for the study. A Windows Operated, Personal Computer, printer and 8gb flash-drive were the hardware utilized for the development of the study. For the software requirements, the following were used; Adobe Photoshop CC and Adobe Illustrator CS6 for the graphical user interface of the application; Java for the programming language; MySQL for the database; Sublime text and Notepad ++ for coding; Google Chrome,

Torch r20, Mozilla Firefox for the browser of the study and Microsoft Office 2010 to create the documentation. Methods The application design is about developing the NARATOR E-book to Audiobook Converter application using which the user can do the following things. 

Read the Documents by just Listening.



Converts EBooks files to Audiobook file



Change the GUI Color Scheme.



Change the Music background.



Change the reader voice personality,



Change the mode (Day/Night Mode) in which the page is being displayed.



Search for some content in the document using keywords.



Auto flag document pages and sections



Read .PDF , .DOCX , .TXT files from google Drive



Share the content of a book on a Facebook wall.



Set an alarm as a remainder to read a particular book in the future.

SOFTWARE DEVELOPMENT MODEL: (WATER FALL MODEL) The waterfall model is a popular version of the systems development life cycle model for software engineering. Often considered the classic approach to the systems development life cycle, the waterfall model describes a development method that is linear and sequential.

Waterfall development has distinct goals for each phase of development. Imagine a waterfall on the cliff of a steep mountain. Once the water has flowed over the edge of the cliff, gravity is in control, and water cannot run uphill. It is the same with waterfall development. Once a phase of development is completed, the development proceeds to the next phase and there no or little interplay between phases [12, 24] (Figure 1). Requirements This is the first phase of the software development life cycle. Here we gather all the requirements that have to be fulfilled by the developed software Application [12]

Figure 1. Definitions of different phases of the water fall model. Source: CrackMBA. Waterfall Model, 2011. http://crackmba.com/ waterfall-model/, accessed Nov. 2018.

Design After gathering the requirements we will design this particular project. Here we will design the system according to the requirements we gathered in the first phase. We use UML to document aspects of the design of the system [12]. Construction Here the code is implemented. This is the phase where we implement the actual system according to the design. This phase is also called the coding phase [12] Testing We will test, after coding part is finished. In this testing phase, we will test the coding part by using different testing methods. We will execute the code with a variety of tests until there are no errors. Once integration is done, we have to again test the system for proper functionality [12]. Installation After testing the application we have to deploy or install the software or application in the real time environment to make use of it. In this deployment process the customer is involved. He is

seeing all the coding, testing and executing part. If he wants any changes, again it will be modified [12]. Maintenance If we have any issues, when we are using the software/application, we will handle them in the maintenance phase. After deployment process, if they are not satisfied with that particular project, again it will be modified. So the project team is maintaining all these phases, in consultation with the customers [12]

Chapter 1,2,3.docx

Overview

More details

Related Documents

Chapter

Chapter

Chapter

Chapter Five Chapter Five

Chapter 1 - Chapter 2

Ardhaviram_chapter11_evaj_last Chapter

More Documents from ""

Thesis Paper Mor.docx

Chapter 1,2,3.docx

Chapter 1,2,3.docx

Activity 8 Bartolo, Bautista.docx

Userguide3.0es.pdf

Bach Dueto Bwv 802