LIST OF CONTENTS
S.No.
CONTENTS
PAGE NUMBER
01
Abstract
1
02
Introduction
2
03
History
3
04
Requirements of Hiding Digital Information
4
05
Steganographic Techniques
5
06
Related Work
5
07
Problem Statement
10
08
Scope Statement
10
09
Proposed Solution
10
10
Proposed Model
11
11
References
12
Ph.D. Thesis Proposal
1
By: KHAN FARHAN RAFAT
Abstract “we can scarcely imagine a time when there did not exist a necessity, or at least a desire, of transmitting information from one individual to another in such a manner as to elude general comprehension.”[1]
With every passing by day, more and more people are switching over to fascinating on-line i.e. “always on” communication to perform their in routine business and personal tasks. This rapid swing from the existing time consuming complex manual procedural formalities have forced the Government and Business communities to offer their services such as home shopping, banking, billing, taxation etc., to public, in open and on 24/7 basis. The gigantic global network of inter connected computer systems, commonly known to people as Internet and composed of expensive gadgetry and software services, is a vital source of this drift. However, the above fascinating and instantly available on line facilities are closely tied with issues concerning availability, integrity, confidentiality and authentication of information exchanged over communication media which has lead to the evolution of information hiding techniques such as Cryptography and Steganography, for secure on and off line communication. Steganography, which dates back to the time of ancient Greeks, has also found its way into the field of Computer Science and is effectively being used alone or together with cryptography.
Ph.D. Thesis Proposal
2
By: KHAN FARHAN RAFAT
1. Introduction It is difficult to comment as to how people communicated in the pre-historic days (dark ages), however, one may logically assume that the earliest forms could have been sketches which had lead to the understanding of their associated sounds (i.e. with sketches) when these would have been narrated. That narration might have paved the way for the evolution of Natural Languages (NL). [2] The evolution of NL has opened up doors for the technological revolution which has brought dramatic changes in the lives of people all over the world. One of such changes is the introduction of Internet for public which, originally developed for military usage, has grown in to a gigantic global system of interconnected computer networks. The development of Internet for the military is just a glimpse of the security concern associated with communication which is, and has remained a serious concern of both, the public and private sectors alike. Over the ages, miscellaneous data hiding techniques have been evolved to protect confidential information from falling into the hands of hostile, which can be classified into two broad categories namely Cryptography and Steganography. 2. History The word Steganography is derived from the Greek which means covered (or hidden) writing. While Cryptography concerns itself with making the intelligible information as unintelligible, Steganography hides the existence of that information. Before giving a brief history of the related work on Text-based steganography, it will be appropriate to discuss the frequently used terms in this context. The secret message to be hidden is referred to as embedded data and the innocuous text / audio / image used for embedding is called as cover. The resultant output object after embedding is referred to as stego-object. The key used in embedding the secret message is known as stego-key. It is a priory for the sending and receiving ends to have agreed upon on a Ph.D. Thesis Proposal
3
Figure 1 - Model of a Steganographic system
By: KHAN FARHAN RAFAT
mutual key exchange protocol / mechanism. Figure-1 depicts the model for secure Steganographic System. The recent interest in Steganography should not be linked up with the publication of the NEWS in USA Today of the year 2000 which stated that terrorists might be using steganography for concealing their secrets from the law enforcing agencies. The history of steganography dates back to fifth century B.C. where in Greece, it was exercised by the prisoners of King Darius. Another famous quoted technique is that of tattooing of a secret message over the shaved scalp of a slave. After some time when the hair of the salve grew, he was sent to the destination where his head was again shaved for reading the camouflaged message. Germans showed masterly expertise in World War II. With ‘microdot’ technique, messages were photographed and reduced to size as small as a period (full stop) [3][4]. Digital technology has given a broader spectrum to steganography to flourish as compared to unconventional ways of secret writing such as writing messages with invisible ink made up of onion juice, alum, ammonia salts and other similar materials that glow dark when held over flames – a technique used extensively by the British and Americans during American Revolution [4]. Today variety of digital electronic media such as audio, image, video, text etc. provides a convenient way for hiding valuable information [5][6][7]. The fascinating attribute attached with digital text documents is the fact that these are written, saved and retrieved by the personal computer in a manner as is seen by the necked eye. This is contrary to the mechanics of other digital file formats like image, video, audio etc. where the information saved in computer is different from that, which is retrieved. Various techniques for hiding data in text file exist. It is, however, worth to mention that Text steganography is considered as the most challenging of all since it involves zero overhead of meta data often used for hiding information [7]. 3. Requirements of Hiding Digital Information A number of protocols and different data embedding techniques exist that enable us to hide information in a given object. However, all of the protocols and techniques must satisfy following requirements so that correct steganography can be applied. The following lists requirement that all steganography techniques must adhere: Ph.D. Thesis Proposal
4
By: KHAN FARHAN RAFAT
3.1 The integrity of the hidden information after it has been embedded inside the stego object must remain intact. 3.2 The stego object must remain unchanged or almost unchanged to the naked eye. 3.3 It is assumed that attacker knows that secret data is hidden inside the stego object. 4. STEGANOGRAPHIC TECHNIQUES A number of available digital media including Text, Image, Audio, Video Files together with other types are being used for hiding secret information. 5. RELATED WORK 5.1. Acronym [8] Table 1 Acronym
Translation
l8
Too late
ASAP
As Soon As Possible
C
See
CM
Call Me
F2F
Face to face
In this method words can be substituted with their abbreviations to represent the binary bit pattern of zero or one corresponding to the bits of secret information. 5.2. Change of Spelling [9] Table 2 American Spelling British Spelling
Ph.D. Thesis Proposal
Favorite
Favourite
Criticize
Criticise
Fulfill
Fulfil
5
By: KHAN FARHAN RAFAT
This method exploits the way words are spelled in British and American English for hiding secret information bits. 5.3. Semantic Method [14] Table 3 Big
Large
Small
Little
Chilly
Cool
Smart
Clever
Spaced
Stretched
Synonym substitution of words is used to hide the binary bits of secret information. The synonym substitution may represent a single or multiple bit combination for the secret information. 5.4. HTML Tags [11][17] HTML Tags can be used in varying combination or as gaps and horizontal tabulation to represent a pattern of secret information bits. 5.4.1
Using white space in tags
Stego key:
, , or
… 0
, , or
… 1 <user >
Alice 01 Hidden Bit String: 101100
Ph.D. Thesis Proposal
6
By: KHAN FARHAN RAFAT
5.5. XML Document [19] XML is a preferred way of data manipulation between web-based based applications. The user defined tags are used to hide actual message or the placement of tags represents the corresponding secret information bits. For example to hide 01110, 01110 following can be used: Stego key:
-> 0 ,
-> 1 Stego data:
5.6. IPv4 [20] Figure 2
Figure 2 shows how the IP (version 4) header is organized. Three unused bits have been marked (shaded) as places to hide secret information. One is before the DF and MF bits and another unused portion of this header is inside the Type of service field which contains two unused bits (the least significant bits). 5.7. The Transport Layer [[20] Figure 3
Ph.D. Thesis Proposal
7
By: KHAN FARHAN RAFAT
Every TCP segment begins with a fixed fixed-format 20-byte byte header. The 13th and the 14th bytes of which are shown in Figure 3. The 66-bit field not used, indicated in shade, can be used to hide secret information. 5.8. White Spaces [12][1 ][17] Tabel 4
Original Text
Table 5
Encoded Text
In this technique spaces between words, sentences or paragraphs are used to represent bits of secret information. 5.9. Line Shifting [12][13 12][13][17] Figure 4
This method hides information by shifting the text lines to some degree to represent binary bits of secret information. 5.10. Word Shifting [13][1 ][17] Figure 5
Here, the distance between words is altered to hide bits of secret information.
Ph.D. Thesis Proposal
8
By: KHAN FARHAN RAFAT
5.11. Feature Coding [13] This method hides the secret information bits by associating certain attributes to the text characters. 5.12. Miscellaneous techniques [10] A number of idiosyncrasies ways can be associated with hiding information, by introducing modification or injecting intentional grammatical word/sentence errors to the text. Following are some techniques / procedures which can be employed in this context: 5.12.1 Typographical errors - “tehre” rather than “there”. 5.12.2 Using abbreviations / acronyms - “yr” for “your” / “TC” in place of “Take Care”. 5.12.3 Transliterations – “gr8” rather than “great”. 5.12.4 Free form formatting - redundant carriage returns or irregular separation of text into paragraphs, or by adjusting line sizes. 5.12.5 Use of emoticons for annotating text with feelings - “:)” to annotate a pun. 5.12.6 Colloquial words or phrases - “how are you and family” as “how r u n family”. 5.12.7 Use of Mixed language - “We always commit the same mistakes again, and ’je ne regrette rien’!”. 5.13. MS Word Document [15] This method makes use of change tracking technique of MS Word for hiding information, where the stego-object appears to be a work of collaborated writing. The bits to be hidden are first embedded in the degenerated segments of the cover document. This is followed by the revision of degenerated text thereby imitating it as being an edited piece of work.
Ph.D. Thesis Proposal
9
By: KHAN FARHAN RAFAT
Figure 6
6. PROBLEM STATEMENT 6.1
All of the existing text based encoding methods either require original file or the knowledge of the original files formatting to be able to decode the secret message.
6.2
Adding spaces between words and lines or HTML tags or Inserting data past end of file mark Increases File length and are equally eye catching.
7. SCOPE STATEMENT 7.1
To Evolve Steganographic Technique which results in a Zero over headed STEGO file as till today “NO” known example of hiding binary data in ASCII text document exist which results in a stego-file of length equal to that of cover Text-file.
7.2
To suggest enhancements in existing text based steganographic techniques.
8. PROPOSED SOLUTION The proposed solution will consist of: 8.1
Methods based on generating ASCII cover text corresponding to a given message.
8.2
Method based on altering a given ASCII TEXT cover in order to encode the message in it (Figure 7 refers).
Ph.D. Thesis Proposal
10
By: KHAN FARHAN RAFAT
Figure 7
Ph.D. Thesis Proposal
11
By: KHAN FARHAN RAFAT
References [1]. Code Wars: Steganography, Signals Intelligence, and Terrorism. Knowledge, Technology and Policy (Special issue entitled ‘Technology and Terrorism’) Vol. 16, No. 2 (Summer 2003): 45-62 and reprinted in David Clarke (Ed.), Technology and Terrorism. New Jersey: Transaction Publishers (2004):171-191. Maura Conway. [2]. Elements of Cryptography – 6th Edition (Student edition). Arthur H. Robinson, joel L. Morison, Phillip C. Muehrcke, A. John Kimerling, Stephen C. Guptill, ISBN – 9-81412638-1 [3]. Steganography: is it becoming a double-edged sword in computer security? Miss K.I. Munro, University of the Witwatersrand. [4]. Steganography 2nd Lt. James Caldwell, U.S. Air Force, 2003, www.stsc.hill.af.mil , last accessed November 14, 2008. [5]. Algorithms for Audio Watermarking and Steganography Nedeljko Cvejic, University of Oulu 2004. [6]. Image Steganography: Concepts and Practice M. Kharrazi, H. T. Sencar, N. Memon, National University of Singapore (2004). [7]. Techniques for data hiding W. Bender, D. Gruhl, N. Morimoto, and A. Lu, IBM Systems Journal, Vol. 35, Issues 3&4, pp. 313-336, 1996. [8].Text Steganography in SMS Mohammad Sirali-Shahreza, M. Hassan Shirali-Shahreza, 0-7695-3038-9/07 © 2007 IEEE, DOI 10.1109/ICCIT.2007.100 [9]. Text Steganography by Changing Words Spelling Mohammad Shirali-Shahreza, ISBN 978-89-5519-136-3, Feb. 17-20, 2008 ICACT 2008 [10].Information Hiding Through Errors: A Confusing Approach Mercan Topkara, Umut Topkara, Mikhail J. Atallah, Purdue University [11].Adaptation of Text Steganographic Algorithms for HTML Stanislav S. Barilnik, Igor V. Minin, Oleg V. Minin, 8th International Siberian Workshop and Tutorials EDM'2007, Session IV, JULY 1-5, ERLAGOL [12].Document Marking and Identification using Both Line and Word Shifting S. H. Low N. F. Maxemchuk J. T. Brassil L. O'Gorman, AT&T Bell Laboratories, Murray Hill NJ 07974, 0743-166W95-1995 IEEE
Ph.D. Thesis Proposal
12
By: KHAN FARHAN RAFAT
[13].Research on Steganalysis for Text Steganography Based on Font Format Lingyun Xiang, Xingming Sun, Gang Luo, Can Gan, School of Computer & Communication, Hunan University, Changsha, Hunan P.R.China, 410082 [14].A New Synonym Text Steganography M. Hassan Shirali-Shahreza, Mohammad ShiraliShahreza. International Conference on Intelligent Information Hiding and Multimedia Signal Processing 978-0-7695-3278-3/08 © 2008 IEEE [15].A New Steganographic Method for Data Hiding in Microsoft Word Documents by a Change Tracking Technique Tsung-Yuan Liu, Wen-Hsiang Tsai,and Senior Member, 1556-6013 © 2007 IEEE [16].Applied Cryptography, Second Edition: Protocols, Algorthms, and Source Code in C, by Bruce Schneier Wiley Computer Publishing, John Wiley & Sons, Inc.ISBN: 0471128457 Pub Date:01/01/96 [17].Steganography and Digital Watermarking 2004 Jonathan Cummins, Patrick Diskin, Samuel. Lau and Robert Parlett, School of Computer Science, The University of Birmingham. [18].Digital Watermarking and Steganography, Second Edition Ingemar J. Cox,Matthew L. Miller, Jeffrey A. Bloom, Jessica Fridrich, Ton Kalker. Morgan Kaufmann Publishers is an imprint of Elsevier. 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA. ISBN 978-0-12-372585-1 [19].Steganography: A New Horizon for Safe Communication through XML Aasma Ghani Memon, Sumbul Khawaja and Asadullah Shah. Isra UniversityHyderabad, Pakistan.Journal of heoretical and Applied Information Technology ©2005 – 2008 [20].An Analysis of Steganographic Techniques by Richard Popa.
Ph.D. Thesis Proposal
13
By: KHAN FARHAN RAFAT