Final year project
Objects in the Cloud Revision : 185
Author:
Supervisor:
Geerd-Dietger Hoffmann
Ruth Pitman
May 21, 2009
Abstract
Cloud computing is rapidly gaining the interest of service providers, programmers and the public as no one wants to miss the new hype. While there are many theories on how the cloud will evolve no real discussion on the programmability has yet taken place. In this paper a programming language named objic is described, that enables programs to run in a distributed manner in the cloud. This is done by creating an object orientated syntax and interpretation environment that can create objects on various distributed locations throughout a network and address them in a scalable, fault tolerant and transparent way. This is followed by a discussion of the problems faced and an outlook into the future.
i
Legal
Copyright The copyright is held by Hoffmann Geerd-Dietger, (May 21, 2009) This paper is licensed under the Creative Commons “Attribution-Share Alike 2.0 UK: England & Wales” License. To view a copy of this license, visit http://creativecommons.org/licenses/bysa/2.0/uk/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. The code is published under the P-BSD license that can be found under Appendix B on page 70
Clarification This document reflects solely the opinion and views of the author stated above and does not represent the views, opinions or standpoint of the University of Bournemouth in any way or form. For simplicity this paper always uses the masculine form. This has nothing to do with the gender of the people that are talked about. Apologies if this insults the reader.
University rights This report is submitted in partial fulfilment of the requirements for an honours degree at the University of Bournemouth. The author declares that this report is their own work and that it does not contravene any academic offence as specified in the university regulations. Permission is hereby granted to the University to reproduce and to distribute copies of this report in whole or in part. Signature: Hoffmann Geerd-Dietger, Bournemouth May 21, 2009
Word count: 16481
ii
Acknowledgments
I would like to acknowledge and extend my gratitude to Ruth Pitman for her support, dedication towards the students and ongoing advice. Further I would like to thank my family without their understanding, patience and encouragement I would not be where I am now in life. I am also grateful to my girlfriend for her unconditional love, even in rough times. Further my appreciation goes to Dan, Tom, Edd, Dave, Elliot, Ivan, Cornelius, Laurie, David and everyone who hoped his name would be here.
iii
Contents
Abstract
i
Legal
ii
Acknowledgments
iii
1 Introduction
1
1.1
Statement of Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2.1
Project Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2.2
Personal Objectives
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3.1
Project Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3.2
Tools Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3.3
Programming Language . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.4.1
Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.4.2
Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.4.3
Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.4.4
Choice of Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.4.5
Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Good Practise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.5.1
Licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.5.2
Backup Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.5.3
Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.6
Development Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.7
Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.3
1.4
1.5
2 Literature review
10
2.1
Definition: “What is the Cloud?” . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2
Definition: “What is an Object” . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3
Objects in the Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4
Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5
Similar Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5.1
SOAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5.2
CORBA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 iv
CONTENTS 2.5.3 2.6
dSelf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Solution Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Requirements
19
4 Design of System
21
4.1
Discussion on Antlr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2
Grammar Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3
Standard Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3.1
ME keyword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3.2
ARGS Keyword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4
objic Class Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.5
Object Location Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.6
ObjectServer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.7
Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.8
Object Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.9
Object State
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.10 Garbage Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.11 Known Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.11.1 Object Version Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.11.2 Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.11.3 Local / Public variables, methods, classes . . . . . . . . . . . . . . . . . 31 5 Implementation 5.1
33
Discussion of Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.1.1
objectServer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.2
objicc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.3
orun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2
Object Server Class Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3
Class descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3.1
Initializer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3.2
Interpreter
5.3.3
MethObj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3.4
ObjConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3.5
ObjManager
5.3.6
oClass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3.7
oObject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
v
CONTENTS 5.3.8
oInt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3.9
oString
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3.10 oVm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.3.11 RequestHandler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.4
Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.5
Error Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.6
VOID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.7
Duck-Typing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.8
Abstract Syntax Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.9
Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.10 Implementation Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.11 Finished Artifact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.11.1 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.11.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6 Testing
47
6.1
Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2
Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.3
Code Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.4
Code Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.5
Recording Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7 Critical Evaluation / The objic Language 7.1
52
Evaluation of Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.1.1
Language Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.1.2
Language Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.2
Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.3
Development Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.4
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.5
Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.6
Project Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.7
Personal Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
8 Future Work
57
8.1
Short Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8.2
Long Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 vi
CONTENTS 9 Conclusion
61
10 List of Abbreviations
63
A Appendix
69
B License
70
C Antlr Syntax
71
D Gantt Chart
76
E INSTALL
77
F CD Content
79
G Backup Script
80
H For Loop
81
I
82
SOAP/HTTP Comparison
J Programming the Cloud
83
K Grammar Description
87
K.0.1 block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 K.0.2 call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 K.0.3 classdef and methdef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 K.0.4 whileloop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 K.0.5 forloop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 K.0.6 newvar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 K.0.7 paramlist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 K.0.8 stat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 K.0.9 NAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 K.0.10 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 L Man Pages
93
M Code Example
97
N Design Diagrams
99
vii
“The World Wide Computer” Nicholas Carr
1 Introduction
This chapter will introduce the reader to the central problem discussed in this paper, followed by the approach and methods used in trying to solving it.
1.1 Statement of Problem Cloud computing is seen to bring together many services that are provided through the “world wide computer” [Carr, 2009]. A trend to multifunctional environments is currently taking place on the operating system kernel level encouraged by new virtualization techniques (see XEN, VMware, OpenBox). On the other hand, on the highest level of abstraction, object orientated notations and ideas are mostly used [Hayes, 2008]. The general concept is that once the cloud provider is chosen, a lock-in to their techniques and libraries occurs. Service compatibility is then achieved by adding specific output filters to the program (see SOAP, REST in section 2.5.1 on page 15), which emulate object usage. This results in that every Software as a Service (SaaS) provider creates his own format. Other programs then have to retrieve this information and parse it accordingly and create local object representations, if they want to communicate with this service. This creates many difficulties especially when the format has to change [Emmerich, 2000]. By these methods, both ends of a cloud service stack have become scalable, or in a nutshell “cloud enabled” [Beard, 2008]. Since the important layer of compilers and interpreters and as such the program constructs, have been neglected in the past few years, it is still the case that to use other services of a cloud provider, the programmer has to include some specific library or write the interface himself [Haggholm, 2007]. Efforts to make compilers and/or interpreters more “cloud friendly” have only resulted in non-complete products (see dSelf in section 2.5.3 on page 17) and are not generally used. As seen by the success in the usage of SOAP and the object orientated paradigm, an object oriented distribution approach bears many advantages for the cloud, but has not been implemented in the layer of programming languages yet. 1
CHAPTER 1. INTRODUCTION
1.2 Objectives 1.2.1
Project Objectives
• The project will create an innovative Turing complete [Abelson, 2001] object orientated programming language that enables and promotes distribution of objects throughout a network. The core principle of the language will be that it will make no difference to the syntax of the code if the object is initialised locally or on an unknown resource indicated by an URL (Uniform Resource Locator). The syntax of the language should seem familiar to any C, Python or Java programmer. • Provide the basis for a discussion of how and if distributed objects can be used for cloud programming purposes.
1.2.2
Personal Objectives
• Gain a sound understanding of compilers, interpreters and the technology involved. • Understand the issues and problems associated with distributed computing and try to find solutions. • Define cloud computing and gain knowledge about the general topic. • Become familiar with Python and the tools linked to it.
1.3 Methods 1.3.1
Project Aim
The project tries to create a novel object orientated programming language that acts as a layer of glue between the hardware cloud providers and the presentation of the user interface where objects are already emulated and used. It should be possible to use an array of services provided in the cloud, through published objects, in an independent and transparent way. It should further encourage people to offer a service to other users, through letting other people instantiate the objects they have written. In the current situation, if someone has written a good encryption library, for example, he is forced to use non standard methods to write a web service that makes this library usable. By using the language created in this project, publishing this library through a well defined interface and securing the intellectual property by keeping all execution in the server, should be enabled and encouraged. A further aim is to make it easy to incorporate services provided by different providers in a scalable, fault tolerant 2
1.3. METHODS and traceable way. Despite that no attempt, known to the author, has been made so far to implement anything in this way. A discussion of similar techniques is needed to enable an objective perspective. This evaluation will be followed by an outlook into the future.
1.3.2
Tools Used
(All links were accessed on May 21, 2009) • ANTLR (http://www.antlr.org) ANTLR (ANother Tool for Language Recognition) is a parser and lexical analyzer generation tool. This tool will be used to generate an AST (Abstract Syntax Tree) from the source code. To do this, it uses LL(*) parsing and has proven in many industry projects to be highly reliable. The syntax is specified in a EBNF (Extended Backus-Naur Form) like form. Then, different tree walking algorithms will be used to optimize and execute the code statements. • Subversion (http://subversion.tigris.org) Subversion (SVN) is a version control system. It can easily manage modifications, recovery and versioning of files. It is considered to be one of the industry standards next to the Concurrent Versions System (CVS). It will be used to keep track of the project changes and synchronise everything on different computer systems. Everything produced in the process of the final year project will be imported into this system. Once the report is finished this will also allow other contributors to add their code and ideas to the project. • Trac (http://trac.edgewall.org) Trac is a project management tool that supplies a wiki, issue tracing, roadmap and SVN front-end. This will be used to record the project milestones, create a wiki of pages that have influenced decisions and the timeline will be used to estimate the completion of tasks. A further application will be to record faults and errors that need to be fixed in the written report and in the source code. • Eclipse (http://www.eclipse.org) Eclipse is an integrated development environment (IDE) which will be used with the pydev plug-in to write the source code. Some of the main features that influenced the decision to use this tool were automatic code completion, error checking, test coverage checking and platform independency. • GNU/Linux, CentOS (http://www.kernel.org ,http://www.centos.org) Linux is an Open Source Unix-like operating system that uses the GNU libraries and programs to create 3
CHAPTER 1. INTRODUCTION a fully functional operational environment. There are many Linux flavours and one of them is CentOS which is derived from the Red Hat Enterprise distribution. Linux offers a full development environment and runs all the programs needed to develop this project. • LATEX (http://www.latex-project.org) LATEX is a typesetting environment that is based on TEX. The idea behind TEX is that the author should concentrate on the content and not on the mark-up. It can also auto generate indexes, bibliographical references and content pages. • pylint (http://www.logilab.org/857) Pylint is a static source code analysing tool. It looks at the program syntax and tries to find errors and coding standard violations. It further looks for code smells [Fowler et al., 1999a]. A very useful feature is that it rates your code and gives it a mark out of 10. This will be used to analyse the code quality. • PyChecker (http://pychecker.sourceforge.net) PyChecker is a dynamic runtime checker. It executes the code and looks for errors that might occur but are not caught based on that Python is a very dynamic language. It is very useful for development but will not help in evaluating the code. • Tools also used Vi, Emacs All benchmarks obtained for this paper were created by running the program discussed 25 times and using the average execution time (using the atime program). The system used was an AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ with 2GB of main memory running CentOS 5.3 GNU/Linux.
1.3.3
Programming Language
Normally compilers and interpreters used to be written in C [Aho et al., 2006] and all major programming languages still are (Java, C, Python, etc). However in recent years, alternative languages have been eroding this monopoly, based on their teaching at Universities [Appel, 2002]. Because of the time constraints, the decision was made to write the project in Python. Python is a multipurpose, multi paradigm, high level, object oriented language. By enabling rapid application development and having many built-in modules (see http://www.Python.org) Python increases the output a programmer can produce [Ousterhout, 1997]. 4
1.4. CONSTRAINTS
1.4 Constraints 1.4.1
Time
Based on the complexity of the problem and the amount of research needed, time can be considered to be one of the toughest constraints. Because of this, the application will merely be a proof of concept and not include any run time optimizations. It can be assumed that the decision to use Python as a programming language will also influence the runtime of the compiler and interpreter drastically. As the time frame is so narrow only a Turing complete language will be created that will not have many libraries for disposal. A Gantt-chart was used to document progress and account for slack and tasks ahead (see Appendix D on page 76).
1.4.2
Change
As the topic is still an active field of research and not well understood, the background reading and design will change while developing the report and artefact. It can be expected that some new techniques will be published while work on the project progresses. Due to this, the project will be developed in an iterative approach (see “Development approach” in 1.6 on page 7)
1.4.3
Knowledge
Based on that many areas the compiler will be using are not fully researched yet, there will be problems in clearing some hurdles encountered. Distributed memory management, for example, is something that is not well understood but has to be performed. This may inflict a constraint on the project as some solutions have to be implemented that may not be complete.
1.4.4
Choice of Language
Based on previous research, Python was chosen as implementation language which might cause some problems. Python is known not to be as memory efficient as C and program execution time might increase due to not being able to optimize certain constructs. As execution time and memory size are not vital for the success of the project this is a small constraint.
1.4.5
Operating System
As all the development is done on UNIX / Linux machines it cannot be assumed that the software will run on any other Operating System. 5
CHAPTER 1. INTRODUCTION
1.5 Good Practise It is an aim to comply with the BCS good practise code (see http://www.bcs.org/upload/pdf/cop.pdf) throughout the whole project.
1.5.1
Licensing
Great care has been taken throughout this project to only use Open Source Software. No “non-free” software has been used to create any part of the project. This is also reflected in the fact that the software produced is licensed under the 5 paragraph P-BSD (Pacifist Berkeley Software Distribution) license. This is a derivate of the original BSD (Berkeley Software Distribution) license. The BSD license is one of the oldest licences around (released 1990) and is considered to be quite close to publishing it under the public domain. The license allows proprietary use and the source derived does not have to made public. The licence used in this project has following main points: • Copyright is held by Geerd-Dietger Hoffmann. • Copyright notice is not allowed to be removed from code. • Binary form must include copyright notice. • Advertisement mentioning the software must contain acknowledgment. • The author holds no responsibilities for any damages caused by the software. • The software is not allowed to be used to harm any other human being. The full license is printed in Appendix B on page 70
1.5.2
Backup Strategy
As mentioned above all work produced was held in a SVN repository. Through this approach backups are made with every repository update as all the data is copied and updated in all repositories held on the computers that the work was done on. To further strengthen the security of the backup strategy the script found in Appendix G on page 80 was used. This first checks that everything is up-to date and will exit with a warning if not, otherwise it will copy all the data to European Organization for Nuclear Research (CERN) and to another off-site location. By applying this strategy it is possible to hold seven copies of the data in geographical and system independent locations. 6
1.6. DEVELOPMENT PROCESS
1.5.3
Documentation
Documentation has two roles in this project: Firstly the code should be well documented and secondly the external project or program documentation should be concise. In both cases it is important to have precise descriptive comments and all written material should be easy to understand.
1.6 Development Process As the topic is of a highly complex nature, an iterative development process was chosen. This approach [Bittner and Spence, 2006] consists of four main stages and builds upon reworking, refactoring [Fowler et al., 1999b] and extending the already existing source code. Methodologies like Waterfall and V were considered, but not used as they were seen to be too inflexible. The four main stages can be repeated as many times as needed and build upon each other. Therefore, the output of the last iteration is the input of the next. The SVN tools were very useful here, as it was possible to log the output of every iteration and see the continuous change of the project. The four stages can be defined as: 1. Requirement gathering In this stage it has to be clarified what part or artefact should be produced in this iteration. The general guideline was that no iteration should take longer than one day. After defining the piece of code that should be written, a little description of the functionality was made as source code comments. 2. Design + Implementation In this stage the modules that needed changing were identified and comments were placed in the appropriate locations in the code. Further, files were created and filled with comments describing the functionality. After reviewing the changes and the implication on the system, the comments were replaced with code. 3. Testing As the iterative steps were well defined, testing was done with a little input file that was extended to include the functionality added. 4. Reworking After verifying the correctness of execution, Pylint and PyChecker (see section 1.3.2 on page 4) were used to evaluate the quality of the code and optimization possibilities were 7
CHAPTER 1. INTRODUCTION explored. Further the code was reviewed before committing to the source tree. This was done through a shell script that showed all the changes made, to the code, before the iteration. The iterations can be grouped in following continuous main steps: 1. Parsing 2. Interpreting 3. Object instantiation 4. Objet Server 5. Distributed object instantiation 6. Base object 7. Classes 8. Stacks 9. Conditionals 10. Memory management 11. While 12. Break 13. For 14. User functions 15. Return 16. Everything running in Object server 17. User classes
1.7 Layout • Chapter 1 on page 1 In this chapter a brief introduction of the problem domain and the methods used to solve it are given. 8
1.7. LAYOUT • Chapter 2 on the next page This section explores the literature associated to the project and explains the major terminology used. • Chapter 3 on page 19 Some very high level requirements are discussed in this chapter. • Chapter 4 on page 21 This section introduces the design decisions taken and discusses the issues involved. • Chapter 5 on page 33 The implementation details are introduced and the class structure is described in more detail in this section of the report. • Chapter 6 on page 47 In this chapter the testing strategy is introduced and the methods used are described. • Chapter 7 on page 52 A critical evaluation of the program and the personal performance is carried out in this chapter. • Chapter 8 on page 57 In this section an outlook into the near and far future is provided.
9
“It does not matter how many books you have, but how good
2
the books are which you have” Seneca
Literature review
This chapter will start in explaining the terms the paper title consists of, followed by an introduction to similar technologies and ways of thought which will conclude in a discussion of the proposed solution.
2.1 Definition: “What is the Cloud?”
Cloud computing is said to be one of the biggest shifts ever seen in the way computers are used [Carr, 2009], but first it has to be clarified what “the cloud” stands for and how a cloud can compute. The term “cloud” was coined based on the image of a cloud for the internet which should resemble a large amount of anonymous, interlinked computers [Miller, 2008] (Figure 2.1).
Figure 2.1: A typical network diagram using a cloud In essence this means that a “cloud” of computers and/or servers acts and reacts as a single computer [Breitter and Behrendt, 2008]. These computers can be owned by a big company and as such be housed in big server farms, can be personally owned home machines or virtualized resources [Buyya et al., 2008]. The important thing is that this conglomerate of machines can be accessed via the internet. Lots of synonyms have been associated with the cloud like 10
2.1. DEFINITION: “WHAT IS THE CLOUD?” Utility Computing (UC), Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS) [Armbrust et al., 2009]. To discuss the topic in more detail, the ambiguous term “cloud computing” has to be divided into two categories: • Storage Data storage forms the base of all computing, this is one of the main requirements to be able to process anything. In terms of cloud computing “cloud storage” can be defined as data being saved on multiple third party servers [Beard, 2008]. The storage appears to the user as one coherent block of space that he has for his use. One of the most used storR service, which charges the user age providers is the Amazon S3 (Simple Storage Service)
dynamically based on usage by a metric consisting of upload/download and data held. For the user, the storage seems unlimited and is only bound by the amount of money he can pay. The user does not know where the data is housed, and it is known that Amazon holds redundant copies in different countries (http://aws.amazon.com/). This, of course, holds some risks for companies as laws and regulations might change from country to country, but this issue will not be discussed here. Further, as currently seen with the Google document error, by uploading data into the cloud it might become involuntarily accessible to the whole world (http://googledocs.blogspot.com/2009/03/on-yesterdaysemail.html). However the feature of being able to share all documents that “live” in the cloud is seen as one of the great advantages [Hayes, 2009]. By uploading data into a cloud storage service, data security (loss, corruption, access) is outsourced to the storage provider. There are many such storage providers but they all conform in that they offer online accessible storage with the actual implementation hidden from the user [Hayes, 2008]. • Software Cloud programs are very similar to Software as a Service in that they are hosted online and mostly accessible through a web browser. However, they are different in the respect that the underlying hardware is not always provisioned by the creator of the service. Software as a service is a well researched area [Menken, 2008], but Utility Computing is just at the beginning. By having the administration of the services outsourced maintenance and software installation are greatly simplified. There are two main areas of thought here: The first one is the way Amazon is taking. It is possible to buy time on a virtual machine which can then be installed and configured as needed. If the service needs more calculation power more time can be bought on that virtual machine. Further scaling horizontally, with only money posing as a boundary, is possible by 11
CHAPTER 2. LITERATURE REVIEW R adding machines through a web interface. The other way of thought is the way Google R are proposing (see http://googleenterprise.blogspot.com/2009/04/whatand Microsoft
we-talk-about-when-we-talk-about.html), in this context the developer has to write his program in a special programming environment and using vendor specific libraries. This makes the maintenance, scalability and installation very easy as it is independent of the users (see http://www.ibm.com/developerworks/linux/library/l-cloud-computing). So cloud computing can be defined by accessibility on the internet; mostly through a browser; nearly infinitive pool of resources; horizontal scalability and dynamic payment [Foster et al., 2008]. Finally, it has to be stated that cloud computing is not grid or grid computing. Clouds may have properties similar to the grid and can internally use grid software to manage the underlying architecture but the cloud consists of a stack of services whereas grid computing is one layer of this stack [Delic and Walker, 2008].
2.2 Definition: “What is an Object” In “traditional languages” like C or FORTRAN a data structure is derived and then functions are created to modify this structure in some predefined way. There is a clear separation between a data structure and a function [Holmes, 1994]. Functional programming can be seen as a list of statements to execute on a data structure. This methodology is changed in the object orientated way of thinking [Parnas, 1972]. Here the data structure and the functions are shielded or “encapsulated” by an interface from any other part of the program [Parsons, 1997], the internal state and structure is “hidden” [Kuechlin and Weber, 2000]. The object is modified by the publicly invokable methods and remembers the state it is in after those methods are executed. Objects normally correspond to real life artefacts and try to model their behaviour [Jacobson, 1992]. A “car” object for example might have a “drive” method but adding another wheel would not be possible, as there is no method provided to do so. An object oriented program can be imagined as many objects calling methods from each other and having more or less “intimate” relationships, an object may also consist out of a range of other objects [Aho et al., 2006]. Many objects have the same “blueprint” or internal structure so they can be grouped into a “class”. A class is the definition of an object which is then instantiated to be able to hold the state. A class might be called “Car” where the instances are “Fred’s VW”, “Bob’s Ford” and “Didi’s Aston Martin”. There is no discussion of Polymorphism or Inheritance included here as this would extend the topic too much. Good literature about this is [Holmes, 1994] [Abelson, 2001] [Arlow and Neustadt, 2005] [Pilone and Pitman, 2005] 12
2.3. OBJECTS IN THE CLOUD
2.3 Objects in the Cloud The object oriented view is one of the programming paradigms most used today, further it is very suitable for distribution. As it is possible to view the components of a system as objects, which are the smallest entity of data and functionality that possess a strictly defined interface, the communication between them can be easily modified. This idea is nothing new [Emmerich, 2000] but moving objects into the cloud is a new idea about which nothing, to the current date, has been published. In the history of distributed objects it was always important where the objects resided, based on the strong research area of big cooperate companies. Further, execution time was seen as a crucial evaluation point. This focus resulted in a limited adoption of distribution or only in very specific cases. Distribution ideas were further misused as output filters for already existing programs and two such areas of thought were mixed, resulting in the CORBA technology.
2.4 Compilers A compiler is a program that reads a well defined source language and outputs a related target language. This target language can be an executable, that can be run directly on an architecture or a byte code, an Abstract Syntax Tree, or similar which can be interpreted. To be able to translate there are four distinctive steps as seen in Fig. 2.2.
Figure 2.2: Translation of a print statement
1. Lexical Analysis 13
CHAPTER 2. LITERATURE REVIEW In this step, the Lexical analyzer or “scanner” reads the source program and creates meaningful tokens out of the characters. This means it tries to split the input up into little lexemes.
2. Syntax Analysis The parser uses the tokens to create a tree-like structure which is normally called a Parse Tree. This is created based on a set of rules which describe how the syntax is recognized and how the tree should be created. The regular appearance of such a tree is that the operator is the root and the children are the parameters. This can be seen in 2.2 on the preceding page after the step named “PARSER”.
3. Semantic Analysis The semantic analyzer checks if the syntax tree has the correct semantic form and might perform some optimizations. This means that the input is correct and can be understood by further steps (semantic rules). Some compilers also do type checking and other changes to the tree like type conversions. The output is then called an Abstract syntax tree (AST).
4. Code Generation This is the final step were the intermediate representation is then converted into actual output code. If this output is some form of Assembly language the registers are allocated and the output is generated. This step can vary from implementation to implementation: some compilers split it into three sub areas Intermediate Code Generation, Code Optimization and Code Generation, whereas other compilers optimize based on the syntax tree.
In this project an Abstract Syntax Tree interpreter will be used. The output of the Lexical analysis and the Syntax analysis will be optimized through a tree walking algorithm [Appel, 2002] and the optimized tree will be saved in a file. A virtual machine (VM) then loads the file and executes every statement [Shi et al., 2008]. This approach was chosen because of the highly distributed nature of the environment in which the code will execute. Thus no usable binary file could have been created [Rowledge, 2001]. 14
2.5. SIMILAR TECHNOLOGY
2.5 Similar Technology 2.5.1
SOAP
SOAP stands for the Simple Object Access Protocol and was initially based on HTTP and R with the two main targets of “providing a standard object invocation developed at Microsoft
protocol built on internet standards, using HTTP as the transport and XML for data encoding. And creating an extensible protocol and payload format that can evolve.” [Scribner and Stiver, 2000]. In summary, the main purpose is to provide a structured packaging protocol for messages that have to be shared between applications [Snell et al., 2001]. It defines a set of rules by which data can be encapsulated in XML and transferred over a network. It has a fault reporting mechanism and routing protocol built in. By using XML as an envelope for all the data, SOAP is operating system and programming language independent, which is of great value in the heterogeneous environment the internet is at the present date. For completeness, in this paper, it has to be stated that SOAP can be used for two main applications; it is used for RPC (Remote Procedure Call) and for EDI (Electronic Document Interchange), however only the former usage will be discussed. SOAP messages have to obey very strict formatting rules to enable the understanding of type, encoding and procedure of the information (see Code 1 on the next page). In order to make the example easier to understand the header has been left out. By convention, every SOAP message should have a header but is not required to. In the header information for the processing of the message is stored, this includes keywords like “mustUnderstand” which tells the parser that all content of the message has to be fully understood or “transactionID” which can be used to keep track of multiple transactions. There are too many keywords to explain, a good source is the SOAP specification that can be found under “http://www.w3.org/TR/2007/REC-soap12-part0-20070427/”. SOAP does not include processing instructions, memory management features, pipelining, objects by reference or remote object invocation [Scribner and Stiver, 2000]. It is further often criticized for using Port 80 which is normally reserved for HTTP servers. However, it may be argued that this use could be a valid choice [Mueller, 2001]. By using plain text and XML, it is quite bloated in comparison with some binary formats and parsing is slow. A discussion on SOAP as a protocol for this project can be found under section 4.8 on page 28
2.5.2
CORBA
In this section, the discussion will centre around CORBA (Common Object Request Broker Architecture), this is only done as an example for an object orientated middleware. Microsoft’s c 15
CHAPTER 2. LITERATURE REVIEW Code 1 Standard SOAP message call 1
POST /StockQuote HTTP/1.1
2
Host: www.stockquoteserver.com
3
Content-Type: text/xml; charset="utf-8"
4
Content-Length: nnnn
5
SOAPAction: "Some-URI"
6 7
<SOAP-ENV:Envelope
8
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
9
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
10 11
<SOAP-ENV:Body> <m:GetLastTradePrice xmlns:m="Some-URI">
12 13 14 15
<symbol>DIS
COM and Sun’s Java RMI would have been exactly as suitable but the author is more knowledgeable about CORBA. CORBA was released in 1991 by the OMG (Object Management Group) and should enable many software components written in an array of different languages to be able to interchange data with each other [Henning, 2008]. This is done by using an IDL (interface definition language) to define externally visible interfaces and the mapping to the underlying source code. It is often seen to replace or extend RPCs (see RFC707). Each application initialises an Object Request Broker (ORB) which then takes care of the communication details like reference resolution, access policies and etc. CORBA uses a method of stubs and skeleton code to emulate objects towards the source code on the client or server and then handles the calls to these. It is a defined goal to hide the distribution as far as possible from the programmer [Emmerich, 2000]. Every object has a unique reference and is statically typed as defined by the IDL. CORBA has many benefits, it is fairly language independent and all big languages have bindings even when nearly none implement the full specification. Error handling is implemented in the form of 25 system exceptions. Because the broker compiles the code in the source language CORBA can be used on all operating systems on which the specific language can run. It tries to be high level and masks as much distribution from the programmer as possible. Since it is an open standard many companies have adopted and use it, most noticeably the GNOME project has used it for inter-process-communication [Orfali et al., 1995]. When discussing CORBA, some problems often mentioned are that distributed objects 16
2.5. SIMILAR TECHNOLOGY are handled differently to local instances. Because there were so many companies involved in creating the standard, many different biases have created an ambiguous description [Emmerich, 2000], this led to incomplete and sometimes error prone implementations. Based on this, the documentation is sometimes quite confusing and writing CORBA enabled applications can be very tedious as the author experienced multiple times. SOAP is often criticized for using port 80 whereas CORBA is criticized for not using it, as a lot of requests get filtered out by firewalls. This is still a hot topic, and in the authors opinion is unlikely to be resolved ever [de Jong , and others].
2.5.3
dSelf
dSelf is a object orientated programming language proposed by Kai Knubben in his “Diplomarbeit” in December 2000 [Tolksdorf and Knubben, 2002]. dSelf is the distributed variant of the language self which is a classless language (delegation based or prototype based languages) that was developed at Stanford University and Sun in the 1980s. The main difference to a traditional object oriented language like Smalltalk is that instead of classes and their instantiated objects, everything is a self prototype object that consists out of specific “slots”, to create a new instance once a base object is cloned. A slot is a pointer to a data or method object which can be added and removed dynamically. Through this no class hierarchy is produced and the self objects enable a flexibility that could not be achieved with class based syntax. An interesting fact about self is that the programming is done graphically; this has been continued in dSelf (see http://www.smalltalk.org.br/movies/). “Distributed” in dSelf means that slots can point to objects that are located in another dself virtual machine that is connected to a network (see Verteilte Implementierung der objektorientirten Programmiersprache SELF), connections are always to virtual machines so all objects contained in thus become accessible. This enables distributed inheritance and distributed instantiation. Accessing a remote slot is in no way different to accessing a local one. Primitive objects like string and integer will be copied to the host VM whereas complex objects will be referenced by a pointer, this is done for speed increase but also causes problems with race conditions and updates not propagating properly. It is worth mentioning that dSelf is not 100 % compatible with self based on some syntax extensions that needed to be made (http://www.ag-nbi.de/research/dself/dSelf-Diplomarbeit.ps.gz). Unfortunately, self and as such dSelf are not maintained anymore and dSelf has never left the prototype stage. Class based languages have never made it “mainstream” and therefore both projects can be considered as “completed research”. 17
CHAPTER 2. LITERATURE REVIEW
2.6 Solution Discussion The idea behind cloud computation, i.e. not caring about locality and horizontal scalability, enables to see the topic of distributed objects and computation under a different angle. Now the location and the execution times can be neglected as they can be offloaded “into the cloud”. Software as a Service is further gaining more and more importance by the surge in internet and network speeds. It is well understood that the current bottleneck is the last so called “copper mile” [Li et al., 2005], through calculating everything online and only transferring the actual needed output to the end-user this can be utilised more efficiently, as the main “cloud servers” are connected to the internet backbone which enables greater transfer speeds. It also enables a service to offer numerous output filters, while keeping the underlying calculation the same, which can be understood by an array of devices. The paradigm of object oriented programming has become the probably most used technique to express procedures (see http://www.tiobe.com/index.php/content/paperinfo/tpci) and is well understood and researched because objects are the smallest confined entity in a language it is desirable to distribute on this level of abstraction [Ostrowski et al., 2008]. These factors enable a complete new discussion of object based distribution, which this paper is trying to start.
18
“Engineers
are
all
basically
high-functioning autistics who have no idea how normal people do stuff.” Cory Doctorow
3 Requirements
It would be possible to name numerous requirements for this project, but as it is a current research topic and not intended for production, only very high-level requirements are listed and discussed. The ones chosen are the most commonly used terms with distributed computing and distributed object [Emmerich, 2000]. • Scalability Scalability has been a hot topic for a long time as serving all the data from one computer has not been possible for many years. Especially as one of the cloud computing corner stones is horizontal scalability, this is one major requirement for the project. However, this scalability should be hidden from the programmer’s side. A further idea is that as an object provider notices that a service is nearing its capacity he should be able to easily add new hosts. • Openness Having an open interface definition is vital for the success of the project. It is important that anyone can see the definitions and can understand the workings. This also affects the protocol that is used to communicate which has to be well documented and easily implementable. • Heterogeneity The project should enable many systems with different setups to communicate with each other in a standardised way. The underlying architecture or implementation should not affect the higher levels like the object interface. A further idea is that the language created should be able to embed and parse other source code formats, like Python for example. 19
CHAPTER 3. REQUIREMENTS • Resource Sharing It should be possible for one object to be used by many clients (other objects). This incorporates the requirement of scalability and enables sharing of data in an easy efficient way • Fault-Tolerance One major requirement is the graceful handling of faults and errors. This is especially important as the project is so distributed and the transport medium is not reliable. Obvious requirements have been intentionally left out like security, usability and similar as these are not discussed in this paper. For a complete discussion, this would be required, but it is not possible to achieve in the time frame given.
20
“Computer language design is just like a stroll in the park.
4
Jurassic Park, that is. ” Larry Wall
Design of System
This chapter will show a high-level view of the design decisions taken, discuss the grammar and known problems. As development name for the project objic (objects in the cloud) was chosen. This will be used throughout the paper. A rough overall design can be seen in Figure 4.1, in respect to the byte-code.
Source-Code
Compiler
Optimizer Byte-Code
Object Server
VM
Interpreter
Figure 4.1: The environment in relation to the byte-code
4.1 Discussion on Antlr As described in methods (see section 1.3.2 on page 3) Antlr was chosen as a parser generator. Other programs were evaluated such as Coco/R, Yacc and the Python internal parser “spark”, which was used in the first prototype. While all powerful in their own domain Antlr had some major advantages. First of all, through the completeness of the development stack a rapid prototyping approach was possible. The toolset especially AntlrWorks, which is a GUI for writing grammars, integrated very nicely (see Fig. 4.2 on the next page). Another main point was the huge adoption in industry and Universities which secures future development and complete references and manuals [Parr, 2007]. The possibility to generate numerous output languages, enables that many implementations of the objic compiler can be created, which will become helpful in further development. One of the main points 21
CHAPTER 4. DESIGN OF SYSTEM
Figure 4.2: The antlrWorks editor though, was the ability to create and modify an AST in the parsing stage in a standard and predefined way. This stops the usage of the error prone and inflexible in-lining technique.
4.2 Grammar Description The syntax for the language designed should be easily understandable for anyone with previous knowledge of programming. The grammar was influenced by C, Python and Java which the author considers to be the most commonly known languages. The first draft of the language was presented in the paper “Programming the Cloud” (see Appendix J on page 83) and was then extended and modified to meet the project requirements. A full description of the grammar can be found in the Appendix under K on page 87 and the listing of the Antlr file which was used to create the Tokenizer and Parser can be found under Appendix C on page 71 This is a summary of the most important syntactical rules, in Antlr syntax:
Construct
Description
block ⇒“{” <stat>* “}”
Everything between “{” and “}” is defined as a block and has its own stack frame
class <expr>
Defines a new class
<stat>
Everything that can be contained in a block, a complete list can be found in Appendix K.0.8 on page 91
22
4.3. STANDARD VARIABLES
return “(” <expr>“)”
Returns from a block, this can be a method or a program
def <expr>
Defines a new method in a class
call ⇒<expr>“.” <expr><parameters>
Calls a method with the given parameters
print “(” <expr>“)”
Prints the expression given
while “(” “)”
Executes the block while the boolean is true
break
Exits from a recursion
if “(” “)” (else
If the boolean expression is true the first
)?
block is executed otherwise the optional else
<expr>“=” new <expr><parameters>
Creates a new object and assigns a vari-
(@ server)?
able name to it, with an optional server parameter
<expr>“=” <expr>
Assigns one variable to another
<expr>“=”
Creates a new method objects and assigns it to the variable name
for “(” ; ; “)”
Creates a for loop like known from C
4.3 Standard Variables There are two variables defined when an object is created, this is done before the constructor of the object is called.
4.3.1
ME keyword
The “ME” variable is a pointer to the object itself, similar to “this” or “self” in Java or Python. This is used when calling methods that reside in the object. As soon as the VM loads in the byte-code for a new object it also creates a unique name and path. Then it loads the base object constructor (see section 5.3 on page 36) which then creates the “ME” variable on the lowest level of the stack, this could be called the class level. Through this ME is always a valid pointer in the object block which resides one stack level higher. 23
CHAPTER 4. DESIGN OF SYSTEM Code 2 ME usage example 1
class PrintMe{
2
def doSomething{
3
print("Doing something")
4
}
5
ME.doSomething()
6
}
Figure 4.3: Diagram showing the relationship between object and class stack frames
In Figure 4.3 it can be clearly seen that ME points to the object itself and one stack level above the variable “A” points to some other object in the cloud.
4.3.2
ARGS Keyword
The second variable that is defined by the VM is the “ARGS” pointer. This holds the value of the parameters passed in to a class or method. As a class does not have a specific constructor, it executes everything that is not in a method at creation (see section 4.4 on the facing page), it is passed parameters too as shown in example code 4 on the next page. When a method is then invoked in this class with parameters, a new pointer “ARGS” is created in a higher stack frame so it will be returned first. This enables every method to have its own “private” parameters while maintaining the global class parameters. Further as the variable is initialised on the stack level of the method when the method exits execution and pops the stack, the pointer is lost, which enables efficient memory management. Even when there is no parameter specified like on line 2 in example 4 on the facing page where “value” is called, this empty space will be filled with a default “VOID” token, to specify that an empty parameter list was created. This is the default behaviour for all methods and classes. 24
4.4. OBJIC CLASS STRUCTURE
4.4 objic Class Structure When a class is initialised all statements that are not encapsulated in a method definition are executed. This bears the risk of someone creating a new instance of the class itself somewhere in the code, which will result in an endless loop (see code 3), but the gain in flexibility and the compliance to Java and Python syntax style outweigh the disadvantage. Code 3 endless loop example 1
class LoopMe{
2 3
a = new LoopMe() }
Further all methods that are defined in the class are registered with the VM so calls are possible. The methods are only registered but not executed, it is also not possible to nest method definitions in each other. At creation the class also creates the ARGS variable with a pointer to the passed in parameters, this is discussed in section 4.3.2 on the facing page. Code 4 class ARGS example 1
class PrintArgs{
2 3
print(ARGS.value()) }
4.5 Object Location Specification There are three different ways of specifying where an object should be created. In the code, where through a specific syntax the interpreter is told the location, in a configuration file which is valid for all objects of that type and the default value which will normally be the local server. The ordering is cascading, so specifying an object in the code has the highest precedence over the configuration file and the default. This is done to enable greater flexibility and with maintainability in mind. The ideal case would be to have all objects specified in the configuration file but this might not always be possible. Sometimes it might also be required to have different objects of the same type reside on different hosts. In the following the object location specifiers are described in more detail: • In the code 25
CHAPTER 4. DESIGN OF SYSTEM This is the least dynamic way of specifying the server. Here the object location is appended to the “new” construct by adding an “at” symbol followed by the server. This will only create the prepending object on the server. This is a good choice if it is exactly known on what server the object should reside on and this is not likely to change. Once the source code is compiled this cannot be modified anymore. Code 5 Object location in Code 1
a = new Int() @ someserver.org
Future features might include the possibility to specify multiple servers than can be run as backup mirrors. See discussion on this in section 8.1 on page 57. • In the server configuration file The object server has a “ObjLocFile” file which specifies the object locations globally for all VMs running in that server. The syntax is very similar to the definition in the code. This is very useful as changing one line will influence all objects, especially when deploying the program written. Code 6 ObjLocFile 1
Int@localhost
2
[email protected]
The first value is the type of object for which this should be valid followed by an at and a server name. More than one object type can be assigned to different servers, as can be seen in example 6 where the object type Int is assigned to localhost and String to someserver.net. Future features might include the possibility to specify the object locations with the scope of a class. • Default If none of the above rules are specified the object server will assume locally “localhost” for the specified object in its class-path. This can be seen as the default behaviour. This is quite useful as user-classes will be located in the class-path and by this the user does not have to specify his own computer. Further by having no path hard-coded but relevant to the running server deployment of a bunch of classes can be achieved very easily, even if distributing the objects would only involve adding one line to the ObjLocFile. 26
4.6. OBJECTSERVER In the implementation phase this will be managed by one central object location module that will read and parse the ObjLocFile and create objects in appropriate places.
4.6 ObjectServer The object server is the main runtime environment. A very high level view can be found in Fig. 4.4. The object server waits on a specific port for connection requests. There are two main requests it can receive: “CREATE” which creates a new instance of the requested object and returns the URL to the VM or “CONNECT” in which case the object server connects the socket to the existing VM for that object.
Figure 4.4: A high level diagram of the server
4.7 Security As the server handles all the requests security is a major issue. It is important that every process is decoupled from all other processes as otherwise it could be possible that one process could read the data of other users or even corrupt it. Therefore the object server should only handle a minimal amount of communication, as this could be exploited. There is no port mapping involved, like in RPC, so there has to be a server listening to a specific port. The solution is to have a separate VM for every connection and have one Object reside in this VM. If this object wants to connect to another object on this server it has to connect through the 27
CHAPTER 4. DESIGN OF SYSTEM main server port like all other objects in the cloud. So as soon as there is a connect to the server, a new network thread is started that only handles this one connection and implements the main functions, noticeably:
• CREATE This creates a new object in its own Virtual machine which has a connection handler.
• CONNECT This creates a new VM instance but points it to an already existing object that was created with CREATE. Based on this mechanism it should be near to impossible to access any other object that resides on the server.
4.8 Object Communication Note: All work was done on TCP/IP. UPD was not examined as reliability and ordering are vital. In the future, UDP connections could be incorporated for streaming binary data from objects. A first attempt was made to extend SOAP to incorporate the object management features needed to implement the language, like Object instantiation and memory management. After extensive testing, this approach resulted in too much parsing overhead. The time required to generate a message that would be understood by the other side dynamically did not match the performance criteria needed. Further parsing a message was very tedious and required extensive amount of memory and CPU cycles. To be able to parse a simple SOAP method (see Appendix I on page 82) invocation request, it requires 2334 system calls and about 0.133 seconds to execute. This already involves heavy optimizations including pre-caching and not trying to read and understand the whole message. Extending this would have increased these problems and would have resulted in a slow and bloated system. More research is needed in this area but initial testing shows that SOAP does not perform adequately for the requirements of distributing objects throughout a big network, in a time critical environment, which program execution is. 28
4.8. OBJECT COMMUNICATION
Figure 4.5: Comparison between SOAP and objic protocol
Figure 4.6: Parse time comparison Instead the decision was made to design a new protocol (see Appendix N on page 99). Initial benchmarking showed that parsing HTTP (Hypertext Transfer Protocol) is very quick and more memory efficient. HTTP is widely used throughout the web for serving websites and has shown to be very reliable. This led to the extension of HTTP for object communication. As the syntax is very simple, linear parsing time can be achieved with near to no overhead, unlike SOAP. To further increase speed and memory utilization HTTP persistent connection was used which was introduced in HTTP/1.1 and formalises a keep-alive mechanism. As an object will normally communicate with a set of other objects by calling methods, lag can be reduced in not having to connect to these objects repeatedly. This also enabled the design to incorporate a “ConnectionObject” in the objic VM which keeps a session to the other object alive. To explain the extension a discussion of the CONNECT request will take place. To be able to communicate with an object the VM first has to connect to the object. This will 29
CHAPTER 4. DESIGN OF SYSTEM reside somewhere specified through the URL, which is in the form of 1
Host/hash
2
=>
3
Bigi.home/osdfo7w3r46yoewhfdjpf9384y6rfh
The first part incorporates the host on which the object server resides, that holds the object. The “/” notation is borrowed from HTTP. After the slash a unique hash to that server is specified. This is a globally unique pointer to this specific object. Sending a connect request to a server will map the initialised connection to the requested object or fail with the appropriate HTTP error code. If for example no object can be found, specified through the hash a “404 Not Found ” Error code will be returned. Through this approach it is very easy to write object server clients. As all the commands and data are in clear text it is possible to connect to the server via telnet or similar and invoke methods. Further existing HTTP libraries can be extended to be used with objic. There is a current effort in completing and standardising this format.
4.9 Object State As thread safety is one big issue in distribution, the decision was made to have all base objects as immutable objects. This means that once initialised the internal state cannot be changed. It is possible for an object to return a pointer to a new object and change the referring pointer if needed; this is done with the NEWPTR request. The Int object for example, if the “add” method is invoked will return a new pointer to an object containing the new correct value. Further optimizations can be made as the value always stays the same it makes no difference if there is only one instance of the value or numerous. The Int object with the value “1” will only have to exist once in the system as it will always contain one.
4.10 Garbage Collection Garbage collection is very hard to perform distributed [Plainfossé and Shapiro, 1995] and there is only few literature on this topic. For this project a counting based technique was chosen. The VM holds a reference to all the connections currently associated with all objects. The counter is updated dynamically based on an object requesting the connection to be closed or connection failure. If the connection count equals zero the object is moved to a temporary pool called the “old-pool”. Here all objects that have no current references are collected. If a connection to this object is requested it can be reborn back into the “active-pool”. When moving to the old-pool a timestamp is associated with this object, which can then be used to delete the object after a certain time. This reduces search times for active objects as the 30
4.11. KNOWN PROBLEMS “active-pool” is searched first for a hash. Further unused objects can be saved in the swap space of the server, so they do not block up memory. This approach also has some drawbacks, as circular pointing garbage will not be collected. Objects might also live longer, as they need to timeout.
4.11 Known Problems In this section problems that were noticed while designing the system are listed. Some attempts have been made in solving them but based on the tight time schedule they could not be implemented.
4.11.1
Object Version Problem
As the object behaviour can change on the server there must be some way of maintaining a version of object for which the program was written. This is not an issue if the interface and the return values do not change, so refactoring of the code has taken place, but if the external view of the object changes it can be assumed that some programs will fail, as they depend on some certain conditions to hold. Solving this is not as easy as it might seem. The first major issue is how to handle this case should there only be a warning displayed or should it be possible for the server to invoke old, maybe error prone objects. The general approach should be that programs will always continue running and should not be affected by updates to the object.
4.11.2
Inheritance
Inheritance is one of the most popular concepts of object orientation. The general understanding is that a derived object inherits functionality and data from a base class. It is not clear if there will be enough time to implement this functionality as it will have to be distributed meaning that an object can inherit functionality from an object residing on a server somewhere else.
4.11.3
Local / Public variables, methods, classes
Locality of data, classes and methods is something every language defines differently and there are many ways of thought. Whereas Java has clear rules with public and private, Python does not implement locality. Especially in a distributed environment this is very important to have clear guidance, as publishing data to be global could be a security risk. Further, companies might be worried that personal data could be leaked out. In objic every object and method is 31
CHAPTER 4. DESIGN OF SYSTEM public, security is gained through the hashed name. Variables are always local to the object and can never be accessed from outside the object’s scope. This enables the state of the object to always be in a consistent and verified state, although this requires more “getters” and “setters” that might have a slight performance hit.
32
“There is no programming language, no matter how structured, that will prevent programmers from making bad programs. ” Larry Flon
5 Implementation
In this chapter the three main parts implemented and the major features of the language will be shown.
5.1 Discussion of Programs 5.1.1
objectServer
ObjectServer is the program that will instantiate all the classes, meaning that it will load an objic byte-code and execute it. It can also act as a client to other servers that “live” in the cloud. This enables the object instantiated to communicate with other objects on other servers and as such create a distributed environment, on the object level. It further provides means for memory management and bookkeeping of object states. Also the definition of the base types, like String and Int are built in, so by default they are the same on all servers, which might change in future versions.
5.1.2
objicc
Objicc is the objic compiler, which simply means that it takes in source-code and generates byte-code. It also checks if the code is semantically correct and does some simple optimizations. If an error in the code is found it will try to generate a useful error message including line number and reason the compilation stopped. This is done using many functions from the Antlr libraries and Pythons cPickle library. It is possible to run the compiler as a pre-compiler for the object server and thus making it a source code interpreter, which would make runtime errors more understandable and dynamic changes to the code during runtime possible. This was done in the first iterations but was changed in favour of execution speed after seeing that one translation into an AST took about 0.5 seconds for a small file which is not acceptable on repeat object instantiations. 33
CHAPTER 5. IMPLEMENTATION
5.1.3
orun
It is important to understand that everything is executed in the object server environment. No instance of the interpreter is initialised outside of a server. While being possible for debugging purposes this should never be done. This behaviour can be demonstrated by the orun.py script which is used to run the binary files. This script performs three main tasks • Create object of requested type It connects to the object server. This could be somewhere in the cloud, through this it is possible to use a main program that is somewhere distributed and still retrieve the output. • Connects to the created object To be able to communicate with the object a connection has to be established. • Call the main method with parameters In principal every object has a main execution method which has to be called. It is possible to pass parameters and no restrictions are made on the naming of this method. Further this method can return and output to the user’s shell. This demonstrates that the script running on the client can consist of four lines of Python, which is intentional. The main reason being that this can be implemented in any language so different output devices can be easily created. For testing purposes a little Java Script application was created which demonstrated that the object could output to a browser window. Further all the processing can be offloaded onto an object server running in the cloud or can be done on a local server. By this a very dynamic execution environment is created.
5.2 Object Server Class Structure The class structure of the object server is quite simple (see Fig. 5.1 on page 36). The initialize class starts a server “start_server” listing on a specific port. In this development cycle this is port 8080 but can be dynamically changed, a further discussion on this is needed in the future, if port 80 would be valid [Somogyi and Schneier, 2001]. Every time a connection is received a new “RequestHandler” thread is used. This is a “proper” operating system thread and this will have its own stack and therefore enhance security, creation might take a little longer but as a state full connection model is used this thread will be running for an appropriate amount of time. When setting up the request handler the “setup” method is called which does general setup for the connection and error checking. When this is completed, the “handle” method 34
5.2. OBJECT SERVER CLASS STRUCTURE is invoked. This starts an endless loop waiting for data and then processing it accordingly. This could be the “CREATE” keyword or similar which would then create a new instance of a VM by creating a new object oVm. The constructor of the oVm takes in the type of object to load into itself and, if provided, parameters (see section 5.6 on page 43). The VM then tries to load the object into its memory. This can be done in two ways. If the object is a base class like Int or String an instance of this class is created. As the classes are written in Python no interpretation of byte-code has to take place. The other possibility is that the class requested is a user class, meaning that the byte-code has to be loaded and interpreted. This is done through the Interpreter class. When creating an interpreter object it is pointed at a binary class file which it will start parsing. Once the object server and VM have finished loading everything up the constructor of the class is called. This setups the “ME” variable which points to the class itself, adds the methods to the method lookup table and initialises the object stack. Every Interpreter instance has an object manager class which takes care of creating new objects. All the data of the running object is saved in the instance of the oClass, this is what makes the difference between a class and an object. Through serialising the data in the oClass instance it is possible to create an exact replica of the current state of the object somewhere else.
35
CHAPTER 5. IMPLEMENTATION
Figure 5.1: the simplified class structure of the object server
5.3 Class descriptions 5.3.1
Initializer
The initializer class (see Fig. 5.1), on creation, generates a threaded TCP Server that listens on port 8080. When the method start_server is invoked the server starts an infinite while loop that creates as many handle threads as requests to this port are made. This is programmed by using the Python socket, SocketServer and threading modules. Such a server has to run on every system in the cloud to enable distribution. Further the class provides a “getServer” method that is meant for debugging purposes. 36
5.3. CLASS DESCRIPTIONS
5.3.2
Interpreter
The interpreter class is the class that is responsible for parsing the byte-code and executing meaningful functions based on this. There are many helper methods but the main one is the “parseBlock” method. It is invoked with the root node of a branch of the AST and will execute it. Every time the “parseBlock” method is called the stack frame is raised, so a block in an “if” for example has its own stack frame that will be popped once the execution is over. Through calling the method recursively nesting of statements is enabled by design.
5.3.3
MethObj
When creating a MethObj an object hash, a method and parameters are specified. MethObj is a wrapper object for method calls. When the MethObj is used the value is automatically retrieved. This is to emulate the behaviour of method objects in Python which enables a lisp like programming. Further values can be associated dynamically which reduces network load and enables flexibility. It inherits some functionality from the ObjConnection class.
Figure 5.2: The class diagram for the methobj class
5.3.4
ObjConnection
The ObjConnection class handles all the low level networking for connection to an object. When created an object hash is passed in to which the object associates itself. Then messages can be sent (“sendMsg”) and answers retrieved. This is done using the Python socket module. Further it provides a wrapper for calling methods. 37
CHAPTER 5. IMPLEMENTATION
Figure 5.3: The class diagram for the ObjConnection class
5.3.5
ObjManager
The ObjManager class has two main functions, creating an object on a server and connection to it. When initialised it parses the “ObjLocFile” which is described in code 6 on page 26 and then knows where to create objects, if not specified somehow else. Through providing wrappers for these complicated tasks it simplifies the management of objects. It further returns ObjConnection objects so a further layer of abstraction is added.
Figure 5.4: The class diagram for the ObjManager class
5.3.6
oClass
The oClass holds the data which makes a class an object. This includes all the variables and pointers to the method an objects posses. For the variables a stack based approach was chosen. This means that every time when a new block is parsed the stack level is raised, when the block is finished with execution all the variables “drop off” the stack and the objects pointed to can be garbage collected. It also offers methods to cleanly remove all variables on the stack meaning that it will close all the connections properly. This class can be serialised to transfer the object state from machine to machine. This feature is accounted for, but not implemented. 38
5.3. CLASS DESCRIPTIONS
Figure 5.5: The class diagram for the oClass class
5.3.7
oObject
oObject is the base class for all VM objects. Through this similar behaviour can be collected in one place and VM internal type checking is possible which again enhances error checking. This is a common method in many languages like Java and Python. Further it defines methods that all objects that want to be objects have to overwrite like the “frep” method which stands for representation and returns a nicely formatted string of the object. It further has its own instance of an object manager that all extending object inherit so creating new objects is possible in all objects. As all objects exposed to the language are immutable calling min on an Int will create a new object and return a pointer to this (see Appendix K.0.6 on page 89).
Figure 5.6: The class diagram for the oObject class 39
CHAPTER 5. IMPLEMENTATION
oObject
oInt
oString
oClass
Figure 5.7: The oObject base class
5.3.8
oInt
This is the internal representation of an integer number. This is a base type meaning that it is implemented in Python and not in objic. It is possible to pass in parameters at creation which should look like an integer (see section 5.7 on page 43) which will then be set as the value. In the VM it is defined as a standard that all functions that are callable from the language begin with a “f” and then the actual name, this means that “fmin” for example will be called min in objic. All objects extend from oObject and such inherit its behaviour and methods.
Figure 5.8: The class diagram for the oInt class
5.3.9
oString
oString implements a String in objic. There are no limitations on length or complexity. Like oInt it extends from oObject and inherits most of its behaviour. 40
5.3. CLASS DESCRIPTIONS
Figure 5.9: The class diagram for the oString class
5.3.10
oVm
As discussed above this is the actual virtual machine that encapsulates the object and defines the interface to the object server. At creation an object type has to be specified which the VM will then instantiate. If it is a base type it will create a new o{String,Int} object and if it is a user class, it will start an byte-code-interpreter. Further a unique name is generated at creation which consists out of numbers, lowercase and uppercase letters. No other symbols are allowed in names as they may conflict with underlying operating systems. “\” for example has a special meaning in file paths.
Figure 5.10: The class diagram for the oVm class
5.3.11
RequestHandler
The request handler is the thread that is created when a new request is received by the object server. It is an operating system level thread and deals with the server side networking issues. It mediates between the VM and the networking layer and decides what type of connection it will become. This is defined by the request sent in the first package. If this is “CREATE” it will assist in creating a new object, if this is “CONNECT” it will keep the connection alive and delegate all responsibilities to the VM. It further takes care that the connection is closed gracefully if so requested. This can be seen as the protocol implementation of the server. 41
CHAPTER 5. IMPLEMENTATION
Figure 5.11: The class diagram for the RequestHandler class
5.4 Logging Through the whole application there is a logging facility. This is done through the log class, which offers an array of different methods to log events. In the header of every class the class name is told to the logging facility. Further a global loglevel is defined in the globalConf file. This is best visualized by a source code example: 1
log.debug("This is a debug message")
2
log.error("This is an error message")
When the loglevel is set to DEBUG this will be generated 1
DEBUG:LogTestClass:This is a debug message
2
ERROR:LogTestClass:This is an error message
This is because the log levels build upon each other. The output consists of three main parts, the first is the error level, the second is the class, which is very useful when debugging big projects and the last is the message supplied. The main log levels are: FATAL, CRITICAL, ERROR, WARN, INFO and DEBUG in ordering of severity. Through this approach it was possible to create different kinds of output depending on the loglevel. While developing DEBUG was enabled whereas in production only ERROR or worse are output.
5.5 Error Management In program execution errors or faults are bound to happen on the user side. It should be the aim of every interpreter / compiler to handle these faults as gracefully as possible. In objic there are 2 layers that handle faults. In the first the compiler checks for syntactical errors, whereas in runtime the Interpreter handles errors. When an error is found the program execution will stop and the program will output a message what has happened (see section 5.4). It is always the aim to specify as much information as possible. 42
5.6. VOID
5.6 VOID The “VOID” keyword is quite important. As in networking sending nothing over a line will keep the other side waiting for data and such lock, some standard term had to be found to indicate that there was no data to be sent. The VOID keyword was chosen to represent nothing. This can be seen when a method is called without parameters “print()”, for example will be translated into “print(VOID)” to indicate that nothing was specified in the brackets. This is a standard throughout the project, a method always returns something if nothing is specified in the code it will be the keyword VOID.
5.7 Duck-Typing In duck typing the type of the object is determined by its properties and is best described by the quote “If it walks like a duck and quacks like a duck, I would call it a duck.” This means that instead of defining the type of an object right from the beginning, like Java does it, the type stays unknown until it is needed. Through this the code can be written in a far more dynamic way and especially in distributed system the type of the object might not be known. However it also has disadvantages of which type checking at compile time is one of the biggest. Unfortunately because of the distributed nature of the objic language duck-typing is the only way the code could be implemented as it can never be guaranteed that an Int object on a remote server has the same properties as the local implementation.
5.8 Abstract Syntax Tree The compiler generates an abstract syntax tree which is serialised in a byte-code form and can then be loaded into the interpreter when needed. By this approach valuable data about the structure of the program is preserved which enables the interpreter to understand more about the programming of the modules. Simple optimizations are made on the AST at compile time but nothing that would destroy the integrity of the tree (see Appendix K.0.5 on page 89 and Appendix H on page 81). All optimization can be done at interpretation time. The AST is saved in a binary form that can be loaded very quickly and is endian independent so multi architecture run environments are possible and file size is smaller which enables fast transfer rates if the class code needs to be distributed. A simple hello world program as in code 7 on the next page will be translated into the AST displayed in Fig. 5.12 on the following page 43
CHAPTER 5. IMPLEMENTATION Code 7 A simple “Hello World” program 1
class SimpleHello{
2
print("Hello World")
3
}
1
(CLASS SimpleHello (BLOCK (PRINTSTM (OSTRING "Hello World"))))
CLASS
SimpleHello
BLOCK
PRINTSTM
OSTRING
Hello World
Figure 5.12: Representation of a simple AST
5.9 Keywords Because of the restrictions of the parsing method chosen it is not possible to name methods like keywords. This can be seen for example that the syntax in code 8 on the next page is not valid. As print is a built-in command this does not work. When parsing this syntax the print is recognized and it is assumed that a parameter string will follow. Instead a block is seen and this produces error-nodes that cannot be processed. Following tokens are keywords and cannot be used as variable, method or class names “class, >=, ==, new, >, ;, return, =, print, @, for, ., ), }, else, break, {, def, <=, ! =, <, if, (, while”. 44
5.10. IMPLEMENTATION PROBLEMS Code 8 Invalid method declaration 1
def print{
2 3
print ("Hello world") }
5.10 Implementation Problems • Language choice By choosing Python as a language to implement the runtime environment some restrictions propagated into objic. As Python is garbage collected, the memory footprint of the server could not be optimized as wanted. Further runtime is affected as objic can never run quicker as the hosting language. This can be clearly seen when trying to calculate a big Fibonacci number. Python cannot do this, as it runs out of stack frames, because of this objic is also unable to calculate this number. • Python networking support Some problems were encountered when using the Python socket library that offers a higher level of abstraction of the hardware. While this enables portability it also creates some problems, as it is not possible to configure the networking socket in the exact way as needed. This can be seen that when the server crashed that the port is still reserved until it is freed by the operating system. Some improvement in this area is seen when Python 3 is used. • Using AST An AST can be quite big in size and contain many blocks. As the implementation parses blocks this has to be held in memory. Further many iterations over the tree are needed of which some could be done at compile time, if a register based byte-code would have been chosen. • pylint Some code and comments were added to conform with the rating system that pylint implements. In most cases this is not a problem but in some special cases highly complex code had to be expanded into more statements as otherwise warnings were raised. This also includes some lines of documentation code that had to be added, while not being a problem these would have normally not been added. 45
CHAPTER 5. IMPLEMENTATION
5.11 Finished Artifact 5.11.1
Documentation
Most of the documentation is included in the programs by calling the executable with the “-h” parameter, this will display a help message and exit the program. Further parameters are • “-v” This prints out the version number of the executable and exits. This is very useful information to include in a bug report as the exact state of the SVN tree can be reproduced. • “-d” This enables the debug mode. This means that all messages that are outputted through the log facility and flagged as debug messages will be printed. This is helpful for development and testing Further documentation is automatically generated out of the comments in the code and published online. This is interesting as through this programmers can gain a deeper understanding of the inner workings of the compiler and avoid errors.
5.11.2
Installation
There is a complete package called objic that can be downloaded from the project website and installed (see http://objic.ribalba.de). The setup is quite simple and only requires Python and the Antlr libraries to be installed. Further a public SVN development snapshot can be downloaded for people that want to contribute to the project. Because of the *NIX development no graphical installer is available but a general familiarity with Python and the *NIX operating system are enough to have a fully functional environment set up in a few steps (see Appendix E on page 77 for further information).
46
“A class is a lot like an iceberg: 7/8 is under water, and you can see only the 1/8 that’s above the surface” Steve McConnell
6 Testing
Especially in an iterative approach testing is vital [Myers, 1979], as further development relies on the correctness of the code the changes built on [Hunt and Thomas, 1999]. Because of the dynamic properties of Python a lot of testing can be done while coding.
6.1 Strategy There were two major testing techniques used to measure software quality in this project. Firstly static and dynamic code analysis was used to check for programming errors, duplicate code and further code smells, which can be called white-box testing.
The tools
used were also adapted to check for compliance with the Python coding convention (see http://www.python.org/dev/peps/pep-0008/). This enables the code to be put into the public domain and be understood easily, by other programmers. Secondly, every module has a specific test section, in which unit tests are performed (black-box testing), this assumes that the programs are syntactically correct. Furthermore it checks the different methods through their return values with specific inputs. This section might be removed in production code as it will slow down execution and programming related error messages might confuse the user, as the author experienced when giving the program to a friend and an assert failed.
6.2 Plan As an iterative approach was used, running the test cases was the last stage of every iteration. While the code analysis was performed throughout the whole development life cycle, test cases were written and executed at the end. The code analysis was automatically performed on every file save. The eclipse editor was modified to start pylint (see section 1.3.2 on page 4) as soon as the source code changed. This enabled rapid error checking and ensured that the coding convention was followed under all circumstances. Before every SVN commit the test 47
CHAPTER 6. TESTING cases were run and checked that none of them had failed. Through this approach only tested, convention conforming and documented code was submitted to the main tree and could be replicated and reused [McConnell, 1993].
Figure 6.1: Picture of the development environment Further keywords were defined to identify notes in the code. These are intended for the programmer, following tokens are recognized by the editor: “TODO”, “FIXME” and “XXX” and will be highlighted in the code and a list at the bottom of the editor window indicates what has to be completed, see Fig. 6.1 “2”. For completeness “3” is the source code editor window and “1” is an interactive Python shell that is used to test small segments of code, by copy pasting them into this window and then executing only this fragment. This does not take regression testing into account but this was considered not as important on a project this size [Brooks, 1995].
6.3 Code Analysis Code analysis is the process of automatically checking the code for program correctness [Beizer, 1990]. This can be done by an array of tools and techniques, but the two main areas are dynamic and static analysis. In the first the code is executed whereas in the second only the uninterpreted / compiled code is looked at, a process called linting [Spolsky, 2004]. 48
6.3. CODE ANALYSIS Two programs were used to implement the code analysis: • pylint As described in section 1.3.2 on page 4 pylint is a static code analysis tool. It was mainly used to check that the code conformed with the Python coding convention and did not contain any code smells, like duplicate code. This could be done on an automated basis and was done on every file save. A further feature used was the code rating facility. This takes errors or warnings found and calculates a number based on 10.0 − (((5 ∗ error) + warning + ref actor + convention)/statement) ∗ 10) to represent a “mark” for the code quality. All code written had to achieve a mark over 9.5 out of 10, otherwise refactoring was done and the errors corrected. The remaining 0.5% are slack for some specific cases in which making the code comply would create more confusion than using KISS. Whether using lines of code as a metric is up for discussion [Fenton and Pfleeger, 1998]. Code 9 The pylint results for the project files globalConf.py
10.00/10
Interpreter.py
9.87/10
MethObject.py
10.00/10
ObjConnection.py
10.00/10
objicc.py
10.00/10
ObjManager.py
9.62/10
ObjServer.py
9.57/10
oClass.py
9.78/10
oInt.py
10.00/10
oObject.py
9.67/10
orun.py
10.00/10
oString.py
10.00/10
oVm.py
10.00/10
• pychecker Pychecker works differently to pylint in the respect that it executes the code but it tries to find similar problems. This has some advantages like that it can understand more dynamic programs but it also creates the problem that some modules cannot be tested as they are not meant for execution, instead they provide helper methods for other functions. This was only used when committing the code to the SVN tree, to ensure 49
CHAPTER 6. TESTING that pylint hadn’t missed anything. Another problem is that pychecker follows all the library imports which is a problem as the Antlr Python libraries have to be set up in an specific way and would produce many errors that are not important.
6.4 Code Coverage The pydev code coverage tool (see http://pydev.sourceforge.net/codecoverage.html) was used while developing to see if the test cases executed every line of code. This was not always possible as some error conditions could not be simulated, especially when it came to network errors. Therefore, a dynamic approach was chosen in which the tool was executed manually. The execution generated a report as can be seen in code 10 Code 10 A code coverage test result 1
Name
2
------------------------------------------------------------------------------
3 4 5
Stmts
oInt.py
68
Exec 46
Cover 67.6%
Missing 71-80,92-93,105-106,118-119,132-133,140
-----------------------------------------------------------------------------TOTAL
68
46
67.6%
At first a coverage of 67% does not seem high but it should be noted that a large amount of these lines are the “if” conditions that are actually testing the code and don’t error. Another counter, that is not taken into account, are the relative trivial checks for the internal state of the object which, if not correct, would have shown up on running the object and as such should have been tested a second time, but because of the limited time this was not done. By manually analyzing the reports coding errors could be spotted quite efficiently as it could be seen if the program execution path was as expected [Kernighan and Pike, 1999].
6.5 Recording Faults As trac (see section 1.3.2 on page 3) was used as an online development tool, faults could be recorded in an online form (see Fig. 6.2 on the facing page). This required filling out a summary, detailed description, priority, component and type of the ticket to raise. Through this approach when an error was found it was recorded and not forgotten. At the end of every coding iteration it was checked that all known errors were fixed. Further by colour coding high priority faults were fixed as soon as possible. This also enabled other people, who were using the code to record errors they had found.
50
6.5. RECORDING FAULTS
Figure 6.2: Trac showing the tickets on Tue 05 May 2009
51
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” Antoine de Saint Exupery
7
Critical Evaluation / The objic Language
This chapter will attempt to evaluate all of the major steps performed for this project and highlight what could have been done better, in the opinion of the author and what was done satisfactorily.
7.1 Evaluation of Language As one of the aims was to make the language very similar to Java, Python and C three different syntaxes and ideologies were mixed. While this being an advantage in the respect that the best features could be picked from each this also had some drawbacks. This can be very clearly seen in the method object. Because Python is very dynamic, method objects are a useful addition but can be confusing, mixing this already confusing principle with a Java like class syntax and C like constructs creates very hard to read syntax. This will require a major change in the objic syntax and some clear guidelines have to be created. During the design stage the idea was to give the programmer as much freedom as possible, because of this no clear rules on naming conventions were established. This means that every time a value of an object is needed the corresponding “value()” method has to be invoked or a method object has to be created. This makes the language very verbose and seems “bloated”. After writing many objic programs the realisation is, that defining a standard method call name for the value of an object would have benefited the readability and the clarity of the language. This can also be applied for representation “rep()”, used in print. Objic is a Turing complete language that can compete with existing solutions. One of the objectives was to create a language that can be understood by Java, C(++) and Python programmers. For this two things have to hold: The syntax has to be familiar and the behaviour of the language the programmer expects has to be similar. Runtime and memory footprint are not analysed as this was not a primary concern and no real optimization has taken place to enable this. Further the great advantage of distribution is not discussed as in 52
7.1. EVALUATION OF LANGUAGE the ideal case the programmer is not noticing that he is writing a distributed application.
7.1.1
Language Syntax
To be able to compare the languages a syntactical discussion of the influential languages would be needed. Since there is not enough space to discuss all three, only Python will be compared, being the language all the implementation has been done in. This does not mean there has been no influence from C and Java. The “for” for loop syntax, for example is a direct copy from C. In Python code does not have to be in the scope of a class as it is a scripting language extended to implement classes and methods whereas objic was designed to be object orientated from the start, based on the need to be able to distribute objects. One of the biggest differences is that Python uses indentation to specify blocks, while forcing the programmer to indent his code and so create code that is more readable, this has been a major point of criticism. In objic the decision was made to use “{” and “}” as block delimiters like used in C based languages. Another noticeable difference is the “main” method, this is copied from Java and is a general advice to name the first method to be called in an object. However this is not enforced by any part of the language. In objic it is further not possible to let the compiler decide which data type a literal should be, this is different to all the languages discussed. In Python, if the interpreter sees a statement like in line two in code example 11 it will create a new object of type String and point the variable to it. This behaviour is built into the compiler or interpreter. In objic this was not possible as it is not known where these objects will reside and if they exist at all. So the “new” keyword has to be used in this context, simulating a similar behaviour to Java but without the built-in types like “int” [Flanagan, 2005]. A similarity to Python and C++ [Meyers, 2005] is the way that a method can be referenced like an object through a method object or function pointer in C++, this is not implemented in Java. Code 11 Python example 1 2
class HelloWorld: text = "Hello World"
3 4 5
def printHello: print str(text)
6 7
objClass = HelloWorld()
8
objClass.printHello()
From the examples seen in code 11 and code 12 on the next page it can be seen that the 53
CHAPTER 7. CRITICAL EVALUATION / THE OBJIC LANGUAGE Code 12 Objic example 1
class HelloWorld {
2
text = new String("Hello World")
3 4
def printHello {
5
print(text)
6
}
7 8
def main {
9
ME.printHello()
10 11
} }
syntaxes are very similar and it should be simple for a Python, C or Java programmer to understand and learn objic fairly easily.
7.1.2
Language Behaviour
In the objic syntax there is no functionality for the characters “+”, “-”, “*”, etc. . . . As it is a purely object orientated language this functionality was not implemented. This is a major difference to its reference languages but was a conscious decision as it cannot be guaranteed that the object that the operation is performed on will understand it. Instead there is the guideline to call addition methods “add” and subtraction “min” as can be seen with the Int and the String objects. In further versions this might change as over-riding is implemented (see section 8.1 on page 57). Otherwise it can be assumed that the language will behave very similar to its reference languages. The main question if it is possible to write a usable, familiar and distributed cloud programming language while hiding the underlying networking, can be answered with a yes. There are still many hurdles to overcome but the general theory has been proven with this paper.
7.2 Research To be able to form the initial idea into a theory a lot of research had to be done. Working on the topic of cloud computing turned out to be very difficult as new papers were published throughout the project with different definitions of the “cloud” which had to be followed by rewrites of specific sections [Weiss, 2007]. In retrospect the development was aggravated by 54
7.3. DEVELOPMENT METHODOLOGY choosing a cutting edge research topic that was not properly defined when the project started. Because of this, the area of abstract syntax tree interpreters was not researched to the full extent and some pitfalls in the implementation phase could have been avoided had more time been spent on this. This is also linked to the realisation that a “normal” compiler would not be able to achieve the flexibility needed, which was not clear from the beginning, but accounted for.
7.3 Development Methodology Choosing an iterative approach seemed the most logical thing to do, based on the clearly defined iterative steps (see section 1.6 on page 7). However something that was not anticipated is the amount of refactoring and rewriting needed to achieve good code quality. Analysing the SVN commits about 37 percent of every iteration was replacing already existing code lines. While some of this is error fixing, this raises the question if a traditional waterfall based approach would have reduced this.
7.4 Implementation A very critical decision for the success of a project is the choice of implementation language. Using Python provided the high-level language features needed for the tight time frame, but also allowed access to the low level operating system methods. This enabled to write quick scalable code that can still be easily maintained. The slight increase in execution time and memory footprint can be neglected as this was never an aim of the project. However, by using the Python socket and TCP server libraries the underlying network could not be exactly configured as needed and thus some stability issues came up that could not be fixed. This can easily be fixed by writing this part of the system in a lower level language.
7.5 Testing As discussed, testing was done as a step in the iterative approach. Because of the time frame testing was cut short and only a few properly documented and tested test cases are in the code. As the main aim of the project was to provide a proof of concept, detailed and thorough testing was not considered a very important iteration. This would have to be improved in future versions to achieve an industry grade quality. However tests were implemented to prove that the concept works and that the main theory holds. 55
CHAPTER 7. CRITICAL EVALUATION / THE OBJIC LANGUAGE
7.6 Project Plan The initial Gantt-chart created (see Appendix D on page 76) was mostly followed with the exception of a four day overrun in the implementation phase because there was no clearly defined end mile stone. This was not a problem as 14 days were planned as slack at the end of the project, which was put in place for exactly this reason. The technique of updating the chart with an estimated percentage of stage completion showed to turn out very valuable and through the experience gained in doing this a clearer picture emerged were the project was risking in running late and counter measures subsequently could be taken.
7.7 Personal Performance Based on the experience with other tasks throughout the author’s career it has become clear that design and tools used are vital to the completion of a project. Using source control, a fault database and the appropriate language influence the outcome more than the initial idea. Further it has to be said that it is important to plan for human error, with regular backups, automatic saves and recurring checks. Because a research based project was chosen, there was no clear end point that could be reached, this was realised and accounted for by following the Gantt-chart rigorously. As can be seen in the section on future work (see chapter 8 on the next page) the project can be extended to a PhD scale thesis, as indicated by St Andrews. It would have been fully sufficient to choose only the object communication protocol as topic and write a proof of concept around this. This would have enabled a more in depth discussion on one particular aspect instead of scratching many areas and actually would have provided a fixed end mile stone at which coding could have stopped. More time should have been spent on low-level specifications. Development started as soon as a fairly high-level view of the system was formalized, this was based on the fear of not finishing on time. By doing this some problems were triggered at the implementation phase especially on the protocol level. While the benchmarks with XML, HTTP and others were performed at the beginning and the decision to use a self developed version was incorporated into the design, the protocol should have been specified in all detail too. Further the lack of an appropriate design notation for distributed systems made designing the system as a whole entity very difficult.
56
“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” Brian Kernighan
8 Future Work
There are many more ideas that could extend this project. While some of them are in the near future some others have not even been properly articulated.
8.1 Short Term This is a list of the features that will be included in the next official release of the language. • Fail-over It is important that if a server is not reachable anymore a fail-over system is in place. This means that an object is never solely instantiated in one location. This feature should be easy to enable and should be used in production environments. Also “intelligent” network error recovery should be in place so that the object is always in a consistent state. If an object cannot be reached the interpreter should decide when the object is next used and continue with the execution till this point and then retry, by doing this small network errors can be fixed. To enable all this functionality the networking layer has to be rewritten. • Standardised protocol While the protocol derived for this project is fit for its purpose, a standard has to be found to enable inter project communication. This has already started in a proposal that is currently being written by the author. This will hopefully provide an alternative to SOAP and other existing non complete implementations. • Inheritance / Polymorphism An important cornerstone of object oriented programming is the possibility to inherit from base objects. It has not been totally examined how this can be performed in the 57
CHAPTER 8. FUTURE WORK cloud and further research is needed to clarify the interface between the objects. When this is implemented polymorphism and overloading will naturally be the next step. • Local caching To enable execution speedups it has to be researched if local caching can be performed to reduce network traffic. This can generate some problems like race-conditions and changes not propagating correctly but at the current state the advantages seem to prevail. • Effective memory management The current memory management implementation is intentionally kept very simple and no optimization has been performed. To achieve comparable execution times and memory footprint this has to be improved and a new memory model has to be derived in which the garbage collector is aware of the usage of objects and can communicate with the objects concerned. • Object versioning As already mentioned in section 4.11.1 on page 31 object versioning is a problem in a distributed environment. A solution to this problem has already been found, by which objects can communicate their versions and check for compatibility. At compile time every object checks all the objects it should connect to and saves the version in the byte-code. When connecting to this object in run time the version is requested which incorporates a “backwards compatible” flag which has to be set to be able to connect to the new version. Otherwise a new request is sent with the exact version needed which the server should be able to provide. • Modelling notation There is no notation to model distributed cloud services. This was a problem in the design stage because UML (Unified Modeling Language) does not provide a notation to do this. The requirements for such a language or notation are currently being investigated by the author and Cornelius Ncube.
8.2 Long Term These points are a listing of thoughts that might be researched and implemented in the long term future. • Enable private clouds 58
8.2. LONG TERM Corporate companies might not want to offload all their internal data to some service provider in the cloud. It might be desirable to build up a private cloud behind the corporate firewall. This cloud might want to use some well defined services through the internet. Defining a clear interface to the internet and other clouds should be enabled. • Payment As the language should enable the program to be distributed as a SaaS a payment system would have to be implemented. It should be able to charge registered customers on a method invocation basis. Through this it is possible to “sell” a program in the cloud. There are many issues involved in this and no research has been carried out. • Object authentication Since with distribution security becomes a major issue there has to be some method of allowing certain objects access to other objects in the cloud. In the current implementation this is done by “security through obscurity” [Anderson, 2001] in using a very long hash that is very difficult to guess, but still possible. There has to be a protocol in which an object can allow or grant access to its partner objects. • Service search database To enable the full potential of using distributed services a platform has to be created in which providers can publish their objects and define the interfaces and functionalities. Ideally this could be an automated process in which the programmer only specifies what is needed and an object-host is found by the interpreter with alternatives and backup services. In the short term a website is imaginable that lists all the services offered, whereas a few major providers will be best known and mostly adapted. • Different output run environments A big aim of cloud computing is to enable applications to run in a device independent environment. In objic this is enabled through implementing different “run environments” (see section 5.1.3 on page 34). While some tests have been made with a Java Script implementation a whole array of different clients has to be implemented, to achieve true device independence. • Encryption At the current state it is not possible to encrypt the communication between objects. As more and more personal data moves to the cloud this is a very important feature and the requirement for strong encryption will grow. A certificate based approach seems to be the 59
CHAPTER 8. FUTURE WORK most feasible whereas there are some problems associated with this like how to connect certificates to objects and how to handle changes which have to be evaluated. A lot of research has been done in this area and hopefully objic can build upon this [Schneier, 1995]. • Verification For a distributed environment it is very important to verify that the service it is talking to is really the one wanted. If the service is not verified spoofing becomes easily possible and this would make the security strategy worthless. This goes hand in hand with the point on encryption and should not be too hard to implement as a lot of research has been done and libraries are available [Needham and Schroeder, 1978] • Object / Server migration When running an object server in a production environment is has to be possible to migrate the running objects to another server without losing the associated information and connections. While it is possible to serialise the object and instantiate in another server instance, a “hot move” is not possible. This is a difficult topic as the event when an object receives a request while being moved has to be covered to keep it in a consistent state. It can always be argued how relevant research is, but many lessons have been learnt and the interest from various companies, individuals and universities has shown that the topic needs more discussion. By proving that it is possible to create a distributed cloud programming language, even if not fully complete, has laid the foundation for further research. The project has also shown the great need for defined cloud computing standards to enable provider independent computation and appropriate notations. Based on the particular interest of a few individuals the project will be continued as an open source development effort.
60
“There is no greater mistake than the hasty conclusion that
9
opinions are worthless because they are badly argued” Thomas Henry Huxley
Conclusion
In conclusion it can be said that the project was a success in that it is possible to distribute objects over numerous machines in a cloud like setup by using a Turing complete, Java like object orientated language. By doing this, it has been proven that it is possible to apply distributed objects to a cloud setup and hide the distribution from the programmer. Due to this project the author has also deepened his knowledge in compilers and interpreters and learned about distributed systems. As a high code standard was required Python was adopted very quickly as were the tools supporting its development, which was also an objective. This has helped the author develop his professional portfolio and experience the issues involved with going through all the steps in creating a project with about ten thousand lines of code. Choosing a topic that has not even been properly defined turned out to be a problem as the boundaries of the project kept changing due to new papers being published but this caused the author to gain a deeper knowledge of the domain that otherwise would not have been required. Because of the misconception of the scope and the limited word count for the report, many areas could not be handled in the detail that was anticipated. Never the less a high-level view was gained with many areas of further research opening up and being documented. This enables many new areas of future work to be explored. Based on the feedback and the interest from various sources a discussion is now starting about how this idea can be used to enable “the cloud” to become more programmable. The author feels very proud of, that two papers about this prototype have already been accepted at international conferences (see Appendix J on page 83) and it may become a topic for a PhD. It confirms the author’s view of the relevance of this topic. Therefore, the project objectives seem to be fully met and even extend the expectations hoped for.
61
List of Figures
2.1
A typical network diagram using a cloud . . . . . . . . . . . . . . . . . . . . . . 10
2.2
Translation of a print statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1
The environment in relation to the byte-code . . . . . . . . . . . . . . . . . . . 21
4.2
The antlrWorks editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3
Diagram showing the relationship between object and class stack frames . . . . 24
4.4
A high level diagram of the server . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5
Comparison between SOAP and objic protocol . . . . . . . . . . . . . . . . . . 29
4.6
Parse time comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1
the simplified class structure of the object server . . . . . . . . . . . . . . . . . 36
5.2
The class diagram for the methobj class . . . . . . . . . . . . . . . . . . . . . . 37
5.3
The class diagram for the ObjConnection class . . . . . . . . . . . . . . . . . . 38
5.4
The class diagram for the ObjManager class . . . . . . . . . . . . . . . . . . . . 38
5.5
The class diagram for the oClass class . . . . . . . . . . . . . . . . . . . . . . . 39
5.6
The class diagram for the oObject class . . . . . . . . . . . . . . . . . . . . . . 39
5.7
The oObject base class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.8
The class diagram for the oInt class . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.9
The class diagram for the oString class . . . . . . . . . . . . . . . . . . . . . . . 41
5.10 The class diagram for the oVm class . . . . . . . . . . . . . . . . . . . . . . . . 41 5.11 The class diagram for the RequestHandler class . . . . . . . . . . . . . . . . . . 42 5.12 Representation of a simple AST . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.1
Picture of the development environment . . . . . . . . . . . . . . . . . . . . . . 48
6.2
Trac showing the tickets on Tue 05 May 2009 . . . . . . . . . . . . . . . . . . . 51
K.1 A diagram of the syntax for a new block of code
. . . . . . . . . . . . . . . . . 87
K.2 A diagram of the syntax for a call statement . . . . . . . . . . . . . . . . . . . . 88 K.3 A diagram of the syntax for a new class definition . . . . . . . . . . . . . . . . . 88 K.4 A diagram of the syntax for a new method definition . . . . . . . . . . . . . . . 88 K.5 A diagram of the syntax for a while loop . . . . . . . . . . . . . . . . . . . . . . 89 K.6 A diagram of the syntax for a new for loop . . . . . . . . . . . . . . . . . . . . . 89 K.7 A diagram of the syntax for a new variable declaration . . . . . . . . . . . . . . 90 K.8 A diagram of the syntax for the parameters passed in to methods . . . . . . . . 91 K.9 A diagram of what can be included in a block . . . . . . . . . . . . . . . . . . . 91 K.10 A diagram of the syntax for the NAME token . . . . . . . . . . . . . . . . . . . 92 62
10 List of Abbreviations
SVN Subversion CVS Concurrent Versions System CERN European Organization for Nuclear Research IDE Integrated development environment RSS Really Simple Syndication XML eXtensible Mark-up Language URL Uniform Resource Locator VM Virtual machine BSD Berkeley Software Distribution CORBA Common Object Request Broker Architecture ORB Object Request Broker EBNF Extended Backus-Naur-Form HTTP Hypertext Transfer Protocol KISS Keep it Short and Simple UML Unified Modelling Language AST Abstract syntax tree
63
Bibliography
Harold Abelson. Struktur Und Interpretation Von Computerprogrammen (Springer-Lehrbuch). Springer-Verlag Berlin and Heidelberg GmbH , &, Co. K, 2001. Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison Wesley, 2006. Ross J. Anderson. Security Engineering: A Guide to Building Dependable Distributed Systems (Wiley Computer Publishing). John Wiley , &, Sons, 2001. Andrew W. Appel. Modern Compiler Implementation in Java. Cambridge University Press, 2002. Jim Arlow and Ila Neustadt. UML 2 and the Unified Process: Practical Object-Oriented Analysis and Design (Addison-Wesley Object Technology Series). Addison Wesley, 2005. Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy H. Katz, Andrew Konwinski, Gunho Lee, David A. Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. Above the clouds: A berkeley view of cloud computing. Technical Report UCB/EECS2009-28, EECS Department, University of California, Berkeley, Feb 2009. URL http: //www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html. Haley Beard. Cloud Computing Best Practices for Managing and Measuring Processes for On-demand Computing, Applications and Data Centers in the Cloud with SLAs. Emereo Pty Limited, 2008. Boris Beizer. Software Testing Techniques. Itp - Media, 1990. Kurt Bittner and Ian Spence. Managing Iterative Software Development Projects (AddisonWesley Object Technology). Addison Wesley, 2006. Joshua Bloch. Effective Java (Java Series). Prentice Hall, 2001. Gerd Breitter and Michael Behrendt. Cloud computing concepts. Informatik Spektrum, 31 (6), 2008. Frederick P. Brooks. The Mythical Man Month and Other Essays on Software Engineering. Addison Wesley, 1995. Rajkumar Buyya, Chee Shin Yeo, and Srikumar Venugopal. Market-oriented cloud computing: Vision, hype, and reality for delivering it services as computing utilities. CoRR, abs/0808.3558, 2008. 64
BIBLIOGRAPHY N Carr. The Big Switch: Rewiring the World from Edison to Google. W. W. Norton , &, Co., 2009. Thomas M. Connolly and Carolyn E. Begg. Database Systems: A Practical Approach to Design, Implementation and Management (4th Edition). Addison Wesley, 2004. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. The MIT Press, 2001. Irmen de Jong (and others). Web services/soap and corba. OMG Whitepapers, 2002. doi: http://www.omg.net/news/whitepapers/CORBA_vs_SOAP1.pdf. Kemal A. Delic and Martin Anthony Walker. Emergence of the academic computing clouds. Ubiquity, 9(31):1–1, 2008. doi: http://doi.acm.org/10.1145/1414663.1414664. Jean Dollimore, Tim Kindberg, and George Coulouris. Distributed Systems: Concepts and Design (4th Edition). Addison Wesley, 2005. Wolfgang Emmerich. Engineering Distributed Objects. John Wiley , &, Sons, 2000. Norman E. Fenton and Shari Lawrence Pfleeger. Software Metrics: A Rigorous Approach. PWS, 1998. David Flanagan. Java in a Nutshell (In a Nutshell (O’Reilly)). O’Reilly Media, Inc., 2005. I. Foster, Yong Zhao, I. Raicu, and S. Lu. Cloud computing and grid computing 360-degree compared. Grid Computing Environments Workshop, 2008. GCE ’08, pages 1–10, Nov. 2008. doi: 10.1109/GCE.2008.4738445. Martin Fowler, Kent Beck, John Brant, William Opdyke, and Don Roberts. Refactoring: Improving the Design of Existing Code (Object Technology Series). Addison Wesley, 1999a. Martin Fowler, Kent Beck, John Brant, William Opdyke, and Don Roberts. Refactoring: Improving the Design of Existing Code (Object Technology Series). Addison Wesley, 1999b. Petter Haggholm. Pyremote: Object mobility in the python programming language. OMG Whitepapers, 2007. doi: http://www.cs.ubc.ca/grads/resources/thesis/Nov07/Haggholm_ Petter.pdf. Brian Hayes. Cloud computing. Commun. ACM, 51(7):9–11, 2008. ISSN 0001-0782. doi: http://doi.acm.org/10.1145/1364782.1364786. James Hayes. Clout of the cloud. Engineering and Technology, 2009. 65
BIBLIOGRAPHY Michi Henning. The rise and fall of corba. Commun. ACM, 51(8):52–57, 2008. ISSN 0001-0782. doi: http://doi.acm.org/10.1145/1378704.1378718. Jim Holmes. Object-Oriented Compiler Construction. Pearson US Imports , &, PHIPEs, 1994. Andrew Hunt and David Thomas. The Pragmatic Programmer. Addison Wesley, 1999. Ivar Jacobson. Object-oriented Software Engineering: A Use CASE Approach (ACM Press). Addison Wesley, 1992. Brian W. Kernighan and Rob Pike. The Practice of Programming (Addison-Wesley Professional Computing Series). Addison Wesley, 1999. Wolfgang Kuechlin and Andreas Weber. Einfuehrung in die Informatik. Objektorientiert mit Java (Springer-Lehrbuch). Springer-Verlag Berlin Heidelberg, 2000. Ningning Hu Li, Li erran Li, Zhuoqing Morley Mao, Peter Steenkiste, and Jia Wang. A measurement study of internet bottlenecks. In In Proc. IEEE INFOCOM, pages 1689– 1700. IEEE Press, 2005. Steven C. McConnell. Code Complete: A Practical Handbook of Software Construction. Microsoft Press,U.S., 1993. Ivanka Menken. SaaS - The Complete Cornerstone Guide to Software as a Service Best Practices Concepts, Terms, and Techniques for Successfully Planning, Implementing and Managing SaaS Solutions. Emereo Pty Limited, 2008. Scott Meyers. Effective C++: 55 Specific Ways to Improve Your Programs and Designs (Addison-Wesley Professional Computing Series). Addison Wesley, 2005. Michael Miller. Cloud Computing: Web-Based Applications That Change the Way You Work and Collaborate Online. QUE, 2008. John Paul Mueller. Special Edition Using Soap. QUE, 2001. Glenford J. Myers. The Art of Software Testing (Business Data Processing). John Wiley , &, Sons, 1979. Roger M. Needham and Michael D. Schroeder. Using encryption for authentication in large networks of computers. Commun. ACM, 21(12):993–999, 1978. ISSN 0001-0782. doi: http://doi.acm.org/10.1145/359657.359659. Robert Orfali, Dan Harkey, and Jeri Edwards. The Essential Distributed Objects Survival Guide. John Wiley , &, Sons, 1995. 66
BIBLIOGRAPHY Krzysztof Ostrowski, Ken Birman, Danny Dolev, and Jong Hoon Ahnn. Programming with live distributed objects. In ECOOP ’08: Proceedings of the 22nd European conference on Object-Oriented Programming, pages 463–489, Berlin, Heidelberg, 2008. Springer-Verlag. ISBN 978-3-540-70591-8. doi: http://dx.doi.org/10.1007/978-3-540-70592-5_20. John K. Ousterhout. Scripting: Higher level programming for the 21st century. IEEE Computer, 31:23–30, 1997. D. L. Parnas. A technique for software module specification with examples. Commun. ACM, 15(5):330–336, 1972. ISSN 0001-0782. doi: http://doi.acm.org/10.1145/355602.361309. Terence Parr. The Definitive ANTLR Reference: Building Domain-Specific Languages (Pragmatic Programmers). Pragmatic Bookshelf, 2007. David Parsons. Object Oriented Programming (Computing Programming Textbooks). Thomson Learning, 1997. Dan Pilone and Neil Pitman. UML 2.0 in a Nutshell (In a Nutshell (O’Reilly)). O’Reilly Media, Inc., 2005. David Plainfossé and Marc Shapiro. A survey of distributed garbage collection techniques. In IWMM ’95: Proceedings of the International Workshop on Memory Management, pages 211–249, London, UK, 1995. Springer-Verlag. ISBN 3-540-60368-9. Tim Rowledge. A tour of the squeak object engine. 2001. URL http://stephane. ducasse.free.fr/FreeBooks/CollectiveNBlueBook/oe-tour-sept19.pdf. Bruce Schneier. Applied Cryptography: Protocols, Algorithms and Source Code in C. John Wiley , &, Sons, 1995. Kenn Scribner and Mark Stiver. Understanding SOAP: Simple Object Access Protocol (Sams professional). Sams, 2000. Yunhe Shi, Kevin Casey, M. Anton Ertl, and David Gregg. Virtual machine showdown: Stack versus registers. ACM Trans. Archit. Code Optim., 4(4):1–36, 2008. ISSN 1544-3566. doi: http://doi.acm.org/10.1145/1328195.1328197. James Snell, Doug Tidwell, and Pavel Kulchenko. Programming Web Services with SOAP. O’Reilly Media, Inc., 2001. Stephan Somogyi and Bruce Schneier. Inside risks: The perils of port 80. Commun. ACM, 44 (10):168, 2001. ISSN 0001-0782. doi: http://doi.acm.org/10.1145/383845.383875. 67
BIBLIOGRAPHY Joel Spolsky. Joel on Software: And on Diverse and Occasionally Related Matters That Will Prove of Interest to Software Developers, Designers, and Managers, and to Those ... or Ill-Luck, Work with Them in Some Capacity. APRESS, 2004. Robert Tolksdorf and Kai Knubben. Programming distributed systems with the delegationbased object-oriented language dself. In SAC ’02: Proceedings of the 2002 ACM symposium on Applied computing, pages 927–931, New York, NY, USA, 2002. ACM. ISBN 1-58113445-2. doi: http://doi.acm.org/10.1145/508791.508971. Aaron Weiss. Computing in the clouds. netWorker, 11(4):16–25, 2007. ISSN 1091-3556. doi: http://doi.acm.org/10.1145/1327512.1327513.
68
A Appendix
69
B License
1
Copyright (c) 2009, Hoffmann Geerd-Dietger
2
All rights reserved.
3 4
Redistribution and use in source and binary forms, with or without
5
modification, are permitted provided that the following conditions are met:
6 7 8
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. - Redistributions in binary form must reproduce the above copyright
9
notice, this list of conditions and the following disclaimer in the
10
documentation and/or other materials provided with the distribution.
11
- Neither the name of the copyright owner nor the
12
names of its contributors may be used to endorse or promote products
13
derived from this software without specific prior written permission.
14
- All advertising materials mentioning features or use of this software
15
must display the following acknowledgement:
16
This product includes software developed by Hoffmann Geerd-Dietger
17
and contributors.
18
- The Program and its derivative work will neither be modified or
19
executed to harm any human being nor through inaction permit
20
any human being to be harmed.
21 22
THIS SOFTWARE IS PROVIDED BY Geerd-Dietger Hoffmann ’’AS IS’’ AND ANY
23
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
24
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
25
DISCLAIMED. IN NO EVENT SHALL BE LIABLE FOR ANY
26
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
27
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
28
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
29
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
30
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
31
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
70
C Antlr Syntax
This is a listing of the syntax file that Antlr uses to create the Tokenizer and Parser, it is basically a list of rules describing the way the language, created in this project, should be structured. The syntax is very similar to EBNF (Extended Backus-Naur-Form). 1
grammar Expr;
2 3
//Some options for the generator
4
options {
5
language=Java;
6
output=AST;
7
ASTLabelType=CommonTree;
8
}
9 10
//Tokens for the AST (see bottom for more)
11
tokens {
12
BLOCK;
13
EQ;
14
PARAMS;
15
NEWOBJ;
16
CALL;
17
IFTEST;
18
CLASS;
19
PTRASS;
20
BOOL;
21
WHILE;
22
BREAK;
23
DOWHILE;
24
ELSE;
25
PRINTSTM;
26
OSTRING;
27
SERVERNAME;
71
APPENDIX C. ANTLR SYNTAX 28
METHDEF;
29
DEBUGFLAG;
30 31
RETURNSTM; }
32 33
//Every program has one class
34
prog
: classdef ;
35 36
// A class is defined through the class keyword, name and a block
37
// Ex
38
classdef
: class test { print ("hello") }
39
:
40
;
’class’ NAME
block
-> ^(CLASS NAME block )
41 42
//A block is alwasy encapsulated between { and } and has n statements
43
block
44
:
45
;
’{’
(
stat
)*
’}’ -> ^(BLOCK stat*)
46 47
//A list of all the different statements there can be in a block
48
stat
49
:
iftest
50
|
methdef
51
|
forloop
52
|
newvar
53
|
call
54
|
block
55
|
whileloop
56
|
loopbreak
57
|
printstm
58
|
returnstm
59
|
NEWLINE
60
;
61 62
//Handels a return in the code
63
//Ex : return(a)
64
returnstm
65
:
66
;
’return’ ’(’ NAME ’)’ -> ^( RETURNSTM NAME)
67 68
//Defines a new method in a class
69
// Ex : def printHelp { print ("HELP") }
70
methdef
71
:
’def’ NAME block -> ^(METHDEF NAME block)
72
72
;
73 74
//A soimple mehtod call
75
// Ex : object.add(otherobject)
76
call
77
:
78
;
a=NAME ’.’ b=NAME paramlist -> ^(CALL $a $b paramlist)
79 80
//The print statement
81
//Ex : print("This is printed")
82
printstm
83
:
84
;
’print’ ’(’ printparams ’)’ -> ^(PRINTSTM printparams)
85 86
//A list of things that can be printed
87
printparams
88
:
call
89
|
NAME
90
|
STRINGTPL -> ^(OSTRING STRINGTPL)
91
;
92 93
//The definition of a while loop
94
//Ex : while (a.value() == a.value()){print ("looping")}
95
whileloop
96
:
97
;
’while’ ’(’ boolexp ’)’ a=block -> ^(WHILE boolexp $a)
98 99
//The break statement to stop executing a loop
100
loopbreak
101
:
102
;
’break’ -> ^(BREAK)
103 104
//The if definition
105
//Ex : if (a.value() == b.value()){
106
//
print("equa")
107
//
}else{
108
//
print("not equal")}
109
iftest
:
’if’ ’(’ boolexp ’)’ a=block (’else’ b=block)? ->
110
^(IFTEST boolexp $a ( ELSE $b)?)
111
;
112 113
//Defines a new variable ptr, 3 main different possibilities
114
//Ex : a = new Int()
115
newvar
73
APPENDIX C. ANTLR SYNTAX 116
:
a=NAME ’=’ ’new’ b=NAME paramlist (’@’ servername)? ->
117
^(EQ
$a NEWOBJ (servername)?
118
|
a=NAME ’=’ b=NAME -> ^(EQ
119
|
NAME ’=’ call -> ^(EQ
120
;
$b paramlist)
$a PTRASS $b )
NAME call )
121 122
//Defines a for loop quite clever as it creates the AST for an while loop
123
//they byte-code doesn’t know what a for is
124
forloop
125
:
’for’ ’(’
126
a=newvar ’;’ boolexp ’;’ ^(BLOCK $a ^(WHILE
127
b=newvar ’)’ block ->
boolexp ^(BLOCK block $b)))
;
128 129
//A boolean expression used in while and if
130
boolexp
131
:
132
|
a=booltype cmp_operator b=booltype ->
133
^(BOOL cmp_operator $a $b)
134
;
135 136
//A list of comparement operators that are valid
137
cmp_operator
138
: ’==’
139
| ’!=’
140
| ’<=’
141
| ’>=’
142
| ’<’
143
| ’>’
144
;
145 146
//A boolena type can be a call or a method objcect
147
booltype
148
:
call
149
|
NAME
150
;
151 152
//Defines how parameters should look like
153
paramlist
154
:
155
;
’(’ ( atom
(’,’ atom)* )?
’)’ -> ^(PARAMS atom* )
156 157
//Defines the syntax for a server
158
//Ex : ptrtovoid.net
159
servername
74
160
:
161
;
a=NAME ’.’ b=NAME -> ^(SERVERNAME $a
’.’ $b)
162 163
//The smallest entity
164
atom
165
:
INT
166
|
NAME
167
|
STRINGTPL
168
;
169 170 171
// MORE TOKENS
172 173
INT
:
174
(’+’ | ’-’)? ’0’..’9’+
;
175 176 177
NAME
:
178
(’a’..’z’|’A’..’Z’|’_’|INT)+
;
179 180
STRINGTPL
181
:
182
;
(’"’ (~’"’)* ’"’)
183 184
NEWLINE :
’\r’? ’\n’ ;
185 186 187
COMMENT
188
:
189
;
’\*’ ( options {greedy=false;} : . )* ’*/’ {skip();}
190 191
LINE_COMMENT
192
: ’\\’ ~(’\n’|’\r’)* ’\r’? ’\n’ {skip();};
193 194 195
WS
:
(’ ’|’\t’|’\n’|’\r’)+ {skip();} ;
75
D Gantt Chart
76
E INSTALL
This is the INSTALL file that is provided with the objic distribution. 1
This is the install document for the objic language.
2 3
1) First of all you have to check if you have python > 2.4 installed, this can be best done by running :
4 5
$ python -V
6 7
This has to return something bigger than Python 2.4.0
8 9
Then you have to check for the antlr python libraries.
10 11
$ python -c ’import antlr3’
12 13
If either of these command fail you have to install these packages. Most distributions will supply packages consult man apt-get or man yum for more information. Otherwise
14 15
http://www.python.org/
16
http://www.antlr.org/
17 18
will help
19 20
2) After checking for the libs you can extract the code and install it
21 22
the source code can be found under objic.ribalba.de
23 24
after downloading the latest version
25 26
$ tar -xvzf objic.tar.gz
27
77
APPENDIX E. INSTALL 28
will unzip the file with all you need.
29 30 31
3) It is advised to edit the configuration file. This should be relative self explanatory and the default settings are normally OK for testing
32 33
4) Then you have to setup your execution path to include the executable files This is done by extending your PATH to include the src folder
34 35
$ export PATH=/path/to/source:$PATH
36 37
will normally do the trick
38 39 40
Please email me if you have any further questions under [email protected]
78
F CD Content
Directory Layout / ............................................................... The CD root directory code .............................................The dir where all the code resides JavaTreeGen ................................A program to generate parse trees LexParse ...................................................The objic program excipsecode .....................The eclipse project holding all source files output ...................................................Output from Antlr design ............................................... High level design documents diagrams ............................................. Diagrams used in the paper documents .....................................................License and Paper examples ....................................................Example source code man .....................................Manuals like INSTALL and the man pages managment ......................................................The Gantt charts objic .................................................The global configuration file paper .....................................................The actual paper source images ...............................................Images used in the report includes ...................................Source examples used in the report syntaximg ................................................Images of the syntax proposal ............................................................The proposal scripts ..........................Management scripts, like backup and word count dump .......................................... SVN dump and other unhelpful files
Further information Operating system Linux, with Python 2.4, see INSTALL for further detail Documentation Is in plain ASCII and can be viewed with appropriate program Libraries Can be found in the INSTALL file
79
G Backup Script
1
#!/bin/sh
2 3
#Set some colour
4
export GREP_OPTIONS=’--color=auto’ GREP_COLOR=’1;32’
5 6
#Check for "bad words"
7
grep -i ’ me ’ ./paper/*.tex
8
grep -i ’ I ’ ./paper/*.tex grep -i ’ we ’ ./paper/*.tex grep -i ’ you ’ ./paper/*.tex
9 10 11 12
#Check that everything is committed
13
if [ ‘svn st | wc -l‘ != "0" ]; then
14 15
echo "Please commit all changes"; exit; fi
16 17
#Goto CERN and update svn
18
ssh [email protected] ’svn up /afs/cern.ch/user/r/ribalba/fyp/’
19 20
#Tar up everything
21
tar -cjf fyp‘date ’+%e%b’ | sed -e ’s/\s*//g’‘.tar.bz2 *
22 23
#upload to uni
24
scp fyp‘date ’+%e%b’ | sed -e ’s/\s*//g’‘.tar.bz2 \
25
[email protected]:/home/ghoffman/fyp/
26 27
#And delete
28
rm fyp‘date ’+%e%b’ | sed -e ’s/\s*//g’‘.tar.bz2
80
H For Loop
This code example will be translated into the following parse tree. 1
class ForLoop{
2
a = new Int(3)
3 4
for (c = new Int(); c.value() <= a.value(); c = c.add(1)){
5
print(c.value())
6 7
} }
The output of the program is 1
0
2
1
3
2
4
3
81
I SOAP/HTTP Comparison
SOAP message 1
2
<soap:Envelope
3
xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
4
soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
5 6 7 8 9 10
<soap:Body xmlns:m="http://www.example.org/stock"> <m:GetStockPrice> <m:StockName>IBM
11 12
Time needed to parse: 0.137 seconds System calls: 2334
objic 1
CALL:GetStockPrice
2
PARAM:IBM
Time needed to parse: 0.031 seconds System calls: 891
82
J Programming the Cloud
This is a paper which was written before the project was started, as a research proposal. It should clarify how the vision of a cloud enabled language should look like. Author:
Hoffmann Geerd-Dietger
Contact:
[email protected]
Date:
2009-04-20
Web site:
http://www.ribalba.de
Version:
0.2
Dave is sitting at home in front of his TV and is trying to remember all the people who he should send a Christmas card to. Being annoyed by the fact that he has this problem every year, and always forgets someone, he decides to write a little application he can use through the year to record these people. Thinking about the problem he identifies two scenarios. The first one is that he has to add people from his address book and the second is that he has to manually enter the data. So he grabs the keyboard that is next to the couch, changes the TV over to his Desktop and creates a new application. First he defines a class he calls Model, that defines the data the program should hold. Experienced readers can tell that Dave knows the concept of Model-View-Controller. people = new List On a very high level the only thing that he needs is a list of people with their names and addresses. So he searches for a “person” Object hosting service that suits his needs1 and finds one offered by Bluesphere.org. This service can be used by private users for free, so he defines that all Person objects should be hosted on Bluesphere.org. The guys2 at Bluesphere are so nice and also offer a backup server so that if the first one is not reachable objects can still be instantiated and used: Person @ bluesphere.org, backup.bluesphere.org If this would be an important application he would also specify that all objects are mirrored, to be extra safe. This would involve specifying where and the language would take care of all the underlying work. Because his desktop machine is quite low spec he decides to host the list object on one of his rented object servers: 83
APPENDIX J. PROGRAMMING THE CLOUD List @ Flipcube.net He has a contract with Flipcube so that he can create 1 Million list objects a day with less than 10000 entries. If he goes over this limit they will charge him according to the methods he invokes, so creating a new list will cost one credit but a complex ordering will cost 5. But this will easily fit into his limit, so now he has to set up the data structure he needs. As he wants to use the application throughout the year he has to save the data. He does this by defining a save() and retrieve() method in his model, which he can pretty much copy and paste: myDataStore = new ObjectStore("peopleToRememberToSendCardsTo") def save(){ myDataStore.save(people) } def retrieve(){ for person in myDataStore.getItems(){ people.add(person) } } He has rented out a 2 TB of store with the company InodeBird so the tells the program to initialize all ObjectStore objects there: ObjectStore @ InodeBird.com, dave.homeserver.net Further he has set up a private backup server at dave.homeserver.net which is a mirror of InodeBird. Dave has done this so he still has physical access to his data; all his friends call him paranoid and old fashioned because of this. Their opinion is that he can never reach the 99.9% availability of the InodeBird server farm that is located in an old nuclear bunker. All the data transmission is encrypted and signed, when signing up to the services it is required to transmit your personal ”profile” file which includes keys and other relevant data. Now Dave is satisfied that the model can handle all the data it needs to. So he starts programming the actual functionality in a class named Controller. The first thing the controller must do is retrieve the saved data so he defines the constructor. def Controller(stdinParam, stdoutParam){ stdin = stdinParam stdout = stdoutParam Model.retrieve() } First he adds the method to add a person from his address book. The variable stdin and stdout are defined by the object View and are passed in every time a Controller is created. But he does not really care how he is going to access the application, as there are predefined views for his phone, TV and computer. def addFromAddressBook(){ var myAddressBook = new AddressBook() @ addressbookserv.net 84
#ToDo check if successful myAddressBook.login(stdin, stdout) stdout.print("Enter Name ... : ") personToFind = stdin.read() personToAdd = myAddressBook.find(personToFind) Model.people.add(persontoAdd) Model.save() } The method to add a person by hand is written quite easily too. def addByHand(){ personToAdd = new Person() personToAdd.propagate(stdin, stdout) Model.people.add(personToAdd) Model.save() } Remember we use the Person object hosted on bluesphere.org and this has a propagate method that will ask all the data it needs to know through stdin and stdout. Finally Dave saves the updated list of persons. He then adds a similar method for deleting people from the list. He could make the program more compact and write the addByHand() method in a short form. def addByHand(){ Model.people.add(new Person().propagate(stdin, stdout)) } But Dave likes the idea to have nice structured and readable code. Further to write in this way he would have to update the model specification to auto-save when the data is modified. What is not directly visible from the code is that the list is actually only a list of pointers. So if he updates an address in his address book the list will have the up-to-date data. Thread safety is an issue here but Dave can assume that the programmer of the interface has taken care of this. Instead of going through the process of a login in to the AddressBook server he could have added a certificate to the program, so as soon as it runs he has read rights. He is also relying on that bluesphere will keep the objects accessible for as long as he needs them. In the long run he will pay them for hosting the objects, but for the development stage this risk is acceptable. Now the only thing he has to do is to create some front-end to his service. For this he can use some objects that are predefined so he can access his application through all the devices he has like phone, computer, netbook, TV, etc. So he defines a class View that extends from viewport.org and he registers the methods he has created in the controller. He doesn’t need to care about the display of the Person objects as there is a method display(stdout) that will know how to display the object correctly depending on what type stdout is. Stdout always 85
APPENDIX J. PROGRAMMING THE CLOUD has a type field that tells the render function what to output. For example if Dave accesses his application through a web browser the type will be XHTML but he can easily get an XML feed by using a stdout object with the type XML. As all the objects are hosted on servers distributed around the world Dave never really know or needs to know where the data is. While adding the List object, to his application, Dave chuckled. He was reminded of how the sort function had a really funny error that when 100 references where the same the ordering would not be correct. This was a really weird error and only a few people noticed it, but because all the computation was done by the servers the fix was instant and some people still didn’t know about this. He also remembered how it used to be, when you had to download updated versions of every software application all the time and what a pain that was. Dave uploaded his application to his application hosting service and can now use it from wherever he is in the world. After some time, Dave’s friends found out about his application and wanted to use it too. The only thing he had to do is add some user verification and he also decided to add some error catching and then allowed his friends to use the app. Because every part of the system is hosted somewhere it doesn’t matter that now 20 people us it, as the servers he uses are all part of a scalable system. Dave is thinking about selling access to his application now, this is also quite easy; the only thing he has to do is include a payment object from the company payfriend.com, that he already uses to pay for his storage and then he can start making money.
1
He is not bound to one service, he can use anyone he thinks will host the objects to his
satisfaction. 2 Of course there are girls working at Bluesphere too. 86
K Grammar Description
In this section the main grammar elements are described and visualized. Small examples are also provided to clarify the descriptions. There are more elements but these are not vital to the understanding of the language and the design of the system.
K.0.1
block
A block is everything between ’{’ and ’}’, it is possible to have as many “stat” expressions (see K.9 on page 91) as wanted, further empty blocks are allowed. Blocks can and have to be able to be nested. An example for a commonly used block is a method definition see Code 13, which might have a loop embedded, which is a block in a block. In the implementation a block will further mean that the stack level will be raised, so all variables defined in this block are local to the block.
Figure K.1: A diagram of the syntax for a new block of code
Code 13 Method declaration 1
def printMe{
2 3
print ("Hello world") }
K.0.2
call
A call is a method invocation on an object with some parameters. This is known from Java, from which this syntax is copied. The “NAME” token in front of the dot is the reference to 87
APPENDIX K. GRAMMAR DESCRIPTION a name in the symbol table which is linked to a pointer to an object. The “NAME” after the dot is the method to invoke. The syntax of “paramlist” can be found under K.8 on page 91.
Figure K.2: A diagram of the syntax for a call statement
Code 14 Simple method call 1
ME.printMe("Hello world")
K.0.3
classdef and methdef
Class and Method declarations share the same syntax. They both have a “NAME” token which identifies the instance and take a block that is executed. The only restriction is that a method definition has to be in the block of a class definition, otherwise it cannot be addressed. Functions alone are not possible in objic, as the aim is to have the object accessible from remote objects and as such they would not be in the published scope. The syntax is very similar to the way Python declares such elements.
Figure K.3: A diagram of the syntax for a new class definition
Figure K.4: A diagram of the syntax for a new method definition
Code 15 Method and Class declaration 1
def PrintMeClass{
2
def printMe{
3
print ("Hello world")
4 5
} }
K.0.4
whileloop
A while loop is one of the most important constructs for a programming language to be Turing complete (see objectives). The construct contains a Boolean expression and a block. While 88
the Boolean expression evaluates to true the block is executed. Before every new round of execution the condition is checked again. The Boolean expression is the same as in an if statement. One major functionality needed to be able to use the while loop is a “break” keyword, when invoked it stops the execution of the loop immediately. This is often used in conjunction with an “if” statement to check for some condition and if this holds exit the loop.
Figure K.5: A diagram of the syntax for a while loop
Code 16 While loop 1
while (a.value() <= b.value()){
2 3
print("In while") }
K.0.5
forloop
A “for” loop is a short syntax for a while loop, every “for” loop can be expressed in a while grammar. This will be done through creating the while AST when parsing a “for” syntax. 1
for (i = new Int(); i.value() == b.value(); i = i.add(1)){}
is equal to 1
i = new Int()
2
while (i.value() == b.value()){
3 4
i = i.add(1) }
So in the actual interpreter the “for” keyword will not be understood. This is implemented to provide a familiar syntax for the Java and C programmers. The grammar is therefore exactly copied from C. With the first statement being the loop invariant which is created as a counter, the second parameter being the Boolean evaluation that has to evaluate to true for the loop to execute and the third parameter being the loop invariant modifier, which is executed at the beginning of every loop invocation.
Figure K.6: A diagram of the syntax for a new for loop The actual transition can be seen in Appendix H on page 81
K.0.6
newvar
The “new” keyword could also be named equal. In objic it will not be possible to create a new object without assigning it to a variable name on the stack, by this anonymous classes [Bloch, 89
APPENDIX K. GRAMMAR DESCRIPTION 2001] are not possible. Further all three cases will create a new instance of the object to the right of the “=” token. This is done so other references to this object are not modified and as such the risk of race conditions can be reduced. In some conditions the object might do this by itself, by returning the NEWPTR message the variable pointer on the local stack gets updated to the new object reference (see 4.8 on page 28). This is done when adding a number to an Int object, by calling the “add” method. The method will create a new variable with the new value and return an updated pointer, the calling object then updates its reference to the new object and thus contains the correct value. Code 17 Pointer update 1
a = new Int() @ bigi.home
2
a = a.add(1)
In line 1 “a” is pointing to an instance of an Int object with the value “0” when invoking the add method, on line 2, with the parameter 1 the original Int takes the value 0 adds the parameter 1 and thus creates a new object with the value 1 and returns the updated pointer which is then assigned to the variable “a”. There are three ways of specifying the invocation location of a new object. A detailed discussion can be found under the heading “Object location specification” 4.5 on page 25. One way can be seen on line one where the programmer tells the VM to initialize the Int object on the server that can be found under bigi.home.
Figure K.7: A diagram of the syntax for a new variable declaration
K.0.7
paramlist
Parameter transfer is always a problem in distributed systems [Orfali et al., 1995]. There is no way in knowing that the side where the method is invoked will understand the parameter given. Further it cannot be assumed that the implementation is identical to the local system. This causes a lot of confusion and is a regular source of errors. In objic this problem is solved by serialising all data into strings, by this technique the object receiving the parameter deals with the data typing and as such multiple usages can be applied. The convention is that differing parameters have to be separated by commas. It is possible to have 0..n parameters. As objic is dynamically typed serialisation can be performed very easily and efficiently. It also offloads the error checking onto the called object which increases the chances of catching a type error as the client does not know or care about the implementation on the server side. However this also creates some difficulties, for example that type safety cannot be checked at compile time like in Java or C, but languages like Python use duck typing very successfully and do not seem to have too many problems with this approach. Further a minor speed impact is the result of having to check the type of every parameter which it acceptable for the gained 90
security and less communication overhead.
Figure K.8: A diagram of the syntax for the parameters passed in to methods
K.0.8
stat
There are numerous statements that can be included in a block, as a block is a indefinite repetition of stats (see K.0.1 on page 87). This list is a collection of everything allowed.
Figure K.9: A diagram of what can be included in a block
K.0.9
NAME
The NAME token is the main identifier for variables and as such for pointers and method names. A name can be constructed out of capital and lowercase letters, numbers as well as “+”, “-” ,. The numbers and “+”, “-” are taken from the INT token which is not described here. 91
APPENDIX K. GRAMMAR DESCRIPTION
Figure K.10: A diagram of the syntax for the NAME token
K.0.10
Comments
In objic there are two types of comments: Line comments that end at the newline and comment blocks that can encapsulate whole regions of the code. The line comment can be placed anywhere and will cause the parser to ignore anything until the end of the line, despite the length of the line. Both Unix c Linux and Windows newline characters are understood, which should make porting easier. In a block comment there is a defined start tag “ ” which will cause the parser to ignore anything till the close tag “*/” is seen. This representation and behaviour is a direct copy from C and Java, which should make the code easier to understand for people that already know these languages. For optimization all the comments are not represented in the byte code. This is based on the decision that for debugging purposes comments might be helpful but not vital, whereas size of the byte code is an important factor for a language especially, if it can be assumed that the binary is transferred over the network.
92
L Man Pages
A man page or manual pages is a documentation text for a program, mostly used in the *NIX world. The command to view such a page is the “man” command followed by the program name. This then displays information like NAME, SYNOPSIS, DESCRIPTION and EXAMPLES. The man page has become the de facto documentation standard in the *NIX environment.
93
APPENDIX L. MAN PAGES
ORUN(1)
GNU/LINUX
ORUN(1)
NAME orun – objic runner SYNOPSIS orun [-d] [-h] [-v] class params server
OPTIONS –d
Debug Mode
–h
Print help and exit
–v
Echo version and exit
DESCRIPTION This program initializes the class on the server and executes the main mehtod
EXAMPLES Example 1: ./orun fib 10 This will run the fib class and pass in the parameter 10 VERSION This documentation describes orun version 1 SEE ALSO globalConf.py objicc ObjServer objic.ribalba.de site AUTHOR Hoffmann Geerd-Dietger [email protected]
Mon, May 18, 2009
1
v1
94
OBJSERVER(1)
LINUX
OBJSERVER(1)
NAME ObjServer – Initializes the main server SYNOPSIS ObjServer [-h] [-v] [-d]
OPTIONS –h
Prints out a help message and exits
–d
Enables debug output also called verbose mode
–v
Prints out the version number
DESCRIPTION Starts the main server loop waiting for connections and executes the objects
VERSION This documentation describes ObjServer version 1 SEE ALSO orun objicc http://objic.ribalba.de site AUTHOR Hoffmann Geerd-Dietger [email protected]
Mon, May 18, 2009
1
v1
95
APPENDIX L. MAN PAGES
OBJICC(1)
DARWIN – MAC OS X
OBJICC(1)
NAME objicc – the objic compiler SYNOPSIS objic [-d] [-h] [-v]
OPTIONS -d
Enables debug mode
-h
Prints a help message
-v
Prints the version of the compiler
DESCRIPTION The compiler for the objic language
VERSION This documentation describes objicc version 1 SEE ALSO Objserver orun http://objic.ribalba.de site AUTHOR Hoffmann Geerd-Dietger [email protected]
Mon, May 18, 2009
1
v1
96
M Code Example
1
class multi {
2 3 4
\\The multiplier method def mulitply{
5
argv = ARGS.value()
6 7
argsint = new Int(argv)
8 9
retval = argsint.mul(argv)
10 11
retMeth = retval.value()
12 13 14
return(retMeth) }
15 16 17
\\The main method
18
def main {
19 20
argint = ARGS.value()
21 22
a = new Int(argint)
23 24
returnBuffer = new String()
25 26
for (c = new Int(); c.value() < a.value(); c = c.add(1)){
27 28
cval = c.value()
29 30
val = ME.mulitply(cval)
97
APPENDIX M. CODE EXAMPLE 31 32
returnBuffer = returnBuffer.add(val)
33 34
returnBuffer = returnBuffer.add(" ")
35 36 37
}
38 39
returnBudderMeht = returnBuffer.value()
40 41
return (returnBudderMeht)
42 43 44
} }
98
N Design Diagrams
These are early sketches of how the object communication should work. This was mostly followed in the implementation phase. Unfortunately the program this was created with corrupted the file so these print outs are the last versions.
99
APPENDIX N. DESIGN DIAGRAMS
An early class diagram of the server, this was modified so that each VM has its own connection:
100