This document was uploaded by user and they confirmed that they have the permission to share
it. If you are author or own the copyright of this book, please report to us by using this DMCA
report form. Report DMCA
Overview
Download & View End Report Sasanka as PDF for free.
A REPORT ON DEVELOPMENT OF AUTOMATED SOFTWARE APPLICATIONS BY KARUMURU SASANKA
2006A7PS125G AT
HP GLOBALSOFT., CHENNAI
A PRACTICE SCHOOL - II STATION OF
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI DEC, 2009
A REPORT ON DEVELOPMENT OF AUTOMATED SOFTWARE APPLICATIONS
BY KARUMURU SASANKA
2006A7PS125G
B.E.Hons Computer Science
Prepared in partial fulfillment of the Practice School – II Course No. BITS C412/ BITS C413/ BITS G639 AT HP GLOBALSOFT., CHENNAI A PRACTICE SCHOOL-II STATION OF
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI DEC, 2009
2
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE PILANI (RAJASTHAN) Practice School Division
Station: HP GLOBALSOFT.
Center: Chennai
Duration: From: 13th July 2009 To: 18th DEC 2009 Date of Submission: 4th DEC 2009 Title of the Project: DEVELOPMENT OF AUTOMATED SOFTWARE APPLICATIONS
ID No: 2006A7P125G
Name of the Student K.Sasanka
Discipline B.E. (Hons.) Computer Science
Name of the PS Faculty: Mr. V. R. Rajan Key Words: Scripting, JavaScript , Error checking, Java function calls. Project Area: Software development Abstract: This project aims at reducing the effort put in manually by people and tends to save a lot of time in making of required documents. It also aims in providing a clear cut user interface for applications for easier filling in of details. It covers the areas of error detection in codes according to some details that are otherwise rejected. The main points discussed are the code behind the main automation in case of the SharePoint and some other ideals of shell scripting which includes the excel sheet creation for error in sandbox path, parameters and directories. The main objectives and means of attaining them have been clearly discussed in the following report. Signature of the Student
Signature of the Expert
Date: 15/09/2009
3
Signature of PS Faculty
Acknowledgements I express my deep gratitude to Mr. Seshadri Rengaswamy, Manager Citigroup for giving me the golden opportunity to undertake this project. I would like to thank Prof. V. R. Rajan, PS-II Instructor-in-charge, for his constant support during the entire Practice School period. I am also thankful to my project guide, Mr. Himansu Sekhar Tripathy, for his guidance through the course of the project and without whom this project would not have been possible.
4
Table of Contents 1. SCRIPTING 1.1. WRAPPER SCRIPTS 1.1.1. INTRODUCTION.................................................................................................... 6 1.1.2. RUNNING GRAPHS USING SHELL SCRIPT...................................................... 6 1.1.3. JOB SCHEDULING USING AUTOSYS................................................................ 9 1.1.4. CONCLUSION....................................................................................................... 14
1. SCRIPTING 1.1 Wrapper scripts 1.1.1 Introduction Wrapper Scripts are common scripts with all the required functions for scripting predefined in them. These functions can be commonly used for all the scripting needs and requirements of development at Citigroup. This first assignment can be classified into two main stages: A) Running Graphs using Scripts B) Job scheduling using autosys
1.1.2Running Graphs using Shell Scripts: i) Introduction The use of an ETL tool called AbInitio is essential for the maintenance of the huge data as it proves to be a good data warehousing tool. This tool works only on a UNIX (Linux) environment. So all the scripts to run the graphs and display the errors or creation of log files have to be done in a programming language which is supported by UNIX. Here Shell Scripting is widely used as it is comprised of excellent string as well as data handling functions. In all the scripts a unique wrapper script is used to run all the graphs. This common wrapper contains all the basic functions to run the graphs, like for e.g. rollback, current step, next step, chkerr, etc. This common script contains all its variables or more specifically environment variables in specific .env or environment files specific to each profile or project. Importing these environment files, according to the needs of the graph. So each script will have different environment files that need to be imported and other variables to be hard coded. An excerpt of the wrapper script is given in the next page. This demonstrates the use of the env files and assignment or exporting of standard variables by running the unique project setup script.
6
ii) Necessities to run the scripts Each script has a number of variables and many scripts have some basic common variables. So it would be a tedious task to export or reassign all the variable values each time again and again to all the scripts. So for each project a file called as the Project_setup.ksh exists which exports the variables required for each scripts. Some variables are also given into the script in the form as parameters to the script itself, these variables that are passed as parameters to the script are unique to that script only. Paths of each file are also to be defined in the setup file. Sample paths or directories 1) BIN – contains all the wrapper scripts 2) SERIAL_LOG- log files generated by wrapper scripts 3) MP – executable files i.e. the ones used to run in the wrapper script 4) SERIAL_ERROR – error log files 5) SERIAL – data or .dat files 6) REJECT - .err or .rej files 7).ssh – profile executable files like project_setup Each of these paths is setup in the project setup file that executes or is called at the beginning of each script that assigns all the paths of the respective directories and variables. iii) Execution of project setup Apart from the standard project setup file there is also a regular profile file for which all the variables and paths of all common projects are entered. Here the variable of AI_DB or AI_RUN is stored but based on an override file. This project had an overrides called as the bpm_ovrrides file. Here the file execution is: $AI_RUN= <path> .project_setup../$AI_RUN
7
This command indicates that executes the project setup file which is in the run directory and then execute it in that folder so as to fix all the variables accordingly. iv) Execution part of the script Running the graph after assigning all the paths and variables is the ultimate task of the wrapper script. The command line execution of the wrapper script is: cmd = $AI_RUN/.ksh This AI_RUN is a path or the directory just like all the other paths that tells the shell that the path had been defined in the setup file. Then the shell extracts all the variables and runs the graph giving the necessary output. v) Generation of output The output of the script is generated with clear demarcation and lines separating each step. This output will be generated by the wrapper script in an almost similar fashion for all scripts. vi) Concept of Return code Returncode is the number generated when the entire graph has been called and run accordingly. This determines whether the graph has been successfully executed or have there been any untimely or unwanted interruptions to the graph. The return code will have the following values for the following errors 0 – Successful execution of the entire graph 1 – Failed 2 – Import of graph fails Any other number – error at random step or unknown error kind
vii) Actions on returncode The number of scripts used is very large and keeping track of all or knowing that all scripts are being executed smoothly is a big task , hence when the return code of a
8
wrapper script is 1 or any other for value for that matter, an e-mail list is maintained(that too in the main profile, the overrides or the project setup file). An e-mail is sent to all the email id’s in the mailing list that so and so graph under the project has failed and the relevant reasons are also given for the filing of the graph. This error report is often custom and can be generated in any manner we like and can be customized to actually pin point the origin of error.
1.1.3Job scheduling using Autosys i)Introduction After the wrapper scripts have been properly setup in the right profile with all variables assigned they have to be run on a regular basis to update the data or retrieve or make changes in the data. This can be achieved through tool in Unix environment called Autosys. What is Autosys? Autosys is a product from Platinum it is a job scheduling software. Most basic job scheduling is done using Unix - Cron (or windows - AT) which is time based it will fire the job based on the clock. However in real business you need to fire jobs not just based on scheduled time but also based on events (like an arrival of a file) and based on the success or failure of other jobs. Autosys is a total job scheduling solution that allows you to define these events dependencies time schedules alerts etc making it a complete data centre automation tool. In other words it can also be put as: Autosys is a job scheduling software like Control - M and cron with the help of autosys we can define the runtime day date week and which script or program needs to be run. The main advantage of using autosys w.r.t crontab is that it is has a Java front end too so a person do not need to be a Unix professional to create or change a job in autosys. The main attributes of AUTOSYS are: 9
1) Job name : 2) script : 3) owner: 4) machine it has to run on : 5) day: 6) date: 7) week: 8) Error log: 9) output log: 10) alarm: An insertion of a job into the autosys is an easy job and u need not be a unix professional to do it. It is just a mere execution of jobs or in other words the wrapper scripts deployed earlier. Scheduling is easy as mention of date and day would be enough. It can be programmed to run on success condition or failure to retrieve information or otherwise. Each attribute in detail:1) JOB NAME: The job name should not exceed 30 letters and should follow a convention for better receptivity of functions. The citigroup convention follows a unique numbering system for easier remember rance and ease of automation. 2) Script: The script that has been called in the particular job that will execute. 3) Owner: The owner is the one from whose profile this job can be called and executed i.e. all parameters should get initialized. 4) Machine it has to run on: There are a number of servers available, this field determines the server on which the given script should run on. 5) Day: This specifies the days on which the given script should run. For example mo, tu, we, th, fr except us holidays.
10
6) Date: It can also be programmed to run on specific date only i.e. 09 Sep 2009 etc. 7) Week: weeks to skip or execute can be given 8) Error log: This is the path where the error file will be created if the script fails and all the adequate reasons of failure will be saved in an error file in this directory. 9) Output log: Either a success or a failure a log file will be generated with all the contents of execution of the script. This is created in the path given in this attribute. 10) Alarm: Alarm or mail alert can be assigned for our sake to know every time a script has failed.
ii) Uploading the jobs After creating the 6 wrappers calling respective graphs to do some functions, inserting these scripts as jobs in autosys. For each script a job name has to be created with all the details as above. But these 6 wrapper scripts are inter linked to each other i.e. the 2nd will run only if the first one succeeds and 3rd one on the success of 2nd , so on and so forth until the last wrapper script. Hence here we make use of another condition or an attribute in autosys called as the ‘Condition’ attribute. Ex: - Condition: S (job_name of previous script) This will take care of the necessary condition of success of previous script. The jobs will be uploaded into the autosys by the use of ‘jil’ command. All the attributes can be defined in a text file or word file and the copy pasted into the jil command prompt. Jil>>
11
And then type ‘exit’ in the jil prompt to update the database of autosys. A successful conformation will occur if the job is perfect i.e. the name doesnot already exist and no infinite loops of scripts exist. After putting in all the jobs into jil in the above mentioned way all of them can be tied up into a small box since they are anyways of the same kind of jobs and depend upon each other for proper functionalities or otherwise. By putting all of them in one box we don’t have to give date and time and week for each and every job but can give it only for the main box and leave it to run. We can put them all in a box by adding an extra tag to each Box name : This addition can be done by tying jil i.e. entering the jil prompt and then typing update_job:<job name> And then typing the line for updating i.e. Box_name: Now that all the jobs have been successfully entered into the autosys database, we need to check whether each one of the jobs are successfully running or not, this is our next phase in autosys. iii) Running the jobs and verification The jobs after insertion into database lie there, waiting for date and time to come or their success condition to occur. They can be force started by the following command Sendevent FORCE_STARTJOB –j < job name > This force starts the job no matter what the condition of execution may be. To check the success or failure of the job we have another command in autosys which is:
12
AUTOREP –j < job name > This command displays the last time when this job had been started and what the result was at that time. But after repeatedly giving this command we see that it starts running i.e. it goes into the ‘RU’ mode. Sometimes the job might go into the ‘RE’ mode which is the restart mode. The different modes of the autosys jobs are: ST - start RU – running RE – restart FA – failure SU – Success If at the end the display show a ‘SU’ then it is the time to move onto the next job. But if the end result of the job is ‘FA’ we need a further examination of all files including the profile file in autosys.
iv) Checking and Debugging The best way to start the cross examination or debugging is to view the failure reports generated by the autosys in the error log path we created for it at time of defining the jobs. This error file will clearly give why thejob has failed and we need to correct it. The common errors occurring in these are values for different parameters not getting assigned, this is because the common scripts are generated in ksh with the she bang at the top of every script (#!\bin\ksh). But the autosys generates all its variables or execution parameters in sh extension. The solution for this problem is to hard code the unrelated parameters in the direct profile file. Email alerts for failure procedures can also be implemented by which it becomes easier to track the failed ones.
13
1.1.4 CONCLUSION Shell scripting is made easier by the use of this wrapper scripts as it calls on common functions that can be used in many ways. All scripts can be coded using these common functions. Just the graph to call and path of profile path will change. Hence they can be executed in any profile by just mentioning one or two paths. Wrapper scripting helps in speedier movement of projects and their other functions as easier and smaller coding scale will result in a multiplied output of resources. Since all the paths will be properly placed in in respective overrides files they will be easily assigned and quickly taken into account by the shell. Autosys job scheduling is an important tool for reducing manual running of scripts on a daily basis. A periodic functioning of the job ensures the smooth data warehousing and data retrieval for smooth productivity. It guarantees the success of many jobs as it can be linked to others i.e. in our case a success of the last 6th job is enough for us to deduce that all the other graphs have run successfully and have come up with the required output according to user needs. The combination of the wrapper scripts and job scheduling of autosys is boon to software developers and programmers all around the world as the code is broken down to basic functions and the main can be easily programmed using these functions and necessary precautions such as log and error files can also be taken. Then the autosys job scheduler further lowers the effort put in by any system programmer by repetitively calling the required scripts (jobs) in the particular order of designation. Relevant outputs can be also grouped together by giving similar extensions of output and hence a very useful and productive program can be created.
14
1.2 CODE REVIEW AUTOMATION 1.2.1 Introduction The ultimate goal of any program or scripts is to successfully complete the desired operations. But even after the ultimate goal is reached there are some glitches in the coding capabilities of every programmer. This is simply to say that a programmer might give away some useful information that is very necessary for the complete running of that script. Now if a rival company or rival industry if they get their hands on it might make use of it in their own way and hence is dangerous in many ways. Hence to make the code completely secure a standardized procedure is followed to ascertain the security of scripts. This procedure is known as ‘code review’. This process is being done manually at the current stage in citigroup. The following project though still in the development stage has progressed quite fast and will be completed in the near future. There are many good examples of code review errors. Some of them are hard coding the user into the scripts. This gives away your unique id into the citigroup away and hence should never be done. Another good example is the file extensions of similar kind should be grouped in the respective directories. This project aims to automate the entire process of code review i.e. check all the scripts and directories of sandbox path and output all the errors associated with them. This output should be in a very readable format possibly in an excel file (as it is generated manually in an excel sheet). This final shell script should output all the errors (errors according to previous code review standards) without manual interruptions and compile them all neatly into an excel file (.csv which is comma delimited and excel readable).
15
1.2.2 OBJECTIVES Serial No.
2
Review Item All Table and file dmls must be stored in ${AI_DML} directory, having file extension .dml xfr files if the graph requires it to be parameterized ,must be stored in ${AI_XFR} directory, having file extension .xfr
3 4
All connection files must be stored in COMMON_DB directory, having file extension .dbc All sql files must be stored in ${AI_SQL} directory, having file extension .sql
1
All the graphs should have following naming convention: _<Sequence>_< Subsystem Name>. mp where is shortened name of the project. E.g. gtpl for Global Trading Profit and Loss. Sequence number falls into following convention:
5
6 7 8
• 000-099 Create Lookups and Reference Data • 100-199 Extract and Validate • 200-299 Apply Business Rules and Transformations • 300-399 Create Load Ready Files • 400-499 Database Loads • 500-599 Data archives Subsystem Name indicates the nature/description or the functionality of the data, source, and target of the graph. Using the above convention we can have • gtpl_101_position.mp: This is a graph to extract and validate positions. • gtpl_201_trade.mp: This is a graph which applies business rules and transformations to trades. All the subgraphs should have the following naming convention: _<Shortened Component Name>_< Sequence >_<Timestamp>. Extension The Sequence and the Timestamp are optional; their function is to uniquely identify the files. Sequence can be used say for example when there is a Reformat with multiple ports. The extension for log files is .log and that for the error file is .err. All the data files should have extension of .dat and must be stored in ${AI_SERIAL} All the look up files should have the extension of .lkp and must be stored in $ {AI_SERIAL_LOOKUP}
16
9
All the components must follow the following naming convention: _< Descriptor> For e.g. A Join component can have the names of the key fields and the sort order as the descriptor. A Reformat component doing a lookup for validating the “Department” would have “Validate Department" as the descriptor The Employee table when used, as an input source will be denoted by: “Input Table_Employee”. In this case “Employee” is the descriptor. The Employee table when used, as an output destination will be denoted by: “Output Table_Employee”. In this case “Employee” is the descriptor. A Daily Transactions File when used, as an input source will be denoted by: “Input File_DailyTransactions”. In this case “DailyTransactions” is the descriptor.
10
All the graph level comments in upper left corner should contain the following in RED: • Author • Graph Name • Description (This should include a brief description of what the graph does and also the run time parameters to be supplied to the graph.) • Date Created • Change Log This should have the following: o Date of Change o Developer Name o Reason for change o Description of change For example: Author: John Doe Graph Name: gtpl_101_position.mp Description: This graph extracts and validates position data. Date created: 01-sep-2005 Change Log: Date of change: 31-dec-2006 Developer Name: Jane Doe Reason for change: New validation added for positions. Description of change: Validation using a new lookup file was added. In addition comments should also be put to identify key functional areas in the graph.
11 12 13 14 15 16
All the component level comments should be in blue and should be present at the top of the component. For example: e.g. Reformat to Validate trade type Nesting of subgraphs should be avoided Sources and targets should not be in sub graph Reject-threshold should be set as "abort on first reject" and "never abort" should never be used without a valid reason. Upper case should be used for SQL reserved words All lookup files should be at the top of the graph
17
17 18 19 20 21 22 23 24 25
Output files should be set to "don't rollback if the job fails" Make sure that graph maintains the rejected records and log files All Sandbox and graph level paramters should be defined with ${} interpretation and used with ${} .ex ${AI_DML},${PREV_RUN_TIMESTAMP} For all Output Table components, its advisable to connect the log port to a file - AND call it ${OUTPUT_TABLE_NAME}_load_${CURR_RUN_TIMESTAMP}.log. When any field needs to be assigned NULL value don’t hard code NULL allow it to take the default NULL value in dml wherever possible. Use api mode while using tables in SQL While working with SQL,ensure that the layouts selected for tables are Database:serial. Ensure that the staging tables are in truncate and load mode while the core tables are in append mode while Insertion. Ensure that all file dmls have a newline character as their last field.
These are the main objectives of code review.
1.2.3 Simple objectives So far the script has been created to check for all the directories for appropriate file extensions. All the files are checked whether they are separated by a “.” and that the last 3 letters are of the valid file extension or not. An almost similar code for all of them i.e. the first 8 objectives will be easily attained by replicating the same algorithm for all of them The next objectives are to search for Start script or end script which is directly done by searching for “start script+”. This can be done by the command: grep –n “Script Start+” $VARIABLE where variable can be put in a loop to take all the values of the file in a specified directory. Similarly replicating for End script the command turns out to be grep –n “End Script+” $VARIABLE These two commands only give us the line number of the following scripts. But once line numbers are captured the in between lines acan be put in a file and then whether it has any content or not. If there is any content it should be flagged as in displayed in the output file.
18
The display should be comma delimited i.e. each word or column separation should be marked by a comma. Only then a .csv file extension can be created which it displays in an excel sheet. This can be attained by the following command: awk “/$^,/” the output cane be pipelined using “|” and the free flowing output can be directly generated into an excel sheet.
1.2.4 Advanced objectives The main challenges in this code review are the objectives from number 11 to number 24 in the above mentioned table. These pose a difficulty because of their close association with human perception than code changes. Procedural error checking is the most difficult of them all. Hence these objectives were attained with the help of an expert here. Output table of i,o tables had to be configured and compared with appropriate profile settings which lie outside the whole procedural call, hence the program would take a long time if the call was made inside the program. It would be easier to schedule this call in an outside environment of autosys and would be taken care of while running this code review script. Reject threshold is also the validation step in this program. A script running in the autosys should have a limiting number of tries at the running and should not be forced to access resources after a certain limit. This should be taken care of in the script itself by checking the outputs of the wait or success files. By the checking of outputs we mean the internal data of the log or error files created by the jobs on failure or success or otherwise. Jobs on ice (a term in autosys) are to be dealt in a brute force manner, by iterating through each and every call, output file, error log etc.
19
1.2.5 CONCLUSION Automation is the most important thing in the IT industry as the main objective of software programming has been to ease the life of a human. The programs or scripts being written are themselves set to automate some very useful purpose for a client, but now the no: of scripts and graphs is so massive that each one have to checked for security violations. But such massive numbers of codes cannot be checked manually as it would make it a tedious work. That is why this code review automation has been introduced to simplify the security check ups. Advantages of this script are numerous, to state a few are it reduces the manual labor of hours put in by a person as it simply fishes out what the person was originally looking for. In depth search of the code is maintained i.e. a person may miss many scripting errors by a well organized search script will not, hence will actually tighten the security. It is sure to give an 100% effective search of all the scripts and file extensions as the script will be deployed with necessary sandbox path as a parameter. The only worry about this automation is that the code review also conducts a detailed scan on the graphs of AbInitio. This might be difficult to go through a shell script as it requires searching multiple file of mp, run and db directories and cross referencing the values for coding errors. This part of the script is difficult but not and impossible task, so when it’ll all be done the script will be easier to run , output very readable in .csv file extension format readable in excel. Hence a file that is valid in format and submittable to code review management will be created at the end of completion of this script by which all error correction should be done again. Maybe the ultimate stage of this application is not to merely display the errors but to actually correct them.
20
2.SHAREPOINT AUTOMATION 2.1 What is SharePoint? Microsoft Office SharePoint Server 2007 is an integrated suite of server capabilities that can help improve organizational effectiveness by providing comprehensive content management and enterprise search, accelerating shared business processes, and facilitating
information-sharing
across boundaries
for
better
business insight.
Additionally, this collaboration and content management server provides IT professionals and developers with the platform and tools they need for server administration, application extensibility, and interoperability.
2.2 Why is automation necessary? The project leads of the citigroup had to submit a word document that contains all the status details of all the projects they are currently working on. This word document had to be prepared by someone manually using MSWORD. A sample of this document:
PROJECT STATUS Module Application ISG
This Week Last Week
Production Support
Graphs Scripts Total Completed In Progress Not Started Detailed Project status
RECON BPM
Total
7
8
Completed 7
8
In Progress Not Started Detailed Project status
21
RECON DVP_RVP
Total
2
4
Completed 2
4
In Progress 0
0
Not Started 0
0
. Detailed Project status DDI
GSR ITERATION 2
Total
2
0
Completed 0
0
In Progress 2
0
Not Started 0
0
Detailed Project status GSM
LUX
Total Completed In Progress Not Started Detailed Project status
DDI
COLT
Total
1
0
Completed 0
0
In Progress 1
0
Not Started 0
0
Detailed Project status DDI
Trade Mart
Total
2
0
Completed 2
0
In Progress 0
0
Not Started 0
0
Detailed Project status
MILESTONE STATUS Sl. Module Milestone No
Completion(%) Original
Re-Planned
Actual
Completion
Completion
Completion
Date(SIT
Date(SIT
Date(SIT
22
Remarks
Realease) 1
ISG
Production Support
N/A
2
RECON BPM
100%
3
RECON DVP_RVP
100%
DDI
Release)
Release)
N/A 8/28/2009 9/18/2009
GSR
4
ITERATION 50%
9/16/2009
2 5
GSM
LUX
100%
N/A
6
DDI
COLT
80%
9/11/2009
7
DDI
Trade Mart
100%
9/4/2009
9/4/2009
Actions Planned for Next Week Module Project ISG
Activities Planned for Next Week
Production Support Detailed Project status
RECON BPM
Detailed Project status
RECON DVP_RVP
Detailed Project status
DDI
GSR ITERATION 2 Detailed Project status
GSM
LUX
Detailed Project status
DDI
COLT
Detailed Project status
DDI
Trade Mart
Detailed Project status
The document had to be created this way and sent to the respective clients. So the process for creation of such a report was manually done. All the employees had to generate a mail of all their information i.e. their project status, milestone and actions planned for next week. This is copy pasted manually by someone into the word document and submitted. This proved to a tedious work and hence some sort of automation was required to ease the work of the document maker, since all the mails were coming in.
2.3 Web template A web template for entry of all the details had to be created. This was the customized web template created by me. 23
This was coded in an html form. The code is lengthy code of simplicity and hence not attached in the report. This template was to be hosted on the centralized SharePoint server. The entries of all the employees had to be recorded and then made into the sample document as above. The collection of data was to be an easy task as SharePoint collects all the data and displays it in the form of an html table with an unique tableid. This table id can be then used by any client side scripting (JavaScript in my case) and easily the document could be prepared. Even though JavaScript boasts of a wide range of function that can be used it does not have the adequate functionality or basis to handle word documents. Word documents had to be handled by an ActiveX object that had to be defined and for repeated uses would take up a lot of space and would slow down the process.
2.4 HTML TABLE The html table created by the SharePoint server is regular vertically arranged column and not in the format of the required document. The adjacent page contains the sample html table. Then there will be a button at the very end of the page to export it into the word document of desired format. But this button should be only allowable to site administrators and not all the users. This would be a difficult task as Microsoft applications tend to hide source of code and hence would be difficult for us to find the
24
email id of the logged on person. Hence an alternate solution would be to have a built in password shared only among project heads and administrators of the site. This could be easily done by the following snippet of the code:
Function passchk } if(prompt("Please enter your password","") == ""; }
2.5 JAVASCRIPT The openWin() is a function in javascript that does all the exporting work. Here is what the function operates on a fool proof mechanism that indulges in arranging the designated sharepoint table into a reformatted customized report. It also does a randomized check on the table and arranges it into various other smaller tables. The other prerequisite is also the arranging of similar module application together in
separate tables. Like DDI,
RECON etc. different documents are created and respective project leads have their reports ready to be submitted to the client group.
25
The password check is an integral part of the entire program. It protects the application from harmful other programs that might trigger it. Only the project leads and the managers have the password, that allow them to run the application and generate their reports as they seem fit.
The mechanism is that this code i.e. Creates a button that reads ="Export FINAL REPORT” on it. When clicked goes to the function passchk() which is used to check password and mentioned above. Then when password is right it exports into the final word document else returns the user directly back to the original page. Here the TABLEID used is the id of the generated html table of SharePoint.
2.6 Conclusion SharePoint provides us with an easy and simplistic tool that can be used to gather information and neatly stack it into a compact and readable format from which it can be reused or reformatted to fit user needs. This automation save the time of not only the document maker who otherwise had to spend at least 3-4hrs on making the document but also the time of all the employees who were sending mail by formatting their status reports and then sending them. Now all they have to do is click or enter very few words and then click on save button. The rest is taken care by the SharePoint server in warehousing the entire data entered. Now the JavaScript takes over which automates the
26
whole process of creating the document. Frames of desirable border, color, size can be easily created. An enhanced user interface is provided by this application which also saves time and effort. Automation of the weekly status report is really important phase as it is used every weekend by all the employees and the project leads. Even the color in the second and third columns can be added in javascript. The user choice could be easily maximized with additions of new projects by directly importing the project database. And all the other functionalities are easily upgradeable. This application is open to all types of changes and upgrades and will be a useful tool until the report submitting is maintained in the given format. Format changes can also be easily done by a JavaScript coder. Hence this tool will be a useful instrument for a long time as it is easily upgradeable, less tedious in handling information, more and more data
3. ANALYSIS OF XML LOADER 3.1 Introduction “Markit” is the XML loader tool that helps in extracting information from xml files and neatly ordering data into a Database. This is an automated tool that helps keep track of the data in the database and regularly updates it with the latest information based on the xml files.
3.2 Workflow Access Markit’s servers are accessed using http over conventional internet ports. This method has been instigated to assist with the firewall and proxy security arrangements within financial organizations. The security and permissioning associated with accounts is embedded in the URL string. The XML supplied by Markit is
27
all-relational, such that there are keys that allow for navigation between the information stored on different pages.
Markit Server
Markit XML Pages with Latest Prices
Mapped for Display
Client Server
3.3 Workflow – Index Data It is anticipated that the customer will poll the Markit Equities servers every fifteen to thirty minutes, for the index data. This server is updated in real-time to reflect the indices that are released throughout the day. It is advocated that this process is not batched, as the information will be potentially out of date during the trading day. The initial URL queried should be as follows: http://xml.dadd.co.uk/update24/daddindex.aspx?customer=1111&password=2222&&version=5&full=0
This URL also contains the permissioning data. This will allow the customer to obtain the Daddid, delta and names associated with each cluster. The delta is a date and time stamp that indicates when the cluster has been updated. Therefore if the delta has incremented from the last value stored at the last request, the composition of the cluster should be requested. The composition of an index cluster can be requested via the following URL: http://xml.dadd.co.uk/update24/daddindex.aspx?customer=1111&password=2222&&version=5&full=1&indexid=1
The codes associated with each index can be obtained from the following URL: http://xml.dadd.co.uk/update24/daddindexcode.aspx? customer=1111&password=2222&version=5&full=1&indexid=1
3.4 DATABASE SCHEMA Overview The database design naturally follows the data structure and format of the XML schema. The database can be thought of as in two parts, indices and dividends representing the two main strands to the Markit Equities business i.e. index management and dividend forecasting. A schema diagram is as following
28
P r o v i d e rs P K P r o v i d e r ID
C lu s t e rs
P o s it io_Hn sis t o ry
P K C o m p a n y ID
C l u s te r N a m e F K1 P r o v i d e r I D L ic e n s e d D iv id e n d s D e lta P o in t s D e lt a C l u s te r D a te L a s tU p d a te d
E x c h a ng e R a te s
S e c to r P r o v i d e r s
C o m p a n ie s
P K C l u s te r I D
P ro v id e r N am e A d d r e1s s A d d r e2s s A d d r e3s s T e le p h o n e F ax W eb C o n ta c t L a s tU p d a te d
P K H i s to r y I D P K P o s i ti o n I D P K H i s to r y D a t e
C om panyN am e C o n ta c t P h o ne Fax A d d re s s E m a il W e b s it e C ity C om panyN ote L a s tU p d a te d
C a lc u la t i o n s P K C a l c u la ti o n I D C a l c u la ti o n N a m e C a l c u la ti o n E x p r e s s io n L a s tU p d a te d
In s tr u m e n t ID T i m e S e r ie s P o s it io n D a t e D iv is o r V a lu e A d ju s t m e n t F a c t o r B a s in g C o n s t a n t A c c ru e d D iv id e n d E s t im a t e d C a s h T o talC as h N e t A s s e tV a lu e S h a re s O u t s t a n d in g U n d e rl y i n g I D U n a d ju s t e d M C A P C o n s t i tu e n t C o u n t P r e v i o u s D iv is o r P r e v i o u s V a lu e P r e v i o u s A d ju s t m e n t F a c t o r P re v io u s B a s in g C o n s t a n t
In s tru m e n t s
P K E x c h a n g e R a te ID
P K I n s tr u m e n tI D
F K1 C l u s t e r I D F ro m C u rren c y T o C u rre n c y T ra d i n g D a te V a lu e L a s tU p d a te d
F K2 C lu s t e r I D I n s tr u m e n tN a m e F ilt e rN a m e F ilt e r B aseF X D a ys InY e a r C u rre n cy F K1 C a lc u l a ti o n ID R e fI n s t ru m e n t I D L a stU p d ated
S to c k s P K S to ck ID S t o c k D iv id e n d I D S toc k N a m e T a x C o u n try T a x C o de S t o c k N o te D e lt a D i v id e n d C o v e r a g e F K1 C o m p a n y I D L a s tU p d a te d
P K S e cto rP ro v i d erID P K C la s s if ic a t io n L e v e l ID
S e cto rP ro v i d erN a m e C la s s if ic a t io n N a m e L a s tU p d a te d
S e c t o rs P K S e c t o rI D
F K1 S e c t o rP r o v i d e r I D F K1 C l a s s if ic a tio n L e v e l I D S e c t o rC o d e S e c t o rN a m e L a s tU p d a te d
I n s tru m e n t C o d e s D iv i d e n d s
P K I n s tr u m e n t C o d e I D
I B E S M e a s u re s
P K D ivid en d ID
F K1 I n s tr u m e n t I D I n s tr u m e n t C o d e V e n d o r I n s tr u m e n t C o d e V a l u e D e l ta L a s tU p d a te d E x ch a n ge s P K E xc h a n g eID
D iv id e _H n d sis t o r y
E xc h a n g eC o d e E xc h a n g eN am e A d d r e1s s A d d r e2s s A d d r e3s s T elep h o n e F ax W eb C o n ta c t C o u n try L a s tU p d a te d
PK M e a su re ID P K,F K1 S to c k I D
F K1 S t o c k I D F in a n c i a l Y e a r A nn ou n ce d T y pe D iv id e n d N o t e G ro s s A m o u n t G r o s s A m o u n t C u r re n c y R a w G ro s s A m o u n t R a w G r o s s A m o u n t C u r re n c y A n n ou n c e m e n tD a te E x D iv id e n d D a t e C l o s e O f B o o k s D a te P ay D ate A n n o u n c e m e n t P u b lis h e d E x D iv id e n d P u b lis h e d C l o s e O f B o o k s P u b lis h e d P a y D a t e P u b li s h e d G r o s s A m o u n t P u b li s h e d R a w G r o s s A m o u n t P u b li s h e d PK L a s t U p d a te d
P K H i s to ry ID P K D ivid en d ID P K H i s t o r y D a te P o in_H t s is t o r y
H i s to r y D a t e I n s tr u m e n t ID D i v id e n d I D P o s i ti o n I D S to c kID E x c h a ng e ID P r o p o r t io n In c lu d e d I s O ff ic ia l P o in t s V a lu e P o i n ts D e l t a
S t oc k ID F in a n c i a l Y e a r A n n ou n c ed Type D iv id e n d N o t e G ros s A m o u nt G r o s s A m o u n t C u rre n c y R a w G ro s s A m o u n t R a w G r o s s A m o u n t C u r re n c y A n n ou n c em e n tD a te E x D iv id e n d D a t e C l o s e O f B o o k s D a te P a y D ate A n n o u n c e m e n t P u b lis h e d E x D iv id e n d P u b lis h e d C l o s e O f B o o k s P u b lis h e d P a y D a t e P u b li s h e d G r o s s A m o u n t P u b li s h e d R a w G r o s s A m o u n t P u b li s h e d
P o in t s P K,F K3 P K,F K1 P K,F K4 P K,F K5 P K,F K2
I n s t r u m e n tI D D ivid en d ID P o s itio n ID S to c k ID E x c h a n g e ID P ro p o rt io n I n c l u d e d I s O f fic ia l P o in t s V a lu e P o i n t s D e l ta L a s t U p d a te d
3.5 Overall Markit
T y pe Is A c tu a l M ean M e d ia n S td D e v C u rr e n c y S c a l in g Y ear M o n th P e rio d L a s tU p d a te d
Table: Parameters
B2
Table: Clusters
B3
Table: Instruments
B4
Table: Positions
B5
Table: Constituents
B6
Table: InstrumentCodes
B7
Table: ExchangeRates
B8
Table: Dividends
29
C a p it a l C h a n g e s
P o s i t io n s
S to c k S e c to rs
P K C a p i ta l C h a n g e I D
P K P o s it io n I D
P K S to c k S e c to rI D
F K2 S t o c k I D S to c k C o d e V e n d o r F K1 E x c h a n g e ID S to c k C o d e V a l u e S t a r tD a te D e lta L a s t U p d a te d
F K1 S t o c k I D S co p e E f fe c ti v e F a c to r L a s tU p d a te d
F K1 I n s t r u m e n tI D T im e S e ri e s P o s it i o n D a t e D iv is o r V a lu e A d ju s tm e n tF a c t o r B a s in g C o n s ta n t A c c ru e d D i v i d e n d E s ti m a te d C a s h T o t a lC a s h N e tA s s e t V a lu e S h a re s O u t s t a n d in g U n d e r ly in g ID U n a d ju s t e d M C A P C o n s t it u e n tC o u n t PK P r e v io u s D iv is o r PK P r e v io u s V a l u e PK P r e v io u s A d j u s tm e n t F a c t o r PK P r e v io u s B a s in g C o n s ta n t PK L a s tU p d a te d PK
F K1 S t o c k I D S e c to r P r o v id e r I D S e c to r C o d e D e lta L a s tU p d a te d
C o n s titu e n t s F o re c a s t D e ta ils F o r e c a s tI D
F o r e c a s tK e y F K1,F K2 M e a s u r e I D F K1,F K2 S to c k I D E s t im a t o r A n a ly s t N a m e A n a ly s t C o d e C u r re n tE s t i m a te D a t e C u r re n tE s t i m a te V a l u e C u r re n tE s t i m a te C u rr e n c y C u r re n tE s t i m a te B a s e C u rr e n c y C u r re n tE s t i m a te S c a l in g C u r re n tE s t i m a te E x c l u d e d C u r re n tE s t i m a te D ilu t e d P re v io u s E s t im a t e D a t e P re v io u s E s t im a t e V a l u e P re v io u s E s t im a t e C u r re n c y P re v io u s E s t im a t e B a s e C u rre n c y P re v io u s E s t im a t e S c a li n g P re v io u s E s t im a t e E x c lu d e d P re v io u s E s t im a t e D ilu t e d F o o tN o t e D a t e F o o tN o t e E x p ir y F o o tN o t e L a s tU p d a te d
It contains a total of 25 tables at Database level, they are :B1
S to ck C o d es P K S to c k C o d e I D
P K,F K1 P K,F K3 P K,F K2 PK
C l u s te rI D S to ck ID E x c h an g e ID S t a r tD a t e S u s pe n d F il te r E n d D a te S h a re s I n Is s u e S h a re s I n B a s k e t F r e e F lo a t A d ju s t m e n t F a c t o r B a s in g C o n s t a n t I n c l u s io n F a c t o r P r ic e P r ic e C u r re n c y D iv id e n d D i v i d e n d C u rre n c y N e t D iv id e n d N e t D iv id e n d C u rr e n c y P r e v io u s S h a r e s I n I s s u e P r e v io u s S h a r e s I n B a s k e t P r e v io u s F re e F l o a t P r e v io u s A d j u s t m e n t F a c t o r P r e v io u s B a s in g C o n s t a n t P r e v io u s In c lu s i o n F a c t o r P r e v io u s P ric e P r e v io u s P ric e C u rr e n c y A n n ou n c em en t L a s tU p d a te d
C o n s t it u_H e n ist st o r y H is t o r y I D H is t o r y D a t e C lu s te r I D S to c k ID E xch a n g eID S ta r t D a te S us p e nd F ilt e r E ndD a te S h a res InIs s u e S h a res InB a s k e t F re e F l o a t A d ju s t m e n t F a c t o r B as ing C o n s tan t I n c lu s i o n F a c t o r P ric e P ric e C u rr e n c y D iv id e n d D iv id e n d C u rr e n c y N e t D iv i d e n d N e t D iv i d e n d C u r re n c y P re v io u s S h a re s I n Is s u e P re v io u s S h a re s I n B a s k e t P re v io u s F re e F lo a t P re v io u s A d ju s t m e n t F a c t o r P re v io u s B a s in g C o n s t a n t P re v io u s I n c lu s io n F a c to r P re v io u s P r ic e P re v io u s P r ic e C u r re n c y A nno u nc e m e n t
B9
Table: Companies
B10
Table: Stocks
B11
Table: CapitalChanges
B12
Table: StockCodes
B13
Table: ForecastDetails
B14
Table: IBESMeasures
B15
Table: Points
B16
Table: Calculations
B17
Table: Exchanges
B18
Table: Providers
B19
Table: Sectors
B20
Table: StockSectors
B21
Table: SectorProvider
B22
Table: Positions_History
B23
Table: Points_History
B24
Table: Constituents_History
B25
Table: Dividends_History
These tables are updated in the following way:
Clusters Job URL_INDEXSUMARY=http://xml.dadd.co.uk/update24/daddindex.aspx?customer= %CUSTID%&password=%PASSWORD%&full=0&version=5 Get the DaddId [URL_PARAM_INDEXID ] from the above xml. if bdCluster does not exist, it must be a new cluster so we update database.
1. Calls the ClusterUpdateJob The DaddId will be inserted/updated in the table Cluster [ ClusterID ]
30
URL_INDEXDETAILS=http://xml.dadd.co.uk/update24/daddindex.aspx?customer= %CUSTID%&password=%PASSWORD%&full=1&version=5&IndexID=%DADDID% For Instrument and Constituent data the DADDID is used to get the Instrument ID and Security ID URL_INDEXSUMMARY_DETAIL=http://xml.dadd.co.uk/update24/daddindex.aspx? customer=%CUSTID%&password=%PASSWORD%&full=0&version=5&IndexID= %DADDID% Check if the Provider for the Cluster exists in the Provider table and insert the new Provider ID Get the IndexProviderID INDEXDETAILS xml.
[URL_PARAM_PROVIDERID
]
from
the
above
A. Calls the ProvidersUpdateJob with IndexProviderID URL_PROVIDERS=http://xml.dadd.co.uk/update24/daddprovider.aspx?customer= %CUSTID%&password=%PASSWORD%&full=0&version=5 Insert the new Provider ID [ all provider information ] in Provider table Check if the Calculations for each of the Instruments of the Cluster exist in the database and insert if needed. Get the CalculationId for all the Instrument ID in Instrument attributes Check if the CalculationId for each Instruments exists in the Calculations table and if the CalculationId doesn’t exists start CalculationsUpdateJob B. Calls CalculationsUpdateJob URL_CALCULATIONS=http://xml.dadd.co.uk/update24/daddcalc.aspx?customer= %CUSTID%&password=%PASSWORD%&full=0&version=5 Insert the new DaddId and Calculations information into Calculations table C. Calls StockUpdateJob with securityID from constituents table. URL_STOCKDETAILS=http://xml.dadd.co.uk/update24/daddstock.aspx?customer= %CUSTID%&password=%PASSWORD%&full=1&version=5&SecurityID= %DADDID% Check if the Company ID exists , else Update Company Table with the Security ID [URL_PARAM_INDEXID].
31
And then Updates the stocks table from the above xml. For stock code details this url is also assigned URL_STOCK_STOCKCODES=http://xml.dadd.co.uk/update24/daddstockcode.aspx? customer=%CUSTID%&password=%PASSWORD%&full=0&version=5&SecurityID= %DADDID% 1. Calls StockCodesJob Checks for the Exchange ID in Exchanges Table, if it does not exist i) Calls ExchangesJob This job updates the Exchanges Table. http://xml.dadd.co.uk/update24/daddexchange.aspx? customer=28291&password=550264&full=0&version=5&ExchangeID=%DADDID% Then it delete/inserts in the StockCode Table from URL_STOCK_STOCKCODES=http://xml.dadd.co.uk/update24/daddstockcode.aspx? customer=%CUSTID%&password=%PASSWORD%&full=0&version=5&SecurityID= %DADDID%
2. Calls CapitalChangesJob Using the same Security ID [URL_PARAM_INDEXID] fro the xml URL_STOCKDETAILS=http://xml.dadd.co.uk/update24/daddstock.aspx?customer= %CUSTID%&password=%PASSWORD%&full=1&version=5&SecurityID= %DADDID% CapitalChanges table is delete/inserted.
3. Calls IBESMeasureJob Using the same Security ID [URL_PARAM_INDEXID] fro the xml URL_STOCKDETAILS=http://xml.dadd.co.uk/update24/daddstock.aspx?customer= %CUSTID%&password=%PASSWORD%&full=1&version=5&SecurityID= %DADDID%
32
First it Updates the IBESMeasures table and then i)
Calls ForecastDetailsJob
This job updates the ForecastDetails table with the corresponding Measure ID
4. Calls DividendUpdateJob Using the same Security ID [URL_PARAM_INDEXID] fro the xml URL_STOCKDETAILS=http://xml.dadd.co.uk/update24/daddstock.aspx?customer= %CUSTID%&password=%PASSWORD%&full=1&version=5&SecurityID= %DADDID% First is updates the updates the Dividend_History table and then Updates the Dividends table.
D. Calls ExchangesJob With [URL_PARAM_EXCHANGEID] = %EXCHANGEID% from the url URL_EXCHANGE=http://xml.dadd.co.uk/update24/daddexchange.aspx?customer= %CUSTID%&password=%PASSWORD%&full=0&version=5&ExchangeId= %EXCHANGEID% It Updates the Exchanges table. The above A , B, C, D jobs come under isvalidforsave function which updates the above table on specific conditions. Now, the workflow continues with E. Calls InstrumentUpdateJob Updates Instrument table from the following xml URL_INDEXSUMMARY_DETAIL=http://xml.dadd.co.uk/update24/daddindex.aspx? customer=28291&password=550264&full=1&version=5&IndexID=%DADDID%
33
Using URL_PARAM_INDEXID=%DADDID% 1.
Calls PositionsUpdatejob
Which updates positions table from the same xml. It also updates the positions_history Table by copying the previous contents of positions table into this positions_history table. Calls ConstituentUpdateJob Updates constituent history table by copying previous contents of constituent table into the former table. It then updates constituents table from the following xml 2.
http://xml.dadd.co.uk/update24/daddindex.aspx? customer=28291&password=550264&full=1&version=5&IndexID=%DaDDID% Using URL_PARAM_INDEXID=%DADDID% to iterate through clusters. 3. Calls InstrumentCodeJob Updates all the instrument codes for each instrument [InstrumentID] from URL_INDEXCODEDETAILS=http://xml.dadd.co.uk/update24/daddindexcode.aspx? customer=%CUSTID%&password=%PASSWORD%&full=1&version=5&IndexID= %DADDID% F. Calls ExchangeRatesJob Updates Exchange rates table from the following xml http://xml.dadd.co.uk/update24/daddindex.aspx? customer=28291&password=550264&full=1&version=5&IndexID=%DaDDID% Using the DaDDID as URL_PARAM_INDEXID.
2. Calls PointsJob With URL_PARAM_POSITIONID=%POSITIONID% and from the url URL_POINTS=http://xml.dadd.co.uk/update24/daddpoints.aspx?customer=%CUSTID %&password=%PASSWORD%&full=1&version=5&PositionID=%POSITIONID%
34
The Points table contains 5 different ID’s or keys and for the respective parent id’s i.e DividendID ExchangeID InstrumentID PositionID StockID In short the diagrammatical procedural call is
3.6 CONCLUSION XML loaders such as Markit prove to be a very useful tool in the world today, as they simplify the job of an accountant or a stock exchange updater. It uses Java coding and by scheduling all the jobs on a cron scheduler it can easily be used to obtain the necessities of a custom program. It is used by a wide range of subscribers to update, extract data from or into the database.
35
This tool will provide a wide perspective of its users as data can be enveloped into a single model and can be broadcasted or relayed at any level.With such huge advantages and provisions Markit proves to be an extensively used tool in the industry today.
4. LEARNINGS The learning’s gained by these projects are: 1) Shell Scripting 2) Validation and profile handling 3) Autosys Job scheduling 4) Wrapper scripting
36
5)
Extensive usage of html and JavaScript
6)
Designer capabilities in a Microsoft SharePoint server
7) Advantages of automation 8) In depth debugging of all wrapper and shell scripts 9) Java coding and parameterization 10) Concepts of DBMS & their usage
5. REFERENCES & BIBLIOGRAPHY 1) http://www.freeos.com/guides/lsst/index.html 2) http://www.w3schools.com/html/html_intro.asp 3) http://learnlinux.tsf.org.za/courses/build/shell-scripting/index.html 4) http://www.unix.com/ 5) UNIX for advanced programmers by R.L.Stevens