Hierarchical Data Back Up

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Hierarchical Data Back Up as PDF for free.

More details

  • Words: 4,247
  • Pages: 18
MAIN PROJECT REPORT

HIERARCHICAL DATA BACKUP SUBMITTED IN PARTIAL FULFILMENT OF THE DEGREE OF

BACHELOR OF TECHNOLOGY

by AJITH.K SRIJITH.P.K GROUP NUMBER

Y2058 Y2107 22

Under the guidance of Sri. Vinod. Pathari

Department of Computer Engineering

National Institute of Technology, Calicut

Acknowledgement We thank sri Vinod Pathari ,lecturer in Department of Computer Science and Engineering,for his guidance and advice for the completion of this project.We also thank my friends for their help to make this software more useful.

AJITH.K SRIJITH.P.K

2

Abstract The project entitled ‘Hierarchical data backup’ aims to design and develop a system for the reliable backup of the critical data. The current backup strategies for data backup are based on direct attached storage methods, in which we backup our data in a hard disk attached directly to dedicated centralized server. Our project aims to develop a more reliable, secure and dependable backup scheme based on a distributed and hierarchical storage of data. The project develops a system in which the data to be backed up is stored in a set of nodes in a LAN rather than in a single node or server, providing a distributed backup mechanism, extending this backup to provide a more reliable storage through an external storage device and to develop a protocol for the hierarchical backup of the data over the intranet through the introduction of the external storage device.

3

Contents 1. Problem specification

3

2. Literature survey

3

3. Motivation

3

4. Design

4 5 6 6

4.1 backup procedure 4.2 deletion procedure 4.3 retrieval procedure 5. Implementation

7

6. Testing and verification 6.1 results

11 12

7. Conclusion

13

8. References

14

4

1. Problem specification To develop a distributed and hierarchical data backup scheme with which we can reliably and securely store critical data. First we develop a LAN based protocol for the storage of critical data securely over a set of nodes on a LAN. Next we introduce an external storage device and a protocol for the backup of more critical data in a LAN in secure and reliable manner using this external storage device. Finally we develop an intranet sharing protocol in which we develop a protocol to backup the most critical data in another LAN.

2. Literature Survey The dependency of computer for data backup is increasing rapidly. The data stored in a computer may be lost because of user error, data corruption, hardware failure or because of disaster [2]. In order to save data from data losses many methods have been adopted by industries and educational institutions. The most common methods of backup are disk to tape backups like Direct Attached Backups, Centralized LAN backups, LAN free backups etc. and disk to disk backups [3]. In Direct attached backup strategy we backup servers by directly attaching at tape backup unit to each server and to backup stored data. In a centralized backup server strategy a designated server known as backup server is used. The backup server manages backup of all data associated in all servers to a tape attached directly to a backup server. In LAN free backups, in most of the cases we make use of Storage area networks (SAN) [4]. Storage area networks are dedicated high speed networks containing a large number of storage elements and a number of servers manage the backup of data to these storage elements. NAS servers are dedicated file servers that function to store and retrieve files for other general purpose servers and computers. Whereas general-purpose production servers are loaded with applications that consume storage, NAS servers are stripped of unnecessary hardware (there is no monitor, keyboard or mouse) and software applications, and use only those components of the operating system required for file serving, thus maximizing the disk space available for storage [5].

3. Motivation Relevance of the data is felt most when it is lost irrecoverably. The heavy dependence on computers as data storage mechanisms could be justified only if proper backup facilities are available to accommodate critical failures. The usual backup strategies heavily

5

depend on backing data to a single disk or tape attached directly to server. This approach has many significant drawbacks. It scales poorly, since meeting the increased demand for storage capacity must be dealt with by adding additional servers. The file processing functions (such as data storage and retrieval) directly compete with applications for system resources. The proposed work is an attempt to derive a distributed and hierarchical backup mechanism which is both cost effective and reliable [1]. The distributed backup protocol securely backup the critical data over a set of nodes that are willing to share a space for storing backup data thereby increasing the reliability. The hierarchical data backup protocol stores the critical data in an external storage device NAS and then over the intranet adding further reliability.

4. Design We first deal with the design of the first phase of the project i.e. to develop a distributed file backup system which allows us to back up the critical files over the LAN. The system is designed in such a way that it is scalable and adaptable with succeeding phases. The architectural model for the system is a combination of client server model. We maintain a server to monitor whole transactions. It maintains necessary atomicity of transactions and provides data consistency and isolation. We assume a lightly loaded situation in which data backup does not results in a heavy traffic in the network. Even though it looks like a client server paradigm, the data to be backed up is stored on some arbitrary nodes and not on the server. Hence the failure of the server does not affect the backed up data. Each node on the network knows in which node it has backed up its data.

client

client

serve r

6

We maintain a server for the purpose of management of data backup over the nodes in a LAN. Any authorized user in a LAN who is ready to share a fixed size of memory for backing up data for other users can backup that much amount of data in some remote nodes in the LAN in a distributed manner i.e. his different data will be backed up in different nodes. If a user wants to save a file in some remote node, he will send a request to the server. The server decides to which node the file has to be backed up and backs up the file to that node. The server maintains all the information regarding the data backup i.e. to which node and filename in which a backup request from a user has been saved. The deletion of files backed up in remote nodes and retrieval of backed up files will also be taken care of by the server. The design for the distributed data backup is based on an interactive model .Computation occurs within the processes, the processes interact by passing messages resulting in communication (i.e. information flow) and coordination (synchronization and ordering of activities) between processes. The interaction model reflects the fact that communication takes place with delays that are of considerable duration and the accuracy with which independent processes can be coordinated is limited by these delays. The model we have designed is a synchronous system in which keep a timeout mechanism and buffering of data to deal with the process omission failures because of system crashes and the communication omission failures. The figure below gives a sequential model for our system design. The sequential diagram deals with three scenarios one for backup, another for deleting backed up files and the last one for retrieving backed up file.

7

4.1 Backup procedure

Server

ClientA

ClientB

Backup req

Search quota .if OK search for lightly loaded node Backup ack File send File saved Update tables File saved

4.2 Deletion procedure

ClientA

Server

ClientB

deletion req

Search dest_node and file delete file File deleted

File deleted

Update tables

8

4.3 Retrieval procedure

ClientA

Server

ClientB

Retrievereq

Search dest_node and file retrievefile fileretrieve

fileretrieve

Update tables

The following design shows the architectural model for the development of LAN based protocol using an external storage device. In this case we plan to store more critical data to an external storage device, a NAS, in addition to store it on a remote node. This increases reliability and availability of the backed up data. Design shown below uses a server to control backing up of data in a NAS.

9

Clie

Clie

NAS NA

server

Se

5. Implementation The project was envisioned to be implemented in two phases. The first phase of the project implemented the data backup mechanism in a LAN. This phase was implemented in a Linux environment. In this phase we implemented the backing up of data in different nodes within a LAN. There is central server which decides upon data backup over different clients belonging to the group in a lightly loaded node manner. For a node in the LAN, to use the facilities provided by the software, has to join the group first by sending a join message to the server. Only those nodes with sufficient storage space are allowed to join the group. Once it has joined the group it can make use of the facilities offered by the software. Through the software a client can save files in a remote node, retrieve saved files from remote node, delete backed up files, view information about backed up files, join the group and unsubscribe from the group. The client is provided with a graphical user interface to perform these functions. The graphical user interface is developed in Qt. The implementation of client side requires two processes to run continuously once the client has joined the group. The processes were implemented in C/UNIX. They make use of UNIX socket feature to perform communication over a network. One of the processes always listens through a predefined port for any message from the server. Whenever it receives a broadcast message “ARE_YOU_UP” from the server it sends back a message “AYU_Answer” to the server. This is used up by the server to determine which all nodes in the group are currently up in the network. The process is implemented 1

as a daemon. The other process always listens through a predefined port using TCP for any user initiated message from the server. The process may receive three kinds of messages from the server. First message is to save a remote file in this node. On receiving this message the process generates a unique filename in the folder ‘remotefile’ and saves the remote file in that name and sends back that filename to the server. Second message is to retrieve the file back to the server. On receiving the message the process opens the file and sends back the content to the server through the open socket. Third message is to delete the file saved in this node upon which it tries to delete the saved file and sends a status message back to the server. On the client side a checking is done on a regular basis if the client is exceeding the required quota needed to be in the group. If it finds that the client side is exceeding the required quota to be in the group it sends an unjoin message to the server which removes the client from the group and the client is notified about this through a popup menu. If it needs to save files again, it has to delete some unwanted files and has to join the group. The client side uses ‘c_searchfile’ to obtain the information about the remote nodes in which the backed up files are stored. The client side can perform following operations. 1. JOIN When user specifies to join a group client sends a join message to the server. Once the client has joined the group it can back up the files in remote nodes. Message format Client to server:

JOIN:END

2. SAVE A FILE In this case the user specifies the filename to be saved. The client performs some initial error routine such as checking whether file could be opened, whether same file was already saved, whether maximum allowable storage space exceeded etc. A message is sent to the server to save the file. After this file is opened and sent to the server for backing up. Server sends back the target node and target filename in which the file is backed up. Message format Client to server:

SAVE:FILENAME:FILESIZE:END

3. DELETE A FILE When the user specifies the filename to be deleted, after performing some error checking like checking whether the file was saved or not, client sends a deletion message to the server specifying the filename to be deleted. Message format

1

Client to server:

DELETE:FILENAME:END

4. RETRIEVE A FILE When the user specifies the filename to be retrieved, after performing some initial error checking, client sends a retrieval message to the server. The retrieved file will be saved in the folder ‘retrieved’. Message format Client to server:

RETRIEVE:FILENAME:END

5. VIEW SAVED FILES Client shows the files saved in remote nodes to the user. It obtains this information from ‘c_searchfile’. The fields of the file are source filename, target node, target filename, file size and server time at which file is saved. 6. UNJOIN This option is for the client to unsubscribe from the group. When a client no longer wants to be in the group, it sends an unjoin message to the server. When a client unsubscribe from the group, all files backed up by this node to remote nodes will be deleted and the files other nodes have backed up in this node are restored. Message format Client to server:

UNJOIN:END

The communication between client and server uses reliable TCP/IP protocol The implementation of server side requires mainly two processes to run uninterrupted. One of them broadcasts a message over the LAN in regular time intervals. It will broadcast “ARE_YOU_UP” message through the broadcast socket configured to use UDP protocol and waits for any response from the clients. On receiving the response message, the process infers the IP address of the client from the message header. In this way the process finds out the currently up client nodes in the group and consequently performs the updation of currently up client node list. The other process always listens through a predefined port configured to use TCP/IP for any connection from the client. The server acts as a concurrent server servicing the clients and listening for any client connection concurrently. The server may receive five kinds of messages from the client. 1. On receiving a save a file message from the client, server first finds out a suitable candidate in the currently up group of nodes. That is, one among the currently up group of nodes in which the least amount of files have been backed up by the server. Now it

1

sets up a connection with target node and sends the file to be saved to the selected node for backing up. Then it listens for any message from target node, updates necessary files on the server side and sends back a status message to the client. It can be an error message or a success message in which case it sends the target node name, filename in which the specified file is backed up and the server time at which it is done. The server stores all information regarding backing up, that is source node, source filename, target node, target filename, size, and time of backing up in the file ‘searchfile’. It also updates ‘nodestorage’ file which stores the amount of remote files stored in each node. Candidate target nodes are found out by the server by searching the ‘nodestorage’ file. The ‘nodestorage’ file always maintains the entries sorted in ascending order according to their stored space size. Hence the most lightly nodded node will always be present at the top. Message format server to target node: SAVE:SOURCE NODE:SOURCEFILE:FILESIZE:END target node to server: SAVED:TARGET FILENAME:END server

to SAVED:TARGETNODE:TARGETFILENAME:TIME:END

client:

2. On receiving a retrieve a file message from the client, the server finds out the target node and target filename in which the file is saved by searching the ‘searchfile’, sets up a connection with target node and retrieves contents of the file from the target node by sending a retrieve message to it and sends it back to the client. Message format Server to target node: RETRIEVE:TARGETFILENAME:FILESIZE:END 3. On receiving a delete a file message from the client, the server finds out the target node and target filename and sends a delete message to the target node and sends a status message back to the client. It will be an error message if the target node in which the file has been backed up is not currently up. The server will remove the corresponding entry from the ‘searchfile’ if deletion is successful and update the file ‘nodestorage’. Message format client to server:

DELETE:FILENAME:END

server to target node: DELETE:TARGETFILENAME:END 4. On receiving a join message from the client the server joins the client in the group and sends a success message back to the client. The server creates an entry in the ‘nodestorage’ file with storage amount initialized as zero.

1

Message format Server to client:

SUCCESS

5. On receiving an unjoin message from a client, the server tries to unsubscribe the client from the network. Server tries to delete all the files stored by the client on all other remote nodes in the group. Then it tries to restore the files stored by other members in this client to other members of the group and sends back the restore information to the source node. Server also takes care of the situation in which not all members in the group are up. This enables the client to unsubscribe from the group whenever he wants. Message format server to client:

SUCCESS

server to source node: RESTORE: SOURCEFILE: TARGETNODE: TARGETFILE: TIME: END The server side is also provided with an easy to use graphical user interface through which an administrator of the server can view all the information stored up in the server related to backing up. The graphical user interface is developed in Qt. In the second phase of the project, an external storage medium, an Iomega NAS, was purchased and configured for using in the software lab. We configured NAS as a network hard drive. It is configured with the static IP of 192.168.3.90 and a machine name of csed-lab. The NAS can be administered at the site http://192.168.3.90/ The NAS was configured so that user logins can be created and quotas can be provided. It is possible for individual users to enter into their logins through Samba client (smb://csed-lab/). NAS will be accessible only to those users who are in the same workgroup of the NAS. The NAS can also be remotely mount to a machine by using command 'mount -t smbfs -o username= (user), password= (pass) //csed-lab/ /path/to/mountpoint'. All the sharing, user creation, maintenance, quota setting etc. can be configured at http://192.168.3.90/ by the administrator. The second phase of the project is done by mounting the NAS to a local folder of the server. Now the server can freely write to or read from the NAS. A program is developed through which one could write, retrieve and delete files in the NAS. This gives the user an additional option to save a copy of his critical data to the NAS also. The only access to the NAS is through the server.

1

6. Testing and verification The testing and verification of the project was done on an incremental basis. The programs for performing broadcast in the LAN are developed and tested first. The programs are tested in the Linux machines of the software lab. Initially we could not obtain the expected results due to the firewall configuration of the Linux machines in the software lab. Disabling the firewall settings in some machines we were able to obtain the expected results. Next server and client programs are developed and deployed in the Linux machines of the software lab. Testing and verification of each program is done separately. All the programs are then integrated and tested together. The following different cases are analyzed for testing. 1. JOIN The following cases are tested and verified when a user tries to join a group a. User has already joined the group b. User has not got enough storage requirements to join the group. c. User has got enough storage requirements to join the group. 2. SAVE FILE When user specifies a filename to save the following cases are initially tested on the client side a. If the file could be opened b. If the file was already saved c. If the size of the file and total size of files he has already saved exceeds maximum allowable storage limit The following are different cases tested and verified that may arise during backing up of files. a. No other nodes in the group are currently up b. Free allocated space on the currently up nodes exceeds the size of the file c. File could not be saved on the remote node. d. File is saved successfully on the remote node.

3. DELETE FILE When user specifies a filename to be deleted, an initial checking is done to verify such a file is already saved. The following cases are tested and verified a. Node in which the file has been saved is not up.

1

b. The file could not be deleted from the remote node. c. The file could be successfully deleted from the remote node. 4. RETRIEVE FILE When user specifies the filename to be retrieved, an initial checking is done if that file was already saved. The following are then tested and verified. a. Node in which the file has been saved is not up. b. The file could not be retrieved from the remote node. c. The file could be successfully retrieved from the remote node. 5. UNJOIN When the user specifies to unsubscribe from the group an initial checking is done if he has already joined the group. The following cases are tested and verified. a. Remote node in which he has backed up the file is not up. b. All remote nodes in which he has saved the files are up. c. No other nodes are currently up so that immediate restoring of files initially stored by other nodes on this node could not be performed. d. Free allowable space of currently up nodes for storing remote files not enough for performing restoration. e. Source node which has saved a file in this node is not up. f. All nodes are currently up. The graphical user interface developed in Qt is also tested and verified completely for running in the software lab. The project as a whole is run using graphical user interface and tested and verified.

6.1 Results The project has been successfully tested and verified. The software was successfully deployed in the software lab. The test results inferred that the software did not showed any anomalous behavior. The software behaved as expected. The following results were shown by the software when tested against various use cases. 1. JOIN a. If the user has already joined the group, user is notified about it b. If user has not enough space, user is notified about it. c. User is notified that he has successfully joined 2. SAVE FILE

1

a. Proper error messages are shown for all initial error routine checks. b. Proper error messages are shown for each of the cases, no other nodes in the group are currently up, and size for storage exceeded among currently up nodes, and file could not be saved in the remote node. c. If file was successfully saved, user is notified about it along with remote node and filename with which it is stored. 3. DELETE FILE a. Proper error messages are shown for initial error checks. b. If the remote node is not up, user is notified about it. c. If the file was successfully deleted, user is notified about it. 4. RETRIEVE FILE a. Proper error messages are shown for initial error checks. b. If the remote node is not up, user is notified about it. c. If the file could be retrieved, user is notified about it along with the name with which retrieved file is stored in the client side. 5. UNJOIN a. User is notified about the successful unsubscription from the group.

7. Conclusion Software for distributed backing up of data in a LAN was developed. The software could perform backing up of files in remote nodes, deletion of files from remote node, retrieve files from remote node, join the group and unsubscribe from the group. The software is also provided with an easy to use graphical user interface. A program was developed through which one could write, delete, and retrieve files to the external storage medium NAS which provides an additional reliability for data storage.

1

8. References [1] [2] [3] [4]

[5]

Jerome .H. Saltzer, Needed: A systematic structuring paradigm for distributed data, th ACM SIGOPS 5 European Workshop,2000. Disaster Recovery :Best Practices. white paper. Cisco network solutions, August 2003, http://www.cisco.com/warp/public/63/disrec.html. Disaster Recovery Planning. white paper. Cisco network solutions, December 2003 http://whitepapers.techrepublic.com.com/whitepaper.aspx Using Network Attached Storage for Reliable Backup and Recovery. white paper. Microsoft corporation and Dell, http://www.dell.com/downloads/global/products/pvaul/en/nasReliableBackups.pdf Business case for network attached storage. white paper. June 2005, http://whitepapers.techrepublic.com.com/abstract.aspx

1

Related Documents

Hierarchical Data Back Up
November 2019 18
Sql For Hierarchical-data
December 2019 16
Back Up Tpt Tarisi
August 2019 32
Whoscoming Back Up
May 2020 10
Back Up Data.xlsx
July 2020 22