BEST PRACTICES GUIDE
Remote Data Management & Backup with Snap EDR CONTENTS Abstract ..................................................................................1 Understanding the Challenges of Remote Data.....................1 Key Considerations for Managing Remote Data ....................1 Additional Requirements for Remote Data Backup ...............2 The Case for Archiving............................................................3 The Central Policy/Consolidated Approach to Managing Remote Data ..........................................................................3 Disk-to-Disk Consolidated Backup ........................................3 Consolidated Archive..............................................................4 Managing Remote Data with Adaptec Snap EDR..................4 Snap EDR Remote Manager.............................................4 Snap EDR Remote Agents................................................5 Snap EDR Remote Data Solutions ...................................5 A Best Practices Guide to Managing Remote Data ...............5 Summary................................................................................6
Abstract The increasing risk from unprotected user files and remote data (data stored outside the data center) is causing companies to re-evaluate their current remote backup processes. Managing remote data poses unique challenges given the variability of networks, computing platforms, lack of trained IT staff at remote locations and other issues. Further, traditional methods of managing remote data tend to be high cost, unreliable, manually intensive and often require redundant equipment and effort. Advanced remote data management and movement technology, such as that incorporated into Adaptec Snap EDR now makes it possible to cost-effectively solve the challenges of managing data at remote offices. This paper discusses the issues, requirements, and approaches to effective remote data management, with specific emphasis on remote data protection and backup. Also included is a Best Practices Guide to help assess your remote data management requirements.
Understanding the Challenges of Remote Data Protecting remote data and managing the exchange of data between corporate locations and remote offices is neither trivial, nor cheap.
IT administrators in companies with remote offices often spend significant amounts of their time managing backup, data management, and data transfer requirements for those offices. Even so, critical processes such as backups may not be adequately covered. Often, central IT staff must rely on non-technical staff in remote locations to change backup tapes, initiate processes and take other actions they are neither trained nor compensated to perform. As a result, companies report as much as 60% of their remote backup procedures may fail on a nightly basis. This represents risk that few companies can afford. When problems do occur, recovery can be tedious and takes days to recover, assuming the data was adequately backed up. Central IT personnel may need to have tapes shipped from the remote site, catalog the volume and search for the files needed, and then reship the files for restore. Online alternatives such as consolidated backup, disk-based backup and centralized archive can increase reliability and overall data protection, improve speed recovery and have proven to improve reliability, overall data protection and significantly lower costs. These methods are discussed further in this paper, however there are a number of issues that must be considered as you evaluate these new approaches and the technologies to implement them.
Key Considerations for Managing Remote Data To effectively managing remote data, you must consider and address a number of specific functional, environmental, and technology factors that do not necessarily come into play in the data center. These factors include: Central Policy-Based Control: To efficiently control data at remote sites, enterprises must have the ability to set up and implement central policies. That means setting a rule once and having that directive implemented throughout the enterprise, rather than managing activities individually at different sites with multiple separate platform-specific policies and tools. While many products claim “central control” capability, they in fact require administrators to establish a unique connection to each remote node to set policy. This
BEST PRACTICES GUIDE
Remote Data Management & Backup with Snap EDR approach eats up hours of administration time, not only during initial set up, but each time a business requirement changes. Some technologies, such as Adaptec Snap EDR, provide a “set it and forget it” approach that automates the communication of policy to the remote node, and have integrated notification if something does not proceed according to policy. Wide Area Netowrk WAN) Network Bandwidth Utilization: Any solution that addresses remote data must take into account bandwidth restrictions, as well as a range of network conditions. Remote locations frequently have varying bandwidth availability that needs to be shared among multiple applications and users at particular times. For this reason, remote data management and movement solutions should have features that enable efficient use of available bandwidth such as byte-level differential data transfer, bandwidth throttling, multi-streaming, and compression. The amount of network overhead, or information that is in addition to the data being transferred, that a product sends over the network is an important consideration. Obviously, less is better than more. Finally, since some remote connections will likely be impaired during some processes, the ability to restart at the point of failure is critical as well as the ability to re-route information flow to alternate networks. Security and Data Integrity: When moving data over networks, data security is always a major concern. Networks are always susceptible to intrusion, but particularly in remote locations where there are fewer IT controls. As a result, any remote data solution should authenticate all sending and receiving nodes prior to any data transfer, and encrypt data during transmission. Moreover, they should utilize a single firewall port and minimize firewall rules. The ability to ensure that data is received with 100% integrity is also an important consideration. Where tape is used for remote backup, one of the biggest points of recovery failure is that data is corrupted on the tape. With some disk-to-disk backup technologies, data accuracy can be 100% guaranteed. Remote Process Automation and Application Interfacing: To minimize or eliminate the need for manual effort at remote locations, the management solution must be able to automate processes and interface with remote applications to access data. For example, when backing up applications like Exchange or SQL Server, it is preferable to use native backup routines. Therefore, the remote data solution must be able to integrate with the application and invoke the native backup package as part of the backup process.
2
Similarly, for applications such as SAP, data must be accessed through the application to ensure integrity, instead of accessing it directly at the database, filesystem, or disk levels. In addition, other custom or script-based processes may also be needed or required prior, during or after data transmission. The remote management solution should automate these as part of the overall remote backup process. Heterogeneous System Support: It is common that a company with multiple remote locations will have a variety of computing platforms and applications at those locations. It is therefore important to choose a solution that can work within a heterogeneous environment. While this seems simplistic, many products today only work within homogeneous environments. Point-in-Time vs. Continuous Replication: Continuous replication products continuously monitor a filesystem and capture changes as they happen and either replicate them immediately or cache the information for bulk transfer at a later time. While these products are ideal for continuous replication between a small number of systems for business continuity purposes, they are not ideal for periodic processes such as backup and archive. Pointin-time replication products are more appropriate, and in general will be far more network efficient for periodic processes such as backup and archive.
Additional Requirements for Remote Data Backup Beyond the basic remote data considerations listed in the previous section, there are some specific requirements for remote backup that become important as Consolidated and Disk-to-Disk Backup methods are considered. Backup at remote offices requires more than just writing the data to tape. Backup solutions must address data integrity and accuracy, automatic operation, offsite storage, and of course, restoration. When storing user files for backup, it is important that the integrity of the original data be preserved; the most important characteristic of a backup is that it can be restored with full integrity. A backup must represent the true data status at the time of the backup. While tape backup software often has the capability to handle files left in an open state at the time of backup, it is important that your disk-backup mechanism have options (skip, open file transfer, or create error log) for handling open files. Backup processes for remote offices ideally should require little or no local manual intervention, but instead be a completely automated, “lights out”
BEST PRACTICES GUIDE
Remote Data Management & Backup with Snap EDR operation. At the same time, since offsite data storage is a must, a local backup to tape process at remote sites must involve some manual intervention to remove the backup media to the offsite storage. Online backup, which can transmit data to another location to be backed up either to disk or to tape, can eliminate the need for any redundant manual effort at the remote sites.
The Case for Archiving An often-overlooked, but critical component of remote data management and protection is archiving. Lets face it, few of us have the time or interest to clean out our electronic files. Emails building up in Outlook inboxes and other files building up in private and shared directories are contributing to the huge volume of data growing on remote storage. In a recent survey by Storage Magazine, users indicated the single biggest reason for backup failure was the quantity of data was too large to be backed up within the backup window. The fact is that most user files and email data are seldom re-opened after the first three days of creation/receipt. Statistics show that if a file hasn’t been accessed in 90 days, there’s a 90%+ probability that it will never be accessed again. Meanwhile, it consumes valuable storage resources. The problem is that since we can’t predict what 10% we will need, we hold on to all of it. The cost-effective solution is to move unused data to lower cost secondary storage (archive), with reasonably easy retrieval capabilities, for long-term retention. A second key factor driving the need for archiving is the federal document regulations, such as SEC Rule 17a-4, that require many companies to retain all communication and documentation for specific time periods. For a distributed enterprise with many remote offices, ensuring compliance to these regulations can be a challenge. So, cost and legal requirements are compelling companies to ensure that employee messages are archived or at the very least, moved to lower-cost, longer-term media. Similar to remote backup, data integrity is essential to any archival solution. Consolidated archival, in which older or unused data is automatically moved from remote disks to a central repository, often consisting of lower cost ATA disks, while leaving transparent access capability for the remote user, is rapidly gaining acceptance as the most viable approach to archive for remote data.
3
The Central Policy/Consolidated Approach to Managing Remote Data Rather than relying on individual backups and separate point processes and staffing required for each remote site and the staffing required for each, a more effective enterprise approach is to allow central IT staff to control remote data management and backup. This requires understanding the changing properties and characteristics of remote data. Solutions should be able to set policies pertaining to the data, automate processes to execute those policies on remote servers, and be able to move data between remote or “edge” servers and central or “core” systems. In this model, individual remote backup and archiving processes at the remote sites are replaced with a consolidated process that moves remote data to a hub site for backup or archive. This requires moving the pertinent data over the available networks in an efficient, secure, timely fashion and therefore requires technology that can deal with the many issues associated with controlling and moving data among many sites and network connections. These issues are identified in more detail in the next section. Centrally controlled, automated processes have been shown to decrease backup costs by as much as 75% due to the elimination of tapes, tape drives, offsite tape storage, and the redundant staffing efforts at each location.
Disk-to-Disk Consolidated Backup Disk-to-disk backup is gaining popularity due to factors including the rapidly falling cost of disk storage, the elimination of the physical limits, and relative unreliability of manual tape storage programs, as well as the need for more ready access to data for restore. Implemented in a best practices model, disk-to-disk backup for remote data involves moving the data to be backed up over a network to a different location. The reason for this is that disk-to-disk backup, if performed at the same site, does not provide the protection needed to recover from a site disaster event (fire, flood, etc.). For any business with multiple remote locations, consolidating disk-to-disk backup brings operational and cost efficiencies and enhanced data security. There are two primary consolidated backup architectures: • Moving differential data to a central disk • Consolidating backup images Both offer central control and automation and the elimination of individual tapes, tape drives, and offsite tape storage processes at each site.
BEST PRACTICES GUIDE
Remote Data Management & Backup with Snap EDR In the first, more common, approach to consolidated backup, data at remote sites is periodically analyzed to determine differential data (i.e. data that has changed) since the last backup process. A copy of this differential data is then moved to a central site, where it is stored on disk. Some technologies have the ability to discern just the byte-level modifications of files to minimize the amount of data needing to be transferred. Data can be stored as incremental packets of data or reconstructed on the central site to provide full, upto-date copies of remote files. This latter alternative provides the advantage of offering instant access to individual files in the case that a remote file is accidentally deleted. In the second approach, backups are run on remote servers with the output stored to the local disk. The resulting backup image is then transferred to disk at the remote site. This works well for applications that have native backup or snapshot features that can be utilized in the consolidated backup process. These approaches can also be used together. For example, backing up user files may be best performed with differential data transfer, while backing up Exchange data may be best performed using the consolidated backup image approach. In both approaches, the backup data on disk at the core location can be further sent to tape if desired. Companies often choose to keep one or two days of backup data on disk for instantaneous access, with older data written to tape.
4 regulations
An essential part of consolidated archive is some mechanism by which data can be retrieved from the archive by end users, preferably without the involvement of IT. Central policy-based consolidated processes, such as consolidated backup and archive, provide an approach to managing data at remote offices that can significantly lower costs, eliminate risk, improve data consistency, and also ensure better compliance to corporate backup and retention policies. It is an approach that all businesses with remote offices should actively consider.
Managing Remote Data with Adaptec Snap EDR Adaptec Snap EDR is an enterprise-class solution that enables organizations to centrally control and securely move data among remote and core locations. Highly scalable, Snap EDR can handle up to thousands of locations across heterogeneous Snap Server, Windows, UNIX, and Linux systems. The Snap EDR technology provides critical remote data capabilities such as policybased central control, remote process automation, transport and data-level security, guaranteed data integrity, and the ability to deal effectively with many types of networks, including high-latency networks. Snap EDR technology is in use by leading companies worldwide to manage, contro,l and move remote data.
Consolidated Archive Consolidated archive involves identifying remote data that meets corporate archival policy and then moving that data from remote systems and archiving it to a central disk. Archival policy determines what data should be archived and when, and parameters often include: last date accessed, type of file, content, location, ownership, size of file, among others. A consolidated archival process has many benefits such as: • Reducing the amount of data to be backed up (reducing the backup window required) • Optimizing use of remote disk for better performance and cost • Ensuring compliance to data retention policies and
Snap EDR Manager The Snap EDR Manager is the central control center for enterprise-wide remote data processes. Remote data processes are easily set up, scheduled, deployed. and monitored through the Manager’s graphical user interface.
BEST PRACTICES GUIDE
Remote Data Management & Backup with Snap EDR Snap EDR Agents Snap EDR Agents are installed on all Snap Server, Windows, UNIX, Linux or other systems involved in the remote data processes. They are remotely installed through the Snap EDR Manager and execute processes and data transfer based on instructions and rules sent from the Manager. They handle multiple tasks, such as data consolidation, distribution, or synchronization tasks.
Snap EDR Remote Data Solutions To make it easier for companies to apply Snap EDR technology to solve specific remote data problems, Adaptec has developed a number of solution packages for the most prevalent remote data problems. These solutions include: • Remote Data Discovery: To properly manage remote data, a good understanding of data at remote sites is essential. The Remote Data Inventory solution automatically collects and reports on data characteristics at remote locations such as file types and sizes, ownership, file create/modified/access dates, file system size, utilization, and much more. • Consolidated Archive: This solution provides for rules-based archiving of remote data to central systems while providing easy retrieval of the archived file. Snap EDR will automatically archive files based on flexible policies, such as file type, size or access date. A unique file marker technology allows users to retrieve archive files simply by clicking on them. • Consolidated Backup: This solution efficiently consolidates data from multiple locations for a unified backup process. Snap EDR identifies changes made to remote files since the last backup and on a scheduled or event-driven basis moves only the bytes of those files that have changed to the central site.
A Best Practices Guide to Managing Remote Data Best practices for managing and protecting remote data involve both understanding and implementing technology that supports the remote automated processes. There are five primary steps toward implementing an enterprise-wide remote data management solution: 1. Identify and understand remote data and the network environment 2. Select a remote data management solution 3. Create policy for how remote data should be managed
5
4. Deploy centrally controlled automated processes to implement the policy 5. Monitor and adjust as business conditions change The questions below are designed as a guideline towards implementing the first two steps of the five-step remote data management process. Assess the current system: • Which types of applications are you using for data transfers? (e.g. FTP, xcopy, robocopy, tftp, DFS/FRS, public folder replication) • How much manual intervention is currently required? • What are the failure rates for these systems? How much does it cost when they fail? • How easy is it to adjust to new business requirements? Determine your data movement goals: Knowing your goals for data movement will assist in developing the cost-recovery models to justify any purchases needed. Example goals include: • Reduce backup failure rates/increasing data protection • Reduce meantime to restore for remote office • Ensure regulatory compliance for data archiving • Automate data transfers to and from remote sites Determine the types of data that need to be moved: How much data is at the remote locations and what are the characteristics of the data? (size, file types, disk utilization, etc.) • Which characteristics need to be maintained: i. Ownership preservation? ii. File system attributes? iii. Physical disk layout? What applications are running in the remote locations? • Does my data transfer system need to integrate with particular vendor applications in ‘real time’? What data are users currently not backing up? (What is my current exposure?) What types of data need to be sent to the remote office? • How compressible is this data? Determine the data movement volume: Neither the time available nor the network is infinite. You’ll need to crunch the numbers and come up with:
BEST PRACTICES GUIDE
Remote Data Management & Backup -- Best Practices • What is the rate of change of the data on a day-today basis? • How many sites need to be supported? Assess your current network: • What is the available bandwidth to each remote location? • What other applications are currently using this bandwidth? How much bandwidth do these applications require? e.g. Active Directory replication, electronic mail, terminal services • Can traffic be segregated using quality of service (QoS) applications? (i.e. Will you be able to dedicate bandwidth to certain applications?) • Is the network traffic prone to bursts? • How secure is the network? (e.g. Are encrypted VPNs in place to support confidential data transfers?) Choose your solution: • Evaluate potential vendors based against required remote data capabilities: Capability
Vendor 1 Yes
No
Vendor 2 Yes
No
Vendor 3 Yes
No
Central Policy-Based Control Network Efficiency Remote Process Automation Security Support for Heterogeneous Environments
6 application and integration? i. Will the vendor assist in assessing your requirements? ii. Will the vendor provide the tools to assess your data change and growth rates?
Summary Many companies are re-evaluating their current backup processes, not only to ensure the proper backup of critical data, but also with the goal of lowering overall IT costs, and safeguarding themselves from the penalties of regulatory non-compliance. Managing remote data effectively requires that you deal with networks variability, dissimilar computing platforms, security needs, and data integrity, and implement process automation to overcome the lack of trained IT staff at remote locations. The good news is that all this does not have to be hard or complex. Advanced remote data management and movement technology such as Adaptec Snap EDR™ now makes it possible to cost-effectively solve the challenges of managing data at remote offices with a single unified approach. Understanding the issues, requirements, and approaches to effective data protection for remote data, with specific emphasis on remote data backup, is the first step to helping your company assess its remote data requirements. For a 3-minute tour of Snap EDR, go to http://www.adaptec.com/go/edr_flash/index.html or for more information on Adaptec Snap EDR, visit www.adaptec.com or call us at 1-800-442-7274.
Point-in-Time Replication
• Can the vendor’s solution solve multiple remote data problems, such as backup, and archive and distribution? • Does the vendor have expertise with remote data
Literature Requests: US and Canada: 1 (800) 442-7274 or (408) 957-7274 World Wide Web: http://www.adaptec.com Pre-Sales Support: US and Canada: 1 (800) 442-7274 or (408) 957-7274 Pre-Sales Support: Europe: Tel: (32) 2-352-34-11 or Fax: (32) 2-352-34-00
Adaptec, Inc. 691 South Milpitas Boulevard Milpitas, California 95035 Tel: (408) 945-8600 Fax: (408) 262-2533
Copyright 2005 Adaptec Inc. All rights reserved. Adaptec and the Adaptec logo are trademarks of Adaptec, Inc., which may be registered in some jurisdictions. All other trademarks used are owned by their respective owners. Information supplied by Adaptec Inc., is believed to be accurate and reliable at the time of printing, but Adaptec Inc., assumes no responsibility for any errors that may appear in this document. Adaptec, Inc., reserves the right, without notice, to make changes in product design or specifications. Information is subject to change without notice. P/N: 666730-012 Printed in USA
2/05 3704_1.3