Intelligent Storage ™
Data Consolidation: Benefits and Implementation Guidelines A W H I T E PA P E R F R O M A D V A N C E D D I G I T A L I N F O R M AT I O N C O R P O R AT I O N
www.adic.com
ADIC WHITE PAPER
Table of Contents
Introduction ..........................................................................................................................................2
Why the Push for Consolidation?........................................................................................................2
Consolidated Storage ..........................................................................................................................4
Data Consolidation: It’s All about the Data ........................................................................................7
NAS Devices..........................................................................................................................................8
SAN-Based Data Consolidation ..........................................................................................................9
Guidelines for Implementing Enterprise Data Consolidation ........................................................11
Conclusion ..........................................................................................................................................12
1
DATA CONSOLIDATION
Enterprise Data Consolidation: Guidelines for Leveraging the Benefits of Consolidated Data This white paper discusses the differences between storage consolidation and data consolidation. By consolidating data, organizations can realize significant benefits in storage management and data availability, while better utilizing storage capacity. This paper explains different approaches to data consolidation —including NAS devices and SAN file systems—and offers guidelines for planning and implementing this new approach to data management.
Introduction The last two decades have seen the distribution of processing and data management throughout the enterprise. Now we’re reversing that trend, particularly for storage. Consolidation is the new IT mandate, and storage consolidation is a big part of it. Storage consolidation is driven by the cost and complexity of managing growing amounts of critical storage, and the recognition of data as a corporate asset that must be available and accessible. There is an important, often overlooked distinction between consolidating storage and consolidating the data itself. Storage consolidation addresses some of the problems facing IT organizations today in terms of data management costs. But actual data consolidation is essential to meet service level requirements for increasingly data-dependent organizations without creating debilitating administrative overhead. Until recently, data consolidation has been difficult to achieve and was limited to file servers and NAS devices. Now, a new generation of distributed, heterogeneous SAN file systems is making data consolidation both possible and strategically important. For example, ADIC’s StorNext Management Suite offers a SAN file system that provides scalable data consolidation in heterogeneous environments, reducing the cost and effort of managing enterprise data.
Why the Push for Consolidation? In the mainframe computing model, all computing and storage is consolidated. The advent of powerful, smaller processors helped fuel the drive towards the client/server computing model. This approach, involving multiple, interconnected programs distributed across different CPUs, proved more flexible and scalable in many ways than mainframes. Today, 20 years after the general introduction of client/server computing, businesses rely on computing in new and strategic ways. As the needs for data and processing capacity grew, this model generated islands of data that were critical to business operations. Managing the distributed data has been especially difficult in the areas of data availability and data protection.
2
ADIC WHITE PAPER
From a financial perspective, IT departments are spending increasing amounts of their budget on storage, even if the overall budget is flat. Managing distributed storage is costly and inefficient. Consolidation is being driven by escalating demands for storage, high service level requirements, and the administrative costs of meeting those demands. Most data still resides in a traditional, direct-attached storage model, distributed throughout the enterprise. Often this is true—even for applications for which multiple computers serve the same purpose with the same data. Although the direct-attached model works well for smaller environments, it has serious limitations as an enterprise strategy for managing data: • Multiple management points make it difficult to administer storage and apply consistent policies. • Different storage platforms and configurations are served by different management utilities, further adding to the cost of managing the data. • Data is duplicated throughout the infrastructure, leading to versioning, transfer, and synchronization problems.
Local Area Network
Clients
File Server Farm
Figure 1. A typical direct-attached storage environment—creating and maintaining multiple copies of data
In short, the direct-attached storage model distributes administration as well as data throughout the enterprise. This makes it difficult and labor-intensive for organizations to ensure adequate protection for their critical data. Administrative capacity is often the bottleneck for growth using this model.
3
DATA CONSOLIDATION
Reducing the cost of storage administration is the big payoff for consolidating storage. According to IDC, storage administrators can manage up to nine times more data in a consolidated SAN environment than in a direct-attached model. Even accounting for the potentially higher costs of network administrators for this environment, management costs are reduced by 33%. [Source: IDC: Leveraging Networks for Storage Consolidation, October 2001.] The cost of storage itself is a secondary, but still significant factor. The direct-attached model is extremely wasteful. Because running out of storage can stop a server cold, most administrators over-provision storage—trading unused storage for the promise of availability. Magnify this over-provisioning by hundreds or thousands of servers and the waste is enormous. In IT departments facing fixed budgets and growing service-level demands, this kind of waste detracts from other IT budget areas. Although the factors driving consolidation are strong, hard experience has taught us that new technology paradigms don’t always result in a quick return on investment. So, rather than taking an ad hoc, opportunistic approach to consolidating storage and data, it’s good to understand the basis for these technologies and analyze what you hope to achieve. First, we’ll compare the general concept of storage consolidation with data consolidation. Then, we’ll review and contrast two different strategies for consolidating data: NAS devices and SAN file systems.
Consolidated Storage Storage consolidation is a first step towards reducing the cost and complexity of storage management—and it’s as far as many organizations have gotten to date. Organizations have been cautiously adopting storage area networks (SANs) in hopes of achieving the benefits of consolidated storage. Most SANs primarily serve to consolidate access by multiple servers to storage devices, such as arrays or high-end tape devices. Network administrators then use storage virtualization software to make the consolidated storage behave as a single virtual disk. The physical drives are accessed via their Logical Unit Numbers (LUNs). Several companies provide LUN virtualization software to ease the overall management of the host to device connections, but these techniques can be cumbersome to implement. To prevent different servers from overwriting the same data, LUN masking and zoning techniques are employed to help ensure that only one server can access any specific storage resource. Although the storage is consolidated physically, it is still logically tied to a specific server.
4
ADIC WHITE PAPER
The physically-consolidated storage model, shown in Figure 2, has several advantages over traditional, direct-attached storage: • You now have fewer management points for managing storage resources. There are fewer independent devices to back up, with storage devices in a centralized location. • You can allocate storage more efficiently, hosting multiple servers with one large array instead of arrays for each. Or, you can use JBOD storage on the SAN, and add new disks to the virtual volumes as storage needs increase.
SGI
A
Windows NT or 2000
B C D
Sun
Linux
Figure 2. The storage is consolidated, but the data is not; storage is partitioned between servers
However, the physically-consolidated storage model still has some inefficiencies. If you look closely at Figure 2, you can see that the link between server and data has not been broken. The disk devices on the SAN are still sliced into individual partitions for each server; data cannot be shared between them. Each operating system still “owns” its own slice of disk. In fact, if Server B needs data from server A, Server A must transfer the data over the LAN to server B. In workflow or other data-sharing environments, this generates a significant amount of network traffic. Only consolidating physical storage has several downsides. First, consider the common but essential task of backup. Although the storage is consolidated, regular backups still occur on a server-by-server basis, with backup data typically traveling through the LAN to the backup server.
5
DATA CONSOLIDATION
Second, administrators are still faced with issues of data synchronization and versioning. Capacity planning occurs on a server-by-server basis. And because administrators have to provision adequate storage for each server, you still have underutilized storage.
LAN
Tape Library
Backup Server
Server B
Server A
Server C
A B C SAN Storage
Figure 3. Backups from a physically-shared array still travel through each host server
Finally, this model does nothing to address the problem of data proliferation. Think again about the direct-attached example. Assume that each server uses mirrored storage to ensure availability of data; for three servers, this results in six copies of the same files. Larger server farms clearly have significant data redundancy. This is the state that many organizations find themselves in today. Having achieved the initial benefits of storage consolidation in early SANs, they struggle with storage management tasks such as backup and capacity planning, and still allocate a significant (and increasing) part of the IT budget to storage. Although they have proven valuable, most SANs have failed to deliver the dramatic, paradigm-shifting benefits promised by a complete overhaul of the system architecture. What’s required is a full shift to data consolidation.
6
ADIC WHITE PAPER
Data Consolidation: It’s All about the Data Once you’ve consolidated storage in a SAN, the groundwork is in place to consolidate the data itself. Logical consolidation is primarily a matter of software. The file system intelligence moves from the individual server level to the SAN level, enabling multiple computers to use shared storage as a single logical entity. This model differs from the physically-consolidated model in several important ways. First and foremost, no single server “owns” the data. Rather, a distributed, independent file system owns the disk and manages data access between the various systems. Multiple clients (which may run on different operating system (OS) platforms can share concurrent access to files without compromising data integrity. The benefits of data consolidation are many: • Data availability improves; if one server is unavailable, its data remains available to other servers. • Productivity improves in workflow environments; no time is spent transferring data between servers. • Network bottlenecks caused by data transfers are eliminated. • Redundant storage is eliminated; instead of four copies of a web page, for example, you store a single copy on highly available storage.
SGI
ABACDBDBA ACBDACAAB BCDBBA
Sun
Windows NT or 2000
Linux
Figure 4. Using data consolidation, all servers share the same pool of data
But the big payoff, again, is in the cost of management. You no longer have to keep separate disk partitions for each server. Provisioning storage is much easier—you can grow and resize file systems according to their actual usage, rather than allocating capacity among multiple machines. Capacity planning is greatly simplified, and there is less data to back up or replicate. With simplified management, each storage administrator can manage a much larger pool of storage, providing administrative scalability. There are two general approaches to consolidating data in file systems: network file servers or NAS, and distributed file systems on a SAN. The two approaches have significant architectural and functional differences, described next.
7
DATA CONSOLIDATION
NAS Devices Network file servers, such as NFS and CIFS, have been around for years, providing consolidated file services over local networks. Network Attached Storage (NAS) devices offer the same functionality in specialized systems optimized for supporting network file systems. When a computer requests data from the NAS, the NAS accesses its storage array, obtains the necessary disk blocks, and converts the data into a generic file format. Then it sends the data across the LAN to the originating computer where it is converted to the local file format.
Server
Server
Server
LAN
NAS Appliance
Disk
Figure 5. Using NAS, all data travels through the NAS and over the local network
The NAS appliance has the convenience of adding file storage to a network nearly instantly. It works well in many workgroup environments. But as your needs grow, NAS storage has several limitations: • Managing large files: The NAS approach works well when serving small (<1MB) files in high transaction environments. But with larger files, the conversion to file format introduces significant latency. The local network may introduce latency as well, depending on traffic. Sometimes users keep data that requires high performance locally instead of on the NAS—compromising data consolidation. • Scaling capacity: Some NAS vendors suggest that files larger than 10 MB be distributed across multiple NAS devices. And often the best way to add significantly more storage is to put yet another NAS device on the network. This only serves to further distribute, not consoli-date, data. Before you know it, you’re managing multiple NAS servers in a model that looks very similar to the original direct-attached storage model. • Protecting NAS data: Few NAS devices offer robust, back-end processes for moving data onto tape or other devices. Most often backups must be processed through the NAS box itself and then written to a backup server—creating a huge performance problem for around-the-clock environments. And it is often difficult to integrate NAS backups into enterprise-wide data protection policies. Because of the “black box appliance” nature of the NAS, the device’s own island of data is isolated from enterprise policies.
8
ADIC WHITE PAPER
SAN-Based Data Consolidation The other approach to consolidating data is to create a file system that works on the SAN, enabling multiple computers to share access to the same set of files. This approach has the benefit of providing the performance of direct-attached storage for all file types, while enabling data sharing among different platforms. Figure 6 illustrates how SAN files systems work:
LAN
Meta Data Server
Server
Server
Fabric
High Speed, Low Latency SAN
Disk
Figure 6. Meta data travels the LAN, while data access occurs over the SAN
Data access requests go through the meta data server via the local area network. If an application requests read access to a shared file, it sends the request to the meta data server, which does the following: • Determines if the file is available for the request; • Determines if the application or user is authorized to access the file; • Directs the client to the file location on the SAN. The client then reads the SAN data directly, without further interaction with the meta data server. The meta data transaction itself is quite short and fast prior to the actual data transfer. This “out-of-band” approach eliminates potential performance bottlenecks, particularly for large files. This contrasts sharply from the NAS model where all requests, meta data and data alike, go through the NAS device.
9
DATA CONSOLIDATION
A SAN-based file system provides the following services: • It supports clients of different OS platforms, enabling file sharing between a wide range of systems. This lets organizations choose the right system for the right task, without worrying about data format. • It allows any SAN client to potentially read or write data without data integrity issues. Often this is managed by a meta data server, which gives clients access to the file meta data. Then, clients access the file data directly from the SAN, at fibre channel speed. • File sizes and bandwidth scale independently. Each SAN file system client has direct access to the data at Fibre Channel speeds and bandwidth can be added through additional Fibre Channel connections. Simplified data administration is a major benefit of data consolidation. Active storage management capabilities can extend this benefit. By consolidating data into functional file systems, it’s easier to assign the specific storage characteristics, such as bandwidth and protection, to specific classes of data. The shared file system has extraordinary knowledge about the data it controls: creation date, last access, creating application, permissions, and so on. Active storage management systems can leverage this knowledge to track data from inception to archival. For example, you can ensure that critical data is replicated to another facility and stored on highly available storage, while migrating little-used or older data to high capacity tape devices. ADIC refers to this concept as Total Data Life Management™, which is the foundation of the StorNext Management Suite software. ADIC went beyond its SAN file system to create integrated software that includes policy-driven, automated storage management—offering administrative efficiencies beyond those inherent in the consolidated data model. This automated data management and protection magnifies the administrative cost savings of consolidating data, while ensuring consistent data protection processes.
SGI RAID Linux
Storage Area Network
Policy Management
JBOD
Sun
Windows NT or 2000 Automated Tape Library
Figure 7. Integrated Solution. Clients have shared access to policy-managed, consolidated SAN storage
10
ADIC WHITE PAPER
Guidelines for Implementing Enterprise Data Consolidation The maxim “think globally, act locally” applies well to the process of consolidating enterprise data. Few organizations are in a position to make wholesale infrastructure changes—most will phase in innovations. However, in selecting solutions for data consolidation, you need to plan for the long-term data storage needs of your organization. With that in mind, here are some general guidelines for data consolidation: • Take an integrated approach to data consolidation. Data consolidation requires servers, network infrastructure, disk storage, tape storage, and enabling software. A SAN alone is not enough—you need the right software to leverage it fully. Although plug-and-play SANs are the wave of the future, today you should look for vendors that help you create a proven, integrated SAN file system. • Choose platform-independent solutions. Avoid locking into one hardware vendor. Throughout your enterprise, computer platforms are chosen for specific and important reasons. One may have better floating point operations, while another may handle largescale parallel processing particularly well. Cost is another significant consideration in selecting platforms. The distributed file system must support access from a wide range of platforms if it is to offer true enterprise data consolidation. • Integrate data protection into the solution. Your goal is reducing the cost of managing storage, without compromising data protection. Make sure that data protection processes like backups, vaulting, and replication are integrated with your SAN file system. • Leverage opportunities for high availability. The shared file system itself enhances data availability by providing multiple paths to data. With many paths to shared storage, clustering and failover become much easier to implement. • Build for growth. Be sure that your consolidated solution can grow both in terms of bandwidth and capacity to handle present and future needs. • Start automating data management. Leverage the file system’s knowledge about the data and its usage in active storage management systems that track data from inception through archival. In selecting projects for SAN file systems, you’ll want to start with those that deliver the greatest and most visible ROI. You may find that these projects address problems other than storage administration costs, such as escalating hardware costs, productivity, or network bottlenecks. Addressing these problems with a SAN file system will give you the metrics you need to quantify and justify the administrative cost reduction case.
11
DATA CONSOLIDATION
Here are some suggestions for choosing initial data consolidation projects: • Replace multiple file servers with a SAN file system. Users are already used to sharing files in file servers; by using a SAN file system you can consolidate several servers into a shared pool of storage, managed by a single SAN file system. Users will see an immediate performance improvement in shared storage, while administrative overhead is reduced. Available network bandwidth should also improve as file transfers are removed from the LAN. • Identify workflow applications, particularly those that are data-intensive. In many cases, the shared file system pays for itself quickly by speeding workflow processes and reducing redundant storage. It can also lead to a significant net increase in local area network bandwidth, as large file transfers are removed from the network while servers access SAN data directly. Good candidates for data consolidation include CAD/CAM applications and geospatial processing, where multiple people work with large data files. • Identify applications in which multiple servers provide the same basic functionality with the same data are great candidates for early data consolidation. Moving to a shared file system for data that is primarily read-only provides immediate and significant storage savings, while reducing content distribution and synchronization efforts. The storage cost savings alone can recoup the software investment rapidly.
Conclusion According to IDC, “The trend toward networked storage will become the disk data storage paradigm of the future precisely because of the business advantages.” [Source: IDC: Leveraging Networks for Storage Consolidation, October 2001.] No one changes their IT infrastructure without a good idea that the effort will pay off, either in immediate savings or enabling future growth and technology. There are different ways to implement consolidation. If you focus on storage alone, then you’ll limit the return for your networked storage investment. To achieve all potential benefits of consolidation, you must consolidate the data itself. While NAS devices offer some consolidation benefits, SAN-based file systems offer the most scalable approach for enterprise data consolidation. They also support multiple operating systems and can be extended to include automated data protection. ADIC’s StorNext Management Suite includes a SAN-based file system that enables concurrent file sharing between heterogeneous systems on a SAN. Integrated storage management capabilities track and apply data protection and management policies throughout the life of the data. By consolidating data and automating data management, you can achieve the promises of consolidation with significant reductions in cost and administration, and improve service levels for data.
12
Intelligent Storage ™
AD I C G L O B A L H E A D Q U A RT E R S 11431 Willows Road NE P.O. Box 97057 Redmond, WA 98073-9757 USA Toll-Free: 800.336.1233 Phone: 425.881.8004 Fax: 425.881.2296
ADIC and StorNext are registered trademarks, and Total Data Life Management is a trademark of Advanced Digital Information Corporation. © 2003 Advanced Digital Information Corporation.
WPEDC 0203
www.adic.com