Server Consolidation with Solaris Containers and ZFS Davide Restivo
Creative Commons Attribution ShareAlike
Agenda ●
Server consolidation overview
●
Traditional filesystem and RAID vs. ZFS
●
ZFS overview
●
ZFS usage examples
●
Containers overview
●
A simple zone configuration
●
An advance example of zone configuration
●
How to add a dataset to a zone
●
Migrate a zone from one server to another
●
Q&A 2
Server Consolidation (SC) Overview ●
●
Reduce hardware cost by running multiple services on the same system –
Better hardware utilization
–
Reduced infrastructure cost
–
Low administration cost
Security requirement –
Resources control
–
Security isolation
–
Failures containment 3
Possible approach to SC ●
Technologies enabling consolidation: –
Hard Partitions
–
Virtual Machines
–
OS virtualization
–
Resources Management
4
Hard Partitions For examples: SUN Dynamic System Domains ● IBM Scalable Systems Manager ●
APP 0
ENV 0
APP 1
Applicaton
ENV 1 Environment
OS 0
OS 1 Operating System
HW 0
HW 1 Hardware
Very expensive!!!
5
Virtual Machines For examples: ● ●
Xen VMware
APP 0
ENV 0
APP 1
Applicaton
ENV 1 Environment
OS 0
OS 1 Operating System
Hardware 6
OS Virtualization For examples: ● ●
Solaris Zones Unix Chroots
APP 0
ENV 0
APP 1
Applicaton
ENV 1 Environment
Operating System
Hardware 7
Resource Management For examples: ●
SRM (Projects)
APP 0
APP 1 Applicaton
Environment
Operating System
Hardware 8
ZFS Overview ●
ZFS is a new fs from SUN available from Solaris 10 (06/06)
●
ZFS was designed by Jeff Bonwick
●
ZFS is an open source project (CDDL)
●
ZFS eliminates the need of a volume manager
●
ZFS is a 128-bit fs (unlimited capacity)
●
2^48 — Number of snapshots
●
2^48 — Number of files in any individual filesystem
●
Automatically repairs corrupted data (in redundant configuration only)
9
ZFS Overview ●
●
16 exabytes (2^64 byte) — Maximum size of a filesystem 16 exabytes (2^64 byte) — Maximum size of a single file
●
Copy-on-write model
●
Transactional object model
●
Portable between paltforms (x86 from/to SPARC)
●
Simple administration
●
All data and metadata are checksummed P.S.: The name originally stood for "Zettabyte File System" but is now an pseudo-acronym.
10
ZFS vs. Traditional filesystems ●
●
Traditional filesystem –
resides on a single device
–
requires a “Volume Manager” or “Raid Controller” to use more than one device
–
Not portable between platforms
–
Offline fsck utility
–
Difficult to grow/shrink
ZFS –
no partitions to manage
–
is built on top of virtual storage pools called zpools.
–
grow/shrink automatically
–
No need to edit /etc/vfstab (automatically mounted)
–
Online scrubbing (data available to users)
–
Portable between paltform (e.g. X86 to/from SPARC)
11
ZFS vs. Raid ●
●
RAID 5 –
Suffers “Write Hole” (loss of power between data and parity)
–
Workaround to “Write Hole”: NVRAM (..expensive!!!)
–
Expensive controllers
ZFS –
Everything is copy-on-write (no live data modified)
–
Everything is transactional (related changes succeed or fail as a whole, in other words Atomic)
–
No extra hardware!!! 12
ZFS Pools ●
Eliminates the notion of volume
●
A pool is constructed from virtual devices (vdevs) vdevs
●
●
●
A vdev describes a single device or a collection of devices organized according to certain performance and fault characteristics It is possible to use whole disk or slice as vedv in a storage pool (slices are not portable) The storage capacity of all vdevs are then available to all of the file systems in the zpool
●
The zpool command configures ZFS storage pools
●
All datasets in a storage pool share the same space 13
Vdevs (Virtual DEVices) ●
●
Vdevs can not be nested (no mirror or mirror, raidz of mirror, ecc... ) The following vdevs are supported: –
disk (/dev/dsk/c0t0d0, /dev/dsk/c0t0d0s2, c0t0d0)
–
slices
–
file (only for testing purpouse)
–
mirror (minimum 2 disks)
–
raidz ( raidz1), raidz2
–
spare 14
Manage ZFS pool Create a single root vdev (mirror) # zpool create pool1 mirror c0t0d0 c1t0d0
Create two root vdevs (each a mirror) # zpool create pool1 mirror c0t0d0 c1t0d0 mirror c0t0d1 c1t0d1
Create a single root vdev (raidz) # zpool create pool1 raidz c0t0d0 c1t0d0 c2t0d0
Add more space to an existing pool # zpool add pool1 mirror c0t0d1 c1t0d2
15
Manage ZFS pool Create a single root vdev (a mirror) with one hot spare disk # zpool create pool1 mirror c0t0d0 c1t0d0 spare c2t0d0
Create a single root vdev (a mirror) and then add a hot spare disk # zpool create pool1 mirror c0t0d0 c1t0d0 # zpool add pool1 spare c2t0d0 ( # zpool remove pool1 c2t0d0 )
Multiple pools can share hot spare devices # zpool create pool1 mirror c1t1d0 c2t1d0 spare c1t2d0 c2t2d0 # zpool create pool2 mirror c3t1d0 c4t1d0 spare c1t2d0 c2t2d0
Destroy a pool # zpool destroy pool1 16
List all available zpools and show info of a particular zpool # zpool list NAME
SIZE
USED AVAIL
CAP HEALTH
ALTROOT
zones
68G
665M
0% ONLINE
-
67.4G
# zpool status zones pool: zones state: ONLINE scrub: scrub completed with 0 errors on Wed Dec 27 15:13:36 2006 config: NAME
STATE
zones
READ WRITE CKSUM
ONLINE
0
0
0
ONLINE
0
0
0
c1t2d0 ONLINE
0
0
0
c1t3d0 ONLINE
0
0
0
mirror
errors: No known data errors
17
IOStat # zpool iostat zones 1 10 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----zones 665M 67.4G 0 0 84 3.56K zones 665M 67.4G 0 0 0 0 zones 665M 67.4G 0 0 0 0 zones 665M 67.4G 0 0 0 0 zones 665M 67.4G 0 0 0 0 zones 665M 67.4G 0 0 0 0 ^C
Option -v gives information on each root vdev
18
Migrating ZFS Storage pool In certain situations you might need to move a storage pool between two machines. ZFS enables you to export the pool from one machine and import it on the destination machine. Migration consists of two steps: 1 - Export a ZFS Storage Pool ● ● ● ● ●
Attempt to unmount any mounted file systems Flush any unwritten data to the disk Write metadata indicating exporting is done succefully Remove knowledge of the pool from the system Note: Place attention on volumes
# zpool export -f pool1
19
2 - Import a ZFS Storage Pool # zpool import pool: pool1 id: 3778921145927357706 state: ONLINE action: The pool can be imported using its name or numeric identifier. Config: pool1 ONLINE mirror ONLINE c1t0d0 ONLINE c1t1d0 ONLINE # zpool import 3778921145927357706 new_pool
20
.. if something goes wrong :( ... .
21
1 - Recovering Destroyed ZFS Storage pools # zpool create pool1 mirror c0t0d0 c1t0d0 # zpool destroy pool1 # zpool import -D pool: pool1 id: 3778921145927357706 state: ONLINE (DESTROYED) action: The pool can be imported using its name or numeric identifier. The pool was destroyed, but can be imported using the ’-Df’ flags. config: pool1 ONLINE mirror ONLINE c1t0d0 ONLINE c1t1d0 ONLINE # zpool import -Df pool1 < output omitted > 22
2 - Repairing a Missing Device If a device cannot be opened, it displays as UNAVAILABLE in zpool status output. Let's find pool with problems: # zpool status -x pool: pool1 state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ’zpool online’. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: resilver completed with 0 errors on Thu Aug 31 11:45:59 MDT 2006 config: NAME pool1 mirror c0t1d0 c1t1d0
STATE READ WRITE CKSUM DEGRADED 0 0 0 DEGRADED 0 0 0 UNAVAIL 0 0 0 cannot open ONLINE 0 0 0 23
Let's change broken disk an bring the new disk online # zpool online pool1 c0t1d0
Confirm that the pool with the replaced device is healthy. # zpool status -x pool1 pool ’pool1’ is healthy
24
3 - Replacing a broken device with another (on different channel) # zpool replace pool1 c1t0d0 c2t0d0 # zpool status pool1 pool: pool1 state: DEGRADED reason: One or more devices is being resilvered. action: Wait for the resilvering process to complete. see: http://www.sun.com/msg/ZFS-XXXX-08 scrub: none requested config: NAME STATE READ WRITE CKSUM pool1 DEGRADED 0 0 0 mirror DEGRADED 0 0 0 replacing DEGRADED 0 0 0 c1t0d0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0
52% resilvered
Replaces old_device with new_device. This is equivalent to attaching new_device, 25 waiting for it to resilver, and then detaching old_device.
4 - Activating and Deactivating Hot Spares Hot spare can be actived in the following way: Manually: Manually using zpool replace command Automatically: Automatically When a fault is received, an FMAagent examines the pool to see if it has any available hot spares. If so, it replaces the faulted device with an available spare.
26
Manually spare activation # zpool create pool1 mirror c1t2d0 c2t1d0 spare c1t3d0 c2t3d0 # zpool status pool1 pool: pool1 state: ONLINE scrub: config: NAME STATE READ WRITE CKSUM pool1 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 spares c1t3d0 AVAIL c2t3d0 AVAIL errors: No known data errors
27
# zpool replace pool1 c2t1d0 c2t3d0 # zpool status pool1 pool: pool1 state: ONLINE scrub: resilver completed with 0 errors on Fri Jun 2 13:44:40 2006 config: NAME STATE READ WRITE CKSUM pool1 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 spare ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 spares c1t3d0 AVAIL c2t3d0 INUSE currently in use errors: No known data errors
28
After the faulted device is replaced and the resilver is completed, use the zpool detach command to return the hot spare back to the spare set. # zpool detach pool1 c2t3d0 # zpool status pool1 pool: pool1 state: ONLINE scrub: resilver completed with 0 errors on Fri Jun 2 13:44:40 2006 config: NAME STATE READ WRITE CKSUM pool1 ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 spares c1t3d0 AVAIL c2t3d0 AVAIL errors: No known data errors An in-progress spare replacement can be cancelled by detaching the hot spare. If 29 the original faulted device is detached, then the hot spare assumes its place in the configuration, and is removed from the spare list of all active pools.
...for all the other problems start to pray an search for a valid backup!!! 30
Managing ZFS datasets The zfs command configures ZFS datasets. A storage pool is also the root of the ZFS file system hierarchy, it can be accessed as a file system, inluding mounting, unmounting, taking snapshots, ecc... A dataset can be one of the following: –
file system
–
volume (simulates a block device)
–
snapshot (read-only)
–
clone
31
Managing ZFS datasets Create a dataset # zfs create pool1/data1 ●
Directories are created and destroyed as needed.
●
The default mountpoint for pool1/data1 is /pool1/data1.
●
The mountpoint properties is inherited
●
●
A file system mountpoint property of “none” prevents the file system from being mounted. If a file system's mount point is set to "legacy", ZFS file systems can be managed with traditional tools (mount, umount, /etc/vfstab).
Change default mount point # zfs set mountpoint=/export/data1 pool1/data1 32
Get the list of ZFS datasets # zfs list NAME
USED AVAIL REFER MOUNTPOINT
pool1
77K 354M 24.5K /pool1
zones
24.0G 42.9G 28.5K /zones
zones/iacc1prd
8.00G 42.9G 24.5K /zones/iacc1prd
zones/iacc1prd/opt_iona 614M 4.40G 614M /opt/iona
Automatically NFS-export # zfs set sharenfs=rw pool1/data1
Set compression # zfs set compression=on pool1/data1
Set quota # zfs set quota=10g pool1/data1 33
Managing ZFS datasets Set reservation # zfs set reservation=2g pool1/data1
Get zfs proprerties # zfs get all pool1 NAME
PROPERTY
pool1
type
pool1
creation
Wed Jan 10 11:21 2007
pool1
used
77K
pool1
available
pool1
referenced
pool1
compressratio 1.00x
pool1
mounted
pool1
quota
VALUE
SOURCE
filesystem
354M
-
-
24.5K
yes none
-
default 34
pool1
reservation
none
default
pool1
recordsize
128K
default
pool1
mountpoint
pool1
sharenfs
pool1
checksum
pool1
compression
pool1
atime
pool1
devices
pool1
exec
on
default
pool1
setuid
on
default
pool1
readonly
off
default
pool1
zoned
off
default
pool1
snapdir
hidden
pool1
aclmode
pool1
aclinherit
/pool1 off on off
on on
groupmask secure
default default default default default default
default default default 35
ZFS Snapshots Note: Snapshots are read-only! Take a snapshot of pool1/data1 # zfs snapshot pool1/data1@first
Roll back a snapshot # zfs rollback pool1/data1@first Note: this command delete all intermediate snapshots
Generate full backup of a snapshot # zfs backup pool1/data1@first > /backup/data1_first
Generate incremental backup of a snapshot # zfs backup -i pool1/data1@first pool1/data1@second > /backup/second-first 36
ZFS Snapshots Take a look at file foo.txt in a snapshot # cat /export/data1/.zfs/snapshot/first/foo.txt Note: The visibility of the ".zfs" directory can be controlled by the "snapdir" property.
Destory a zfs dataset # zfs destroy pool1/data1
Displays all ZFS file systems currently mounted. # zfs mount
Mounts/Unmounts a specific ZFS file system # zfs mount pool1/data1 # zfs unmount pool1/data1
Mounts all available ZFS file systems. zfs mount -a To unmount all currently muounted fils systems use: zfs unmount -a
37
ZFS clones Note: Clones are read-write! Take a clone of a snapshot # zfs clone pool1/data1@first pool1/data1first
38
Container Overview Solaris Container consists of several technologies that work together. In particular: – – – –
ZFS Dynamic Resource Pool FSS (Fair Share Scheduler) Solaris Zones
39
Dynamic Resource Pools Dynamic Resource Pools enables CPU resources to be allocated to specific applications. One Application Server
Resource Pool 1 (1 CPU)
Two Web Server
Resource Pool 2 (3 CPU)
The two web server will share the 3 CPU 40
Fair Share Scheduler (FSS) Share Allocated to Zones 1
2
4
WebServer Mail Server Database AS
4 / (1 + 2 + 3 + 4 ) = 0.4 = 40 % 3
This software enables CPU resources to be allocated proportionally to applications. Each application get its own share.
41
Configure Resource Pools 1 - Enable the resource pools feature globalzone# pooladm -e
2 - Save the current configuration and activate it globalzone# pooladm -c
3 - See default pool config globalzone# pooladm system my_system string system.comment int system.version 1 boolean system.bind-default true int system.poold.pid 638
42
pool pool_default int pool.sys_id 0 boolean pool.active true boolean pool.default true int pool.importance 1 string pool.comment pset pset_default pset pset_default int pset.sys_id -1 boolean pset.default true uint pset.min 1 uint pset.max 65536 string pset.units population uint pset.load 7 uint pset.size 8 string pset.comment cpu int cpu.sys_id 1 string cpu.comment string cpu.status on-line
43
4 - Create a processor set containing one CPU globalzone# poolcfg -c 'create pset as-pset (uint pset.min=1; uint pset.max=1)'
5 - Create a resource pool for the processor set globalzone# poolcfg -c 'create pool as-pool'
6 - Link the pool to the processor set globalzone# poolcfg -c 'associate pool as-pool (pset as-pset)'
7 - Activate the configuration. globalzone# pooladm -c
8 - Enabling the Fair Share Scheduling globalzone# poolcfg -c 'modify pool pool_default (string pool.scheduler="FSS")' globalzone# pooladm -c globalzone# reboot
44
Solaris Zones ●
●
Solaris Zones provide separate environments on a machine and logically isolate applications. Two type of zones: Global zone – Non-global zone A standard zone automatically shares the –
●
– – – –
/usr (ro) /lib (ro) /platform (ro) /sbin (ro)
45
Create a simple zone 1 - Zone definition globalzone# zonecfg -z zone1 zone1: No such zone configured Use 'create' to begin configuring a new zone. zonecfg:zone1> create zonecfg:zone1> set zonepath=/export/home/zones/zone1 zonecfg:zone1> add net zonecfg:zone1:net> set address=10.0.0.2 zonecfg:zone1:net> set physical=eri0 zonecfg:zone1:net> end zonecfg:zone1> set pool=pool_default zonecfg:zone1:net> verify zonecfg:zone1:net> commit zonecfg:zone1:net> exit
46
2 - Install the zone globalzonel# zoneadm -z zone1 install
3 - Boot the zone globalzonel# zoneadm -z zone1 boot
4 - Login in the new zone and configure it globalzone # zlogin -C zone1
Configure it as a fresh Solaris installation
47
An advanced example 1 – Zone definition globalzonel# zonecfg -z zone2 zone2: No such zone configured Use 'create' to begin configuring a new zone. zonecfg:zone2> create zonecfg:zone2> set zonepath=/export/home/zones/zone2 zonecfg:zone2> add net zonecfg:zone2:net> set address=10.0.0.3 zonecfg:zone2:net> set physical=eri0 zonecfg:zone2:net> end zonecfg:zone2> set pool=pool_default zonecfg:my-zone> set autoboot=true zonecfg:zone2> add rctl zonecfg:zone2:rctl> set name=zone.cpu-shares zonecfg:zone2:rctl> add value (priv=privileged,limit=2,action=none) zonecfg:zone2:rtcl> end zonecfg:zone2> add fs zonecfg:zone2:fs> set dir=/cdrom zonecfg:zone2:fs> set special=/cdrom zonecfg:zone2:fs> set type=lofs zonecfg:zone2:fs> set options=[nodevices]
48
zonecfg:zone2:fs> end zonecfg:zone2> add device zonecfg:zone2:device> set match=/dev/dsk/c0t0d0s6 zonecfg:zone2:device> end zonecfg:zone2> add device zonecfg:zone2:device> set match=/dev/rdsk/c0t0d0s6 zonecfg:zone2:device> end zonecfg:zone2> verify zonecfg:zone2> commit zonecfg:zone2> exit
Block device
Character device
49
2 - Install the zone globalzonel# zoneadm -z zone2 install
3 - Boot the zone globalzonel# zoneadm -z zone2 boot
4 - Login in the new zone and configure it globalzone # zlogin -C zone2
Configure as a fresh Solaris installation
50
How to add a dataset to a zone 1 - Create a dataset globalzone# zfs create mypool/myzonefs globalzone# zfs umuont mypool/myzonefs
2 - Change default mount point # zfs set mountpoint=/export/myzonefs mypool/myzonefs
3 - Add the dataset to the zone globalzone# zonecfg -z myzone zonecfg:myzone> add dataset zonecfg:myzone:dataset> set name=mypool/myzonefs zonecfg:myzone:dataset> end zonecfg:myzone> commit Reboot the zonecfg:myzone> exit globalzonel# zoneadm -z myzone reboot :(
zone
51
Migrate a zone from one server to another zone1
zone3
Server 1
zone2
zoneA
ZoneB
zone4
Server 2
52
1 - Halt the zone globalzones1# zoneadm -z zone4 shutdown -y -i 0
2 - Detach the zone globalzones1# zoneadm -z zone4 detach Use -p
3 - Create the archive globalzones1# cd /export/zones/zone4 globalzones1# tar cvpf /tmp/zone4.tar ./zone4
4 - Move the archive to the other server globalzones1# scp /tmp/zone4.tar server2: /tmp/zone4.tar globalzones2# cd /export/home/zone ; tar xvpf /tmp/zone4.tar globalzones2# zonecfg -z zone4 zonecfg:zone4>create -a /export/home/zone/zone4 zonecfg:zone4>exit 53
Means: attach the zone
5 - attach the zone globalzones2# zoneadm -z zone4 attach
6 – Boot the zone globalzones2# zoneadm -z zone4 boot
54
Reference Documentation ●
“Solaris ZFS Administration Guide”
●
The Sun BluePrints - “Guide to Solaris Containers”
●
Solaris 10 How to Guides - “How to move a Solaris Container”
●
Solaris 10 How to Guides - “Consolidating servers and applications”
●
Solaris 10 How to Guides - “ZFS in Solaris Containers”
●
man pages
55
”Time for questions”
56