Why backup one tablespace at a time?
Backup and Recovery Tips
Why backup one tablespace at a time? All Oracle documentation (and mine, too!) recommends strongly that when performing Operating System hot backups you should put one tablespace into hot backup mode at a time, copy its files, and take it out of hot backup mode before moving on to the next tablespace, and repeating the procedure. The reason for this recommendation is simple: the ‘begin backup’ command has profound effects on the amount of redo generated by the database (in plain English, it sends the amount of redo generated by even the smallest transaction through the roof). The trouble starts when you consider that an Oracle block is typically made up of several Operating System blocks. Commonly, O/S blocks are 512 bytes in size –so your typical Oracle block of 8K is made up of 16 O/S blocks. When you perform an Operating System-based backup, the disk head is copying Operating System blocks, oblivious of the fact that each O/S block it encounters has a relationship with (say) 15 others. Given that we are talking about hot backups, those 15 others could be having changes made to them whilst we are copying the first O/S block. The Operating System doesn’t care about this, but Oracle most certainly does –because we now have part of an Oracle block copied with its contents at time X, whilst the remainder of the block has moved on to time Y. By the time the O/S has finished copying all 16 O/S blocks, it’s clear that the copies are likely to all be all at different times from each other. Imagine now trying to put all 16 back together again, and making a sensible Oracle block out of them. Put simply, it can’t be done … unless you snapshot the entire Oracle block every time a part of the block changes. That way, we’d end up with a number of different snapshots, the last one of which could be used as a picture of what the entire set of O/S blocks, when put together to create an Oracle block, should look like. This problem is known as ‘block fracturing’, and the way to get around it is to generate complete Oracle block snapshots in the redo stream every time any part of the Oracle block changes. This is precisely what the ‘alter tablespace … begin backup’ command does for us. It instructs Oracle to generate block-level snapshots of any block that is altered by a piece of DML during the Operating System backup, and that snapshot is stored in the Redo Log Buffer, and from thence, in the Online Redo Logs –and eventually, of course, in the Archived Redo Logs. This is in stark contrast to what the Redo system ordinarily stores. If you update a record, the Redo system usually only records the before and after version of the actual change you make. If you change Bob’s salary, for example, the Redo would comprise the rowid for Copyright © Howard Rogers 2001
20/10/2001
Page 1 of 3
Why backup one tablespace at a time?
Backup and Recovery Tips
Bob’s record together with his original salary, swiftly followed by the same rowid and the new salary. There is some additional housekeeping information stored, too, naturally –but at the end of the update, you’d have generated around 40 or 50 bytes of redo information. But in hot backup mode, the very same transaction is required to generate an entire block snapshot –on a typical Unix box, that would be 8192 bytes of redo, not a mere 40 or 50. Now, I’m simplifying a bit here: on a recent test, it took me 8 updates of a person’s salary to induce a log switch on a 4K-block system whilst in hot backup mode (the logs were 16K in size, so that suggest around 2K per transaction, or half a block). The same update when NOT in hot backup mode had to be performed 14 times before the log switched (which suggests around 1K per update). However: that’s still double the amount of redo for the identical transaction when in hot backup mode compared to when not in hot backup mode. Now, if you accept that the smallest transactions now generate at least double the amount of redo, consider making that happen across the entire database! (And if you were to place all tablespaces into hot backup mode simultaneously, that’s exactly what you’d be doing). What does doubling the amount of redo produced by the database mean in practice? Well, for a start, LGWR is going to be flushing the Redo Log Buffer more frequently (because it will hit the 1/3rd full or 1Mb of uncommitted redo limits more rapidly). That means redo logs get filled up more quickly –which means log switches happen more frequently. Given that a full database checkpoint is issued at every log switch, that means that the rate of checkpointing goes up –and since a checkpoint is one big I/O-fest, you can confidently expect general database performance to go down in about equal proportion. What’s worse, you will be cycling through the available Online Redo Log groups at a rapid rate of knots. You’d better pray that ARCH can keep up with his archiving activities, otherwise you will soon end up looping back to a Log which has not yet been archived –at which point the entire database will hang until ARCH catches up. Some people get away with it: their Redo sub-system is geared up to cope with a doubling of the amount of redo generated by the entire database, and the flood of redo induced by a string of successive ‘begin backups’ is therefore not overwhelming enough for them to worry about it. You’d have to weigh that up in the light of your own circumstances before signing up to that approach, though. There’s one final factor to consider. The ‘begin backup’ command has one other effect on the database: it locks the SCN stored in the header of each of the database’s Data Files. That’s so we can tell the earliest point from which to start applying redo in the event of a recovery being required. If you take the recommended approach of one tablespace at a time, then each Data File backup copy will have a different SCN in its header. That’s not a problem for performing Copyright © Howard Rogers 2001
20/10/2001
Page 2 of 3
Why backup one tablespace at a time?
Backup and Recovery Tips
recovery, provided you’ve retained all archives generated since the time the entire backup started. But it does mean that a restore of a file which was copied late in the backup cycle (say, 3 hours after the very first one was backed up) will require significantly less redo to bring it up-to-date than one copied early on in the piece. Now consider what it would look like if you issued a string of consecutive ‘begin backup’ commands, and then embarked on a mass copy of all Data Files which took (say) 3 hours to complete. Every file would have more or less the same SCN stored in its header –and hence, all files would require more or less the same amount of redo to be applied in the event of a recovery. In practice, Oracle would have to run through 3 hours’-worth of Archives, regardless of what file it was being recovered. With some of the sizes of databases I’ve seen lately, the time taken to do the physical copying of Data Files during a backup is not insignificant –and hence this business of being forced to trawl through masses of unnecessary Archived Redo Logs is not insignificant either. It’s therefore an awfully inefficient way of doing things, and if you follow the recommended approach, there’s zero need for it in the first place. For a small-ish database with a moderate amount of activity, the simplicity of throwing the entire database into hot backup mode and then just doing a ‘copy *.*’ may well outweigh the potential disadvantages I’ve explained here (always provided your Redo sub-system has been configured to cope). But when the process of actually copying your Data Files starts taking a reasonable length of time, and/or when the DML activity on the database is significant, it’s probably going to be best to stick with the standard advice: one tablespace at a time!
Copyright © Howard Rogers 2001
20/10/2001
Page 3 of 3