Notes on the
Oracle Archive Redo Log System and its
Backup Application
1 - How it works
The Oracle on-line backup system uses copies of all transactions executed against the database stored in what are called Archive Redo Logs. These archive logs are copies of the Online Redo Logs used by Oracle to maintain consistency during shutdown and startup of the database. As each transaction is executed on the system the details of the transaction are stored in the redo log files. There are a fixed number of redo log files attached to each database and Oracle cycles through these filling one and moving onto the next until the last file available is reached, when it will return to the first on the list again.
When Archiving is switched on Oracle will copy each filled Redo Log file to a specified archiving directory for storage. These Archive Redo Logs can then be used to replay all transactions on the database from the point of the last backup when recovery is required.
2 - Securing the backup
As may be obvious from the description above any backup method using Oracle Archive Redo Logs requires that the Archive Redo Logs are stored and secured properly. In order to recover to a certain point in time all the archived redo logs from the point of the last backup 'til the time to be recovered to must be available in order for Oracle to replay all transactions. Any missing files will cause Oracle to stop the recovery process and will either invalidate the backup or mean that recovery can only be made to a certain point rather than up until the moment of loss.
Any storage method must take account of the number and size of Archive Logs generated at peak processing periods in the database.
3 - A timely recovery
As was mentioned before Oracle must apply all transactions from all Archive Redo Logs until the required point of recovery is reached. As you can imagine applying all transactions to a large OLTP database from the point of the last backup until even one day further on can take some time.
In order to ensure a fast recovery time from a loss it is essential that all Archive Redo Logs from the point of the last backup are quickly available on site. If stored on tape the time to recover from tape back to disk will impact the process greatly. If stored on another machine the network time to copy the files across will be the deciding factor.
At best all Archive Redo Logs from the point of the last backup until the current point in time should be available on local storage. Backups on another machine or on tape should certainly exist in case total machine failure is encountered, but these should not be relied on to provide fast recovery.
Also the frequency of the database backups taken will have a high impact on the time to recover. If backups are taken once per week, say on a Sunday, and a disaster hits on Friday afternoon (why do they always hit on Friday afternoon?) then all transactions for the week will have to be applied. This can take a long time on a high transaction database. In addition the storage of a weeks Archive Redo Logs will cost a lot of disk space, never mind the added opportunity for an Archive Log File to be lost during that time.
4 - Stuck in the mud
Oracle is very picky about how it handles redo logs whilst in Archive Redo Log mode. It must ensure that all Archive Redo Logs are written cleanly to their designated storage area before the Online Redo Logs can be reused. If Oracle reaches the last Online Redo Log before it manages to copy the first log to the Archive area it will simply hang all transactions and pause the database pending a fix for the problem. The most common cause of this is where the file system being used to store the Archive Redo Logs is full.
This is a real flash point for disaster. Normally in such situations the DBA or System Administrator will attempt to free up some space on the file system by deleting old Archive Redo Logs that are no longer required online. If all the logs currently stored are required to recover from the last backup then alternative storage will have to be found and these logs copied to this storage before being deleted from the full file system.
Unfortunately this task is normally completed as 300 users are calling the helpdesk, management is screaming and the cat is hiding under the couch in sheer panic. At this point is best to have a tried and tested set of shell scripts to run that will ensure only Archive Redo Logs that are already copied correctly to alternative storage are deleted.
5 - Hot Databases in Hot Backup Mode
Oracle implements a hot backup that allows the database to be backed up without loss of service to the users. The database remains online for all user activity as the database files are being backed up. Integrity is maintained by writing transactions to the database Online Redo Logs rather that to the database files themselves, giving the database the appearance of being in read/write mode whilst actually placing the database data files in an essentially read only stance. This is slightly misleading, though. Oracle actually contiues to write to the database files as normal, it just freezes the header block of the file. This is the block that determines the changes that have to be made when restoring from Archive Redo Logs. Thus when the backup files are restored they contain the first value of the transaction logging number available at the time the backup was taken and all changes after that can be written to the files.
This would be all fine and lovely except for one thing. It is possible using this method that the process being used to backup the database might not read entire blocks at a time. It may gather a partial block and then return to get the next part later. During this time Oracle may have made changes to the block, in which case the backup process will copy the block from two different images. The first part of the block before the change and the second part afterwards. This would confuse the recovery
process since the data stored in the Redo Logs only replays the actual bytes changed in the block, not the whole block information. To counter this Oracle during hot backup mode forces the write to the Redo Logs of the entire block each time a block is changed. This is a dramatic overhead for updates.
In this kind of situation backing up a large database to tape, even to Raid DLT can be impractical. On top of the performance drop involved in copying database files to another device whilst the database is online there is also the major performance problem of writing kilobytes of extra data to the Redo Logs and thus onto the Archive Redo Logs. A faster method has to be found. Disk to disk backup, then to tape is a better option but may still take a long time to complete.
On most high-end sites a three-way mirror system is used, with one mirror being kept solely for backup purposes. The third mirror is normally off but is synched in
with the other two just before a backup is required. Once synched the database is put into backup mode, the mirror is split and the database can be put back into normal operation quickly. The third mirror can then be backed off to tape for storage.
6 - Recovering up to the last minute
Oracle only writes an Archive Log when an Online Redo Log is full and complete. In a total disaster then, where even the Online Redo Logs have been lost, recovery would only be possible up until the point of the last available Archive Log.
Normally this is not a large problem, it is expected that Online Redo Logs will be filled at the rate of at least one per hour. This makes for a maximum loss of 1 hour
of transactions. At slow transactional period however the rate of fill of such logs slows down. The normal expectation is that overnight batch jobs will cause high Redo Log activity and that daily processing will be less intensive.
Tuning Oracle Redo Logs for performance is usually done against the peak transaction rate, meaning that at slower times of the day there can often be a large time gap between log completion and thus log archiving. In such an instance the total time exposure can increase to beyond what the business will accept.
Oracle provide a facility for this by which the switch of Online Redo Logs can be forced causing the current log, even if not filled, to be archived. On systems where such periods of low activity exist it is recommended that the Redo Log switch is forced at regular intervals, most sites switching at least once per hour, depending on business needs.
7 - So how do I know what storage I'll need
Oracle stores a history of redo logs used, the time they were first written to and the transaction count at the time they were started. This history, held in the database view V$LOGHIST, can be used to estimate the number of logs typically output in a day.
A typical query for this might be:
Select to_char(first_time,'DD-MON-YYYY'),
Count(*)
From v$loghist
Where first_time > '01-JAN-2001' /* Or any other date
that may be relevant */
Group by to_char(first_time,'DD-MON-YYYY')
Order by count(*) asc;
Running this query on a sample database shows a peak value of 8 redo logs generated in one day.
The size of the redo logs can be determined either by examining the file system size or by reading the BYTES column on the V$LOG view. On our database at the moment, for example, there are four Online Redo Logs (each with a mirror copy) of 64M each.
Combining the two figures we have a total of 512M (8*64=512, trust me I used a calculator) redo log data at peak in one day. This is the amount of data that any backup facility or storage of Archive Redo Logs would have to take into account. This figure should be mitigated against the fact that manual switching of redo logs means that not all archived redo logs will be the full size.