I received an email today noting that today (March 31st) is World Backup Day. Who knew there was such a thing. An interesting comment then followed: “shouldn’t every day be world backup day?” Good point!
Whilst everyone is double checking their iPhone backup to make sure their precious holiday photos are set to back up every night in the event you drop your phone or leave it in the back of a taxi I thought it would a good day to share some basic guidance on backups in an eDiscovery / eDisclosure context.
This aim of this post is to explain backup tapes (and other backup formats) at a high level. Backups are often a source of electronic documents that are required to be considered in the context of a legal and regulatory dispute. For matters with a long and historic date range, or in the event that custodians of interest have left the company, a backup may be the only source of data, and accordingly may be a crucial source of evidence.
However, and maybe wrongly, backups have the reputation of being a notoriously challenging source in an eDiscovery context. For some companies and some backup formats, it may be true that dealing with them is difficult (maybe impossible) however it would not be accurate to say that just because data exists on a backup tape, that the source of data should always be disregarded as disproportionate in the context of a legal dispute.
Let us help you navigate your way through this data source.
What Data Exists on a Backup?
Backups are best described as a snapshot of data at a specific point in time. Almost anything can be backed up and what a company chooses to be backed up is entirely up to their internal policy and need.
Importantly, backups are often taken at various time points, commonly each day, week, month or year. The purpose of a daily backup is often to allow a company to restore its data to that point in time in the event of a catastrophic IT failure. The more often a backup is taken the less data is lost if the IT systems fail.
Backups are also often kept for retention and historic storage purposes. Often, in addition to daily backups, organisations may choose or are required to keep periodic backups of certain categories of data. To facilitate this, periodic backups such as month-end or year-end backups will be kept for a period of time.
Depending on the nature and frequency of backups, the data held on each tape will be highly duplicative.
The below is an illustration of the fictional lifecycle backups of 3 emails to demonstrate:
Email X is received on the 23rd of March, then deleted on the 27th of March. Because the backup was taken on the 31st of March Email X is never stored on the backup tapes.
Email Y is received on the 25th of March then deleted on the 2nd of April. When the backup is taken on the 31st of March a copy of the email is on the system at that time, therefore a copy is preserved on the backup tape. However, it is then deleted on the 2nd of April, therefore no other backup tapes will contain a copy of this file. The only copy of this file in existence is on the 31 March backup.
Email Z is received on 27th of March, but it is never deleted. When the backup is taken on the 31st of March a copy of the email is on the system at that time, therefore a copy is preserved on the backup tape. However, this file is never deleted, therefore every time a backup is taken, another copy of this email is taken. Therefore, the backups potentially contain hundreds of copies of this email.
With this in mind in the context of retrieving data in an eDiscovery context and with consideration to proportionality, it may be advantageous to develop a strategy that involves strategically selecting tapes to restore, rather than restoring every tape. Based on this fictional example, doing so then knowingly risks that certain documents may not be retrieved / restored.
Options may include:
- Restore every backup
- Restore only the year end backups
- Restore tapes from select years only (eg previous 7 years’ only or another selection of years based on key events related to your case)
Backup Formats (Tapes)
At this stage it is worth pointing out that despite the title of this document, to date, backups have been referred to without reference to the word “tape.” Historically backups were normally saved onto tape media. Backup tapes can come in a variety of forms, but look very similar to a cassette tape that everyone’s favourite Beach Boys songs came on in the 80s (yes, I’m showing my age).
As a fairly general statement, backing up to tape was chosen historically because it was a very inexpensive means of storing a very large amount of data. For example, an LTO-6 format of tape stores over 6TB of data and can be purchased for around £20 (or $25USD). However, back-up tapes are a fairly fragile storage medium although they are designed to store data for a very long time. Tape can degrade, disintegrate or become corrupt without notice during storage. Factors increasing the risk of damage in “accessing” a tape would include its age, how well it has been stored over time (i.e. the environment in which it was stored) and how many times the tape has been used in the past.
Whilst tapes are still used, it is increasingly common for firms to back up to a hard drive, or cloud storage solution, rather than relying on tape as a media. The reason for focusing on tapes in this document is that there are specific challenges with tapes that need to be considered to allow the data on them to be transformed into a usable format. These challenges can include:
- The large volume contained on the tape.
- The fact that the tape will contain an entire network of data or an entire company’s email at a point in time, therefore confidentiality and privacy concerns often need to be addressed before providing tapes to an organisation to restore.
- Due to the age of some tapes, companies often no longer hold the tape drive that is used to read the tape.
- The software that was used to back up the data to the tape also needs to be considered. Whilst the backup software is not always needed to restore from tapes, it is something that will need to be considered in relation to the approach to retrieving data from the tapes.
Millnet’s forensic team have substantial experience in working with and restoring data from numerous formats of tapes and hold the drives that allow us to read from many common tape formats. Therefore, if you have a case that requires tapes to be considered, our experts should be called on to discuss your options before excluding this source just based on hearing that “there are tapes involved.”
How Many Tapes?
What is quite important to point out at this stage is that it would be incorrect to assume that ALL data will be stored on ONE tape. Often due to the size of the system being backed up the data will be written to many tapes. For example, we have known companies who back their file servers up to tape and use a series of 20+ tapes to write all of this data to. This is because the size of their live system is that large. Conversely a tape will not always contain data from a single source. The tape may be a backup of many systems.
Below are 3 x fictional examples of how companies may store data on tapes. In this example the companies each have very simple IT infrastructure (an email server and file server). Please note that it is common for companies to have far more complex infrastructure, however this fictional example aims to illustrate the point:
- Have 2 x servers – 1 x email server and 1 x file server
- Uses 3 x tapes to back up their systems
- Email server is backed up to 1 x single tape
- File server is backed up to 2 tapes (presumably because it is too large to fit on one tape)
- Backs everything up onto one single tape
- Uses 6 x tapes to back up their systems
- Tapes 1-4 backup the Email server (presumably because the email server is so large they need 4 x tapes to fit the data onto)
- Tapes 4-6 are used to back up the file server.
- NOTE that tape 4 has data from the BOTH the email server and file server (presumably because the last piece of the email server backup used only part of the tape and so the first part of the file server backup was started on this same tape.
So in this fictional scenario if we are looking for data stored on the file server that we have determined is not on the live system but only on backup tapes, our approach for each company would be different:
- Company A – we would restore 2 x tapes and have all of the file server data.
- Company B – we would restore 1 x tape but then have to isolate the file server backup from the email backup to find what we want.
- Company C – we would restore 3 x tapes (tapes 4-6), noting that in doing this if we were to restore everything on tape 4 we would also have the partial backup of the email server (which we would ultimately disregard).
The key point here is that when referring to tape backups we should consider:
- How many tapes are the backups written to?
- What is on each tape (which server or multiple servers)?
- What are you looking for?
I note at this point that this was meant to be a “basic guide” to working with backup tapes in an eDiscovery / eDisclosure context. For the IT and Data nerds (like me) out there, I acknowledge the many other items that we haven’t discussed such as incremental or differential backups, encryption, and I’m sure a host of other topics we could talk for days about.
Hopefully the topics covered here start to demystify this very commonly available source of data. If the nature of your case and your client mean that data exists on tapes and you have to consider this source to retrieve evidence, remember these key points:
- Data can be restored from tape.
- Tape restoration is not always difficult and certainly not always impossible (but often needs specialist expertise).
- Data on tapes can be highly duplicative – therefore a strategy needs to be considered in order to consider costs, what you are looking for, and the best approach for your case.
- Tapes will often contain high volumes of data that is not on your live systems, but similarly, tapes will never contain every document ever written, sent or received.
- It is not correct to say that just because data is held on tape this source should be disregarded.
If you have a case with historic timelines, ex-employees or custodians whose data no longer resides on the live systems, or where other needed data resides on tape, Millnet’s consultancy team can help you consider the best approach with relation to this source.
Director – eDiscovery Project Management and Consultancy
Millnet, an Advanced Discovery Company
Emma is a Director at Millnet and is responsible for eDiscovery operations. She leads a team of the most qualified Relativity project managers and consultants in the UK. Emma has over 14 years experience in eDiscovery with previous roles at Latham & Watkins and Allens Arthur Robinson, gaining a wealth of experience in complex litigation cases.
Having more than four specialist Relativity certifications concurrently, Emma was recently recognized as a Relativity Master. The Master designation recognizes proficiency of technical skills.
Emma has been involved in hundreds of eDiscovery projects involving many millions of documents. The largest matter that she has dealt with to date involved over 42 million documents.