Discover the Different Criteria to Choose Your Backup
For now, let's focus on the first criteria, which is to analyze your backup needs. Throughout the chapter, we will relate these criteria and decisions to PiraSTEM’s needs.
What do we need to backup?
Everything is ideal, but we may need to compromise for cost and time reasons and focus on what is crucial for the company. For example, why backup the operating system files on a computer when they can just be reinstalled if needed?
In the case of PiraSTEM, the business model is strongly based on intellectual property, making this data a valuable business asset. For this reason, the uncompromisable backups for PiraSTEM are for all their design and intellectual property from the workstations and local servers.
How often should we backup?
How much data can you afford to lose, and how quickly can you complete a backup? This question relates to acceptable risk. For example, some businesses make a daily backup; others might have a near-real-time service because they can’t afford any downtime that could result in some data loss, even if minor.
One of PiraSTEM’s unique selling points is the creative effort it puts into its product designs and support material. Hence, the data related to those activities are very valuable to the business. It will need to be covered by a reliable backup, probably at least once a day.
How long do we keep backups?
The retention period is the amount of time you keep a backup. This period can be either in age or number: do we backup daily and keep them for a month, or hourly and keep them for a week? Is everything kept for a year? Do we just keep the last “n” backups? This is related to acceptable risk, cost of storage, and compliance. There’s no solid rule for retention, although it’s common to keep most general work for a year or two.
PiraSTEM’s HR and office systems are hosted in the cloud and backed up by their service provider, so the business only has to decide how long to retain backups of their in-house data, according to their own risk analysis and contract requirements.
Where do we store backups?
When deciding where to store our backups, there are questions to consider:
Location: In a fire safe in the same room as the servers? Somewhere else on-site? Offsite in another site belonging to us? Cloud storage?
Media: What media type (or types) can hold our backups? Should we use USB flash drives or hard disks? Storage media (and their management systems) are usually the most expensive part of a backup solution, so a bad choice here can be costly.
Luckily, there is a rule that can help with such decisions: the 3-2-1 rule. It states: There should be three copies of your important data, on two different media, with one copy offsite.
A hybrid or hierarchical backup system, storing some backups locally (and securely) while moving older copies elsewhere, can be a good compromise. As a bonus, storing older backups on lower-performance, remote media can be relatively cheap. Such a practice is called hierarchical storage management (HSM).
Check Your Data Sensitivity
Data may be sensitive because it is valuable to a person or business or it relates to a person’s social, employment, or medical history. Protecting backups from unauthorized access is a best practice and often a legal requirement, as is the case with the General Data Protection Regulation (GDPR) in the European Union and the U.K. Data Protection Act 2018. Security may include a mixture of physical security (locked safes, secure buildings) and data security (encryption).
Organizations, such as PiraSTEM, have most of their value tied up in intellectual property. Can you imagine the risk if their design for an innovative computing kit got into the hands of a competitor? Therefore, it’s in PiraSTEM’s best interest to protect such data, and they should encrypt their backups, making sure they can only be electronically and physically accessed by authorized personnel.
Analyze Your Own Company Constraints and Resources
Having covered some of the principles and rules of good backups, we need to consider whether we have any constraints that will affect the design of our solution; for example, the 3-2-1 rule tells us that having a copy of our data offsite is a good idea, but how do we do this? One possible option is to back up to the cloud, but is this going to work for us?
Understanding data transfer speed
Getting your precious data somewhere safe involves a journey across a local or remote network, which will take time.
Network data transfer speeds are quoted in decimal megabits per second (Mbit/s, or Mb/s, or Mbps) , while file sizes are given in binary megabytes (MB). When estimating the time to complete a backup across network links, treating all values as decimal units to simplify the calculations is common. Verify estimates by running test backups.
We can calculate approximate data transfer speeds using the formula:
Time (seconds) = (data size in MB * 8) / link speed in Mbps)
The “8” accounts for one value being in bytes and the other in bits.
Example: The time to transfer a 500MB file across a 10Mbps data link is about (500 * 8) / 10 = 400 seconds, or 6.67 minutes.
The backup solution needs to meet requirements; for example, if a full backup takes longer than 48 hours over a weekend, it's going to clash with the weekday incremental or differential backups and complicate things.
Network protocol overheads and network quality mean that data transfers are not 100% efficient. As a rough guide, assume your transfers are 90% efficient; that means a 400-second transfer may take closer to 400 / 0.9 = 444 seconds. Factor this in when estimating the duration of large backups.
Now that you know how to calculate data transfer speed, let’s perform some calculations to see whether PiraSTEM’s broadband data link makes immediate offsite backup and restores viable:
PiraSTEM’s rural location means they can only obtain an internet service with a maximum uplink speed of 10Mbps and a maximum downlink speed of 38Mbps.
From earlier, PiraSTEM’s typical daily backup size is 20GB, and they have about 15TB of data on-site in total. Therefore, using the transfer rate calculation, we can determine:
Daily backup (typically 20GB) over 10Mb uplink = (20000 * 8) / 10 = 16,000 seconds, or 4.4 hours.
Full backup (or worse case daily backup, 15TB) = (15000000 * 8) / 10 = 12,000,000 seconds, or nearly 139 days.
Full restore (15TB) over 38Mb downlink = (15000000 * 8) / 38 = 3,157,895 seconds, or 36.5 days.
These calculations suggest that a typical daily backup might take just over four hours, which is not an issue. However, if the backup size grew for some reason and took over 24 hours to complete, a daily backup schedule would be disrupted. The full backup and restore time is much too long to be practical.
Considering the time to process full backups and concerns about their rural location and what would happen if the internet link was out of action, PiraSTEM decided that a fully remote backup and restore solution is not workable. Instead, they decided that a solution involving temporary local storage would be better.
Mitigate the Costs of Your Backup Plan
Backup plans are a compromise between budget and acceptable risk. At the extreme end, an organization could arrange for backup sets of all their working data in real-time to a remote location across an ultra-fast data link. Still, the setup and running costs would be astronomical and need exceptional justification! The 3-2-1 rule paves the way for a good compromise by encouraging backup operators to retain a secure local copy while shifting others elsewhere. The main decisions then become ones of what media to use and how to get the data off and back on site (for restores), either by a data link or ground transportation, both of which will need sizing (link speed or transport pickup frequency).
PiraSTEM’s backup plan seems likely to be based on storing recent backups on-site and arranging for older ones to be taken away to secure storage. This means choosing a physical media for the backups, and there are a few options.
Backup media
Before introducing you to the different types of backup media, there are two important things to know:
All backup media can fail or degrade over time (so-called bit rot), so backup solutions must implement data verification and media replacement.
The costs for each media do not include the capital cost of the equipment needed; for example, an 8-tape autoloader costs around $5,400, and a simple, 8-bay disk storage unit, plus controller, is about $700.
Media Type | Capacity | Transfer Speed per Hour | Cost per TB | Good to know |
Magnetic tape | Up to 50 TB | 14000 GB/hour | $4.00 | Popular for small to large businesses. |
Hard disk | ≥ 20 TB | 450 GB/hour (on local network) | $24.00 | Allows for high-capacity backup solutions to be built. |
Optical storage (e.g., CD, DVD, Blu Ray) | Up to 50 GB | 180 GB/hour | $120.00 | Relatively slow throughput. Some optical media are write-once, so that stored files cannot be altered. |
Flash memory (USB sticks & SSDs) | Up to 2 TB per USB stick | 540 GB/hour | $238.00 (for USB sticks) | For small-capacity, personal (non-corporate) backups. Data retention time is unpredictable and varies with storage temperature. |
Hybrid backup solutions
The 3-2-1 rule tells us that having backups stored offsite reduces risk. If your internet connection is fast enough, you could send older backups to cloud storage. In general, cloud storage is divided into two types:
Online or hot storage behaves just like regular disk storage with access speed limited by your internet connection speed (monthly cost around $60 per TB).
Cold storage can be quick to accept data, but retrieval can take longer (even hours) because, behind the scenes, your data is perhaps stored on lower-performance systems or even moved to offline tape. The benefit of cold storage for backup archival is it’s cheap (monthly cost around $8 per TB).
Location and its impact on internet connection speed are important to consider when thinking of cloud storage. As previously seen, PiraSTEM’s rural location makes using cloud storage as an immediate backup destination unworkable.
Managing your backup media
There is a rule to help manage backups in a way that balances risk with sensible media management: The GFS (grandfather-father-son) rule.
The GFS rule is based on a backup rotation scheme that is a workable compromise between the time to complete a set of backups and the amount of storage they need. There are many ways to implement such a scheme, with different retention times. Here’s one example:
Grandfather: Perform a full backup once a month and keep the backups for one year (or longer if needed for compliance or contract).
Father: Perform a full backup once a week. Keep the backups for four months.
Son: On all other weekdays, backup the changes since either the previous day (an incremental backup) or since the last full backup (a differential backup). Keep these backups for a week.
The backups may be kept securely on-site, with some being sent elsewhere for security, as per the 3-2-1 rule.
If you completed a full backup every day, and kept each backup for two years, you would have consumed around (366 x 2) = 732 sets of media at the end of the period! Let’s see how the GFS rule helps us to better manage our backup media.
If a full weekly backup is about 20TB, with a month-end retention period of two years, plus you complete daily differentials, the full amount of media needed (assuming a full backup fits onto one unit of media) is:
Month 1: Four units of media (four weekly backups in the month) + five more for the differentials (Mon-Fri).
Month 2: One new unit to replace the full backup that’s now in retention.
Month 3: One new unit to replace the full backup that’s now in retention,
etc.
Over two years, that’s 4 + 5 + 23 (rest of the full monthlies), or only 32 units of media!
Going back to our case study company, PiraSTEM will need to choose either a tape or disk solution that meets their budget and capacity needs, based on what they consider a suitable GFS pattern for acceptable risk. If they do not have a secure, detached but nearby location to store their backups, they will probably consider a fire safe in the building for recent backups and arrange for older ones to be taken off-site.
Choose the Solution That Suits Your Company Best
As we’ve seen, there are many things to consider when choosing a backup solution. Here’s a decision workflow that highlights the main points that require technical, logistical, or financial input.
Let’s Recap!
When designing a backup solution, start by considering the what, how, and where questions.
What needs to be backed up?
How often does the data need to be backed up?
How long are the backups being retained?
Where will the backups be stored?
How have you secured the backups from risk (encryption, secure storage, etc.)?
Understand any limiting factors, such as data link transfer speeds and budget.
Remember that storage needs usually grow over time.
There are a number of different types of backup media available.
Congratulations on completing the first part of this course! Don’t hesitate to go back and revisit what you’ve learned so far, and when you are ready, try out part 1 quiz before moving on to the next part.