Which RAID level should I use?
some guidelines to choose the right RAID level for your aerial mapping environment.
in Part 1 – Theory behind RAID, we talked about the Theory behind the RAID. Now we compare the different RAID levels.
RAID 0 – stripe
- Use it when you need full performance but the data is not important.
- In a RAID 0, the data is divided into blocks and then written to all disks at same time.
- RAID 0 provides the most speed improvement, especially for write speed, because read and write requests are evenly distributed across all of the disks in the array.
- It provides no fault tolerance at all. Should any of the disks in the array fail, the entire array fails and all the data is lost. This solution is cheap while it uses all the disk capacity.
- If RAID controller fails, you can do recovery relatively easy using RAID recovery software. However you should keep in mind that if the disk failure happens, data is lost irreversibly.
RAID 5 stripe + parity
- Use as large, reliable, relatively cheap storage.
- It writes the data blocks evenly to all the disks similar to RAID 0. But additional „parity“ block will be written in each round. This additional parity provides redundancy. If one drive fails, the contents of the block can be reconstructed using parity data together with all the remaining data blocks. Read speed is similar to RAID 0 (N-1). Write speed of a RAID 5 is limited by the parity calculation and updates. For each block, its corresponding parity block has to be read, updated, and then written back. Therefore no write speed improvement.
- If RAID controller fails, you can still recover data from the array with RAID 5 recovery software.
RAID 6 stripe + dual parity
- Use as large, extremely reliable, relatively expensive storage.
- RAID 6 uses a block pattern similar to RAID 5, but provide 2 different parities. So 2 drives can fail. Read speed of the N-disk RAID 6 is (N-2) times faster than the speed of a single drive.
- There is no write speed improvement, the parity updates require even more processing than that in RAID 5.
- The recovery of a RAID 6 from a controller failure is fairly complicated.
RAID 10 mirror + stripe
- Use as a large, fast, reliable, but very expensive storage.
- RAID 10 uses 2 identical RAID 0 arrays to hold 2 identical copies of the content. Read speed is N times faster than that of a single drive. Each drive can read its block of data independently, same as in RAID 0 of N disks.
- Writes are 2 times slower than reads because both parity copies have to be updated. So there is a write speed improvement. Half the array capacity is used to build the fault tolerance, the overhead increases with the number of disks, contrary to RAID 5 & 6, where the overhead is the same for any number of disks. So far RAID 10 is the most expensive RAID.
Conclusion:
Now, if this was important data, the obvious answer would be „RAID 10“, for performance and reliability.
You could also make arguments for RAID 5, and RAID 6, for reliability, but never for speed, especially for writing.
If you didn’t care about this data at all, the obvious answer would be „RAID 0“, especially when talking about processing speed and temporary / intermediate data.
RAID 0 called „AIDS“ – Array of Inexpensive Disks that Suck“
But, once one drive will die, say goodbye to everything.
Final Resume:
As we learned, there are different RAID levels available, providing different ways of reliability and performance.
So in the typical aerial workflow, image based processing, LiDAR processing as well, a combination of different RAID level will be the key to success.
In my opinion, RAW data, from camera or LiDAR, should be stored in a relatively safe way. RAID 5 or RAID 6 will be the solution to integrate these data into the workflow.
Temporary processing files, such as intermediate images waiting for RGBI processing or used to produce Point Clouds or Orthos, should be stored in a RAID 0 system. Only by using that technology, full performance will be available for all the cores of your high performance workstations and to saturate your 10GB Ethernet environment. If course, a large SSD RAID 0 array will be the best for that purpose.
What we recommend, e.g. for Vexcel Ultramap
So in a typical Vexcel Ultramap processing environment, LVL-0/00 data should be stored in a RAID 5/6. That provides us with reliability, some redundancy, it’s not so expensive but will reduce the write speed much (to the speed of a single disk)
Lvl-2 intermediate images should be processed to a RAID 0. That will provide the fastest access to the data for processing, largest capacity of disk space, lowest cost and will be a Turbo for the processing. The missing data protection is a factor, but since we have the original RAW data on our RAID 5 solution (and of course, minimum 2 backups of these data somewhere safe), we are able to reprocess these intermediate products relatively easy in case of…
Additional steps of the workflow, such as Tie-point-matching, Dense Matching, Orthos or Point Cloud benefit much as well when using it out of a RAID 0 system.
Especially when going to work with Point Clouds (nFrames Sure Aerial), then the increased Read and most important , the increased WRITE speed, will help to lower the processing times for your projects a lot. When analyzing the task manager, e.g. in Sure Aerial, you will notice, that the CPU’s are not sweating much, most of your cores waiting to be feeded with data.
In another article we will talk about the Myth of Redundancy of a RAID 5, so stay tuned.
continue with Part 3 – Lack of write speed gain, where we will talk about the performance of the different RAID levels.