Storage Management – Data Management Features

Exploring the key features of centralised data storage management systems
By Paul Rummery, Securenet Consulting

Using the Cloud as a Storage Tier

With 'cloud' as a part of a storage virtualisation strategy, there is no longer a need to deploy dedicated off-site infrastructure to move data backups off-premise for disaster recovery.

Petabytes of thin-provisioned cloud storage can be added to the infrastructure, with a cloud storage gateway feature set, including dynamic caching, data reduction, at-rest local key encryption and bandwidth optimisation.

For example, replicate to Amazon AWS, Microsoft Azure and other public cloud providers.

Augmenting or even replacing an off-site tape strategy is simpler than ever.

Intelligent Caching

Disk caching provides local performance for volume access. Data can be partially or fully cached locally, depending on application needs. Cache is easy to re-size as application needs change and can consist of local disk, solid-state, network-attached or SAN storage.
Security – The storage management software encrypts data blocks prior to transporting them to public cloud storage. The gateway leverages the Advanced Encryption Standard (AES-256).

High Speed Caching

Accelerates disk I/O and business-critical application performance by empowering existing storage assets.

Caching essentially recognises I/O patterns helping it anticipate which blocks to read next into RAM from the back-end disks. That way the next request can be fulfilled quickly from memory absent mechanical disk delays.

Accelerate Random Writes

Significantly increases performance for workloads such as databases and ERPs often generate random write operations. These types of operations are the most expensive that can occur within a storage system and always result in a negative performance impact. This impact not only affects magnetic storage devices (HDD), but flash-based (SSD) devices as well.

QoS (Quality of Service)

Ensures high-priority workloads competing for access to storage can meet their service level agreements (SLAs) with predictable I/O performance. QoS Controls regulate the resources consumed by workloads of lower priority.

Without QoS Controls, I/O traffic generated by less important applications could monopolize I/O ports and bandwidth, adversely affecting the response and throughput experienced by more critical applications. To minimize contention in multi-tenant environments, the data transfer rate and IOPS for less important applications are capped to limits set by the system administrator.

Load Balancing

Improve response and throughput

Overcomes typical storage-related bottlenecks
Spreads load on physical devices using different channels for different virtual disks
Detects disk “hot spots” and transparently redistributes blocks across the pool
Automatically bypasses failed or offline channels

Load balancing across the back-end channels into the physical storage pool complements caching to improve response and throughput. Load balancing helps to overcome short-term bottlenecks that may develop when the queue to a given disk channel is overly taxed, or when one channel fails or is taken offline.

SSD (Solid State Disk) & HDD (Hard Drive Disk)

For random disk read patterns, SSDs are said to be 25 to 100 times faster than SAS hard disk drives (HDDs) at roughly 15 to 20 times higher cost per Gigabyte. In practice, SSDs substantially reduce the number of HDDs required for heavy random I/O pattern. Just one SSD PCI I/O card may yield the equivalent of 320 hard disk drives. That’s 300 times less hardware to house, maintain, cool, and watch over.

In today’s storage market, solid-state drives (SSDs) are emerging as an attractive alternative to hard disk drives (HDDs). Because of their low response times, high throughput, and IOPS-energy-efficient characteristics, SSDs have the potential to allow your storage infrastructure to achieve significant savings in operational costs. However, the current acquisition cost per GB for SSDs is currently much higher than for HDDs. SSD performance depends a lot on workload characteristics, so SSDs need to be used with HDDs. It is critical to choose the right mix of drives and the right data placement to achieve optimal performance at low cost. Maximum value can be derived by placing “hot” data with high IO density and low response time requirements on SSDs, while targeting HDDs for “cooler” data that is accessed more sequentially and at lower rates.

Thin Provisioning to optimise efficiency

Dynamic allocation of space

Using thin provisioning, applications consume only the space they are actually using, not the total space that has been allocated to them.

Designed to keep business overhead low, thin provisioning optimises efficiency by allocating disk storage space in a flexible manner among multiple users, based on the minimum space required by each user at any given time. This reduces use of storage hardware but also can save electrical energy use, lower heat generation and reduce hardware space requirements.

For example, a database might be expected to grow to 100 TB but is only 10 TB today. Using thin provisioning, a storage administrator can allocate 100 TB of virtual capacity to meet expected future requirements while consuming only 10 TB of physical capacity.

Snapshot Backup

The ability to take an image of the storage index table at that point in time, for recovery, without needing to copy full copy of actual data.

Capture point-in-time images quickly.
Recover quickly at disk speeds to a known good state
Eliminate back-up window
Provide “live” copy of environment for analysis, development and testing
Save snapshots in lower tier, thin-provisioned storage without taking up space on premium storage devices
Synchronise snapshots across groups of virtual disks
Trigger from Microsoft VSS-compatible applications and VMware vCenter

Once you’ve tried online snapshots, you can’t live without them. Snapshots capture a known good point-in-time that may be used for several purposes without scheduling lengthy back-up windows. It may give you a recovery point to undo a patch, file deletion or virus attack. Or it may be used to feed business intelligence analysis. They are also commonly used to verify new software enhancements in test and development before being put into production.

Snapshots are invaluable in cloning working system images to provision identical new servers or new virtual desktops. Although snapshots utilities are commonplace in operating systems, server hypervisors, backup software, and disk arrays, capturing them at the SAN level affords some major advantages. For one, there is no dependency on host software. Nor does it consume host resources (your shared memory and processor pool serving your applications in VMware). And you don’t need mutually compatible disk arrays. You can snap the contents of disks on a tier 1 array and place it on a tier 2 or tier 3 device rather than tie up expensive space on the top-of-the-line equipment.

Replication

Maintain distant copies up-to-date without impacting local performance
Perfect for disaster recovery, business continuity or inter-site migrations
Compressed, multi-stream transfers for fastest performance and optimum use of bandwidth
Test disaster recovery readiness without impacting production
Asynchronous - Bidirectional transfers are available on some systems.

Auto-Tiering

Makes intelligent and automatic usage optimisations based on cost and performance across different types of storage.

For example, useful when disk shelves are different, e.g. mix of Flash, Solid State, SAS or SATA. Moving most accessed data onto the fastest disk type.

Of course, there will be exceptions, especially when you need to assign high performance storage to an infrequently used volume, as in special end-of-quarter general ledger processing. In these cases, you can pin specific volumes (virtual disks) to a tier of your choosing, or define an “affinity” to a particular tier. Only if that tier is completely exhausted, will a lower tier be chosen.

Volume Mirroring

Provides a single volume image to the attached host systems while maintaining pointers to two copies of data in separate storage pools. Copies can be on separate disk storage systems that are being virtualised. If one copy failing, the system provides continuous data access by redirecting I/O to the remaining copy. When the copy becomes available, automatic re-synchronisation occurs.

VMware Virtual Volume (VVols)

Simplify storage management in vmware environments.

Traditionally, vSphere admins requested assistance from the storage administrator to provision storage for new VMs. This provisioning leads to an increasingly complicated process that requires multiple convoluted steps, slowing down the process. Furthermore, storage administrators have no visibility to the VMs residing on the storage they provision. As a result, they either manually record the VM to LUN mapping on a spreadsheet or create one datastore per VM. This makes the management of storage infrastructure for virtualized environments complex, costly and inflexible.

Allow storage and infrastructure administrators the benefits of VVols for their legacy EMC, IBM, HDS, NetApp and other popular storage systems - including all flash arrays.
Now enabled with VVols, vSphere administrators can ‘self-provision’ virtual volumes quickly from virtual storage pools without having to contact the storage administrator.

Continuous Data Protection & Recovery

Roll back to a previous point-in-time prior to a disaster, virus attack or other disruptive event without doing an explicit backup.

Deduplication and Compression

Users and applications occasionally will have duplicate copies of the same data, wasting storage capacity. Deduplication service analyses blocks of data, looking for repetition. It replaces multiple copies of data with references to a single, compressed copy.

Data Migration Tools

Storage equipment ages or fails, and it must be replaced. Simplify the process of migrating data to new equipment without many of the difficulties, delays and stresses usually associated with these moves. Often the transition can take place during normal working hours without disrupting applications.

Storage Controllers

It is essential that you get the best / most powerful controller available, if you are planning to utilise advanced capabilities, or if you go mid range, make sure you factor for future load.

Storage arrays all have some form of a processor embedded into a controller. As a result, a storage array's controller is essentially a server that’s responsible for performing a wide range of functions for the storage system. Think of it as a storage computer.

The storage array can be directly impacted (and in many cases determined) by the speed and capabilities of the storage controller. The controller’s processing capabilities are increasingly important. There are two reasons for this. The first is the high speed storage infrastructure. The network can now easily send data at 10 Gigabits per second (using 10 GbE) or even 16 Gbps on Fibre Channel. That means the controller needs to be able to process and perform actions on this inbound data at even higher speeds, generating RAID parity for example.

Also the storage system may have many disk drives attached to it and the storage controller has to be able to communicate with each of these. The more drives, the more performance the storage controller has to maintain. Thanks to Solid State Drives (SSD) a very small number of drives may be able to generate more I/O than the controller can support. The controller used to have time between drive I/Os to perform certain functions. With high quantities of drives or high performance SSDs, that time or latency is almost gone.

Automated tiering moves data between types or classes of storage so that the most active data is on the fastest type of storage and the least active is on the most cost-effective class of storage. This moving of data back and forth is a lot of work for the controller. Automated tiering also requires that the controller analyse access patterns and other statistics to determine which data should be where. You don’t want to promote every accessed file, only files that have reached a certain level of consistent access. The combined functions represent a significant load on the storage processors.

There are, of course, future capabilities that will also require some of the processing power of the controller. An excellent example is de-duplication, which is becoming an increasingly popular feature on primary storage. As we expect more from our storage systems the storage controller has an increasingly important role to play, and will have to get faster or be able to be clustered in order to keep up.

---------------------------------------------------------------------------------------------------------------

WANT TO LEARN MORE? Contact Securenet Consulting