Data Domain Founder, Kai Li, on EMC Acquisition and the Future of Data Storage
(Page 4 of 4)
to data center customers in remote office situations, where you reduce the amount of data you have to move from your source to a data center.
Symantec and CommVault recently introduced deduplication technology to their backup software products. Several players are making deduplication storage in a virtual tape library system. Their idea is to make a disk system look like tape; you can roll a new storage system in easily. Diligent [acquired by IBM last year] is one. Another player is Quantum. Their deduplication storage systems started shipping two or three years ago. Data Domain started selling products in 2003. We were the first company to sell a deduplication storage system. Meanwhile, HP has developed their own deduplication product for low-end, remote offices. And Dell is also planning to sell deduplication storage systems.
X: What is Data Domain’s—and now EMC’s—main competitive advantage in this space?
KL: Data Domain has been very customer focused, we make a product that’s very easy to use. And Data Domain’s technology has been superior to competitors’. One of the reasons our technology is better is that our software architecture was designed with parallelism built in from Day One. We were betting on multicore CPUs, rather than betting on many disks, many spindles, to achieve scalable throughput. As long as Intel and others keep making progress on CPUs, our technology can translate the increasing CPU power into increasing deduplication throughput. In the product line at Data Domain, you can see that. The current product runs 700 megabytes per second. The previous one runs 350 megabytes per second, from the year before, and so on. It’s essentially on a Moore’s Law curve.
X: So, faster, cheaper, and more efficient storage and backup. How will this affect the data storage industry more broadly?
KL: Deduplication is going to reshape the storage industry. If you look at storage media, we currently see a hierarchy, where the bottom is probably tape. With deduplication, tape use will be substantially reduced—maybe in time it will disappear. The next one is high density disks such as SATA disks which are 1.5 terabytes for $200. Then, fiber channel disks—high rotations per minute, they don’t have a lot of density, but you can use the disks to run a higher number of transactions per second with a database system. Then, solid state disks. Between those four kinds of storage media, the cost factor is 3-5 between each level.
High-density magnetic disks will stay because they’re inexpensive, and we have a lot of data to put there. But if deduplication storage technology can be applied to solid state disks, and compress the data by a factor of 3-5, we can reduce the cost to that of fiber channel disks. When this happens, fiber channel disks may disappear. So, deduplication is impacting the storage community in multiple directions. It may be going into primary storage systems. But a lot of work needs to be done.
X: What is Data Domain’s biggest challenge going forward?
KL: The general question is how to apply deduplication to other use cases. Currently, the primary use case has been backup and disaster recovery. Data Domain has moved into nearline and archival storage use cases. There’s still a lot of work to be done, though. How to attack those markets, and others, is the general question. The value proposition is very clear. It’s essentially translating computing power into storage and bandwidth reduction. Computing power is getting cheaper and better. Can we translate that into less storage, and less network storage needed? We are on the roadmap to do better, year after year.