Vertica: Getting Its Ducks in a Column
At Vertica, the company blog has a plain, seemingly straightforward name: The Database Column. But it’s actually an inside joke—and one that hints at the fundamental innovation that the Andover, MA, startup hopes will catapult it into the ranks of established database makers like Oracle, IBM, and Sybase.
Vertica’s software stores data in the form of columns rather than rows, the way most traditional databases do. Exactly why that’s important, we’ll get to in a minute. But it happens to make the software well suited for “read-intensive” applications where data is frequently consulted but infrequently updated. That could mean analyzing historical stock price data, mining cell-phone usage records, or understanding the clickstreams of Internet surfers as they browse e-commerce sites—in short, the kinds of data operations that people these days are calling “business analytics.”
In these situations, Vertica’s system can store data using much less disk space, and respond to queries much more quickly, than traditional row-store databases. When querying clickstream data, for example, Vertica’s system is 40 to 215 times faster than traditional database management systems—indeed, “faster than anything on the face of the planet,” boasts company co-founder Andy Palmer. In other words, Vertica believes that the traditional one-size-fits-all, row-store database architecture no longer fits all.
“You might ask why we need yet another database program in what is seemingly a mature market,” says Vertica CEO Ralph Breslauer. “Well, about two-thirds of all growth in the database market is coming from the query-intensive, business-analytics side. In those areas, even a small improvement in your ability to make decisions quickly can make you more competitive.”
Palmer, a former Infinity Pharmaceuticals executive, started Vertica in 2005 with MIT computer scientist Michael Stonebraker, the main architect behind the famous INGRES and POSTGRES relational database management systems. Breslauer, who just joined the company two weeks ago, says the company has spent most of this year introducing the Vertica system to “early adopters” such as database administrators at several dozen large companies, and is busy assembling business case studies that will help sell the product to CIOs and other higher-ups starting in 2008.
Okay—I promised I’d explain more about that rows vs. columns issue. All databases are essentially tables consisting of rows and columns. Traditionally, when a database program writes data onto disks for storage, it arranges the individual records in text files according to rows. Take the following table:
In a row-store database, these records would be stored in a file as follows:
1, Buderi, Bob; 2, Zacks, Rebecca; 3, Roush, Wade.
But in a column-store database like Vertica’s, the data would be stored like this:
1,2,3; Buderi, Zacks, Roush; Bob, Rebecca, Wade.
Row-store databases are considered “write-optimized” because it’s easy to add records simply by adding rows, and it’s easy to spit all the data out onto a disk in one big operation. That makes them good for transaction processing environments characterized by frequent updates; think electronic banking and airline reservation systems.
But for applications where data is queried regularly but doesn’t change as fast, such as customer-relationship management databases and electronic library card catalogs, a “read-optimized” approach is more efficient. That’s what Vertica does. Saving data in columns means that the data in each column is of a uniform type, which means it can be more easily compressed, which means it takes only about one-tenth as much space on disk, which in turn means that it can be found faster.
And in contrast to data warehousing systems from local competitors such as EMC and Netezza (which went public in July), Vertica’s software runs on any industry-standard piece of server hardware, such as machines from Dell or Hewlett-Packard. “That’s a huge differentiator against the companies with these massive, proprietary offerings,” says Breslauer. “For companies that want to bundle Vertica with a specific server running Red Hat Linux, we can do that, but we can also do the software-only piece, which our competitors cannot do.”
Ironically, while Vertica’s software is optimized to deal with loads of queries, the company itself is getting more queries from potential customers than it can handle. “The smallest problem we have right now is lead generation,” says Breslauer. “The biggest problem is growing our technical and support capabilities internally as fast as possible.”
The company also “needs to move from only being understood by database administrators and architects to being understood by CIOs and business leaders,” says Breslauer. In each vertical industry it targets, such as telecommunications and investment analysis, the company will do that by “demonstrating the business value, creating a case study, and then taking it up the business chain—showing that if you can make a decision that’s not just based on a gut feeling, the result is an improvement in the bottom line by x percent.”
If Vertica can start selling stodgy Fortune 500 executives on its unconventional approach to storing data, that will really be something to columnize about.
Update: Go Sox!