Greenplum Purchase Gets EMC into the Big Data Game
[Corrected July 8, 2010, 10:20 a.m.; see below] Boston is already a powerhouse in “big data.” It’s home to companies like Netezza, Dataupia, Vertica, and Lightwolf Technologies, which all help enterprises manage and mine the huge databases used in business intelligence applications. It was the site of the first “Boston Big Data Summit” last fall. And now, with the acquisition of San Mateo, CA-based Greenplum by Hopkinton, MA-based EMC, the region will be even bigger into big data.
Greenplum is probably best known as the provider of the multi-petabyte data warehouse that auction site eBay formerly used to analyze the behavior of site visitors. EBay users generate a reported 150 billion individual event records per day as they skim the site and place bids. That’s information eBay can use to optimize the site’s performance and serve customers better—but doing so requires sifting through trillions of records overall. This huge task requires a massively parallel processing approach, which is what Greenplum’s database software, built on top of the open-source Postgres object-relational database system, is optimized to do. [Update and correction: Oliver Ratzesberger, who is in charge of the analytics platform at eBay, wrote to say that the company now uses a different technology for analytics.]
The main difference between Greenplum’s technology and other database software schemes has to do with how data is accessed. In traditional database management systems built by companies like Oracle and Microsoft, different query processing jobs generally share access to the same hard-drive disks, which can slow down individual queries. But Greenplum’s so-called “shared-nothing” system divides data across multiple servers or segments, each of which has its own connection to a disk drive. That means a single database query can be run against many segments of data simultaneously—perfect for the analytics applications run by Greenplum customers like eBay, Fox Interactive Media, NASDAQ, the New York Stock Exchange, Skype, and T-Mobile.
Announced Tuesday, the all-cash acquisition of Greenplum (terms weren’t given) means that EMC will now have a data computing product division that allows it to compete directly with suppliers of large-scale data warehousing systems like Netezza and Vertica, not to mention database giants like Oracle and Teradata. It’s perhaps surprising that EMC would reach all the way to California for an acquisition in the big-data sector, given that there were several options within Route 128. But it’s easy to understand why EMC would want to be a player in this area, considering that … Next Page »