MongoDB Wizards Work to Make 10gen the Red Hat of Databases

9/27/11Follow @wroush

Database designers are the secret wizards of the Web revolution, and they’ve been busy writing new spells. Deep inside the castle at each of today’s leading Web companies there’s at least one custom “NoSQL” database keeping things running: Google has Big Table, Amazon has Dynamo, Facebook built Cassandra, and LinkedIn has Project Voldemort. Seriously.

In fact, the Harry Potter allusion is apropos: just as Harry’s wizarding world was riven by conflict, there’s a revolution underway in database architecture. For decades, the relational database systems developed by stalwarts like IBM served as the foundation of business computing, but today they’re being displaced by non-relational databases, like Big Table and Dynamo, which are better suited for large-scale computations of the sort that big consumer-facing Web services carry out all day.

We’ve seen a movie like this before. In the 1990s and early 2000s Linux hackers created an operating system to rival Windows and other established players, eventually claiming enough of the corporate computing market to turn Linux support startup Red Hat into a juggernaut with $1 billion in annual revenues. Today several of the companies building NoSQL databases—including New York- and Redwood Shores, CA-based 10gen, Mountain View, CA-based Couchbase, Burlingame, CA-based DataStax, and Menlo Park, CA-based Neo Technology—see the chance to steal some of the $20 billion corporate database market away from Oracle, IBM, and Microsoft.

Meanwhile, venture backers are jumping into the sector with gusto. Couchbase raised $14 million in August, and this month Data Stax and Neo received $11 million apiece. On September 12, 10gen, which had previously raised about $11 million from Flybridge Capital and Union Square Ventures, announced that it had obtained $20 million more in a round led by Sequoia Capital.

Dwight Merriman

This is clearly a company to watch—so I was pleased to host Dwight Merriman, 10gen’s co-founder and CEO, and Erik Frieberg, its new vice president of marketing and alliances, here at Xconomy San Francisco when they were in town a few weeks ago.

At the moment, the NoSQL market is still small—fewer than 1,000 companies subscribe to support and monitoring services for MongoDB, the free, open-source database that 10gen created and shepherds. But MongoDB itself is downloaded 100,000 times a month, and it’s already used inside Craigslist, Disney, Foursquare, IGN, Intuit, MTV, Shutterfly, and thousands of smaller organizations. (Merriman says 10gen doesn’t have an exact count of MongoDB users, since they don’t have to give their names to download the software: “We only know a fraction of them, which is normal in open source.”)

“There is an opportunity for somebody to create the Red Hat of databases in this space,” Merriman says. The NoSQL market, he says, “will be 100 times bigger than it is today.”

Merriman has an illustrious history as a serial tech entrepreneur. He’s best known as co-founder and chief technology officer at DoubleClick, the New York-based Web banner advertising company acquired by Google in 2007 for $3.1 billion. With business partner Kevin Ryan, who was CEO at DoubleClick, Merriman started AlleyCorp, a network of Silicon Alley companies that you’ve probably heard of: Gilt Groupe, The Business Insider, and ShopWiki. In 2007 Merriman co-founded 10gen—another AlleyCorp company—together with Eliot Horowitz, who is the database company’s chief technology officer.

Their first project was writing MongoDB, which departs in significant ways from the relational databases that Web application developers had always used in the past. “In a way, MongoDB is the kind of database I always wanted to have at DoubleClick,” Merriman says. “There were certain kinds of problems that kept coming up. There were scaling problems with the data, and also it was a pain to write apps using relational databases. It seemed like it could be easier.”

The basic problem with old-fashioned relational databases is that they weren’t designed for the huge datasets, unpredictable loads, or rapid software development cycles commonplace in the Web world. (If you’re a programmer, you might want to skip the next few paragraphs, as I’m about to give a painfully dumbed-down explanation.) For years, the Web world was centered around a set of free, open-source components known as the LAMP stack, for Linux (the underlying operating system running most Web servers), Apache (the open-source Web server program), MySQL (the free relational database management system developed by Swedish startup MySQL AB, now owned by Oracle), and Perl/PHP/Python (the most common Web programming languages). Relational databases can be fast and powerful, and MySQL is still the database of choice for many common Web applications, such as the Drupal, Joomla, and WordPress publishing systems. But more and more developers over the last half-decade have been coming up against MySQL’s limitations.

One of them is scaling. Relational databases consist of thousands of related or “joined” tables where the data in one column of Table A might be defined by the rows and columns of Table B, which might contain columns that only make sense in relation to Table C, et cetera. Unless all of these tables live on one machine, it’s hard to keep them joined, and it’s hard to make sure that each transaction—each query or read-write operation—is completed consistently. That makes it difficult to add server and storage capacity quickly if your Web application happens to become popular. And if you’re rewriting your application and you discover that you need to add a new category of data to the existing tables—well, good luck. Before Craiglist adopted MongoDB, adding one new column to its MySQL database would take three months of continuous computing time, according to 10gen’s Frieberg.

“It was the imperative to scale that started this space,” says Merriman. “At the app server level, it’s easy to have 100 servers with load balancers distributing the work. But it’s very hard to have a single database running across all 100 servers. Distributed joins becomes a very hard problem, and distributed transactions becomes a very hard problem. The way Big Table and MongoDB solve this problem is by saying we are not going to do those two things. That means some stuff is left out, but you can still cover a very large set of use cases.”

In MongoDB, as Merriman explains it, there are no tables, rows, or columns. It’s a document-oriented database, meaning it consists of collections of documents, with each document containing an arbitrary number of fields. Programmers can add fields to documents as needed, without affecting other documents in the collection. With this kind of data storage, chunks of the database can be distributed safely across different servers, because related data is always pre-joined inside individual documents—just as it is in the real world.

“A classic example from the relational database world is, say we want to store customer orders,” says Merriman. “So you’d have an order table, and a line-item table, and for a given order there might be 10 items in the line-item table. But if you’re doing something more complex, like adding a ‘ship to’ and ‘bill to’ address, all of a sudden you might have 20 or 30 tables representing a single order. When you do a query, you have to go find 30 pieces of information, and if they’re not on a single machine, bringing them together is very hard. But in MongoDB, you would have order documents, with one document per order, and that’s the end. It’s all in one place already.”

MongoDB and its cousins are called NoSQL databases because they don’t necessarily depend on the old structured query language used to retrieve data from MySQL and similar databases. But a better term would be “non-relational” databases, Merriman says, since it’s the self-contained nature of each document that makes them so scalable.

Interestingly, however, Merriman and Horowitz didn’t originally think of 10gen as “the MongoDB company.” When they set out in 2007, they wanted to build a service that hosted other Web companies’ software and provided a complete infrastructure stack; MongoDB was just the database layer in that stack. Today that kind of offering is called a “Platform as a Service” (PaaS), and it’s a niche being exploited successfully by companies like Heroku, Cloud Foundry, Microsoft (with Azure) and Red Hat (with OpenShift). But back in 2007, it was too early—companies weren’t ready to hand over their business-critical functions in this way. “It’s been a slow curve to maturity in PaaS,” says Merriman. “Five years from now, you will see a lot more of that.”

After less than a year, 10gen shifted gears to focus on the data layer of its platform. “We had beta users who were getting really excited about MongoDB,” says Merriman. “They were asking us ‘Can I use this elsewhere, outside the stack?’ We said, ‘Gee, that could make sense—it’s half of what we’re building anyway. So let’s make something that can run anywhere.’”

10gen released the first open-source version of MongoDB at a 2009 NoSQL meetup in San Francisco, organized by hosting provider Rackspace and streaming music startup Last.fm. For Merriman and Horowitz, life morphed into a series of speaking engagements at Ruby, Java, PHP, and Python conferences, where they’d tout MongoDB’s scalability and 10gen’s services. Following a script established by Red Hat, Acquia, and other companies in the open source world, Merriman and Horowitz decided that 10gen’s best strategy was to keep MongoDB free, continue to improve its root code, and try to make money selling services like deployment support, application development consulting, and training.

Merriman says his big worries during this period were the typical ones share by all tech entrepreneurs: “Will anyone ever use it? Will they like it? Will it be big companies or just small companies? Will they pay us for services? Is our market share good?”

But once a few big outfits like Shutterfly and Craigslist started using MongoDB, word about 10gen began to spread organically among developers. At first, the database mainly appealed to startups where developers “didn’t want to buy these $100,000 Sun boxes” to run their Web applications, but preferred to spread them across lots of cheap servers, Merriman says. And that’s where 10gen’s core community still lies: “You have developers voting with their tie and downloading the product and advocating its use at meetups around the world,” says Luis Robles, a Sequoia Capital partner who handled the firm’s recent investment in 10gen. But now larger enterprises like Intuit, which is rolling out a variety of mobile and cloud-based personal financial services, are also dipping their toes in the NoSQL water. “The use cases for big data span across many different verticals—Web companies, media companies, gaming companies, medical, telcos,” says Robles.

Today, MongoDB is the best-known NoSQL database, at least judging from Google’s statistics—searches for MongoDB are about twice as common as searches for the closest competitor, CouchDB. To meet the demand for support and ongoing development of MongoDB, the company has expanded bicoastally; it now has 80 employees, half in New York and half in Redwood Shores. (Merriman says it’s interesting doing business from both places: “As a pure technology company, we’re an outlier in New York, but in Redwood Shores it’s completely normal.”)

I asked Merriman when he thought 10gen would hit its inflection point—when the gradually rising adoption curve would begin to look more like a hockey stick. “In theory, that is happening right now,” he says. “Adoption is great. Customers are super happy. Enterprises are using it. People are willing to pay.” The NoSQL market is still small compared to the overall database market, but within three years, Merriman says, it will be clear whether 10gen was a “home run” for its investors. “There are a whole bunch of risk gates that we have already gone through. The thing I focus on now is execution—doing a good job for our customers and users and not messing that up.”

What would “messing up” mean for 10gen? “A bad bug would be a perfect example of bad operational execution,” Merriman says. “Also, bad marketing, or bad sales, or bad hiring. The next 200 people that we hire—are they engaged? Can we get good people?”

Fortunately for 10gen, there aren’t many other places yet where young NoSQL wizards can go for a Hogwarts experience. “The good thing is that it’s a cool product to work on,” says Merriman. “If you want to work at a pure tech company, this should be one of the more interesting ones.”

Wade Roush is a contributing editor at Xconomy. Follow @wroush

By posting a comment, you agree to our terms and conditions.