What’s After Big Data? Niche Analytics, Data Wrangling, Smart Storage

8/21/14Follow @gthuang

Big data is a “hackneyed term,” said Michael Stonebraker. “I try hard not to use it.”

It was wintertime when I sat down with a few database experts in Boston to talk shop. Stonebraker, an MIT professor and entrepreneur, is one of those graybeards who was working in big data long before it was called big data—and will likely be doing so long after the term has faded.

In hindsight, his remark was a clear sign that the marketing hype around “big data” had peaked. Everyone was using the term, and no one seemed to know what it really meant—or how it could benefit mainstream businesses and reward data-savvy entrepreneurs.

The premise of big data, at least, is easy to grasp: more and more information is being collected, stored, and analyzed, from click streams to sales records to mobile-device locations. What hasn’t been easy is translating all that data into insights that help organizations make better decisions. That goes for retail, finance, healthcare, marketing, wireless, Internet commerce—name the industry and you’ll hear the lament that corporations aren’t fully capitalizing on their digital assets.

The underlying reason is that “big data” as a technology area has been a mirage. There’s no magic button, only myriad software techniques that may or may not work for problems specific to particular industries.

But a recent wave of startups has identified new classes of problems, showing where big-data capabilities are heading in the next few years. “It’s really not about big data. It’s about the most useful data,” says Andy Palmer, a co-founder (with Stonebraker) of Vertica Systems and Tamr, both data-related companies. He’s focused on giving companies the ability to access the information that’s most relevant, often hidden, and is “high-quality enough to answer compelling questions.”

Andy PalmerTamr, where Palmer (pictured) is currently CEO, is working on “data curation”—software that helps organizations understand and connect their many different data sources and formats. The idea is to use a combination of statistics and human experts to show customers how their records are interrelated, identify redundancies and errors, and scrub the data so it can be used effectively. The Cambridge, MA-based startup has done pilot tests with Novartis, Thomson Reuters, and other enterprises.

There are broader terms for this sort of unsexy software—data wrangling, plumbing, “munging,” or janitor work—but the goal is a real one: to help businesses make better decisions faster, and save money. And a market for such services seems to be emerging: other startups vying for a piece of the pie include Trifacta, Paxata, and ClearStory in data preparation, and Attivio and Bedrock Data in data integration.

Bedrock Data, for example, has developed software that “synchronizes” data across different business systems, such as customer relationship management, e-mail, marketing, and finance; the idea is to break down barriers between departments and make sure different teams’ records are consistent with each other. Meanwhile, the data-prep companies, including Tamr, are making tools meant to automate the traditional, labor-intensive “extract, transform, and load” (ETL) process used to prepare data for data warehouses.

But once the data is cleaned up and shared, how do companies actually make sense of it all? That’s a separate story, and it lies in the domain of analytics.

The field has seen a lot of consolidation and investment in recent months, with big players such as Intel, Hewlett-Packard, and Teradata buying into companies including Cloudera, Hortonworks, and Hadapt. A particularly hot sector has matured around Hadoop, an open-source analytics software platform. Many tech companies are writing software to make Hadoop industrial strength and integrate it with new and existing types of databases.

As Palmer sees it, analytics is increasingly moving into vertical industries and niche applications. RStudio, led by JJ Allaire and based in Boston, is one of the emerging leaders, though it’s hard to understand what the company does if you don’t use R, an open-source language for data scientists. Suffice to say, RStudio makes tools for large-scale statistical analysis, and the kinds of companies that use R include Bank of America, Facebook, Ford, Google, Uber, and Zillow.

With more targeted analytics tools, big businesses can collect data from new sources, such as sensors or social media, and start to squeeze useful insights from them. “Enterprise companies need to take a page from Internet companies,” Palmer says. “They need to get more analytical.”

Some examples of niche approaches in analytics: Vast, based in Austin, TX, is tackling Web search and analysis in the automotive, real estate, and travel markets. In the Seattle area, Algorithmia, which just raised a $2.4 million venture round, runs an online marketplace for number-crunching algorithms, while Context Relevant makes predictive analytics software for the financial sector. And FarmLink, based in Kansas City, MO, has just raised a $40 million round to advance analytics for farmers.

Meanwhile, back in Boston, the startup Quant5 specializes in analysis tools for marketing purposes, and Recorded Future tries to predict world events—things like civil unrest, terrorist attacks, and other security threats—by analyzing social media and Web documents for companies and government agencies.

Indeed, Doug Levin, the CEO of Quant5 and founder of Black Duck Software, says “enabling data-driven decisions in corporations is one of today’s most significant technology trends.” He points out that what companies can do with data has moved far beyond the notions of big-data analytics from the past few years; analytics certainly isn’t new, but the kinds of analysis that can be done and the types of data that can be accessed are changing.

Which leads us to one more big trend in data, and perhaps an unexpected one: storage is hot again. Not the commodity storage systems—disks, flash drives, appliances—though those are still a huge business. Rather, a number of well-funded startups are pursuing new kinds of storage software that give corporate users more intelligence about their data.

Take Actifio, a Boston-area company that has raised more than $200 million to try to win the “copy data” storage market—systems that companies use to manage multiple versions of their data that exist for different purposes. The firm started out with the idea of separating data backup and protection from the storage layer. But once customers use Actifio’s software for backup, they find they can use the same software to unify their stored data so there’s effectively one golden copy of everything. That’s the idea, anyway.

What’s interesting is that Actifio is trying to save companies money on traditional storage and software, which takes away business from giants like EMC and IBM. But Actifio is solely about data management; it doesn’t really touch analytics or business intelligence.

For that, you have to consider DataGravity, which represents another interesting evolution of data storage. The Nashua, NH-based startup has raised some $42 million from venture investors to create a new storage architecture that could give businesses new visibility and insights into their data.

DataGravity is trying to “extract information from storage,” says CEO and co-founder Paula Long. The company’s product, just announced this week, looks like a regular storage system to the user. But the software that goes with it can “see” into an organization’s files and track all interactions with the data—who accessed or contributed to a particular file and when, what they did with it, whom they worked with, and so on. The software provides charts and visualizations to help users drill down into the data and keep tabs on it.

Paula Long (image: DataGravity)“Before, you could just see the file name. Now you can do an MRI on it,” says Long (pictured), who previously co-founded EqualLogic, which was acquired by Dell in 2007.

The goal—a familiar one by now—is to give IT and business users a deeper understanding of corporate data that can help them make better decisions. DataGravity’s beta customers include companies in the tech, legal, retail, and healthcare industries. “We believe storage should strategically participate in your business, not just support it. It’s not just a container,” Long says. She adds that storage is going through a “transformative moment” as it enters the information and analytics age.

And she seems to agree that big-data technologies have moved beyond the realm of geeks and into helping mainstream users solve real business problems. For DataGravity, that means getting a better grip on all the information that resides in a company’s network and files—without trotting out an old buzzword.

“You didn’t have to understand big data, you didn’t have to program anything,” Long says about her firm’s customers. “We’re not going after the big-data space.”

Gregory T. Huang is Xconomy's Deputy Editor, National IT Editor, and the Editor of Xconomy Boston. You can e-mail him at gthuang@xconomy.com. Follow @gthuang

By posting a comment, you agree to our terms and conditions.

  • Ronnie Corvid

    “Useful data” – that would be “information” then, which anyone who’s read pretty much any Computer Science / Information Science 101 book in the last 30 years would be able to tell you. See this for details: http://en.wikipedia.org/wiki/DIKW_Pyramid and then, er, perhaps read a book.