Microsoft Rolls Out Tools to Help Scientists (and Eventually Companies) Manage Data Deluge
(Page 2 of 2)
an oceanographer’s workbench. So we said, ‘Let’s get involved in this. Let’s do a proof of concept.’”
Barga got a couple of interns from the University of Washington to work on the project, and by the summer of 2007, they had a working demo. The idea of the software is to help people manage the workflow between data collection and analysis—coordinating a sequence of steps to be taken with the data. It’s not a revolutionary algorithm, but it’s a way to break the process into manageable chunks that can be reused and recombined, so you don’t have to start from scratch or hire a programmer every time you want to manipulate your data in a new way. The tools are built on top of Microsoft’s Windows Workflow Foundation (making use of Microsoft SQL Server and Windows HPC Server cluster technologies), and they include advanced gaming graphics tools to display what’s going on in your data.
The reception in academia has been very positive, says Barga, a 13-year Microsoft veteran whose group now totals seven people. All told, Microsoft has put well over $1 million into Project Trident, counting the researchers’ time. The next step is to get more scientists to use the tools, and to share their work. Currently, ocean researchers from the UW, Monterey Bay Aquarium Research Institute, and other institutions are using Trident tools as part of the Neptune oceanographic project for networking the seafloor, funded by the National Science Foundation. And astronomers at Johns Hopkins University are using the software as part of their Pan-STARRS project to detect objects in the solar system that could pose a threat to Earth.
There are plenty of other efforts to build scientific data management tools, of course. Some examples are Taverna, a UK-based project specialized for bioinformatics, and California-based workflow software projects Kepler and Pegasus, developed by academics. “We collaborate with them,” says Barga. “We wanted to show you don’t have to build these systems from the ground up.”
Barga says other organizations are getting interested in all this too. “We have medical research groups and financial analyst groups talking to us about it,” he says. But many challenges remain when it comes to dealing with data. For example, Barga says, some tasks are just too big for the computer on your desk to handle. You might need to manipulate a whole data center, say. That’s why Microsoft is also announcing a new programming language, called Dryad, which is specialized for doing high-performance computing across parallel and distributed systems. It could come in handy for large-scale studies that involve searching, filtering, and aggregating data on topics like social networks or broad economic trends.
As researchers and businesses think increasingly globally when it comes to data, you can bet there will be a big role for companies like Microsoft in providing the key tools of the trade. “Our ability to collect data will outpace our ability to analyze it,” Barga says. Computer science is going to be driven and challenged to visualize and analyze [more] data. It’s a big enabler.”