Growing Need for “Big Data” Analysis Spurs Growth in Today’s SD Conference
More than 1,100 experts who analyze “Big Data” are gathered in San Diego today for the industry’s major conference on “Knowledge Discovery and Data Mining” (KDD). The registered attendance is a 22 percent increase over the 900 who attended last year’s KDD conference in Washington D.C.
“In this field, at least, we’re experiencing a recovery,” says Joe Milana, the global head of analytics at Opera Solutions, which was founded as a consulting firm in 2004 and has developed a specialized focus on helping big companies use predictive analytics. When the economy crashed in 2008-09, Milana says a lot of companies eliminated their data analytics services as part of their overall cost-cutting efforts. By 2010, however, he says the analytics industry was experiencing a recovery as financial institutions and corporations realized the growing importance of identifying customer trends and user patterns in the deluge of data that gets generated daily.
Opera Solutions, based in New York and Jersey City, NJ, has been a beneficiary of this resurgence, Milana says. While the company has about 550 employees worldwide, Milana heads the company’s San Diego-based analytics business—which had five employees three years ago, and has doubled its headcount to more than 70 employees in the past eight months.
“There’s this niche in machine learning, where San Diego is the place to be, and our company explicitly moved to San Diego to tap into it,” Milana says. Among the U.S. cities with technology clusters focused on software analytics, he estimates that San Diego ranks third—behind Silicon Valley, with its regional expertise in search engine applications, and the New York area, where Wall Street’s financial sector has a seemingly insatiable appetite for data mining and analytic technologies.
The focus of San Diego’s analytics and machine learning community seems to extend beyond search applications, and is focused more on Main Street than Wall Street. Many local companies are focused in various ways on helping big businesses take advantage of the terabytes of data being generated by their customers each month. More than 100 companies in the San Diego area are working on predictive analytics, Milana says, making predictive analytics one of the few cohesive communities in San Diego’s fragmented software industry.
Opera Solutions sees new opportunities, Milana says, for advanced analytics with banks, in basic retail, and within healthcare—wherever companies have a lot of information about their customers. The most advanced technologies tend to be used in banks, where “they’re all about balancing risks,” Milana says.
For example, banks use Opera Solutions’ predictive analytics technology to help them decide whether to issue loans to customers with no credit history, by using “unstructured” Web data, i.e. data that has not been formatted in a database. Milana says the company usually customizes its technology to meet customers’ needs. For example, a nationwide auto reseller uses Opera Solutions’ analytics to determine the price that a used car of a particular make and model is likely to fetch in different cities throughout the United States. “We also have our fingers in fraud detection, in which accounts get compromised,” as well as in cyber security, “where again it’s searching for outliers and anomalous activity,” Milana says.
“A lot of time, the need drives the application,” Milana says. “Looking at healthcare, for example, there is a lot of data that’s not fully labeled (in contrast to “structured” data in which each field has a specific meaning and value). So you need to make a system that’s sort of focused on anomaly detection—that compares and contrasts. In the case of banks, you’re trying to make a prediction on whether someone will or won’t pay back their loan.”
A substantial part of the KDD 2011 conference is focusing on expanding the use of unstructured data, Milana says, including natural language processing, which analyzes data available in text messages, Twitter feeds, and blogs. The conference, which ends Wednesday, also has sessions focused on social media analytics, Internet ad systems, and bioinformatics. Local speakers include Paul Rejto, Pfizer’s San Diego-based director of computational biology in oncology research, and Dan Steinberg of San Diego-based Salford Systems, who will discuss how predictive modeling and marketing optimization can be used to improve retail sales.
The conference, as Milana puts it, “sits between academia and industry,” and this year includes an industrial practice expo that is intended to highlight the latest issues and implications of data mining and predictive modeling. He adds, “What people are talking about the most is one of the reasons people go to the conference.”