How Metatomix is Bringing the Semantic Web to Life in Law Enforcement

3/26/08Follow @wroush

In 2005, nine-year-old Jessica Lunsford was raped and murdered by a known sex offender who did not inform the police that he had moved to her neighborhood in Homosassa, FL, and who had been released earlier by a state court that did not have access to parts of his criminal record. Florida lawmakers responded to the tragedy with a new law, since copied by many other states, that requires permanent electronic monitoring for many sex offenders and stiffens penalties for those who don’t report their locations. The law also created a task force to recommend better ways for courts and law enforcement agencies to share data on criminal suspects, convicts, and parolees. And now new information-integration technologies being adopted by the Florida court system to meet that task force’s mandate are putting the spotlight on a Dedham, MA, company called Metatomix.

The company’s system, which helps judges and law enforcement officers piece together records drawn from disparate databases managed by competing agencies, is one of the first successful implementations of so-called “semantic Web” technology that World Wide Web inventor Tim Berners-Lee and other computer scientists have been advocating for years. And it’s working, say engineers at Metatomix, because it doesn’t try to transform the entire Web, but is narrowly focused on the specific problem of managing criminal records.

The big idea behind the semantic Web is to tag raw data with detailed descriptions or “metadata” that explain what the data is about and how it should be used; in theory, automated software can then recognize the data and reuse it in more intelligent ways. The project is widely considered laudable but impractical and unlikely to be realized in the near term, in part because it’s so hard to come up with a single, standard framework of descriptions (called an “ontology”) that could be applied to the entire Web.

But programmers are achieving greater success by implementing semantic Web concepts in restricted fields where ontologies are easier to build. And it turns out that tracking criminals is one of those fields.

As criminal defendants, prisoners, and parolees move through the justice system, they leave an electronic trail in the databases of dozens of agencies. What became clear after the Jessica Lunsford murder, and what turned into the biggest challenge for Metatomix, was that these databases often store the data in different ways. Something as simple as a home address, for example, might be represented using three different varieties of metadata (or no metadata at all) by the information systems at agencies such as county sheriff’s departments, drug rehabilitation centers, and state corrections offices.

Metatomix’s trick is to study the legacy databases operated by all the various agencies who have pieces of the puzzle and create an ontology that matches up the data contained in each according to its meaning. When a officer of the court needs to see the data all in one place, Metatomix’s system can pull out the individual pieces and assemble them in a temporary database. As soon as the case is finished, the data is purged; if it’s needed again, it is reassembled from scratch.

It’s an elegant solution to an ugly reality—government bureaucracies often have an active disincentive to make their information systems communicate. “At the state and local government level, there is very little information-sharing, because owning information means you have a budget, and with a budget comes power,” explains John Pilkington, Metatomix’s vice president of marketing. “In Florida, they had this guy in custody the day before, and they had all this information but they couldn’t put it together. So they let him go, and he ended up abducting and killing Jessica Lunsford. The law that the State of Florida put into motion forces these organizations to share information. But they still don’t want to give it up, and they don’t even want someone else storing it in a data warehouse. So we pull it out and store it on a case-by-case basis.”

The key, according to Metatomix’s CTO, Howard Greenblatt, is that the ontology ties together the data according to its meaning. For each specific subsystem that the system has to deal with—say, the legacy Oracle or IBM DB2 database in a county agency’s back office—Metatomix builds an ontology representing the way that subsystem stores information. Through an advanced form of XML known as the Resource Description Framework, or RDF, those lower-level ontologies can feed information upward into higher-level ontologies that “understand” the general concepts embodied in different ways in the subsystems. “In general, we have a concept of a person and an address and a perpetrator and a booking event,” says Greenblatt.

As soon as a detainee is booked in Florida, a server programmed by Metatomix starts querying local, state, and federal databases for information corresponding to those categories and others. The data is pushed to a secure Web page accessible only by court staff and law enforcement officers; there, a single “dashboard” aggregates information such as outstanding warrants, sexual offenses, restraining orders, photographs, and biometric information like fingerprints. From the dashboard, users can drill down into individual databases for supporting data. To comply with privacy laws and with local agencies’ demands for control over the information, the pages are deleted within 24 hours after the official who requested it finishes their review.

The whole system, called the Judicial Information Exchange Model, was installed in just nine months for $2.8 million, according to Metatomix. It makes Florida the first state with a fully integrated criminal justice information system. But plenty of other states are watching Florida’s effort, according to Pilkington; similar projects are now underway in Georgia, Illinois, Oklahoma, Ohio, Texas, and Vermont. And this week Metatomix—which was founded in 2000 and has raised two rounds of venture capital, including a $7 million round last May led by Apex Venture Partners, Battelle Ventures, Dunrath Capital, and North Hill Ventures—announced that it’s an official IBM business partner in the public safety area. That means Big Blue will market the company’s semantic platform to existing customers and include Metatomix’s technology in the big, multi-vendor proposals it submits when state and federal public-safety agencies put projects out for bid.

But law enforcement is far from the only field where restricted ontologies and RDF can help officials “connect the dots,” to use the post-9/11 catch phrase. “It turns out that you can solve a number of problems with the approach that we’ve taken of leveraging both semantics and rules about how to use the data,” says Greenblatt. The company is building custom semantic-filtering systems both for financial services companies, which can use them to screen borrowers and guard against fraud and money-laundering, and for manufacturers, who can use them to make sense of disparate parts databases. Airbus, for example, turned to Metatomix for help constructing a tracking system for the millions of components that go into its passenger airplanes.

Rather than starting from scratch on that project, says Greenblatt, Metatomix is adapting an ontology already developed by NASA. And the fact that ontologies can be reused, adapted, and merged into even more general ontologies is one of the saving graces that may eventually make it possible to achieve a grand semantic Web on the scale Berners-Lee has proposed. “The issue with the semantic Web right now is that people have to agree on what the definitions are,” says Greenblatt. “There are some ontologies, like the Dublin Core for publishing and libraries, that people have already agreed to accept as fundamental. Many more will have to be worked out before we can ultimately implement ‘Web 3.0′ or the ‘Web of meaning,’ as people call it.”

But Greenblatt emphasizes that “we are not trying to build the semantic Web here—we are just trying to solve a few specific industrial, enterprise, and government problems.” And in the end, it’s this specificity that seems to be steering Metatomix toward success.

Wade Roush is Chief Correspondent and Editor At Large at Xconomy. You can subscribe to his Google Group or e-mail him at wroush@xconomy.com. Follow @wroush

By posting a comment, you agree to our terms and conditions.

  • http://www.livejournal.com/ Tampera

    I will present just one of my articles that I prepared after finding the http://www.omfica.org.

    If we are able to cage the lions, why should we allow them to be on the loose?

    The significance of the Web in various aspects of our life is unquestionnable. Less well known is the impact of Web search engines on what information we find and consume. Given that 75% of Web users rely on search engines as the primary means to pass through the Web (Bruemmer, 2001), the significance of search engines in an information society should not be underestimated. The sheer size of the Web makes it impossible for a single search engine to cover all Web pages that exist. Different search engines cover different parts of the Web. As websites that are covered by search engines are much more visible and accessible to Web users, the selective coverage of the Web by search engines has great social and political implications.
    Given this political and social significance of search engines, it is important to explore the following questions:
    1. Can search engines provide an equal representation of websites?
    2. Would it be possible that the concept of “equality” be measured in relation to the websites.
    By “equal” representation, we mean that the same proportion of websites from different sources is covered, so as to ensure that websites from different sources have an equal chance of being presented to Web users. And in case of “biased” representation the imply may be that this unequality is the result of commercial intentions. This claim seems to be grounded, since the search engine market is dominated by a few olygopolies who take the biggest share of the market. Until such “market failures” exist the unequal representation will continue to persist.
    To conclude with, first things must come first and basic democratic principles must preceede any substantive achievement maid on the way of Search Engine development. That is any by-product (search-results) presupposes the machinery, the method that produces them. And Web population users are not required to love or hate the “machine” – they should aim to control it.
    Hopefully, there are several companies (as http://www.omfica.org ) considers that the interests of Internet users must come first and explored an alternative way to answer positively those questions put above.

    References:
    Bruemmer, P. (2001). Search engine marketing: Luxury or necessity? Retrieved March 1, 2007 from http://www.searchengineguide.com/wi/2001/1120_wi1.html