Sift Science Uses Machine Learning to Weed Out Credit Card Fraud
Google has team of 120 engineers dedicated solely to fighting fraud and webspam. Its leader, Matt Cutts, is probably the most familiar face at the company, after founders Larry Page and Sergey Brin and chairman Eric Schmidt.
Up in Seattle, Amazon has more than twice as many full-time fraud-fighters. And when you consider that online fraud costs businesses an estimated $3.4 billion in credit-card chargebacks every year, it’s easy to see why these two Internet giants would invest so much in detecting and evading bad guys. But what about the “little guys”—the online businesses like Uber or Airbnb that process millions of dollars in online payments every year, but aren’t big enough to hire whole teams of engineers to fight fraud? Shouldn’t they have access to state-of-the-art fraud detection technologies too?
That’s exactly the market Sift Science wants to serve. The San Francisco startup, which emerged from the Y Combinator accelerator in 2011, boasts a team of eight ex-Google engineers building a cloud-based system that monitors other companies’ e-commerce systems in real time. If Sift’s machine-learning algorithms spot a credit-card purchase that looks suspicious, it gets flagged for review by a human. The idea is to disallow the highest-risk transactions and reduce costly chargebacks. (Merchants, not consumers or credit card companies, are usually the ones on the hook when fraudsters make purchases using stolen credit cards.)
“It’s amazing, the similarities between Matt Cutts’ job and what we are trying to accomplish,” says Jason Tan, Sift Science’s co-founder and CEO (pictured above). “Mom-and-pop and mid-tier e-commerce businesses don’t have the resources to hire 120 engineers, and they don’t have the cutting-edge technology that Google does. Our intuition was, why can’t they hire a third party to abstract all of that away [in the cloud] and make it easy for them to sign up and get protected.”
Tan and his co-founder, Brandon Ballinger, met as roommates and fellow computer science majors at the University of Washington in Seattle, where they graduated in 2006. Ballinger went on to Google, while Tan worked for a series of Seattle tech startups, including Zillow, Optify, and BuzzLabs. In 2011, Ballinger got into Y Combinator with an idea for a mobile-social-local app; he was able talk Tan into joining him, but only on the condition that they try something more challenging.
Tan and Ballinger (who has since left the company) got interested in the fact that most merchants and credit-card companies still rely on crude rules-based systems to screen transactions. To minimize chargebacks, a company might, for example, look at past episodes of fraud and set up a rule saying “If the transaction is for more than $10,000 and the IP address of the purchaser is in Nigeria, flag it as suspicious.”
But the next fraudulent transaction might originate in China, necessitating a new rule. And the anti-Nigeria rule might mistakenly filter out legitimate orders. “So you have this big mess of hundreds of rules, but it’s very static,” not to mention porous and unreliable, Tan says.
The obvious way to improve fraud detection, Tan and Ballinger thought, would be to ditch all the manually constructed rules and use machine-learning algorithms to identify the real patterns that foreshadow fraud at specific e-commerce sites. “With machine learning you can teach the computers to build the rules themselves using the statistics as data,” Tan says.
The procedure wouldn’t be all that different from the speech-recognition work Ballinger had been doing using machine learning at Google, or the sentiment analysis work Tan had been doing at BuzzLabs; what had changed was that it was getting cheaper to analyze large number of transactions in the cloud, using distributed database technology. “Only now, with Hbase and Hadoop, has the technology evolved to the point that we can build this kind of infrastructure outside of Google,” says Tan.
A variety of signals influence the risk score, and they’re different for each Sift customer, Tan says. At Uber, for example, the score might depend in part on where a driver picks up a customer; some neighborhoods generate more fraud than others. At Airbnb, the more nights a customer books, the higher the likelihood that the transaction is fraudulent. (Both companies are actual Sift customers.)
At an e-commerce site, a user who goes straight to the checkout page several times in a row, without stopping to browse, might be flagged. So would one who asks for next-day shipping, or gives a shipping address that’s far away from the billing address (like Vietnam and Kansas).
Sift can also use certain clues to identify individual computers involved in past chargebacks and flag all transactions coming from those machines. And because Sift saves what it learns in a central database, there’s a network effect for the company’s clients; an individual customer who’s flagged as suspicious on Uber might also be flagged at Airbnb, if he were to use the same device, e-mail address, or credit card number.
If a risk score is high enough—say, above 90—Sift will automatically recommend that a transaction be canceled and the user banned. But in the gray area between 70 and 90, Sift doesn’t reject suspicious transactions, instead flagging them so a human reviewer can investigate more closely.
In theory, the scores get more accurate over time as reviewers confirm or contradict the software’s suspicions. “Humans are actually really good” at sensing a telltale combination of fraud signals, Tan says. “It creates this feedback loop between human intelligence and machine intelligence.”
One limitation of rules-based fraud reduction systems is that there’s no gray zone, meaning a certain number of good transactions will be canceled. In e-commerce, Tan says, these false positives “are really bad,” leading not just to lost revenue but also to angry customers. The big benefit of a system like Sift Science’s, he says, is that it can reduce false positives, while allowing human reviewers to ignore the low-risk transactions and focus on the toughest cases.
When I interviewed Tan in late April, the company had already signed up several hundred companies to use its system. The service is free for up to 10,000 transactions per month. Between the 10,000- and 20,000-per-month levels, the company charges 2 cents per risk score; above that, it’s 1 cent per score.
RSA’s 2012 acquisition of fraud detection startup Silver Tail Systems for $230 million is a “promising sign” that Sift Science chose a hot market to enter, Tan says. “Fraud is the type of problem that can kill a business,” he says. “If you are about to get killed, you will pay whatever it takes to stop that threat.”