Newsletter 9-15-07: Snap and Search (No Words Needed)

Snap and Search (No Words Needed)
By MIGUEL HELFT
Copyright by Reuters
Published: December 19, 2009
http://www.nytimes.com/2009/12/20/business/20ping.html?_r=1&th&emc=th

THE world, like the World Wide Web before it, is about to be hyperlinked. Soon, you may be able to find information about almost any physical object with the click of a smartphone.

Vic Gundotra, a Google vice president, says the goal is “to recognize every image.”

This vision, once the stuff of science fiction, took a significant step forward this month when Google unveiled a smartphone application called Goggles. It allows users to search the Web, not by typing or by speaking keywords, but by snapping an image with a cellphone and feeding it into Google’s search engine.

How tall is that mountain on the horizon? Snap and get the answer. Who is the artist behind this painting? Snap and find out. What about that stadium in front of you? Snap and see a schedule of future games there.

Goggles, in essence, offers the promise to bridge the gap between the physical world and the Web.

Computer scientists have been trying to equip machines with virtual eyes for decades, and with varying degrees of success. The field, known as computer vision, has resulted in a smattering of applications and successes in the lab. But recognizing images at what techies call “scale,” meaning thousands or even millions of images, is hugely difficult, partly because it requires enormous computing power. It turns out that Google, with its collection of massive data centers, has just that.

“The technology exists and was developed by other people,” said Gary Bradski, a computer vision expert and a consulting professor of computer science at Stanford. “The breakthrough is doing this at scale. There are not many entities that could do that.”

Goggles is not the first application to try to create a link between the physical and virtual worlds via cellphones. A variety of so-called augmented-reality applications like World Surfer and Wikitude allow you to point your cellphone or its camera and find information about landmarks, restaurants and shops in front of you. Yet those applications typically rely on location data, matching information from maps with a cellphone’s GPS and compass data. Another class of applications reads bar codes to link objects or businesses with online information about them.

Goggles also uses location information to help identify objects, but its ability to recognize millions of images opens up new possibilities. “This is a big step forward in terms of making it work in all these different kinds of situations,” said Jason Hong, a professor at the Human Computer Interaction Institute at Carnegie Mellon University.

When you snap a picture with Goggles, Google spends a few seconds analyzing the image, then sends it up to its vast “cloud” of computers and tries to match it against an index of more than a billion images. Google’s data centers distribute the image-matching problem among hundreds or even thousands of computers to return an answer quickly.

Google says Goggles works best with certain categories of objects, including CDs, movie posters, products, wine labels, artwork, buildings and landmarks. It can read business cards and book covers. It doesn’t do so well with trees, cars or objects whose shape can change, like a towel. And it has trouble recognizing objects in less than ideal lighting conditions.

“Today, Google Goggles is limited because it recognizes certain objects in certain categories,” said Vic Gundotra, a vice president at Google in charge of its mobile phone applications. “But our goal is for Goggles to recognize every image. This is really the beginning.”

For now, Goggles is part of the “labs” section of Google’s Web site, which indicates that the product remains experimental. So it is not surprising that it has quirks and flaws.

Goggles had trouble recognizing the San Francisco-Oakland Bay Bridge, for example, when the image was shot with several trees in the way of its suspension span. But it did recognize it when the picture was snapped with fewer obstacles in the way. Faced with a picture of a Yahoo billboard shot in San Francisco, the search results showed Times Square, presumably because of the huge Yahoo billboard there.

But the service can also delight and amaze. It had no trouble recognizing an Ansel Adams photograph of Bridalveil Fall in Yosemite, returning search results for both the image and a book that used that image on its cover. It also correctly identified a BlackBerry handset, a Panasonic cordless phone and a Holmes air purifier. It stumbled with an Apple mouse, perhaps because there was a bright reflection on its plastic surface.

It’s not hard to imagine a slew of commercial applications for this technology. You could compare prices of a product online, learn how to operate that old water heater whose manual you have lost or find out about the environmental record of a certain brand of tuna. But Goggles and similar products could also tell the history of a building, help travelers get around in a foreign country or even help blind people navigate their surroundings.

It is also easy to think of scarier possibilities down the line. Google’s goal to recognize every image, of course, includes identifying people. Computer scientists say that it is much harder to identify faces than objects, but with the technology and computing power improving rapidly, improved facial recognition may not be far off.

Mr. Gundotra says that Google already has some facial-recognition capabilities, but that it has decided to turn them off in Goggles until privacy issues can be resolved. “We want to move with great discretion and thoughtfulness,” he said.

Newsletter 9-15-07

Sunday, December 20, 2009

Snap and Search (No Words Needed)

No comments:

Blog Archive

About Me