Google engineers have developed a machine vision technique able to bring high power visual recognition to ordinary desktop and even mobile computers. It is claimed that the system is able to recognize 100,000 object types within a picture in a few minutes.
The tech giant seems to have improved the fairly standard method of applying convolutional filters to a photo to pick out objects of interest. This appeared tricky, as the filters must need a sample of at least one per object type, i.e. if you are scanning Facebook for cats you will need a filter which identifies cats. In other words, the technique is limited to a small number of categories, or you will need a huge database.
A report, penned by Mark Segal, Tom Dean, Mark Ruzon, Jonathon Shlens, Sudheendra Vijayanarasimhan and Jay Yagnik, describes the method which is able to speed things up by using hashing. The matter is that locality sensitive hashing looks up the results of each step and instead of applying a mask to the pixels and summing the result, it hashes the pixels and then uses them as a lookup in a table of results. In addition, they also use a rank ordering method indicating which filter is the best match for further evaluation.
The experts point out that the change to the basic algorithm has resulted in a speed up of almost 20,000 times faster. With 100,000 object detectors that require more than a million filters to be applied to multiple resolution scalings of the target image, the set up could recognize an object in less than twenty seconds. It should be noted that the hardware used was a single multi-core machine with 20GB of RAM.
The tech giant seems to have improved the fairly standard method of applying convolutional filters to a photo to pick out objects of interest. This appeared tricky, as the filters must need a sample of at least one per object type, i.e. if you are scanning Facebook for cats you will need a filter which identifies cats. In other words, the technique is limited to a small number of categories, or you will need a huge database.
A report, penned by Mark Segal, Tom Dean, Mark Ruzon, Jonathon Shlens, Sudheendra Vijayanarasimhan and Jay Yagnik, describes the method which is able to speed things up by using hashing. The matter is that locality sensitive hashing looks up the results of each step and instead of applying a mask to the pixels and summing the result, it hashes the pixels and then uses them as a lookup in a table of results. In addition, they also use a rank ordering method indicating which filter is the best match for further evaluation.
The experts point out that the change to the basic algorithm has resulted in a speed up of almost 20,000 times faster. With 100,000 object detectors that require more than a million filters to be applied to multiple resolution scalings of the target image, the set up could recognize an object in less than twenty seconds. It should be noted that the hardware used was a single multi-core machine with 20GB of RAM.
No comments:
Post a Comment