Xerox Reveals Breakthrough Software that Categorizes Text and Images at the Same Time
* The News: Xerox scientists develop a new smart technology that can simultaneously categorize both text and visuals.
* The Context: The proliferation of digital content means that many more documents today contain both visual and textual information. But until now categorization technology has focused principally on either one or the other.
* Why It Matters: Smarter ways to categorize information will improve search, help businesses operate more efficiently and create more effective documents.
GRENOBLE, France, Oct. 09, 2007 -- Researchers from Xerox Corporation (NYSE: XRX) have demonstrated a software technology that can link text and general images together - a breakthrough in how online and paper-based information is categorized.
Current tools classify or "tag" either text or images so they can be processed; but until now no one has combined the two effectively, according to Marco Bressan, a computer scientist who led the research team at Xerox Research Centre Europe. By linking image and text-based content, Xerox's new software technology significantly improves fundamental document management tasks like retrieving information from a database or automatically routing documents. The result? More complete searches and streamlined business processes.
For example, if a brochure from an isolated hotel in the French Alps describes the hotel's features and includes maps and pictures of mountainous surroundings, the categorizer will automatically discover the content and link the text and the images together. Then someone searching for an isolated mountain lodge within a certain price range would retrieve the brochure even if "isolated lodge in the mountains" were never mentioned in the actual text.
The research aligns with Xerox's goal of developing smarter documents to make information-based work easier, more efficient and more effective. Bressan believes there are many uses for the new categorization software.
"Suppose a traveler wants to combine vacation photos with a journal to produce an annotated photo album or photoblog recapping vacation highlights," said Bressan. "Because the Xerox categorizer handles both text and visuals, it can identify the photos, automatically match them to the written text and then enrich the visuals with additional information via hyperlinks to a knowledge base such as Wikipedia."
A second application, according to Bressan, could be at Xerox's imaging centers, where the company scans and digitizes documents to create secure, accessible and searchable online information archives for its customers. Currently the process of scanning, labeling and indexing documents is partially supervised by operators. Hybrid categorization can streamline document management in this application, improving accuracy and eliminating manual operations.
Enabling Xerox's hybrid categorizer are recent advances in machine learning and pattern recognition, advances in computer vision and the large body of hybrid content now available. XRCE has extensive experience with text categorization and, in 2005, demonstrated the industry's first generic image categorizer. The new categorizer combines earlier text and image categorizers to handle hybrid content, with powerful results.
"Xerox's hybrid categorizer creates a shared knowledge space between text and images," said Bressan. "The textual information enriches the visual, and the visual information enriches the textual. The whole is ultimately greater than the sum of the parts."
The software remains under development. Xerox has filed a number of patents on the technology.
The Xerox Innovation Group conducts work in color science, computing, digital imaging, electromechanical systems, novel materials, linguistics, work practice analysis and nanotechnology connected to Xerox's expertise in printing and document management. For more information, visit www.xerox.com/innovation.
Source: Xerox
Scroll down for related articles:
Related articles
- 2007-10-09: Xerox Reveals Breakthrough Software that Categorizes Text and Images at the Same Time
- 2009-11-17: NIST Demonstrates Universal Programmable Quantum Processor for Quantum Computers
- 2009-10-08: RAND: U.S. Must Focus on Protecting Critical Computer Networks from Cyber Attack
- 2008-07-24: Dell Takes on Blade Workstations with Dell Precision R5400
- 2008-05-06: Judge Finds Internet Affiliate Advertisers Violated Washington Spyware Law
- 2008-04-22: NASA Offers Educational Online Gaming Opportunity to Developers
- 2008-04-22: Michigan Governor Granholm Announces Energy Efficiency Partnership with Climate Savers Computing Initiative
- 2008-04-04: Intel Unveils Second-Generation Intel-Powered Classmate PC -- 'Netbook' for Worldwide Education Markets
- 2008-01-30: New Mexico Governor Richardson Unveils New Mexico’s New Supercomputer
- 2008-01-11: New York Attorney General Cuomo Launches Antitrust Investigation Of Intel
- 2008-01-07: 'One Laptop Per Child' Project Hits Snag
- 2007-12-20: Africa Slowly Struggles to Bridge Digital Divide