« Project Censored | Main | Recombinant Role For Library Information Portals »

Digging for Nuggets of Wisdom

Today's New York Times "Circuits" column describes how researchers, by using text-mining programs, are able to process vast quantities of textual material that would be beyond a human reader. Text-mining programs go further than Web search engines, categorizing information by making links between otherwise unconnected documents and providing maps to lead readers down new pathways that they might not have been aware of. The story gives an example of the medical researcher using this approach to sieve through the 7,000-8,000 articles per week that are indexed in Medline, a database that already houses more than 10 million abstracts, looking for the small fraction that are related breast cancer research.

http://www.nytimes.com/2003/10/16/technology/circuits/16mine.html
(New York Times, October 16, 2003)

Text-mining is built upon the foundations of data mining, which uses statistical analysis to pull information out of structured databases like product inventories and customer demographics. But text mining starts with information that doesn't come in neat rows and columns. It works on unstructured data - e-mail messages, news articles, internal reports, transcripts of phone calls and the like.

To make sense of what it is reading, the software uses algorithms to examine the context behind words. If someone is doing research on computer modeling, for example, it not only knows to discard documents about fashion models but can also extract important phrases, terms, names and locations. It can then categorize them and draw connections among the categories.

Posted by Tom on October 16, 2003