The Never Ending Language Learner
Write:
Elyse [2011-05-20]
The Never Ending Language Learner
Andrew Carlson along with Prof. Tom Mitchell and other researchers at Carnegie Mellon University have developed an artificial intelligence language-learning program that never ends. It simply continues to run and learn more of the English language every day. The idea is that the Web contains so much information to be extracted, and has so much new information added each day, that an AI program can continuously mine it without its knowledge ever reaching a plateau.
It is true that other AI programs run forever; for example, if one considers the Google page rank algorithm that sorts web pages to be AI, then it could be considered an example of an AI program that lives forever. Carlson s idea is unique in that it seeks to use a never ending learner to develop an understanding of actual language. While the page rank algorithm develops a wide breadth of knowledge, Carlson s project is arguably about depth. It is interesting to consider this question: Could an AI program become conversant just by extracting information from the Web?
NELL ( never ending language learner ) is only a prototype so far. It is to operate 24 hours a day, 7 days a week and perform two tasks each day: (1) reading and (2) learning. When NELL is reading, it extracts knowledge from web text and adds it to its internal knowledge base. When it goes in to the learning phase, it applies machine learning algorithms to its newly enlarged knowledge base, thereby enhancing its understanding of language. The researchers believe that NELL holds the potential to yield major steps forward in the state of the art of natural language understanding.
For now, they have focused on a specific language learning task, that of discovering noun phrases that belong to different classes. For example, Carnegie Mellon belongs to the class University, and General Electric belongs to the class Company. However, it gets more complicated because Organization is a superclass that both Carnegie Mellon and General Electric belong to. The task is highly appropriate for something like NELL because there are a nearly limitless of number of object classes in the English language. There is simply no existing database to tell computers that cups are kinds of dishware and that calculators are types of electronics. NELL could create a massive database like this, which would be extremely valuable to other AI researchers.
NELL must be initialized with the classes to be learned, as well as a few trusted instances of those classes (e.g. Carnegie Mellon and MIT are of class University ) and a few trusted contextual patterns that are known to identify instances of classes (e.g. colleges such as _ ). The user can also specify relationships between classes, such as some classes being mutually exclusive or others being subsets. Now, as NELL extracts text from the web, it identifies new instances that fit its given contextual patterns. But the instance does not become a member of the class just yet. This is where learning comes in. A learning algorithm, specifically a Naive Bayes classifier using pointwise mutual information and a log probability multinomial bag of words classifier, is applied to determine whether the instance is likely to actually be a member of the class, in light of the rest of NELL s available knowledge. If the instance happens to also appear in a contextual pattern associated with a different class that is mutually exclusive, then this will count against it. At some point, the AI will simply decide whether to include the instance or not. Furthermore as it learns new instances, it applies similar learning techniques to identify new, highly-predictive contextual patterns for each class.
Carlson and the rest of the team seem set on expanding and experimenting with NELL to see how far a never-ending language learner can go. They report that they ve recently given NELL the ability to identify semantic relations such as is in . With Nell s tireless 24/7 learning capability, it may soon have a better vocabulary than any of us.