Nenov and Dyer (1994) developed a system where the
learning of words was "grounded" by visual input. A
series of figures moved around a 2-dimensional space
at the same time that a particular input (word) node
was activated. The task of the NN was to create correct
relationships between the moving figures and the words
being activated. Word categories included nouns (such
as circle, triangle), adjectives (red, blue), verbs
(moves, bounce), and adverbs (fast, slow). Success in
learning was measured by looking at internal nodes
being used to construct a "mental image" of the scene
being presented. The network was trained with individual
words first. It learned nouns within 4-5 cycles and
verbs within 48 cycles. Once independent words were
learned, they were correctly processed as part of a
sentence automatically. Visual-to-verbal mappings took
10-40% longer to be learned than verbal-to-internal.
|