Parsing is a process that happens through time

As seen in the preceding section, one of the attributes that make NN attractive to the problem of language learning is that they can learn based on a sequence of training data. Some researchers have used the type of NN outlined above to perform different types of language processing. Input nodes are used to represent different words, either by having one input node per word in the language, or by a group of input nodes representing a group of words through binary coding, or by having one input node represent a group of words by assigning different analog values to different words (For examples of how word coding can be performed, see Gasser & Lee (1991), or Munro, P., Cosic, C., Tabasko, M., (1991) ). Then a series of sentences can be generated using a particular grammar that we would like the NN to learn. Positive and negative examples of the grammar can be presented, and the network can be trained to respond "Yes" if the words being presented constitute a valid sentence and "No" otherwise. With some analysis, we can see that the process of parsing a sentence in real time (that is, with one word presented at a time) requires the use of states. We would like a NN to react differently when "reading" the word John after it has already seen I am happy to see, compared to when it has only seen I am happy to. The network needs to remember events in the recent past. In some cases, such as with center embedding, historical knowledge of the beginning of the sentence becomes important at the end of a sentence. For example, after seeing the sequence of words The dog who chases the cat who chases mice, the network should react differently to the words runs and run, and this difference is caused by the word dog close to the beginning of the sentence. Consequently, the NN must be able to perform some sort of temporal processing.

How can a NN perform temporal processing? Any architecture capable of processing time-varying sequences must have elements that act as memory, as well as elements that are used to process both the information being stored in memory and the current input. To allow for this type of behavior, NN architecture is expanded to allow recurrence in the connections between nodes. We call this type of network a Recurrent Neural Network (RNN). Different researchers have used recurrent (feedback) connections configured into different types of architectures. Among the most commonly used architectures are: Simple Recurrent Networks (SRN), Jordan Nets, and Fully Connected . Simple Recurrent NN (SRN) have feedback connections only between two different layers of hidden nodes; Jordan nets have feedback connections only coming out of the output nodes; and Fully Connected Networks have connections between any two nodes.

Figure 5: Three different types of RNN. (A) SRN. (B) Jordan nets. (C) Fully Connected.
Back to the Table of Content

Back to the Previous Topic

To the next topic