Mercurial > hg > jgplsrc
view docs/ioj/iojSent.htm @ 0:e0bbaa717f41 draft default tip
lol J
author | Jordi Gutiérrez Hermoso <jordigh@octave.org> |
---|---|
date | Mon, 25 Nov 2013 11:56:30 -0500 |
parents | |
children |
line wrap: on
line source
<html> <head> <title>An Implementation of J -- Interpreting a Sentence</title> </head> <body> <p align=center><font size="6"><b>Interpreting a Sentence</b></font><br> <font size="4"><b><a href="ioj.htm">An Implementation of J</a></b></font></p> <a href="#Word Formation" >Word Formation</a><br> <a href="#Parsing" >Parsing</a><br> <a href="#Trains" >Trains</a><br> <a href="#Name Resolution">Name Resolution</a><br><br> <hr> <br> <a name="Word Formation"></a><font size="5"><b>Word Formation</b></font><br><br> Words are expressed in the standard ASCII alphabet. Primitive words are spelled with one or two letters; two letter words end with a period or a colon. The entire spelling scheme is shown in the <a href="iojSumm.htm">system summary</a>. The verb<tt> ;: </tt>facilitates exploration of the rhematic rules. Thus:<br><br> <pre> ;: 'sum =:+/_6.95*i.3 4'<font size=2 face="ISIJ"> �������������������������Ŀ �sum�=:�+�/�_6.95�*�i.�3 4� ���������������������������</font></pre> The source code for word formation is in the files w*.c. The process is controlled by the function<tt> wordil </tt>(word index and length) and the table<tt> state</tt>.<tt> </tt>Rows of<tt> state </tt>correspond to 10 states; columns to 9 character classes. Each table entry is a (new state, function) pair. Starting at state<tt> S</tt>,<tt> </tt>a sentence is scanned from left to right one character at a time; the table entry corresponding to the current state and character class is applied.<br><br> New State/Function States Character Classes <pre> XN S AN NN AN 9N XN XN QN S Space X Other XI SI AI NI AI 9I X X QI X Other S Space XI SI A A A A X X QI A Alphanumeric A Letters excl. NB XI SI A A NB A X X QI N N N The letter N XI SI A A A A NZ X QI NB NB B The letter B Z Z Z Z Z Z X X Z NZ NB. 9 Digits and _ XI SI 9 9 9 9 9 X QI 9 Numeric D . Period Q Q Q Q Q Q Q Q QQ Q Quote C : Colon XI SI AI NI AI 9I XI XI Q QQ Even Quotes Q ' Quote Z Z Z Z Z Z Z Z Z Z Trailing Comment X S A N B 9 D C Q <font size=3 face="Times New Roman">Function</font> I j=:i [ Emit(j,i-1) N j=:i </pre> <tt>Emit(j,i-1) </tt>produces a pair of indices delimiting a word in the string.<tt> i </tt>is the current index, and<tt> j </tt>is an internal register; if the current word is a number immediately following a numeric list (one or more numbers),<tt> Emit </tt>combines their indices to form a single word. At the end of the string,<tt> Emit(j,i-1) </tt>is executed.<br><br> As an example, this process is applied to<tt> sum =:+/_6.95*i.3 4</tt>,<tt> </tt>the sentence used above. In the following table, the columns are: index, character in the string, the (current state, character class) pair, the (new state, function) pair, and the action. For example, the first step is step 0, the letter is<tt> s</tt>,<tt> </tt>the current (and initial) state is<tt> S</tt>,<tt> </tt>and the character class is<tt> A</tt>.<tt> </tt>From the table, the entry in row<tt> S </tt>and column<tt> A </tt>is<tt> AN</tt>,<tt> </tt>meaing the new state is<tt> A </tt>and the function code is<tt> N</tt>.<tt> </tt>The action assigns 0 to<tt> j</tt>.<br><br> <table> <tr> <td> </td> <td> </td> <td align=center>State /</td> <td align=center>New State /</td> <td> </td> </tr> <tr> <td>i </td> <td align=center>Char</td> <td align=center>Char Class</td> <td align=center>Function</td> <td align=center>Action</td> </tr> <tr> <td>0 </td> <td align=center><tt>s</tt></td> <td align=center><tt>S A</tt></td> <td align=center><tt>AN </tt></td> <td><tt>j=:0</tt> </td> </tr> <tr> <td>1 </td> <td align=center><tt>u</tt></td> <td align=center><tt>A A</tt></td> <td align=center><tt>A </tt></td> <td> </td> </tr> <tr> <td>2 </td> <td align=center><tt>m</tt></td> <td align=center><tt>A A</tt></td> <td align=center><tt>A </tt></td> <td> </td> </tr> <tr> <td>3 </td> <td align=center><tt> </tt></td> <td align=center><tt>A S</tt></td> <td align=center><tt>SI </tt></td> <td><tt>j=:3 [ Emit(0,2) </tt></td> <td><tt> sum</tt></td> </tr> <tr> <td>4 </td> <td align=center><tt>=</tt></td> <td align=center><tt>S X</tt></td> <td align=center><tt>XN </tt></td> <td><tt>j=:4 </tt></td> </tr> <tr> <td>5 </td> <td align=center><tt>:</tt></td> <td align=center><tt>X C</tt></td> <td align=center><tt>X </tt></td> <td> </td> </tr> <tr> <td>6 </td> <td align=center><tt>+</tt></td> <td align=center><tt>X X</tt></td> <td align=center><tt>XI </tt></td> <td><tt>j=:6 [ Emit(4,5) </tt></td> <td><tt> =:</tt></td> </tr> <tr> <td>7 </td> <td align=center><tt>/</tt></td> <td align=center><tt>X X</tt></td> <td align=center><tt>XI </tt></td> <td><tt>j=:7 [ Emit(6,6) </tt></td> <td><tt> +</tt></td> </tr> <tr> <td>8 </td> <td align=center><tt>_</tt></td> <td align=center><tt>X 9</tt></td> <td align=center><tt>9I </tt></td> <td><tt>j=:8 [ Emit(7,7) </tt></td> <td><tt> /</tt></td> </tr> <tr> <td>9 </td> <td align=center><tt>6</tt></td> <td align=center><tt>9 9</tt></td> <td align=center><tt>9 </tt></td> <td> </td> </tr> <tr> <td>10</td> <td align=center><tt>.</tt></td> <td align=center><tt>9 D</tt></td> <td align=center><tt>9 </tt></td> <td> </td> </tr> <tr> <td>11</td> <td align=center><tt>9</tt></td> <td align=center><tt>9 9</tt></td> <td align=center><tt>9 </tt></td> <td> </td> </tr> <tr> <td>12</td> <td align=center><tt>5</tt></td> <td align=center><tt>9 9</tt></td> <td align=center><tt>9 </tt></td> <td> </td> </tr> <tr> <td>13</td> <td align=center><tt>*</tt></td> <td align=center><tt>9 X</tt></td> <td align=center><tt>XI </tt></td> <td><tt>j=:13 [ Emit(8,12) </tt></td> <td><tt> _6.95</tt></td> </tr> <tr> <td>14</td> <td align=center><tt>i</tt></td> <td align=center><tt>X A</tt></td> <td align=center><tt>AI </tt></td> <td><tt>j=:14 [ Emit(13,13)</tt></td> <td><tt> *</tt></td> </tr> <tr> <td>15</td> <td align=center><tt>.</tt></td> <td align=center><tt>A D</tt></td> <td align=center><tt>X </tt></td> <td> </td> </tr> <tr> <td>16</td> <td align=center><tt>3</tt></td> <td align=center><tt>X 9</tt></td> <td align=center><tt>9I </tt></td> <td><tt>j=:16 [ Emit(14,15)</tt></td> <td><tt> i.</tt></td> </tr> <tr> <td>17</td> <td align=center><tt> </tt></td> <td align=center><tt>9 S</tt></td> <td align=center><tt>SI </tt></td> <td><tt>j=:17 [ Emit(16,16)</tt></td> <td><tt> 3</tt></td> </tr> <tr> <td>18</td> <td align=center><tt>4</tt></td> <td align=center><tt>S 9</tt></td> <td align=center><tt>9N </tt></td> <td><tt>j=:18 </tt></td> </tr> <tr> <td>19</td> <td align=center><tt> </tt></td> <td align=center><tt> </tt></td> <td align=center><tt> </tt></td> <td><tt> Emit(18,18)</tt></td> <td><tt> 4</tt></td> </tr> </table><br> <a name="ID"></a>Every primitive word has an ID (a unique byte value) defined in file jc.h. The ID for the first 128 ASCII characters are simply the byte values 0 to 127; other IDs are arbitrary assignments in "dictionary order". <pre> ... #define CLPAR '(' /* 40 050 28 */ #define CRPAR ')' /* 41 051 29 */ #define CSTAR '*' /* 42 052 2a */ #define CPLUS '+' /* 43 053 2b */ ... #define CASGN '\200' /* 128 200 80 =. */ #define CGASGN '\201' /* 129 201 81 =: */ #define CFLOOR '\202' /* 130 202 82 <. */ #define CMIN '\202' /* 130 202 82 <. */ #define CLE '\203' /* 131 203 83 <: */ #define CCEIL '\204' /* 132 204 84 >. */ #define CMAX '\204' /* 132 204 84 >. */ #define CGE '\205' /* 133 205 85 >: */ ... </pre> Using mnemonics such as<tt> CPLUS </tt>and<tt> CASGN </tt>instead of<tt> '+' </tt>and<tt> '\200' </tt>makes the source code more readable and more amenable to automatic manipulation.<br><br> The 3-row table<tt> <a name="spell">spell</a> </tt>in file ws.c associates letter sequences with IDs. The rows correspond to letters in the range ASCII 32 to 127, those letters <a name="inflect">inflected</a> by a period, and those letters inflected by a colon; table entries are IDs. Thus:<br><br> <pre> static C spell[3][68]={ '=', '<', '>', '_', '+', '*', ..., CASGN, CFLOOR, CCEIL, 1, CPLUSDOT,CSTARDOT, ..., CGASGN, CLE, CGE, CFCONS, CPLUSCO, CSTARCO, ..., }; </pre> For example, the first column specifies that<tt> =. </tt>has the ID<tt> CASGN </tt>(assignment) and<tt> =: </tt>the ID<tt> CGASGN </tt> (global assignment).<br><br> <tt>spell </tt>is used by functions<tt> <a name="spellin">spellin</a> </tt> and<tt> <a name="spellout">spellout</a></tt>:<tt> </tt> given a string (e.g.<tt> =:</tt>),<tt> spellin </tt>computes the ID (<tt>CASGN</tt>); given the ID,<tt> spellout </tt>computes the corresponding string.<br><br> Using the information computed by<tt> wordil</tt>,<tt> </tt> functions<a name="tokens"></a><tt> tokens </tt>and<a name="enqueue"></a><tt> enqueue </tt>transform a string into a list of nouns, verbs, adverbs, conjunctions, etc. The next step is to parse this "tokenized" form of the sentence.<br><br> <br> <a name="Parsing"></a><p><font size="5"><b>Parsing</b></font><br><br> Parsing occurs after word formation and is controlled by function<tt> parse </tt>and table<tt> cases </tt> in file p.c.<tt> cases </tt> is a direct translation of the parse table in Section II E of the dictionary: <pre> #define AVN ( ADV+VERB+NOUN) #define CAVN (CONJ+ADV+VERB+NOUN) #define EDGE (MARK+ASGN+LPAR) PT cases[] = { EDGE, VERB, NOUN, ANY, monad, ..., 1,2, ..., EDGE+AVN, VERB, VERB, NOUN, monad, ..., 2,3, ..., EDGE+AVN, NOUN, VERB, NOUN, dyad, ..., 1,3, ..., EDGE+AVN, VERB+NOUN, ADV, ANY, adv, ..., 1,2, ..., EDGE+AVN, VERB+NOUN, CONJ, VERB+NOUN, conj, ..., 1,3, ..., EDGE+AVN, VERB, VERB, VERB, trident, ..., 1,3, ..., EDGE, CAVN, CAVN, CAVN, trident, ..., 1,3, ..., EDGE, CAVN, CAVN, ANY, bident, ..., 1,2, ..., NAME+NOUN, ASGN, CAVN, ANY, is, ..., 0,2, ..., LPAR, CAVN, RPAR, ANY, punc, ..., 0,2, ..., }; </pre> The sentence to be parsed is prefaced with a marker and placed on a queue, and as parsing proceeds words are moved from the right end of the queue onto a stack. The classes of the first four words on the stack are compared to the patterns in columns 0 to 3 of<tt> cases</tt>.<tt> </tt> The first row matching in all four columns is selected; the action in column 4 is applied to the words on the stack indicated by the inclusive indices in columns 8 and 9, with the result replacing those words. If none of the rows match, the word at the end of the queue is moved onto the stack by the function<tt> move</tt>.<tt> </tt> Scanning for a matching pattern then begins anew. The process terminates when the queue is empty and none of the rules are applicable. At that time, the stack should have exactly two words: the marker and a noun, verb, adverb, or conjunction; anything else signals syntax error.<br><br> This parsing method was first described in <a href="iojBib.htm#Iverson1983">Iverson</a> [1983]. The parse table is a compact representation of a large amount of information; it has guided both the evolution of the language and its implementation. The following example illustrates parsing on the sentence<tt> ((i.#y)=i.~y)#y </tt>where<tt> y=:'abc'</tt>. (<a name="Marker"></a><tt>§ </tt>denotes the marker.)<br><br> Rule/<br> Queue Stack Action Comment <pre> §((i.#y)=i.~y)#y original sentence §((i.#y)=i.~y)# 'aba' 13 move §((i.#y)=i.~y) #'aba' 13 move §((i.#y)=i.~y )#'aba' 13 move §((i.#y)=i.~ 'aba')#'aba' 13 move §((i.#y)=i. ~'aba')#'aba' 13 move §((i.#y)= i.~'aba')#'aba' 13 move §((i.#y) =i.~'aba')#'aba' 13 move §((i.#y) =v0 'aba')#'aba' 3 adv v0=: i.~ §((i.#y )=v0 'aba')#'aba' 13 move §((i.# 'aba')=v0 'aba')#'aba' 13 move §((i. #'aba')=v0 'aba')#'aba' 13 move §(( i.#'aba')=v0 'aba')#'aba' 13 move §( (i.#'aba')=v0 'aba')#'aba' 13 move §( (i.3)=v0 'aba')#'aba' 1 monad 3 -: #'aba' §( (0 1 2)=v0 'aba')#'aba' 0 monad 0 1 2 -: i.3 §( 0 1 2=v0 'aba')#'aba' 12 punc §( 0 1 2=0 1 0)#'aba' 1 monad 0 1 0 -: v0 'aba' § (0 1 2=0 1 0)#'aba' 13 move § (1 1 0)#'aba' 2 dyad 1 1 0 -: 0 1 2=0 1 0 § 1 1 0#'aba' 12 punc §1 1 0#'aba' 13 move §'ab' 2 dyad </pre> <br> <a name="Trains"></a><p><font size="5"><b>Trains</b></font><br><br> A <i>train</i> is an isolated phrase not interpreted by the parsing rules pertaining to verbs, adverbs, and conjunctions, and (as a matter of language design) may be assigned any meaning whatsoever. <a href="iojBib.htm#Iverson1989">Iverson and McDonnell</a> [1989] defined a train of three verbs as a <a name="fork"></a><i>fork</i> and a train of two verbs as a <a name="hook"></a><i>hook</i>. That is, if<tt> f</tt>,<tt> g</tt>,<tt> </tt>and<tt> h </tt>are verbs, then so are<tt> (f g h) </tt>and<tt> (g h)</tt>,<tt> </tt>and:<br> <pre> <font size=3 face="Times New Roman">Fork</font> <font size=3 face="Times New Roman">Hook</font> g g g g / \ / \ / \ / \ f h f h y h x h | | /\ /\ | | y y x y x y y y </pre> Parsing rules 5, 6, and 7 deal with trains. (See <a href="#Parsing">Parsing</a>.) A consequence of the rules is that a train of verbs is resolved by repeated forming a fork from the <i>rightmost</i> three verbs, with a final hook if the train is of even length. Likewise, a train of adverbs and conjunctions is assigned a meaning, and is resolved by repeatedly forming a group from the <i>leftmost</i> three adverbs or conjunctions, with a final group of two if the train is of even length. Trains are implemented by functions and variables of file cf.c. The main functions are<tt> <a name="folk"></a>folk </tt> and<tt> hook</tt>.<tt> </tt>(<tt>fork </tt>conflicts with UNIX usage.)<br><br> <br> <a name="Name Resolution"></a><p><font size="5"><b>Name Resolution</b></font><br><br> During parsing, words are moved from the queue to the stack. Suppose a name<tt> xyz </tt>is being moved. If<tt> xyz </tt>is immediately to the left of a copula, it (as a name) is put on the stack. Otherwise, if<tt> xyz </tt>denotes a noun, that noun is put on the stack; if<tt> xyz </tt>denotes a verb, adverb, or conjunction,<tt> 'xyz'~ </tt>is put on the stack, to be evaluated when the verb, adverb, or conjunction is applied.<br><br> Names and their assigned values are stored in symbol tables. A symbol table is an <a href="iojNoun.htm#Arrays">array</a> of type<tt> SYMB </tt>whose atoms are pairs (name,value). Functions and variables in the files s*.c work with symbols tables. In particular,<tt> symbis(a,w,symb) </tt>assigns the name<tt> a </tt>to<tt> w </tt>in the symbol table<tt> symb</tt>,<tt> </tt> and<tt> symbrd(w) </tt>"reads" the value of the name<tt> w</tt>.<br><br> <br> <hr> <a href="iojNoun.htm">Next</a> • <a href="iojIntro.htm">Previous</a> • <a href="iojIndex.htm">Index</a> • <a href="ioj.htm#TOC">Table of Contents</a> <br> </body> </html>