Lesson 3

\       Lesson 3 - How Forth Works
\       The Forth Course
\       by Richard E. Haskell
\          Dept. of Computer Science and Engineering
\          Oakland University, Rochester, MI 48309

comment:



                                Lesson 3

                            HOW FORTH WORKS


                3.1  VARIABLES                                  3-2

                3.2  MORE ABOUT VARIABLES -- FETCH AND STORE    3-3

                3.3  CONSTANTS                                  3-5

                3.4  THE FORTH COLON WORD :                     3-6

                3.5  ARRAYS                                     3-7

                3.6  THE RETURN STACK                           3-8

                3.7  CODE WORDS                                 3-9

                3.8  THE FORTH DICTIONARY                       3-10

                3.9  TABLES                                     3-11

                3.10  CHARACTER OR BYTE DATA                    3-12

                3-11  FINDING DICTIONARY ADDRESSES              3-13

                3-12  THE NAME FIELD                            3-14

                3-13  OPERATION OF THE F-PC INNER INTERPRETER   3-15

                      EXERCISES                                 3-17














3.1  VARIABLES

        The Forth word VARIABLE is a defining word used to define
        variable names.  If you type

                VARIABLE my.name

        then Forth will create a new dictionary entry called my.name.
        All dictionary entries have the same general form, consisting of
        a header (made up of a view field, a name field and a link field)
        and a body (made up of a code field and a parameter field).  In
        F-PC the header and the body are physically stored in separate
        segments.

        (In the 8086/8088/80286 the 1 Mbyte address space is divided into
        64 Kbyte segments.  A physical address is specified by a 16-bit
        segment address, seg, and a 16-bit offset address, off.  The total
        address is written as seg:off.  The segment address can begin on
        any 16-byte boundary, called a paragraph.  Therefore, to address
        any byte in memory, one has to specify both a segment address and
        an offset address.)

        For the word MY.NAME the dictionary entry will look like this

                            HEADER           ________
                view field address --->  VFA | VIEW |
                                             |------|
                link field address --->  LFA | LINK |
                                             |------|
                name field address --->  NFA | 7 M  |
                                             | Y .  |
                                             | N A  |
                                             | M E  |
                                             |------|
                pointer to CFA ------->      | ^CFA | -------|
                                             |------|        |
                                         Head Segment YSEG   |
                                                             |
                            BODY             ________        |
                code field address --->  CFA | CODE | <------|
                                             |------|
           parameter field address --->  PFA | 0 0  |
                                             |------|
                                         Code Segment ?CS:










        The view field contains a character count equal to the offset
        address of the beginning of the colon definition within the file.
        It is used to locate the colon definition when you ask to VIEW
        the definition of a word.

        The link field contains a pointer to the LFA of a previously
        defined word.

        The name field contains the name, beginning with the character
        count and followed by 1-31 characters.

        The pointer to CFA contains the offset address of the CFA in the
        code segment.  This segment address is given by the F-PC word
        ?CS:.

        The code field contains code that will be executed when the name
        in the name field is interpreted.  This is called direct threaded
        code and is used by F-PC.  Many Forths use indirect threaded code
        in which the code field contains a pointer to the code that is
        executed.  For a VARIABLE the code field contains the three bytes
        corresponding to the instruction CALL >NEXT where >NEXT is the F-PC
        inner interpreter to be described later in this lesson.  The CALL
        instruction automatically puts the address of the next instruction
        on the stack.  In this case it is not the address of an instruction
        at all but rather the parameter field address.

        The parameter field contains different things for different kinds
        of words.  For a VARIABLE word, the parameter field contains the
        16-bit value of the variable.  An initial value of zero is stored
        in the parameter field when a VARIABLE is defined.

        When you type the name of a VARIABLE the CALL >NEXT instruction in
        the code field will be exectuted which has the effect of leaving
        the parameter field address on the stack.  If you type
                my.name .
        the PFA of my.name will be printed.  Try it.


3.2  MORE ABOUT VARIABLES -- FETCH AND STORE

Forth words:

        !       ( n addr -- )   ( "store" )
                stores the value of n at addr

                6 my.name !

                will store the value 6 at the PFA of my.name.

        @       ( addr -- n )   ( "fetch" )
                reads the value at addr and puts it on the stack.

                my.name @ .

                will print the value of my.name.

Stack variables

        The system variable SP0 contains the value of the stack pointer
        for an empty stack.  Thus,

                SP0 @

        will return the address of the stack pointer when nothing is on
        the stack.

        The Forth word SP@ returns the address of the last item pushed on
        the stack.  That is, it is the current value of the stack pointer.

        The Forth word DEPTH returns the number of items on the stack.
        It is defined as follows:

        : DEPTH         ( -- n )
                        SP@ SP0 @
                        SWAP - 2/ ;

        Note that since each element on the stack is a word that contains
        two bytes, the number of bytes on the stack must be divided by 2
        to give the number of words on the stack.
































3.3  CONSTANTS

        The Forth word CONSTANT is a defining word that lets you define
        constants.  For example, if you type

                25 CONSTANT quarter

        the name quarter will be entered into the dictionary as follows:

                      ________
                  VFA | VIEW |
                      |------|
                  LFA | LINK |
                      |------|
                  NFA | 7 Q  |
                      | U A  |
                      | R T  |
                      | E R  |
                      |------|
                      | ^CFA | -------|
                      |------|        |
                  Head Segment YSEG   |
                                      |
                      ________        |
                  CFA | CODE | <------|
                      |------|
                  PFA |  25  |
                      |------|
                  Code Segment ?CS:

        The code field contains the three bytes corresponding to the
        instruction CALL DOCONSTANT.  The code at DOCONSTANT pops the
        PFA from the stack (where it was put by the CALL instruction)
        and then pushes on the stack the value stored at the PFA.  Thus,
        if you type

                25 CONSTANT quarter

        and then type

                quarter .

        the value 25 will be printed.

        NOTE:  The value stored in CONSTANTs are 16-bit signed numbers.









3.4  THE FORTH COLON WORD :

        The Forth word : ("colon") is also a defining word that lets you
        define new Forth words.  When you type

                : squared  DUP * ;

        the colon word : is executed.  It creates your Forth word
        squared as a dictionary entry that looks like this:


                      ________                          ________
                  VFA | VIEW |               |------->  | DUP  | ES:0
                      |------|               |          |------|
                  LFA | LINK |               |          |  *   |
                      |------|        ES = LSO + XSEG   |------|
                  NFA | 7 S  |               |          |UNNEST|
                      | Q U  |               |          |------|
                      | A R  |               |      List Segment XSEG
                      | E D  |               |
                      |------|               |
                      | ^CFA | -------|      |
                      |------|        |      |
                  Head Segment YSEG   |      |
                                      |      |
                      ________        |      |
                  CFA | CODE | <------|      |
                      |------|               |
                  PFA | LSO  | --------------|
                      |------|
                  Code Segment ?CS:


        The code field contains the three bytes corresponding to the
        instruction JMP NEST.  The code at NEST is part of the inner
        interpreter whose operation will be described later in this lesson.

        The parameter field contains a list segment offset, LSO, whose value
        is added to the base list segment address contained in the variable
        XSEG.  The resulting segment address is stored in the register ES
        and the list of code field addresses making up the definition of
        squared is stored in memory starting at address ES:0.

        UNNEST is the address of another routine that is part of the Forth
        inner interpreter.










3.5  ARRAYS

        Suppose you want to create an array that contains five 16-bit values.
        If you type

                VARIABLE my.array

        Forth will create the dictionary entry my.array that contains a
        single 16-bit value in its parameter field.

                      ________        |
                  CFA | CODE | <------|
                      |------|
                  PFA |  00  |
                      |------|
                  Code Segment ?CS:

        Here we have not bothered to show the head segment associated with
        the definition of my.array.  Note that the parameter field contains
        2 bytes to store the 16-bit value.

        The Forth word ALLOT will add n bytes to the dictionary in the code
        segment where n is the value on the stack when ALLOT is executed.
        Thus,
                8 ALLOT

        will add 8 bytes, or 4 words to the dictionary.  The code segment
        part of the dictionary entry my.array will then look like this.

                      _______________        |
                  CFA |    CODE     | <------|
                      |-------------|
                  PFA | my.array(0) |
                      |-------------|
              PFA + 2 | my.array(1) |
                      |-------------|
              PFA + 4 | my.array(2) |
                      |-------------|
              PFA + 6 | my.array(3) |
                      |-------------|
              PFA + 8 | my.array(4) |
                      |-------------|
                     Code Segment ?CS:


        To print the value of my.array(3) you could type

                my.array 3 2* + @ .








3.6  THE RETURN STACK

        When you type a number it is put on the parameter stack.  All of
        the arithmetic operations and words such as DUP, ROT, DROP, SWAP,
        and OVER operate on numbers that are on the parameter stack.

        Forth also has a second stack called the return stack.  The return
        stack is used by the Forth inner interpreter to store the return
        address of the next word in a colon definition.  It is also used
        by certain Forth words such as DO.

        You can also use the return stack IF YOU ARE CAREFUL.  You can move
        a number from the parameter stack to the return stack temporarily
        if you are sure to move it back before the end of a colon definition.
        Otherwise, the proper return address will not be on top of the return
        stack, and the inner interpreter will not be able to find its way
        back to the next word to execute.

        The following Forth words use the return stack, R:

        >R      ( n -- )        ( "to-R" )
                pops the top element of the parameter stack
                and pushes it onto the return stack.

                Ex.  3 >R
                     moves 3 to the top of the return stack and
                     leaves the parameter stack empty.

        R>      ( -- n )        ( "from-R" )
                pops the top element of the return stack
                and pushes it onto the parameter stack.

        R@      ( -- n )        ( "R-fetch" )
                copies the top element of the return stack
                to the top of the parameter stack.


        Here is a possible definition of ROT:

        : ROT   ( n1 n2 n3 -- n2 n3 n1 )
                >R              \ n1 n2
                SWAP            \ n2 n1
                R>              \ n2 n1 n3
                SWAP ;          \ n2 n3 n1










3.7  CODE WORDS

        Code words are Forth words that are defined in terms of 8086
        machine code rather than in terms of other Forth words.  Of
        course, eventually some real 8086 machine code must be executed!
        The inner interpreter is written in machine code.  In addition,
        many of the F-PC Forth words are written in machine code to
        maximize their speed of operation.  In Lesson 7 we will show you
        how you can write your own Forth code words.

        Since F-PC uses direct-threading, the machine code in a code word
        is stored directly at the CFA in the code segment.  Here are
        examples of how some F-PC primatives are defined.  In each case
        the header of the word is stored in the head segment in the same
        way as described previously for VARIABLES, CONSTANTS, and colon
        definitions.
                                       DUP
                      _______________        |
                  CFA |  POP  AX    | <------|
                      |-------------|
                      |  PUSH AX    |
                      |-------------|
                      |  PUSH AX    |
                      |-------------|
                      |  JMP >NEXT  |
                      |-------------|
                     Code Segment ?CS:


                                       SWAP
                      _______________        |
                  CFA |  POP  DX    | <------|
                      |-------------|
                      |  POP  AX    |
                      |-------------|
                      |  PUSH DX    |
                      |-------------|
                      |  PUSH AX    |
                      |-------------|
                      |  JMP >NEXT  |
                      |-------------|
                     Code Segment ?CS:


                                       @ ("Fetch")
                      _______________        |
                  CFA |  POP  BX    | <------|
                      |-------------|
                      |  PUSH [BX]  |
                      |-------------|
                      |  JMP >NEXT  |
                      |-------------|
                     Code Segment ?CS:



3.8  THE FORTH DICTIONARY

        The Forth dictionary consists of a linked list of all words that
        have been defined.  These words can be variables, constants,
        colon definitions or code words.  The names of all of these words
        are stored in the head segment and linked by means of the link
        field pointers.  The code field of each word is pointed to by the
        code field pointer in the header.  The code field always contains
        real executable code and therefore must be in the 8086 code segment.
        In a colon definition the list of CFA's for each word in the
        definition is stored in a separate list segment that is pointed to
        by a pointer at the PFA in the code segment.

        When you define a new word using the colon definition, the
        compilation process consists of storing the word in the dictionary.
        F-PC links the name of your word into one of 64 threads using a
        hashing scheme to increase the speed at which words can be found when
        searching the dictionary.

        The next available address in the code segment of the dictionary
        is given by HERE.  That is, HERE is a Forth word that returns the
        next available address on the stack.  The variable DP, the dictionary
        pointer, contains this next available dictionary address.  The word
        HERE is just
                        : HERE  ( -- n )
                                DP @ ;


        The outer interpreter is what is being executed when you boot up
        a Forth system and the ok prompt is displayed.  When you type a
        word and press <Enter> the outer interpreter searches the dictionary
        for the word you typed.  If it finds the word, it executes it by
        executing the code located at the code field address.  If it doesn't
        find the word, it executes a word called NUMBER that tries to convert
        your word to a valid number.  If it succeeds, it pushes the value
        of the number on the stack; otherwise, it displays the <- What?
        message that tells you that is doesn't understand your word.

        We will take a closer look at the operation of the inner interpreter
        in Section 3.13.














3.9  TABLES

        A table is like a constant array.  You could create an array and
        then fill the values of the array using the ! store word.

        Another way to create a table is to use the Forth word CREATE
        which works the same way as VARIABLE except that no space is
        reserved in the parameter field.  For example, if you type

                CREATE table

        you will create the following dictionary entry.

                                table
                      ________        |
                  CFA | CODE | <------|
                      |------|
                  PFA          <---- HERE

                  Code Segment ?CS:

        As with the case of a VARIABLE the code field contains the three bytes
        corresponding to the instruction CALL >NEXT where >NEXT is the F-PC
        inner interpreter.  The CALL instruction will leave the parameter
        field address on the stack when the word table is called.

        At this point the dictionary pointer, DP, contains the value of the
        PFA of table.  The forth word comma (,) will store the value on the
        stack at the location pointed to by the dictionary pointer; that is,
        at the next available location in the dictionary.  Therefore, if you
        type
                CREATE table 5 , 8 , 23 ,

        the following dictionary entry will be created:

                                table
                      ________        |
                  CFA | CODE | <------|
                      |------|
                  PFA |   5  | 0
                      |------|
                      |   8  | 1
                      |------|
                      |  23  | 2
                      |------|
                  Code Segment ?CS:

        You could now define a new word called @table as follows:

                : @table        ( ix -- n )
                                2* table        \ 2*ix pfa
                                + @ ;           \ @(pfa + 2*ix)

        For example,  2 @table
        will return 23 to the top of the stack.

3.10    CHARACTER OR BYTE DATA

        Character (ASCII) data can be stored in one byte.  Data can be
        stored and retieved from single bytes using the following Forth
        words:

        C,      ( c -- )        ("C-comma")
                Store least significant byte (LSB) of value on top of
                the stack at HERE (next available location in the
                dictionary).

        C!      ( c addr -- )   ("C-store")
                Store LSB of value on top of the stack at addr.

        C@      ( addr -- c )   ("C-fetch")
                Fetch the byte at addr.  Leave as LSB on stack.


        You can create a table containing byte values rather than word
        values by typing

                CREATE table 5 C, 8 C, 23 C,

        You can then define a word called C@table as follows:

                : C@table       ( ix -- c )
                                table + C@ ;

        2 C@table will then return 23 to the top of the stack.

        Note the difference between C@table and @table defined in
        Section 3.9.






















3.11  FINDING DICTIONARY ADDRESSES

        The following words can be used to locate and examine Forth
        dictionary entries:

        '       ( -- cfa )      ("tick")
                The statement  ' table
                will leave the CFA of table on the stack.

        >NAME   ( cfa -- nfa )  ("to-name")
                Converts the code field address, CFA, (in the code segment)
                to the name field address, NFA, (in the head segment).

        >LINK   ( cfa -- lfa )  ("to-link")
                Converts the code field address, CFA, (in the code segment)
                to the link field address, LFA, (in the head segment).

        >BODY   ( cfa -- pfa )  ("to-body")
                Converts the code field address, CFA, (in the code segment)
                to the parameter field address, PFA, (in the code segment).

        You can go to the code field address by using

        BODY>   ( pfa -- cfa )  ("from-body")
        NAME>   ( nfa -- cfa )  ("from-name")
        LINK>   ( lfa -- cfa )  ("from-link")

        You can also go from NAME to LINK and from LINK to NAME using

        N>LINK  ( nfa -- lfa )  ("name-to-link")
        L>NAME  ( lfa -- nfa )  ("link-to-name")

        The Forth word HEX will change the base used to print out values to
        hexadecimal.  The word DECIMAL will change the base back to decimal.
        You can change to any base by storing the value of the base in the
        variable BASE.  For example, the definition of HEX is

                : HEX  16 BASE ! ;

        Note that the base must be DECIMAL when this definition of HEX is
        loaded.

        The Forth word U. prints the top value on the stack as an unsigned
        number between 0 and 65535, or if in the HEX mode, between 0000 and
        FFFF.  As an example, to print the hex address of the name field of
        the word OVER type

                HEX ' OVER >NAME U. DECIMAL

        The Forth word LDUMP ( seg off #bytes -- ) can be used to get a
        hex dump of #bytes of memory starting at address seg:off.
        For example, type
                          YSEG @ ' OVER >NAME 20 LDUMP

        and see if you can see the name field of OVER.

3.12  THE NAME FIELD

        When you define a new word such as TEST1 using a colon definition
        the following name field is created:

        Precedence bit ----|   |--- Smudge bit         Hex value
                           |   |
                     ___________________________________
                NFA  | 1 | 0 | 0 | Name char count = 5 |   85
                     |---------------------------------|
                     | 0 | Character #1  T = 54H (hex) |   54
                     |---------------------------------|
                     | 0 | Character #2  E = 45H       |   45
                     |---------------------------------|
                     | 0 | Character #3  S = 53H       |   53
                     |---------------------------------|
                     | 0 | Character #4  T = 54H       |   54
                     |---------------------------------|
                     | 1 | Character #5  1 = 31H       |   B1
                     |---------------------------------|


        If the precedence bit is set, the word will be executed immediately.
        Immediate words will be discussed in Lesson 9.

        If the smudge bit is set, the word will not be able to be found in
        a dictionary search.  This bit is set while a colon definition is
        being compiled.

        Type in the following dummy colon definition:

                : TEST1 ;

        Then examine the name field by typing

                YSEG @ ' TEST1 >NAME 10 LDUMP

        The six hex values shown in the figure above should be displayed.

        Note that the most significant bit is set to 1 in the first and last
        byte in the name field.  The maximum number of characters actually
        stored in the name field is the value stored in the variable WIDTH.
        For example,
                        10 WIDTH !

        will cause a maximum of 10 characters to be stored in the name field.
        F-PC sets the default value of WIDTH to 31 -- its maximum possible
        value.







3.13  OPERATION OF THE F-PC INNER INTERPRETER

        The following diagram illustates the operation of the F-PC inner
        interpreter.

                                 cubed
                      ________        |
                  CFA | CODE | <------|
                      |------|                          _________
                  PFA | LSO  | -------- +XSEG ------->  | DUP   | ES:0
                      |------|                          |-------|
                  Code Segment ?CS:          |<-------  |squared| IP = ES:SI
                                             |          |-------|
                                             |          |   *   |
                                             |          |-------|
         |<- W = AX = current word pointer <-|          |UNNEST |
         |                                              |-------|
         |                                          List Segment XSEG
         |                     squared
         |            ________        |
         |----->  CFA | CODE | <------|
                      |------|                          ________
                  PFA | LSO  | -------- +XSEG ------->  | DUP  | ES:0
                      |------|                          |------|
                  Code Segment ?CS:                     |  *   |
                                                        |------|
                                                        |UNNEST|
                                                        |------|
                                                    List Segment XSEG



        >NEXT   LODSW   ES:             \ Load AX with CFA at ES:SI & inc SI
                JMP     AX              \ Execute the code at the CFA in AX

        NEST    XCHG    BP,SP           \ Push IP = ES:SI
                PUSH    ES              \ on the return stack
                PUSH    SI
                XCHG    BP,SP
                MOV     DI,AX           \ AX = CFA of word to execute
                MOV     AX,3[DI]        \ Get LSO at PFA
                ADD     AX,XSEG         \ and add to XSEG
                MOV     ES,AX           \ Put this sum in ES
                SUB     SI,SI           \ Make new IP = ES:SI = ES:0
                JMP     >NEXT           \ Go to >NEST

        UNNEST  XCHG    BP,SP           \ Pop IP = ES:SI
                POP     SI              \ from the return stack
                POP     ES
                XCHG    BP,SP
                JMP     >NEXT           \ Go to >NEXT




        The inner interpreter contains the three routines >NEXT, NEST, and
        UNNEST.  An interpreter, or instruction pointer, IP, points to the
        memory location in the list segment containing the code field address
        of the next word to be executed.  In F-PC this instruction pointer
        is made up of the two registers ES:SI.  Suppose this is pointing to
        the CFA of squared in the definition of cubed as shown in the
        previous figure.  The routine >NEXT puts this CFA into a word
        address register, W, (which is AX in F-PC) increments IP (SI) by 2
        so that it will point to the next word (*) in the current definition,
        and then executes the code at the CFA in W.

        If this is a colon definition as it is in this case then the code
        at the CFA is a jump to the routine NEST.  As shown above NEST will
        push IP (both ES and SI) on the return stack so that the program can
        later find its way back to the next word in cubed when UNNEST is
        executed.  NEST then gets the list segment offset, LSO, for the word
        squared, adds it to the base list segment address in XSEG and stores
        this value in ES.  It then sets SI to zero so that the new IP is
        given by ES:0, which points to the first word in the definition of
        squared.  It then jumps to >NEXT which will repeat the process, this
        time executing the first word in squared, namely, DUP.

        Since DUP is a code word, its actual machine code is located at its
        CFA.  This code will therefore be executed when >NEXT is executed.
        The last instruction in the definition of DUP is another jump to
        >NEXT.  But now the IP will have been incremented to point to the
        CFA of *.  Again this is a code word which will be executed and then
        jump to >NEXT again.

        The last word in a colon definition is UNNEST.  The CFA of UNNEST is
        added to the list segment in the dictionary when the semi-colon in a
        colon definition is executed.  The code field of UNNEST contains the
        machine code shown above which pops IP (SI and ES) from the return
        stack and then jumps to >NEXT.  Since this is the IP pushed on the
        stack by NEST when squared was executed, it will point to the word
        following squared in the definition of cubed.  This is the word *
        which is the next word to be executed.

        This is how Forth works.  Colon definitions are stored as lists of
        CFA's in the list segment.  When the CFA to be executed is that of
        another colon definition, the IP is incremented and pushed on the
        stack and then changed to point to the first CFA in the definition
        of the new word to be executed.  When the CFA to be executed is that
        of a code word, then the actual machine code located at the CFA is
        executed.  This process continues with a jump to >NEXT following the
        execution of each word.









EXERCISE 3.1

        Define the colon words

                : squared  DUP * ;

        and
                : cubed  DUP squared * ;

        Use the F-PC words ' ("tick"), >LINK ("to-link"), and
        LDUMP (seg off #bytes -- ) to answer the following questions:

        1)  What is code segment address ?CS:?

        2)  What is the head segment address stored in YSEG?

        3)  What is the list segment address stored in XSEG?

        4)  What is the CFA of squared?

        5)  What is the LFA of squared?

        6)  What is the NFA of squared?

        7)  What is the PFA of squared?

        8)  What is the CFA of cubed?

        9)  What is the LFA of cubed?

        10)  What is the NFA of cubed?

        11)  What is the PFA of cubed?

        12)  Draw a picture of the header of squared showing the
             hex values in all locations.  What is the value stored
             in the ^CFA location?  Draw a picture of CFA and PFA fields
             as well as the list segment dictionary entries for the word
             squared.  Show all addresses and values stored in the
             dictionary.

        13)  What is the LFA of the dictionary entry defined just prior to
             cubed?  What is the name of that word?

        14)  What is the address of NEST?

        15)  What is the CFA of DUP?

        16)  What is the CFA of *?

        17)  What is the address of UNNEST?

comment;