Forth programming language

HomePage | Recent changes | View source | Discuss this page | Page history | Log in |

Printable version | Disclaimers | Privacy policy

Forth is a computer programming environment developed by Chuck Moore during the 1960s, and later formalized as a programming language in 1977, and standardized by ANSI in 1989. It features both interactive execution of commands (making it suitable as a shell for systems that lacked a more formal operating system), as well as the ability to compile sequences of commands into threaded code for later execution. Its name is derived from Mr. Moore's belief that it was a "fourth generation computer language" on a computer whose Fortran compiler allowed only five-letter identifiers. (No way. Fortran allowed 6 letters. Must have been an operating system limit for files or commands.)

(There were about a zillion dialects of FORTRAN in the 1960s that differed in every conceivable respect including fundamental stuff like IF statement syntax and how exponents in scientific notation were represented. A five character identifier length would have been mild compared to some of the other portability issues).

Forth relies heavily on explicit use of the stack data structure and Reverse Polish notation (or RPN, also used on advanced calculators from Hewlett-Packard). For example, one could get the result of a mathematical expression this way:

25 10 * 50 + .

This command line first puts the numbers 25 and 10 on the implied stack; the "*" command multiplies the two numbers on the top of the stack and replaces them with their product; then the number 50 is placed on the stack, and the "+" command adds it to the previous product; finally, the "." command prints the result to the user's terminal. Even the language's structural features are stack-based. For example:


This code defines a new command (something like a function) called "FLOOR5" using the following commands: "DUP" simply duplicates the number on the stack; "<" compares the two numbers on the stack and replaces them with a true-or-false value; "IF" takes a true-or-false value and chooses to execute commands immediately after it or to skip to the "ELSE"; "DROP" discards the value on the stack; and "THEN" ends the conditional. The net result is a function that performs similarly to this function (written in the C programming language):

int floor5(int v) { if (v < 5) return 5; else return v - 1; }

Forth became very popular in the 1980s because it was well suited to the small microcomputers of that time, being very efficient in its use of memory and easy to implement on a new machine. It is still used in many small computerized devices (called embedded systems) today for the same reasons.

Forth is also infamous as being one of the first, and simplest extensible languages. That is, programmers can easily adapt the features of the language to the problem. This apparent advantage helps poor programmers to write incomprehensible code. The language never achieved wide commercial use because it acquired a reputation as a "write only" language, after several companies had product failures caused when a crucial programmer left.

Responsible companies using the language, such as FORTH Inc, had become aware of the problem and addressed it with internal cultures that stressed code reviews.

Structure of the language

The basic data structure is a "dictionary," that maps "words" to executable code or other named data structures.

The general structure of the dictionary entry consists of a head and tail. The head contains the name, the indexing data, a flag byte, a pointer to the code associated with the word, and sometimes another, optional pointer to the data associated with the word.

The tail consists of the data area.

When a word is purely executable, the code pointer simply points at the code. When a word is a variable, or other data structure, the code pointer points at code shared with other variables of that type, and a data pointer points at the data area for that specific type of variable.

Words written in Forth usually compile to be lists of addresses of other words, which saves very large amounts of space. The code executed by these words is an "address interpreter." The address interpreter does just enough work to be able to execute the lowest level of words, which are written in assembly language.

Most Forth systems include a specialized assembler that produces executable words. Assembly language words usually end in a macro called "NEXT" which indexes the address interpreter to the next word, and executes it.

The flag byte in the head of the dictionary entry distinguishes words with "compile time" behavior. Most simple words execute the same code whether they are typed on a command line, or embedded in code. Compile-time words have special meanings inside Forth code. The classic example of a compile time word is the control-structures. All of Forth's control structures, and almost all of its compiler are implemented as compile-time words.

Classic Forth systems use no operating system. Instead of storing code in files, they store it as source-code in disk blocks written to physical disk addresses. This is more convenient than it sounds, because the numbers come to be familiar. Also, Forth programmers come to be intimately familiar with their disks' data structures, just by editing the disk. Forth systems use a single word "BLOCK" to translate a the number of a 1K block of disk space into the address of a buffer containing the data. The Forth system automatically manages the buffers.

Classic Forth systems are also multitasking. They use a special word, "PAUSE" to save all the important registers to the current stack, locate the next task, and restore all the registers. Tasks are organized as a scratchpad, an area for variables used by the task, and the stacks for that task. The customary way to search for an executable task is to jump to the schedule, which is a linked list consisting of jump instructions. When a software interrupt instruction is placed replaces the jump, the task begins to run. This system is remarkably efficient. In a Forth programming class, ten users have been supported on an 8MHz PDP-11, with each user operating out of less than 4K of RAM and sharing a single floppy disk. In a telephone system, a thousand tasks (one per phone) were supported on a small NOVA minicomputer.

Forth uses two stacks for each executing task. The stacks are the same width as the index register of the computer, so that they can be used to fetch and store addresses. One stack is the parameter stack, used to pass data to words. The other stack is the linkage stack, used to nest words, and store local variables. There are standard words to move data between the stacks, and access variables.

A Forth interpreter looks up words one at a time in the dictionary, and executes their code. The basic algorithm is to search a line of characters for a non-blank, non-control-character string. If this string is in the dictionary, and it is not a compile-time word (marked in the flag byte), the code is executed. If it is not in the dictionary, it may be a number. If it converts to a number, the number is pushed onto the parameter stack. If it does not convert, then the interpreter prints the string, followed by a question mark, and throws away the rest of the line of text.

A Forth compiler produces dictionary entries. Other than that, it tries to simulate the same effect that would be produced by typing the text into the interpreter.

The great secret to implementing Forth is natively compiling it, so that it compiles itself. The basic scheme is to have the compiler defined in terms of a few words that access a code area. Then, one definition of the words compiles to the normal area of memory. Another definition compiles to disk, or to some special memory area. How does one adapt the compiler? One recompiles it with the new definiitons. Some systems have even defined the low-level words to communicate with a debugger on a different computer, building up the Forth system in a different computer.