Programming language

From Wikipedia

HomePage | Recent changes | View source | Discuss this page | Page history | Log in |

Printable version | Disclaimers | Privacy policy

A programming language is a standardized method for expressing instructions to a computer. The language allows a programmer to precisely specify what kinds of data a computer will act upon, and precisely what actions to take under various circumstances.

This serves two primary purposes; the first is that the internal representation a computer uses for its data and operations (at the lowest level, just on and off switches) is not easily understood by humans, so translating a human-readable language into those internal representations makes programming easier. Another purpose is transporting programs between different computers: those internal representations also differ from one computer to the next, but if each is capable of translating the human-readable language into its own internal structures, then that program will operate on both.

If the translation mechanism used translates the program text as a whole and then runs the internal format, this mechanism is spoken of as compilation. The compiler is therefore a program which takes the human-readable program text (called source code) as data input and supplies object code as output. This object code may be machine code directly usable by the processor, or it may be code matching the specification of a virtual machine, and run under that environment.

If the program text is translated step by step at runtime, with each translated step being executed immediately, the translation mechanism is spoken of as an interpreter. Interpreted programs run usually more slowly than compiled programs, but have more flexibility because they are able to interact with the execution environment, instead of all interactions being planned beforehand by the programmer.

Many languages can be either compiled or interpreted, but most are better suited for one than the other.

Features of a Programming Language

Each programming language can be thought of as a set of formal agreements concerning syntax, vocabulary, and meaning, between the programmers who use the language and the implementers or vendors who create the programming system. Most languages that are widely used, or have been used for a considerable period of time, have standardization bodies that meet regularly to create and publish formal definitions of the language, and discuss extending or supplementing the already extant definitions.

These agreements usually include:

  • Data and Data Structures
  • Instruction and Control Flow
  • Reference Mechanisms and Re-use
  • Design Philosophy

Data and Data Structures

Internally, all data in a computer is simply on-off states, but humans use these to represent information in the real world like names, bank accounts, measurements, and so on. So programming languages allow users to specify data in several ways that better suit our uses.

Languages can be classified as statically typed systems (e.g. C++ or Java), and dynamically typed languages (e.g. Lisp, JavaScript, Tcl or Prolog); statically-typed languages can be further subdivided into languages with manifest types, where each variable and function declaration has its type explicitly declared, and type-inferred languages (e.g. MUMPS, ML). With statically-typed languages, there usually are pre-defined types for individual pieces of data (such as numbers within a certain range, strings of letters, etc.), and programmatically named values (variables) can have only one fixed type, and allow only certain operations: numbers cannot change into names and vice versa. Dynamically-typed languages treat all data locations interchangeably, so inappropriate operations (like adding names, or sorting numbers alphabetically) will not cause errors until run-time. Type-inferred languages superficially treat all data as not having a type, but actually do sophisticated analysis of the way the program uses the data to determine which elementary operations are performed on the data, and therefore deduce what type the variables have at compile-time. Type-inferred languages can be more flexible to use, while creating more efficient programs; however, this capability is difficult to include in a programming language implementation, so it is relatively rare.

It is possible to perform type inference on programs written in a dynamically-typed language, but it is legal to write programs in these languages that make type inference infeasible.

Sometimes statically-typed languages are called "type-safe" or "strongly typed", and dynamically-typed languages are called "untyped" or "weakly typed"; confusingly, these same terms are also used to refer to the distinction between languages like Eiffel, Oberon, Lisp, Scheme, or OCaml, in which it is impossible to use a value as a value of another type and possibly corrupt data from an unrelated part of the program or cause the program to crash, and languages like FORTH, C, assembly language, C++, and most implementations of Pascal, in which it is possible to do this.

Sometimes type-inferred and dynamically-typed languages are called "latently typed."

Most languages also provide ways to assemble complex data structures from built-in types and to associate names with these new combined types (using arrays, lists, stacks, files). Object oriented languages allow the programmer to assemble complex structures along with behaviors specific to those data structures.

Aside from when and how the correspondence between expressions and types is determined, there's also the crucial question of what types the language defines at all, and what types it allows as the values of expressions (expressed values) and as named values (denoted values). Low-level languages like C typically allow programs to name memory locations, regions of memory, and compile-time constants, while allowing expressions to return values that fit into machine registers; ANSI C extended this by allowing expressions to return struct values as well. Functional languages often allow variables to name run-time computed values directly instead of naming memory locations where values may be stored. Languages that use garbage collection are free to allow arbitrarily complex data structures as both expressed and denoted values. Finally, in some languages, procedures are allowed only as denoted values (they cannot be returned by expressions or bound to new names); in others, they can be passed as parameters to routines, but cannot otherwise be bound to new names; in others, they are as freely usable as any expressed value, but new ones cannot be created at run-time; and in still others, they are first-class values that can be created at run-time.

Instruction and Control Flow

Once data is specified, the machine must be instructed how to perform operations on the data. Elementary statements may be specified using keywords or may be indicated using some well-defined grammatical structure. Each language takes units of these well-behaved statements and combines them using some ordering system. Depending on the language, differing methods of grouping these elementary statements exist. This allows one to write programs that are able to cover a variety of input, instead of being limited to a small number of cases. Furthermore, beyond the data manipulation instructions, other typical instructions in a language are those used to control processing (branches, definitions by cases, loops, backtracking, functional composition).

Reference Mechanisms and Re-use

The core of the idea of reference is that there must be a method of indirectly designating storage space. The most common method is through named variables. Depending on the language, further indirection may include references that are pointers to other storage space stored in such variables or groups of variables. Similar to this method of naming storage is the method of naming groups of instructions. Most programming language use macro calls, procedure calls or function calls as the statements that use these names. Using symbolic names in this way allows a program to achieve significant flexibility, as well as a high measure of reusability. Indirect references to available programs or predefined data divisions allow many application-oriented languages to integrate typical operations as if the programming language included them as higher level instructions.

Design Philosophies

For the above-mentioned purposes, each language has been developed using a special design or philosophy. Some aspect or another is particularly stressed by the way the language uses data structures, or by which its special notation encourages certain ways of solving problems or expressing their structure.

Since programming languages are artificial languages, they require a high degree of discipline to accurately specify which operations are desired. Programming languages are not error tolerant; however, the burden of recognising and using the special vocabulary is reduced by help messages generated by the programming language implementation. There are a few languages which offer a high degree of freedom in allowing self-modification in which a program re-writes parts of itself to handle new cases. Typically, only machine language and members of the Lisp family (Common Lisp, Scheme, MUMPS) provide this capability; languages that support dynamic linking and loading such as C, C++, and the Java programming language can fake self-modification by either embedding a small compiler or calling a full compiler and linking in the resulting object code. Interpreting code by recompiling it in real time is called dynamic recompilation; emulators and other virtual machines exploit this technique for greater performance.

There are a variety of ways to classify programming languages. The distinctions are not clear-cut; a particular language standard may be implemented in multiple classifications. For example, a language may have both compiled and interpreted implementations.


To Do: this is just an outline to get started; add some descriptive text (or put in '/' links) and add a few representative languages to the descriptions


Links to specific languages


APL -- awk

BeFunge -- BLISS -- Blue -- Brainfuck

COBOL -- CORAL66 -- CPL

Dylan

ECMAScript -- Erlang -- Euphoria

Forth -- FORTRAN

GENIE -- Godiva

Haskell

Icon -- INTERCAL

Kvikkalkul

Limbo -- LOGO -- Lua

m4 -- Miranda -- Mercury -- Mesa -- ML -- Modula -- MOO -- MUMPS -- Mary

Nial

Oberon Occam

Perl -- PHP -- PL/I -- Poplog -- PostScript -- Prolog -- Python

REBOL -- REXX -- RPG -- Ruby

sed -- SETL -- Simula -- Smalltalk -- SNOBOL -- SPITBOL --SQL

Tcl -- teco -- tpu -- Trac -- Turing

Unicon -- UnLambda

VarAq --


Timeline of the history of programming languages.

/Talk

Someone has written a very long, contentful article on programming languages for the German Wikipedia: Programmiersprache.

Part of that document has been processed through Automatic Translation Software, and incorporated into this document. Other parts of it are below, partially processed.


Current developments in new programming languages

Newer integrated, visual development environments brought clear progress. They reduced expenditures of time, money (and nerves). Regions of the screen that control the program can often be arranged interactively. Code fragments can be invoked just by clicking on a control. The work is also eased by prefabricated components and software libraries with re-usable code. Object-oriented methodology can substantially reduce the complexity of programs. These techniques mark the transition of a craft to an industrial process.

Specialized classes of programming languages

Machine language: The code is directly executable on a processor. Its scope is architecture-dependent. It is typically formulated as numbers expressed in octal or hexadecimal. Each group of numbers is associated with particular fundamental operations of the hardware. The activation of specific wires and logic controls the computation of the computer.

Assemblers: Assemblers are almost always directly tied to a machine language. Assembler allows these machine instructions to be written in a form readable by humans. Assembler allows a program to use symbolic addresses which become absolute addresses calculated by the assembler. Most assemblers also allow for macros and symbolic constants as well.

Data-structured languages: LISP uses lists as its organizing principle. Even programs themselves are formulated as a list of instructions, which change other lists. FORTH and Poplog are conceptually based on an open stack model and use stack operations as fundamental building blocks.

Logical languages: Prolog formulates data and the program evaluation mechanism as a special form of mathematical logic known as Horn logic and a general proving mechanism called logical resolution.

Procedural languages: Ada, BASIC, C, COBOL, FORTRAN, Pascal, PL/1 represent the procedural family, in which the computer performs imperative statements consecutively.

Object-oriented languages: Smalltalk, Eiffel, Modula-3, C++, Java, Sather, and Oberon(?) are object-oriented languages. The data structures are defined in object classes, which also include code (methods). Thus the effects of a change to the code reamin very localized. Object classes can be extended by inheritance.

Functional languages: APL, LISP, ML, Ocaml, Scheme, Haskell define programs and subroutines as mathematical functions. Many so-called functional languages are "impure" and also contain imperative features.

Rule-based languages: Rule-based languages such as OPS-5, Prolog, Clips, Jess instantiate rules when activated by conditions in a set of data. Of all possible activations, some set will be selected and the statements belonging to those rules will be executed.

Application-oriented languages and systems: Data base systems: dBase, SQL provide powerful ways of searching and manipulating mathematical relations that have been described as tables, mapping one set of things into other sets.