The Nexus programming language is a general purpose high-level programming language. Its design emphasizes simplicity, dynamicity, and readability. The language is object-oriented in nature, but supports multiple programming paradigms including: procedural and functional. Its features include a dynamic type system, automatic memory management, exception handling, extensible semantics, extension libraries, and multi-threaded programming.

Nexus is implemented as an embeddable interpreter with a small footprint. Its implementation consists of a relatively small number of ANSI C files containing roughly 30,000 lines of code, that compiles in multiple platforms. The source files are well organized into logical modules, that couple together to form comprehensive subsystems.

The "Architecture of the Nexus Programming Language" identifies the features and design principles that influence the construction of Nexus. This is followed by a decomposition of the modules, and their relationships. Finally, an overview of subsystems describes how the modules are coupled together.

Design Principles

There are a number of design principles fundamental to the architecture of Nexus.

Simplicity

Simplicity is considered the most important design principle applied to Nexus. It ensures a comprehensive, compact, and flexible system with optimal performance. The balance between simplicity and complexity is not always clear though. Over simplification of the Nexus interpreter would inevitably lead to complexity at the application level. Great care is taken to keep this balance consistent.

Extensibility

The general purpose nature of Nexus requires that it provide a means to be extended and built upon. The ability to extend Nexus allows it to remain simple, in this respect the principles of simplicity and extensibility lend each other. Nexus achieves extensibility by adopting several mechanisms such as user defined data types, dynamic libraries, and metaprogramming facilities.

Performance

The Nexus interpreter achieves greater performance by implementing a virtual machine. The Nexus VM provides a way to speed execution time because it allows precompiled code to be forwarded directly to it, thereby bypassing the lexical and syntactical analysis.

Portability

The Nexus interpreter achieves portability by adhering to ANSI C standards. But there are features implemented by Nexus that are not supported by the ANSI standard therefore, the interpreter implements an operating system abstraction layer which provides a standard interface for external modules. This layered approach isolates the portability issues inherent to platform specific implementations.

Language Features

Nexus provides a wide variety of programming functionality such as literals, variables, methods, loops, dynamic types, exception handling, garbage collection, threads, debugging interface, and object oriented programming.

Dynamic Type System

The power of Nexus comes from its dynamic type system. Nexus is a dynamically typed language, types are attached to values rather than variables. Dynamic type checking gives Nexus more expressive power. The programmer is less restricted by the type system and more free to write code.

Standard Types

Nexus provides a standard set of built-in data types.

Type Description
Number Implements a 64-bit numeric value. This allows number types to represent any number: integer, signed, unsigned, floating point.
String Represents an array of characters.
Boolean Represents a true, or false value.
Byte Represents an 8-bit value.
Array Container type that holds a collection of objects indexed by an integer.
HashMap An unordered collection of key-value pairs or a collection that is indexed by arbitrary types of objects.
TreeMap An ordered collection of key-value pairs or a collection that is indexed by arbitrary types of objects.
Iterator An iterator over a collection. Iterator is the abstract base class of all iterators.
Range Represents an interval, a set of values with a start and an end.
Object Parent class of all types.
Class Represents a construct to create object instances.
Field Represents a variable of any type that is defined in an object or class.
Method Represents a subroutine that is defined in an object or class.
Module Namespace that defines a collection of fields, methods, and classes.
Exception Communicates the occurrence of a condition that changes the normal flow of execution.

First-Class Values

Nexus implements instances of a type as first-class values, which can be used in programs without restriction. They can be stored in variables and object collections, passed as arguments to methods, returned from methods, etc.

Nexus values are represented as typed unions, in which an integer defines the type, and a structure union the value. Types such as Number, Boolean, and Nil implement values directly in the union, others implement values by reference (pointers to structures).

Exception Handling

Nexus exceptions are designed to alleviate the problems caused by traditional error handling approaches. When an error occurs within a method, the method creates an object and hands it off to the interpreter. The exception object, contains information about the error, including its type and the state of the program when the error occurred.

Garbage Collection

Nexus performs automatic memory management. This removes the burden of allocating and freeing memory for objects, from programming tasks. The Nexus garbage collector automatically collects dead (or unreachable) objects.

Threads

In Nexus, the execution of a method by the interpreter is defined as a thread of execution. At any given moment multiple threads maybe running in parallel to each other, within a single instance of the interpreter.

Nexus threads are implemented as native threads, meaning the underlying thread implementation is platform specific. There is no standard implementation for managing multiple threads of execution, threads and the synchronization objects used to control them are provided by the underlying operating system. This implementation choice has obvious portability issues, but the performance advantage gained by pre-emptive threads will become increasingly important as the number of CPU cores (concurrency) increases.

Debugging Interface

Nexus provides a debugging interface for introspection and the tracing of running programs. The debugging interface implements a foundation for building higher level debugging facilities.

Object-Oriented Programming

As in other object-oriented languages, the central concept in Nexus is that of an object. Nexus is a "pure" object-oriented programming language, meaning that, all values are objects. An object is always an instance of a class. Classes are "blueprints" that describe the properties (fields) and behavior (methods) of their instances. Nexus supports single inheritance in that a subclass can only inherit from a single base class

Embedded Interpreter

Many applications incorporate dynamic languages for purposes of integrating a runtime component that static language do not provide. Although static languages (i.e. C/C++) provide a strong foundation, they don't provide the flexibility found in dynamic languages. The Nexus interpreter is designed to be embedded within an application for these purposes. Applications written in C/C++ can host the Nexus interpreter, and extend themselves with dynamic capabilities. In fact, the standard console and windows interpreters that are included within the Nexus distribution, are simple host applications, written in C, that host the Nexus interpreter.

The Nexus interpreter is implemented as a library that can be either statically or dynamically linked to an application. The interpreter exposes an API that provides extensive control over its operation. The Nexus API provides the ability to load and execute programs, handle exceptions, interact with Nexus objects, implement extension libraries, and manage multiple virtual machines.

Interpreter

Implementation of the Nexus programming language takes the form of an interpreter. The interpreter translates source code into an intermediate representation and executes it. More precisely, it compiles bytecode, which is executed by a virtual machine. The interpreter is implemented as a C library that provides an API for integration with its host application.

Executing Code

The steps necessary for executing code within the Nexus interpreter are relatively simple. The interpreter library is linked with the host application, function calls initialize an instance of the interpreter, load the code, and execute it.

The first step in executing Nexus code is creating an instance of the interpreter. This is accomplished by calling a parameterless function that returns an interpreter state pointer. The state pointer from there on, provides a context for interaction between the application and interpreter. The Nexus API is fully reentrant, and requires a state parameter for all reentrant function calls.

After establishing a context for program execution, the application loads code in one of two forms, precompiled or source. When loading source code, Nexus employs lexer, parser, and compiler modules to generate bytecode. The resulting compiled code is executed by a virtual machine that implements a simple switch dispatch inside of a loop.

Modular Architecture

The Nexus interpreter is implemented using a modular architecture, it is composed of separate components that are coupled together to form larger subsystems. The modules enforce logical boundaries between components, thus allowing one component to be replaced or added without affecting the rest of the interpreter. As much as possible, modules are autonomous, and small in size and scope.

A module is physically represented by a single C implementation file, and its corresponding header file. The implementation file contains internal (static) functions that are only referenced within the module itself, and external functions that define the module interface. The header file declares a data structure for storing module attributes, and exposes the interface to other modules.

Module decomposition of the Nexus interpreter.

Subsystem Module Description
API API Exported functions.
Standard Library Object Object system foundation.
Class Class definition facilites.
... Standard library class implementations.
Dynamic Library Dynamic Library Dynamic library mechanism.
Input/Output Loader Loads source code blocks.
Export Exports precompiled bytecode blocks.
Import Imports precompiled bytecode blocks.
Compiler Lexer Lexical scanner that produces token stream.
Parser Performs syntactic analysis of token stream produced by Lexer.
Code Generator Generates bytecode as instructed by the Parser.
Symbol Table Hash table mapping identifiers to internal descriptors.
Debugging Debugging Introspection and tracing functionality.
Virtual Machine State Manages process and thread level states.
Register Allocator Manages set of value registers.
Kernel Instruction dispatcher.
Call Interface Method invokation interface.
Stack Manages call and data stacks.
Cache Method caching mechanism.
Error Handler Exception handling mechanism.
Memory Manager Heap Pool of managed memory.
Card Table Write barrier that keeps track of inter-generational pointer references.
Garbage Collector Reclaims memory used by inaccessible objects.
Platform OSAL Operating system abstraction layer.
TAL Pre-emptive thread abstraction layer.
Memory Allocator Low-level memory allocator based on dlmalloc.
Byte Buffer Dynamic array of bytes.
Vector Dynamic array of heterogeneous objects.
Hash Table Associative array of heterogeneous objects.
Tree Red-black self-balancing binary tree.

Subsystems

API

The Nexus API provides a C/C++ interface to applications at a variety of levels. The API can be used for embedding the interpreter, extending its functionality, manipulating its data stack, and interacting with live objects.

The API subsystem interacts with other subsystems, in this sense, it implements an abstraction layer between the interpreter and application. This abstraction layer allows the interpreter internals to change without requiring application level changes.

Standard Library

The standard library contains built-in object classes that provide a wide range of facilities.

Dynamic Library

The dynamic library subsystem provides a mechanism for extending Nexus functionality.

The module interface provides a means for applications to load external libraries. The libraries themselves can be statically or dynamically linked to the interpreter. Each library exposes an initialization function that registers the classes within it. The library loader automatically invokes this function after it has located the library.

Input/Output

The input/output subsystem provides facilites for loading source code, and importing and exporting precompiled code. The interface for all operations implements an extensible reader/writer model that allows the source/destination to be virtually anything (i.e. file, memory).

Operations are applied to blocks of code. A block is defined as a unit of execution, and contains one or more expressions. Source code blocks are passed to the compiler subsystem inorder to produce bytecode. Precompiled code is passed directly to the VM for execution.

Compiler

The compiler converts Nexus source code into bytecode that the virtual machine will execute. The compiler subsystem consists of lexer, parser, and code generator modules.

The compiler subsystem implements a pipeline design pattern. The modules are arranged so that the output of each is the input of the next. The lexer is responsible for obtaining tokens. The parser is responsible for analyzing the token stream in order to identify language constructs. The code generator produces bytecode as instructed by the parser.

Debugging

Virtual Machine

The virtual machine is responsible for executing bytecode generated by the compiler subsystem. The virtual machine is a critical component of the Nexus interpreter. The large amount of processor time spent in this subsystem, mandates a design that produces optimal performance.

The virtual machine kernel interprets an array of bytecodes that have been loaded from a code block. It then dispatches instructions using a simple loop-embedded switch statement. The execution of a virtual machine instruction consists of three parts:

  • Accessing the arguments of the instruction
  • Performing the function of the instruction
  • Dispatching (fetching, decoding, and starting) the next instruction

This approach has performance drawbacks. Dispatching instructions is very expensive. A typical compilation of the dispatch loop requires a minimum of three control transfer machine instructions per iteration. Superior dispatch implementations have been considered (i.e. direct-threading, inline-threading) however, none of them are particularly portable. This is a design that will inevitably evolve over time.

Memory Manager

In Nexus, memory is allocated only to objects. There is no explicit allocation of memory, there is only the creation of new objects. The interpreter employs a garbage collector that reclaims the memory occupied by an object once it determines that object is no longer accessible.

The managed heap is where the objects of a Nexus program live. It is a repository for live objects, dead objects, and free memory. In the managed heap memory is arranged as a contiguous block; a pointer tracks the boundary between allocated and free memory. As memory is allocated, the pointer is simply incremented resulting in much higher performance for allocations.

The heap module splits the heap into two physical areas, which are refrerred to as generations, one young and the other old. Most newly allocated objects are allocated in the young (ephemeral) generation, which is typically small and garbage collected frequently. Since most objects in it are expected to die quickly, the number of objects that survive a young generation collection (also referred to as a minor collection) is expected to be low. In general, minor collections are very efficient because they concentrate on a space that is usually small and is likely to contain a lot of garbage objects. Objects that are longer-lived are eventually promoted, or tenured, to the old generation. This generation is typically larger than the young generation and its occupancy grows more slowly. As a result, old generation collections (also referred to as major collections) are infrequent, but when they do occur they are more lengthy.

When objects are allocated, they are initially placed in the young generation. When the young generation nears its maximum size, a minor collection is initiated. This is a very fast GC pass, due to the small size of the young generation. A minor collection always results in the young generation being completely flushed. Any objects that are discovered to be garbage are freed, and any objects actually in use are tenured.

To keep minor collections short, the GC must be able to identify live objects in the young generation without having to scan the entire old generation. It achieves this by employing a data structure called a card table. The old generation is split into chunks called cards. The card table is an array with one byte entry per card in the heap. Every update to a reference field of an object also ensures that the card containing the updated reference field is marked dirty by setting its entry in the card table to the appropriate value. During a minor collection, only the areas that correspond to dirty cards are scanned to potentially discover old-to-young (inter-generational) references.

As the old generation nears its maximum size due to the number of tenured objects, a major collection phase will be initiated. The major collection looks at all objects in both the young and old generations. In addition to performing copying from the young to old generation, a mark/sweep is performed on all objects. If a major collection pass is not able to free enough memory for an allocation, the heap is expanded.

Platform

The platform subsystem implements a wide range of low-level interfaces that are used throughout the interpreter. These low-level interfaces abstract the underlying implementation and insulate the interpreter from low-level variations.

The operating system abstraction layer (OSAL) module shields the interpreter from low-level operating system interfaces, by providing a library of low-level interfaces to specific platform resources, such as I/O or memory. By providing a layer of abstraction that insulates the interpreter from low-level operating system interfaces, OSAL facilitates a platform independent interpreter.

The memory allocator module procures memory from the underlying operating system. The memory allocator implementation is based upon dlmalloc. This allocator provides implementations of the the standard C routines malloc(), free(), and realloc(), as well as a few auxiliary utility routines.

Work in progress... stay tuned.