A Scheme Interpreter for ARM Microcontrollers: Language (00.0160)

Overview:

The Armpit Scheme language implementation is based on the description of Scheme in the Revised^5 Report on the Algorithmic Language Scheme (r5rs), with few extensions and omissions. The language elements are separated into 6 categories: Level 1 Core, Level 1 Library, Level 1 Addendum, Level 2 Core, Level 2 Library and Level 2 Addendum. The Core (Level 1 and 2) represents the functions, syntax and variables of r5rs that do not have a "library" or "optional" qualifier (see r5rs section 1.3.3). The Library (Level 1 and 2) are procedures of "library" category in r5rs. The Addendum (Level 1 and 2) are additonal useful procedures. Core and Library are presented together in the next sections that are organized by level.

Level 1 provides a working Scheme system without numerical operations from exp to expt, cond, case, do, delay, caaar to cddddr and force. It is meant for the most memory-constrained MCUs (LPC-2103, LPC-2131) and also does not include the i2C subsystem and associated functions (pack, packed?, unpack) as well as direct top-level access to the READBUFFER, uart input port vector and uart output port vector.

The conformance of the language (at Level 2) to the r5rs standard is presented here

If the system hangs, try ctrl-c.

Armpit Scheme Core and Library, Level 1:

The r5rs elements incorporated into the Armpit Scheme Level 1 Core and Library are listed below with the section number in which they appear within r5rs:

  4. Expressions
     4.1. Primitive expression types
  	4.1.1 variable reference:	[internal bndenv]
	4.1.2 literal expressions:	quote
	4.1.4 procedures:		lambda
	4.1.5 conditionals:		if	
	4.1.6 assignments:		set!
     4.2. Derived expression types
        4.2.1 conditionals	        and, or
        4.2.2 binding constructs	let, let*, letrec
        4.2.3 sequencing		begin
	4.2.6 quasiquotation:		quasiquote
     4.3 Macros
       4.3.2 pattern language:		syntax-rules
  5. Program Structure
     5.2 definitions:			define
     5.3 syntax definitions:		define-syntax
  6. Standard Procedures
     6.1 equivalence predicates:	eqv?, eq?, equal?
     6.2 numbers
       6.2.5 numerical operations:	number?, integer?, =, <, >, <=, >=, +, *, -, /, quotient, remainder,
					modulo, floor, ceiling, truncate, round,
					zero?, positive?, negative?, odd?, even?, max, min, abs, gcd, lcm
       6.2.6 numerical input/output:	number->string, string->number
     6.3 Other Data Types
       6.3.1 booleans	                not, boolean?
       6.3.2 pairs and list:		pair?, cons, car, cdr, set-car!, set-cdr!
			                caar, cadr, cdar, cddr,	null?, list?, list, length, append, reverse,
					list-tail, list-ref, memv, memq, member, assq, assv, assoc
       6.3.3 symbols:			symbol?, symbol->string, string->symbol
       6.3.4 characters:		char?, char=?, char?, char<=?, char>=?, char->integer,
					integer->char
       6.3.5 strings:			string?, make-string, string, string-length, string-ref, string-set!
		                        string, string=?, substring, string-append, string->list, list->string,
					string-copy, string-fill!
       6.3.6 vectors:			vector?, make-vector, vector, vector-length, vector-ref, vector-set!
	                                vector, vector->list, list->vector, vector-fill!
     6.4 control features:		procedure?, apply, call/cc, values, call-with-values,
                                        map, for-each
     6.5 eval:				eval, interaction-environment
     6.6 Input and Output
       6.6.1 ports:			input-port?, output-port?, current-input-port, current-output-port,
                                        open-input-file, open-output-file, close-input-port, close-output-port
       6.6.2 input:			read-char, peek-char, eof-object?, char-ready?, read
       6.6.3 output:			write-char, write, display, newline
       6.6.4 system interface:		load

The remainder of this section discusses aspects of the Level 1 Armpit Scheme Core and Library language that may deviate from the standard or are not specified there (and that might also not be discussed in the conformance and implementation pages). One such deviation is that Armpit Scheme functions can be called with more or less arguments than specified without reporting an error. The results that they produce in such cases are not necessarily correct. Unspecified input arguments are replaced with nulls and overspecified arguments are disregarded. For example:

      (list->vector) ; -> #(-402856647 . -402856647) -- missing input, replaced with '(), odd output
      (list->vector '(1 2) 3 4) ; -> #(1 2) -- correct output with disregard of extra args

The system is also not designed to report type errors in this version. The user has to be a bit careful to avoid those, for example:

      (list->vector 5) ; -> (data  error:  5805) -- and the system hangs, do be a bit careful

On the numeric side, applying a numeric function to a non-number is designed to return nan as discussed also in Addendum Level 1 and in Core and Library Level 2, below. Also, neither fractions nor complex numbers are implemented.

      (/ #t) ; -> nan   = fine,  not a number
      1/3    ; -> 1.e3  = wrong, no fractions
      5+1i   ; -> 5.e19 = wrong, no complex numbers

In Armpit Scheme, the Input and Output functions defined in section 6.6 of r5rs are designed to work with memory locations (such as peripheral registers) in addition to more common uart, usb and file ports. A single byte can be read from or written to memory as a scheme character and the lower 30-bits of a word can be read from or written to memory as an integer (when reading, the upper 2 bits - bits 31 and 30 - are discarded to build the scheme integer, and when writing, they become copies of the 3rd most significant bit - bit 29 - which corresponds to bit 31 of the scheme integer). Reading from and writing to memory (eg. peripheral registers) extends the syntax of standard input and output functions to specify a source/destination base address as follows:

      (read-char  shifted-base-address  offset) ; -> char
      (read       shifted-base-address  offset) ; -> integer
      (write-char character  shifted-base-address  offset)
      (write      integer    shifted-base-address  offset)

The shifted-base-address is the main part of the source/destination address shifted to the right by 4 bits (one hexadecimal digit) and the full address of the memory location that is read-from or written-to is obtained as 16*shifted-base-address + offset. The offset must be specified to differentiate these input/output operations from standard character port or file writing operations (an offset of 0 is acceptable). One could think of memory input/output ports as consisting of two integers (shifted base address and offset) but, internally, they are represented in a slightly more involved manner, consistent with other ports.

In some similarity to memory ports, the (open-input-file ...) and (open-output-file ...) functions return integer IDs that uniquely identify the opened file (for the time it is open) and can be used as a port specifier in read/write functions and in the file closing functions:

      (open-output-file "zig") ; -> 1                     -- file ID 1 is acquired
      (write 'humm 1)          ; -> (non-printing object)
      (open-output-file "zag") ; -> 2                     -- file ID 2 is acquired
      (write 'what? 2)         ; -> (non-printing object)
      (close-output-port 1)    ; -> (non-printing object) -- file ID 1 is relinquished
      (open-input-file "zig")  ; -> 1                     -- file ID 1 is acquired
      (read 1)                 ; -> humm
      (read 1)                 ; -> eof
      (close-output-port 2)    ; -> (non-printing object) -- file ID 2 is relinquished
      (close-input-port 1)     ; -> (non-printing object) -- file ID 1 is relinquished

Hence, file ports may be thought of as single integers although internally their representation is more complex. One more word about files, Armpit Scheme considers a user file named "boot" to be a file that contains user-specified startup Scheme code which may be used to automatically start a data acquisition process (for example). If the code in such a file is erroneous the system may hang somewhat irrecoverably (the user file space would have to be cleared using an external program). To prevent such situation from happening, one general purpose input/output pin on each MCU is used to override the loading of the "boot" file (if needed). It is often the P0.3 pin on LPC2000 MCUs but another pin might be used and is used on other MCUs (please consult the FlashInitCheck: function in the src/MCU-Family/MCU-Family_init_io.s source code file to identify the proper pin if needed -- the pin should be connected to ground for override). It is a good idea to try ctrl-c first to see if it can break the system out of an improper "boot" file.

For built-in character ports (uart, usb, i2c) the port can also be specified by a single integer that represents the shifted-base-address of the peripheral in the MCU. For example, on the Samsung S3C2410 used in the TinCan Tools Hammer board, the base address of the uart0 peripheral (default input/output port) is #x50000000 and one can write through it using either of (note the one-hexadecimal digit right shift):

    (write "hello")
    (write "hello" #x05000000)
    (define (current-output-port) #x05000000)
    (write "hello")

As mentioned earlier a full port is slightly more involved and designed to allow a user to define additional input and output ports in addition to those that are built-in. A full port includes (potentially) the shifted base address of the port peripheral and (mandatorily) a vector of port input or output pseudo primitives written in assembly. At reset, the output of the (current-input-port) and (current-output-port) functions are such full ports, for example (on the Hammer):

      (current-input-port) ; -> ((83886080) . #(1  #primitive> #primitive> #primitive> #primitive>
                           ;                    #t #primitive> #primitive> #primitive>))

If the base addres is known and a proper port vector has been built then a full port can be constructed. For example, for the uart0 output port on the Hammer (_OPR is pre-defined as the uart output port vector):

      (define fuop (cons (list #x05000000) _OPR))
      fuop
      (define (current-output-port) fuop)
      (write "hello")
      (write "hello" fuop)

User-defined ports built in this way work seamlessly with scheme input/output functions and can be specified as current-input/output-ports. They can be built while the system is running rather than pre-assembled as part of the Armpit Scheme source code.

Armpit Scheme Addendum, Level 1:

The Level 1 Addendum adds 13 functions and 7 variables to the Core and Library as listed below:

   Addendum: pattern language	       match, substitute
   Addendum: reader                    parse
   Addendum: definitions  	       defined?
   Addendum: numbers                   inf, nan
   Addendum: bitwise logical ops       logior, logxor, logand, lognot, ash
   Addendum: file system	       erase, files, unlock
   Addendum: System/ArmSchembly        install, _maloc, _cons, _list, _save, _ISR

The syntax functions match and substitute are used by the Core pattern language and exposed at top-level. The function match is used to match a form to a pattern. The function substitute is used to substitute bindings into a template. The syntax of these functions are:

    (match form pattern initial-bindings literals)
    (substitute bindings template)

An example of the usage of match is:

    (define form         '(plus 2 3 4))
    (define pattern      '(_ x ...))
    (define old-bindings '((z . 1)))
    (define literals     '(else =>))
    (define new-bindings (eval `(match ,form ,pattern ,old-bindings ,literals)))
    new-bindings         ; -> ((x 4 3 2) (_ . plus) (z . 1)) == an a-list of bindings

The use of substitute is illustrated by continuing the above example:

    (define template '(+ z x ...))
    (define new-expr (eval `(substitute ,new-bindings ,template)))
    new-expr         ; -> (+ 1 2 3 4)
    (eval new-expr)  ; -> 10

The function parse is used by the reader and exposed at top-level. It takes a string as input and converts it to its internal representation. For example:

    (parse "(+ 2 3 4)")        ; -> (+ 2 3 4)
    (eval (parse "(+ 2 3 4)")) ; -> 9

The function defined? identifies whether a symbol is part of the currently accessible environment or not (without producing an eval error):

    (defined? 'xyz)    ; -> #f
    (define xyz 10)
    (defined? 'xyz)    ; -> #t

The 2 variables inf and nan are special floating point numbers that represent positive infinity and indefinite values according to the 30-bit adjusted version of IEEE-754 floating point numbers used in Armpit Scheme. The following examplifies:

    inf     ;  -> inf
    nan     ;  -> nan
    (- inf) ;  -> -inf
    (/ 0)   ;  -> inf
    (/ 0 0) ;  -> nan
    (+ "hello") ; -> nan

The bitwise logical operation functions are similar to those found in GNU guile scheme: logior, logxor, logand, lognot and ash (arithmetic shift):

    (number->string (logior #b1100 #b1010) 2) ; -> "00000000000000000000000000001110"
    (number->string (logxor #b1100 #b1010) 2) ; -> "00000000000000000000000000000110"
    (number->string (logand #b1100 #b1010) 2) ; -> "00000000000000000000000000001000"
    (number->string (lognot #b1100) 2)        ; -> "11111111111111111111111111110011"
    (number->string (ash #b1100  4) 2)        ; -> "00000000000000000000000011000000"
    (number->string (ash #b1100 -2) 2)        ; -> "00000000000000000000000000000011"

The file system functions erase, files and unlock are used to manipulate the file system. The function erase erases the user-file FLASH space (erases all user files -- may take a minute). The function files returns a list of the names of all user files currently in FLASH. The function unlock unlocks the file system (used in the case where a prior file operation terminated abnormaly, leaving the file system locked, which causes future file operations to hang):

     (files)  ; -> ("zag" "zag2" "zig2" "wawo" "wawi")
     (erase)  ; -> ()
     (files)  ; -> ()
     (unlock) ; -> ()

The 5 functions and 1 variable of System/ArmSchembly type are used to install objects above the heap, to link user-defined assembly code to Armpit Scheme primtives or to access the machine code Interrupt Service Routine Vector (ISR VECTOR). Examples of the use of these 5 functions and 1 variable (_ISR) are to be found in the Program Examples section of this web site. It is important to note that _maloc, _list, _cons and _save operate below normal top-level functions such as cons or list and should not, in themselves, be used to allocate memory at top-level (they return abnormally and change the memory reservation status). Rather, they are to be used in machine code assembled at the top level (eg. with an ARMSchembler). An assembler running in the Armpit Scheme top-level can find the address in FLASH of these pseudo-primitives (and other Scheme objects) using statements such as:

      (define (object-address obj)
        (display (string-append "#x" (number->string (ash obj 2) 16))))

      (object-address _cons) ; -> #x00000A08  
      (object-address cons)  ; -> #x00002B10

Once such addresses are obtained, it becomes possible to craft machine code (to be installed above heap RAM) that uses built-in symbols or branches to built-in functions. The function install takes a specially crafted scheme vector as input, adjusts the heap's top address down to make room for the vector's content, copies this content to the freed space above the heap and returns the address at which the object has been installed. The address returned by install is suitable for storage into a Scheme variable, using define or set! for example, and it evaluates to the stored object:

      (install '#(0 0 1363 0 25960 27756 111)) ; -> "hello"
      (define x (install '#(0 0 1363 0 25960 27756 111))) ; -> ()
      x  ; -> "hello"

The input vector to install contains a 16-bit-based representation of the item to be installed where each 16-bit halfword of the item is stored in a Scheme integers. It is essentially the internal representation (see implementation) of the object to install stored 16-bits at a time (sticking these 16-bit chunks into Scheme integers makes the vector gc-safe). It is notable that because install returns an address rather than an immediate, the object in the input vector needs to be a multi-word object. For common multi-word Scheme objects, the input vector is structured as follows (imm indicates an immediate object (i.e. a single word object)):

       Index Halfword    Primitive        Vector     String               List
       ----- -------- --------------- ------------- ---------- -----------------------------
         0    Lower   type (synt/var)       0           0                   0
         1    Upper   type (synt/var)       0           0                   0
         2    Lower    primitive tag    vector tag  string tag  car of list or offset to car
         3    Upper    primitive tag    vector tag  string tag  car of list or offset to car
         4    Lower    machine code   imm or offset  char 1,2    imm, null or offset to cdr
         5    Upper    machine code   imm or offset  char 3,4    imm, null or offset to cdr
        ...    ...         ...              ...         ...                ...

For example, the representation of the string "hello" in an install vector (as shown in the example above) is:

      Index   Content  Meaning
      ----- ---------- -------
        0   #x00000000 this object is not a primitive
        1   #x00000000 this object is not a primitive
        2   #x00000553 lower 16-bits of string tag for string with 5 characters
        3   #x00000000 upper 16-bits of string tag for string with 5 characters
        4   #x00006568 lower 16-bits of 1st word of string content (ASCII e and h)
        5   #x00006C6C upper 16-bits of 1st word of string content (ASCII l and l)
        4   #x0000006F lower 16-bits of 2nd word of string content (ASCII o)

Such install vectors can contain the address of objects in FLASH or non-heap RAM by splitting these addresses into 2 16-bit chunks. This simple strategy does not work where objects to install contain pointers to some other part of themselves (eg. where they represent machine code with an internal branch, or even just a list of scheme items). Since the address of an item that is in the object itself is unkown until the object is installed, it is not possible to specify such internal address directly. For this situation, the offset of the item within the object is specified instead and its actual address is computed by the install function, when the object is installed. To indicate that a pair of half-words represents an offset, the lower halfword is tagged with ones in its upper 1 to 6 bits (just one bit is needed there really). For example, the list '(9 8 6) is represented in an install vector as:

      Index   Content  Meaning
      ----- ---------- -------
        0   #x00000000 this object is not a primitive
        1   #x00000000 this object is not a primitive
        2   #x00000025 lower 16-bits of 9 tagged as Scheme integer, eg. (ash #x25 -2)
        3   #x00000000 upper 16-bits of 9 tagged as Scheme integer
        4   #xE0000002 lower 16-bits of offset (2) to next item in list (note most significant bits)
        5   #x00000000 upper 16-bits of offset (2) to next item in list
        6   #x00000021 lower 16-bits of 8 tagged as Scheme integer, eg. (ash #x21 -2)
        7   #x00000000 upper 16-bits of 8 tagged as Scheme integer
        8   #xE0000004 lower 16-bits of offset (4) to next item in list (note most significant bits)
        9   #x00000000 upper 16-bits of offset (4) to next item in list
       10   #x0000001D lower 16-bits of 7 tagged as Scheme integer, eg. (ash #x1D -2)
       11   #x00000000 upper 16-bits of 7 tagged as Scheme integer
       12   #x0000000F lower 16-bits of internal representation of null
       13   #x00000000 upper 16-bits of internal representation of null

Check:

    (install '#(#x00 #x00 #x25 #x00 #xE0000002 #x00 #x21 #x00 #xE0000004 #x00 #x1D #x00 #x0F #x00))

Armpit Scheme Core and Library, Level 2:

The Level 2 Armpit Scheme Core and Library includes most of the remainder of r5rs that would not fit in memory constrained MCUs (LPC-2103, LPC-2131):

  4. Expressions
     4.2. Derived expression types
        4.2.1 conditionals	        cond, case
	4.2.4 iteration                 do
	4.2.5 delayed evaluation        delay
  6. Standard Procedures
     6.2 numbers
       6.2.5 numerical operations:	exp, log, sin, cos, tan, asin, acos, atan, sqrt, expt
     6.3 Other Data Types
       6.3.2 pairs and list:		caaar,	caadr,	cadar,	caddr,	cdaar,	cdadr,	cddar,	cdddr,
                                	caaaar,	caaadr,	caadar,	caaddr,	cadaar,	cadadr,	caddar,	cadddr,
					cdaaar,	cdaadr,	cdadar,	cdaddr,	cddaar,	cddadr,	cdddar,	cddddr
     6.4 control features:		force

There is nothing particular about these functions except that the numerical operations are designed to return nan (rather than produce an error) when applied to non-numbers:

      (exp "hello")   ; -> nan 
      (cos '(1 2 3))  ; -> nan

Armpit Scheme Addendum, Level 2:

The Level 2 Addendum adds 3 functions and 3 variables to the system as listed below:

   Addendum: packing objects	       pack, unpack, packed?
   Addendum: System/ARMSchembly        _RBF, _IPR, _OPR

The pack, unpack and packed? functions are designed to process (build, restore, check) position-independent objects, mainly for i2c-based multiprocessing, and are not functional in this version of Armpit Scheme. They will be re-instated in a future version.

The variables _RBF, _IPR and _OPR provide top-level access to the READ BUFFER, the uart input port vector and the uart output port vector, respectively. They are meant to support the development of machine code drivers for input or output peripherals that may replicate some of the functionality found in the uart. The drivers would be installed above the heap via the install function of the Level 1 Addendum, while the system is running.

Last updated February 2, 2009

bioe-hubert-at-sourceforge.net