A Scheme Interpreter for ARM Microcontrollers:
LSD Bootloader Implementation, Version 080

LSD Overview:

The Live-SD (LSD) bootloader for ArmPit Scheme (APS) is written in ARM assembly using a unified syntax that bridges Thumb2, 32-bit ARM and 64-bit Aarch64 instructions sets (ARMv7M, ARMv7A and ARMv8A architectures), via macros where necessary. The LSD performs the MCU initialization required for the device to run the APS and then launches it. It is the hardware-dependent part of the system and defines, among other things, the input/output ports, and related functions, that the running APS will use (including Basic Input/Output Functions: bio_funcs). The LSD is designed to work with ArmPit Scheme but may also be used to launch an arbitrary machine code program of the user's choice (just name that program apsT2.bin, aspT2co2.bin, aps32.bin or aps64.bin, as appropriate).

The LSD bootloader source code is organized into 1 main configuration file, 11 common files and up to 9 mcu-specific files:

    Main Configuration File:    lsd_080.s

    Common Files (in lsd/):     lsd_constants.s
                                lsd_macros.s
                                lsd_core.s

    Common Files from aps/:     aps_constants.s
                                aps_constants_64.s
                                aps_macros.s

    Ports (in lsd/io_ports):    aps_ports.s
                                memory_port.s
                                uart_port.s
                                usb_port.s
                                file_sd_port.s

    MCU-Specific Files:         board.h             (in manufacturer/mcu/board/)
                                device_family.h     (in manufacturer/mcu/)
                                startup.s           (in manufacturer/mcu/)
                                hw_local_init.s     (in manufacturer/mcu/)
                                system_0.s          (in manufacturer/mcu/)
                                file_sd_hw.s        (in manufacturer/)
                                usb_hw.s            (in manufacturer/)

    MCU-Optional files:         file_sd_hw_local.s  (in manufacturer/mcu/)
                                usb_hw_local.s      (in manufacturer/mcu/)

The main configuration file, lsd_080.s, uses ".include" statements to load the source code for constants, ports and mcu-specific elements for assembly. The script lsd_build (at top-level in the source code) assembles the result into a machine code LSD for a specific board. It is typically called by build_all (also at top-level), which assembles all LSDs and APSs.

The configuration file code uses macros defined in lsd_macros.s to construct a group of temporary scheme objects that the LSD will pass to the APS when it transfers execution there. These objects include an ISR vector, one or more Core Buffers, a multicore startup vector, a Main Buffer (MBF), a Global Vector (_GLV), and a scheme environment vector and obarray that is empty except for the mcu-specific ports and system_0 sub-environments. The _GLV and MBF (root) are accessed differently on 32-bit and 64-bit systems. In 32-bit, their addresses are stored in FPU registers s17 and s18, respectively, while in 64-bit they are stored in registers x26 and x27, respectively. Accessor macros for these objects (eg. getGLV, getMBFr) provide code uniformity across 32-bit and 64-bit systems, and are defined in the aps_macros.s file.

LSD Execution:

LSD execution begins in the startup.s file. On Cortex-M, the file starts with temporary exception vectors used during the boot process. On other MCUs, the file may start with a header needed for booting from SD-card, or for being launched by u-boot (Xilinx Zynq only). On some systems, the code may copy the LSD to a non-default memory location (eg. other than 0x00) and perform memory remapping. On Cortex-M MCUs, the TCM, L1 cache and MPU may be condigured at this stage. On those Cortex-A53 MCUs that start in Aarch32 mode, the startup code performs a warm reset to Aarch64. Also, if the MCU started in EL3, the code drops it to EL1 secure state, and continues with initialization.

After the above preliminary initialization, startup.s typically configures PLLs, interrupts, GPIOs and the main uart using code that varies little betwen MCUs, for example:

	bl	regcfg			// set values for fre, sv1-sv5, env
	bl	hw_cfg			// configure clocks
	bl	piocfg			// configure gpio
	bl	uarcfg			// configure uart
	report	greet_name		// display boot program name/version

The functions regcfg, hw_cfg, piocfg and uarcfg are mcu-specifc and are defined in the file hw_local_init.s. The macro "report" is defined in the file lsd_macros.s and prints out, to the uart, the content of the "greet_name" string (an ArmPit Scheme symbol encoded as utf-8) which is defined in the main configuration file lsd_080.s.

The startup code continues with configuration of SDRAM (if any), FPU enabling (so that s17 and s18 can be used in 32-bit systems), configuration of the SD-card (SD/MMC) subsystem and it then initializes communication with the SD-card. This initialization is performed by the sd-init scheme function (available also at the REP) defined in file_sd_port.s. The function and its sub-functions call various mcu-specific functions from file file_sd_hw.s to achieve their tasks. The sd-init function also writes into the temporary _GLV the type of SD-card (SD or SDHC) that has been found in the card socket (this is why, on 32-bit systems at least, the FPU has to be enabled prior to sd-card initialization).

After this, the startup code loads the APS code from SD-card, and copies it to SRAM or SDRAM, using the function ld_aps of file lsd_core.s. The function ld_aps calls _bsgb from file file_sd_port.s to load the needed 512-byte blocks from the SD-card (FAT directory and then file content) and _bgsb in turn calls the mcu-specific function _sgbrs from source code file file_sd_hw.s to perform the actual data transfer.

Startup continues with configuration of the USB-device subsystem, configuration of the MMU and TTB (for Cortex-A), storing the target address (in SDRAM or SRAM) of the MBF, in the MBF, and jumping to the APS with the _GLV in s17 or x26, the MBF in s18 or x27, and the cpu-id in sv1 (cpu-id is 0 for single cpu systems).

LSD Scheme Objects:

In the configuration file, lsd_080.s, the LSD sets up and uses 7 scheme objects that are later re-used and/or modified by the APS: an ISR vector, one or more Core Buffers, a multicore startup vector, a Main Buffer (MBF), a Global Vector (_GLV), a scheme environment vector and obarray, and a pseudo-vector of basic i/o (bio) functions. The ISR vector is named ISR_vector and built by the macro ISRVECTOR in aps_macros.s. It contains the addresses of Interrupt Service Routines (ISRs) for specific peripherals, located at the index that corresponds to the peripheral's IRQ number (eg. uart0_int_num) found in file device_family.h. The length of ISR_vector is given by num_interrupts in device_family.h and each ISR is a scheme primitive, for example: uarisr in uart_port.s or tmrisr in lsd_core.s. A value of 0 (scheme integer) is used for peripherals with no ISR:

        ISR_vector = #(0 0 0 uarisr 0 0 tmrisr 0 0 0 0 usbisr 0 0 ...)

Core Buffers are bytevectors used to define the location of the stack, heap bottom and tops, mcu ID (for variables), default i/o port and read/write buffers for each CPU core in the MCU. Several of the values used in these buffers (memory partitioning and layout) are defined in the lsd_constants.s file. Each bytevector entry is 4-bytes long in 32-bit systems and 8-bytes long in 64-bit systems. There is one such buffer per core, they are named core_buffer_0 to core_buffer_7 (as needed) and are built by the CORE_BUFFERS and make_core_buffer macros of lsd_macros.s:

     core_buffer_0 = #u8(main_stack      above-heap-address top-of-bottom-heap
                         top-of-top-heap heap-bottom        mcu-var-id
                         default-io-port readbuffer-address writebuffer-address)

The multicore startup vector is a bytevector named mp_cores that is built by the MPCORE_VECTOR macro in lsd_macros.s. It contains two entries, each 4-byte long in a 32-bit system or 8-bytes in 64-bit: the index of the main (startup) core and a bitfield of which other cores it should start. This vector is defined but not used in the current implementation where only the main core runs. It is included for future developments:

        mp_cores = #u8(main-core-index   bitfield-of-cores-to-start)

The Main Buffer (or MBF/MBFr) is a bytevector initially named mainbuffer, located within the data space of the LSD boot RAM, and later copied by APS to RAM that can be over-written, at address MAINBUFFER defined in lsd_constants.s. The MBF/MBFr is built by the macro MAIN_BUFFER in lsd_macros.s and entry size is the same as in the two bytevectors above. Entries at indices 0 to 9, and 15, depend on whether the system is a Cortex-M, a 32-bit ARM, or a 64-bit device and entries 31-38 are not included in 64-bit:

                    Cortex-M version of Main Buffer
        -------------------------------------------------------------
        index     item name          description
        -----  -----------------  -----------------------------------
            0  core_buffer_0      cpu0 specific param buffer address
            1  gen_fre_bottom     address of bottom of nursery
            2  gen_fre_top        address of top    of nursery
            3  nursery_size	  size of nursery
            4  scminten__0__31	  enabled ints   0-31
            5  scminten_32__63	  enabled ints  32-63
            6  scminten_64__95	  enabled ints  64-95
            7  scminten_96_127	  enabled ints  96-127
            8  scminten128_159	  enabled ints 128-159
            9  scminten160_191	  enabled ints 160-191
        -----  -----------------  -----------------------------------
           10  mp_cores	          bytevector of main and startup cores
           11  bio_funcs          basic i/o funcs
           12  env_scheme         built-in environment
           13  oba_scheme         built-in obarray
           14  ISR_vector         initial ISR vector
        -----  -----------------  -----------------------------------
           15  useL1cache         i0=no cache, i1=cache
        -----  -----------------  -----------------------------------
           16  mainbuffer         mainbuffer address (sd-init), MAINBUFFER (APS)
           17  vfile              default i/o file port
           18   0 (raw)           file lock
           19  i0                 sd-card type: i0=SD, i1=SDHC
           20   0 (raw)           space for APS's genise
           21   0 (raw)           space for APS's adr__alo
           22   0 (raw)           space for APS's adr__err
        23-24   0 (raw)           USB data, 64 bytes to idx 38 (32-bit) or 34
        25-26   0 (raw)           USB setup buffer
        27-28   0 (raw)           USB chunk
        29-30   0 (raw)           USB address
        31-38   0 (raw)           may be used by USB data (32-bit only)
        -----  -----------------  -----------------------------------


         32-bit ARM version of Main Buffer (specific components only)
        -------------------------------------------------------------
        index     item name          description
        -----  -----------------  -----------------------------------
            0  core_buffer_0      cpu0 specific param buffer address
            1  core_buffer_1      cpu1 specific param buffer address
            2  core_buffer_2      cpu2 specific param buffer address
            3  core_buffer_3      cpu3 specific param buffer address
            4  core_buffer_4      cpu4 specific param buffer address
            5  core_buffer_5      cpu5 specific param buffer address
            6  core_buffer_6      cpu6 specific param buffer address
            7  core_buffer_7      cpu7 specific param buffer address
            8  getint             function that returns asserted interrupt
            9  clrint             function that clears  asserted interrupt
        -----  -----------------  -----------------------------------
           15  TTB0_address       address of TTB0
        -----  -----------------  -----------------------------------


           64-bit version of Main Buffer (specific components only)
        -------------------------------------------------------------
        index     item name          description
        -----  -----------------  -----------------------------------
            0  core_buffer_0      cpu0 specific param buffer address
            1  core_buffer_1      cpu1 specific param buffer address
            2  core_buffer_2      cpu2 specific param buffer address
            3  core_buffer_3      cpu3 specific param buffer address
            4  core_buffer_4      cpu4 specific param buffer address
            5  core_buffer_5      cpu5 specific param buffer address
            6  core_buffer_6      cpu6 specific param buffer address
            7  core_buffer_7      cpu7 specific param buffer address
            8  GICC_base          address of GICC
            9  GICD_base          address of GICD
        -----  -----------------  -----------------------------------
           15  TTB0_address       address of TTB0
        -----  -----------------  -----------------------------------

The initial Global Vector (_GLV) defined in the LSD is a vector of 12 raw zeros and twelve tagged zeros named ini_glv and built by the INITIAL_GLV macro of lsd_macros.s. It is used only by the SD-card initialization code (scheme function sd-init in file_sd_port.s) but serves multiple purposes there. The zeros it contains direct sd-init, if it fails, to re-try SD-card initialization, rather than returning #f. The zeros also make it possible to wait a smaller number of cpu cycles between initialization steps (as compared to running sd-init from APS) to account for the lower system performance resulting from caches being disabled during most of the LSD execution (in most cases, caches are enabled after the MMU remaps the memory where the APS was loaded to 0x00). The initial GLV is also where sd-init stores card type, to differentiate between SD and SDHC cards when specifying the address of the 512-byte block to be read:

        ini_glv = #(0 (raw) ... 0 (raw) i0 ... i0)

In lsd_080.s, the LSD uses the macros STARTOBAENV, OBAENVWORD and ENDOBAENV, of aps_macros.s, to build a scheme environment and obarray with bindings for mcu-specific functions, including i/o ports and system-0:

        env_scheme = #(empty empty empty empty env_ports empty env_system_0)

        oba_scheme = #(empty empty empty empty oba_ports empty oba_system_0)

The ports and system-0 sub-environments are built with STARTSUBOBAENV, BNDREG, BNDVAR, PRIMIT (for named functions, eg. pfun; not ufun) and ENDSUBOBAENV, defined in aps_macros.s. The files defining i/o ports and functions are loaded into aps_ports.s with ".include" statements. The memory port, MEM, comes from memory_port.s; the uart variables uar0 and uar1 (register base addresses), and the uart ports UAR0 and UAR1, come from uart_port.s. The SD-card file port, FILE, aliased to SDFT, and the sd-init function come from file_sd_port.s; the USB-device variable usb (base address) and the USB-device port, USB, come from usb_port.s.

        env_ports = #(MEM uar0 UAR0 uar1 UAR1 FILE SDFT sd-init usb USB)

Input/Output ports are vectors with 3 components: 1) a base address (or 0); 2) ipr: a vector of input parameters and (unnamed) functions, and; 3) opr: a vector of output parameters and (unnamed) functions:

       i/o-port = #(base-address-or-0 ipr opr)

The format of binary (MEM) and textual (UAR0, USB, FILE) input and output port vectors is shown below. The format is quite similar to that used in previous versions (eg. 050). The LSD port-vector functions are sub-functions or sub-sub-functions called by higher level read/write functions defined in APS.

              binary input port                    binary output port
     ------------------------------------   ------------------------------------
     index   item                           index   item
     -----   ----------------------------   -----   ----------------------------
       0     1 (identifies input port)        0     2 (identifies output port)
       1     port-close      function         1     port-close function
       2     read-u8/peek-u8 function         2     write-u8   function
       3     ready? (#t)     function         3     write      function
       4     read            function 
     -----   ----------------------------   -----   ----------------------------


              textual input port                  textual output port
     ------------------------------------   ------------------------------------
     index   item                           index   item
     -----   ----------------------------   -----   ----------------------------
       0     1 (identifies input port)        0     2 (identifies output port)
       1     port-close          function     1     port-close      function
       2     read-u8/peek-u8     function     2     write-u8        function
       3     char-ready?         function     3     11 (identifies text port)
       4     10 (identifies text port)        4     write           function
       5     #t/#f wait-for-cr                5     ISR echo        function
       6     read-helper-0 init  function     6     optional: info  function
       7     read-helper-1 getc  function     7     optional: erase function
       8     read-helper-2 done  function     8     optional: block size
       9     optional: info      function
      10     optional: file list function
      11     optional: block size
     -----   ----------------------------   -----   ----------------------------

The LSD also builds a pseudo-vector (a bytevector) of low-level, basic, input/output functions, that it exports to APS via the Main Buffer. The pseudo-vector is named bio_funcs and is constructed in lsd_core.s using the STARTBVU8, VCTR_item and ENDSized macros of aps_macros.s. The bio_funcs contains the starting addresses of 9 hardware-specific low-level functions and 1 register address, that turn LEDs on/off or write to the uart, and may be useful for signalling error conditions or debugging:

        bio_funcs = #u8(ua0wrt ua0whx ua0wst
                        gldon gldoff yldon yldoff rldon rldoff
                        uart0_base+uart_thr)

The first 3 functions write a byte, a hexadecimal value or a string to uart0, the next 6 functions turn either the green, yellow or red LED (if any) on or off, and the last item is the address of uart0's transmit register.

Memory Layout Prepared by LSD:

LSD runs from on-chip MCU RAM, typically the boot RAM, and sets up the memory layout for the APS, which it transmits to it via the Core Buffers (described earlier) located in the LSD Main Buffer. The memory layout defines where the APS code is loaded, where the two semi-heaps (for stop-and-copy garbage collection) and their attendant memory barriers (that trigger gc) are, where the above-heap space is located (used to store the user obarray, libraries, and executable binaries), where the read and write buffers are and where the LSD Main Buffer is copied to. The layout can differ significantly between Cortex-M and Cortex-A chips, but is quite similar between 32-bit and 64-bit Cortex-A MCUs.

Cortex-M MCUs are typically single-core devices and their memory map may include several separate areas of on-chip RAM (OCRAM), some of which may be tightly coupled to the MCU for either instruction (ITCM) or data (DTCM) access. Additionally, the chips (or boards) that use Cortex-M MCUs may, or may not, include SDRAM (external RAM). The Cortex-M implementation of LSD and APS further differs from the Cortex-A implementation by using an MPU (rather than MMU) to specify memory access permissions and locate memory barriers, and the implementation includes a memory allocation nursery to speed-up heap management operations (garbage collection). To address the diversity of MCUs and boards, the memory layout for Cortex-M systems is defined individually, for each board, in the board.h files. The resulting layout, for a MCU with DTCM and SDRAM may look like the example below:

                       ======================================
                        EXAMPLE of MEMORY LAYOUT for Cortex-M
                       ======================================
        -----------    --------------------------    -------  --------------------------
          Tag              Memory Area                Size     Location
        -----------    --------------------------    -------  --------------------------
        SDRAMTOP       ..........................                top of SDRAM
                           Frame Buffer                          (optional)
                           Write Buffer               64 KB    
                           Read  Buffer               64 KB    
                           above-heap space           grows downwards    
                           memory barrier             32  B
        heaptop1       ..........................
                           HEAP 1                     shrinks as above-heap space grows
                           memory barrier             32  B
        heaptop0       ..........................
                           HEAP 0                     shrinks as above-heap space grows
        heapbottom     ..........................                bottom of SDRAM
        OCRAMTOP       ..........................                top of OCRAM
                           main system stack                     OCRAM
                           Main Buffer                           OCRAM, below stack
                           memory allocation nursery             DTCM
                           LSD Code                    16 KB     0x20000000
                           APS Code                    48 KB     0x00000000
        OCRAMBOTTOM    ..........................                bottom of OCRAM
        -----------    --------------------------    -------  --------------------------

The 32-bit Cortex-A MCUs typically include a small amount of boot RAM, where the LSD is loaded, and the boards include substantial external SDRAM. The LSD stores the APS code at the bottom of SDRAM and uses the MMU to remap it to address 0x00000000. System buffers and TTBs are stored near the top of SDRAM, at their native (non-remapped) addresses. The system may have multiple cores and a memory area is set aside for each to store its stack, read and write buffers, heap (two semi-heaps and memory barriers) and above-heap space. A page size of 1 MB is used in the MMU and, consequently, the memory barriers at the top of each semi-heap are set (and aligned) to that same size of 1 MB. The locations of individual components of this layout are defined in lsd_constants.m, based on values of OCRAMBOTTOM, SDRAMBOTTOM, SDRAMTOP, framebuffer_MB and num_cores (defaulting to 1) in the MCU's board.h file. The first table below shows the overall memory layout set-up by the LSD for these MCUs, and the second table shows the memory layout for the space assigned to individual cores:

                       =======================================
                        MAIN MEMORY LAYOUT for 32-bit Cortex-A
                       =======================================
        -----------    --------------------------    -------  --------------------------
          Tag              Memory Area                Size     Location
        -----------    --------------------------    -------  --------------------------
        SDRAMTOP       ..........................                top of SDRAM
                           Frame Buffer               12 MB     12 MB below SDRAMTOP
                           TTB0                                 16 KB above TTB1
                           TTB1                       16 KB     80 KB below Frame Buffer
                           Main Buffer                         512 KB below Frame Buffer
        RAMTOP         ..........................                1 MB below Frame Buffer
                           Core (num_cores - 1) memory
                           ...
                           Core 1 memory
                           Core 0 memory
        RAMBOTTOM      ..........................                1 MB above SDRAMBOTTOM
                           APS Code                    1 MB      0x00000000 (remapped)
        SDRAMBOTTOM    ..........................                bottom of SDRAM
                           LSD Code                  16-64 KB    boot RAM
        -----------    --------------------------    -------  --------------------------

                       ========================================
                       CORE n MEMORY LAYOUT for 32-bit Cortex-A
                       ========================================
        -----------    --------------------------    -------  --------------------------
          Tag              Memory Area                Size     Location
        -----------    --------------------------    -------  --------------------------
        (no name)      ..........................                top of Core n memory
                           main system stack                   512  B below top
                           Write Buffer               64 KB     65 KB above above-heap space
                           Read  Buffer               64 KB    512  B above above-heap space
                           above-heap space           grows    256 KB below top
        (no name)      ..........................                2 MB below top
                           memory barrier              1 MB
        heaptop1       ..........................
                           HEAP 1                     shrinks as above-heap space grows
                           memory barrier              1 MB
        heaptop0       ..........................
                           HEAP 0                     shrinks as above-heap space grows
        heapbottom     ..........................                bottom of Core n memory
        -----------    --------------------------    -------  --------------------------

The memory layout set-up by LSD for 64-bit Cortex-A chips is essentially the same as that for 32-bit MCUs, except that the MMU is configured with a 2 MB page size and therefore the main memory regions, and barriers, are aligned to that 2 MB size (rather than 1 MB for 32-bit systems). The two tables below illustrate that layout:

                       =======================================
                        MAIN MEMORY LAYOUT for 64-bit Cortex-A
                       =======================================
        -----------    --------------------------    -------  --------------------------
          Tag              Memory Area                Size     Location
        -----------    --------------------------    -------  --------------------------
        SDRAMTOP       ..........................                top of SDRAM
                           Frame Buffer               12 MB     12 MB below SDRAMTOP
                           TTB0                                  8 KB above TTB1
                           TTB1                        8 KB     80 KB below Frame Buffer
                           Main Buffer                           1 MB below Frame Buffer
        RAMTOP         ..........................                2 MB below Frame Buffer
                           Core (num_cores - 1) memory
                           ...
                           Core 1 memory
                           Core 0 memory
        RAMBOTTOM      ..........................                2 MB above SDRAMBOTTOM
                           APS Code                    2 MB      0x00000000 (remapped)
        SDRAMBOTTOM    ..........................                bottom of SDRAM
                           LSD Code                  16-64 KB    boot RAM
        -----------    --------------------------    -------  --------------------------

                       ========================================
                       CORE n MEMORY LAYOUT for 64-bit Cortex-A
                       ========================================
        -----------    --------------------------    -------  --------------------------
          Tag              Memory Area                Size     Location
        -----------    --------------------------    -------  --------------------------
        (no name)      ..........................                top of Core n memory
                           main system stack                   512  B below top
                           Write Buffer               64 KB     65 KB above above-heap space
                           Read  Buffer               64 KB    512  B above above-heap space
                           above-heap space           grows    256 KB below top
        (no name)      ..........................                4 MB below top
                           memory barrier              2 MB
        heaptop1       ..........................
                           HEAP 1                     shrinks as above-heap space grows
                           memory barrier              2 MB
        heaptop0       ..........................
                           HEAP 0                     shrinks as above-heap space grows
        heapbottom     ..........................                bottom of Core n memory
        -----------    --------------------------    -------  --------------------------

Extending the System with New LSD Functions:

Adding new functions, accessible from the APS REP, can be done relatively simply by adding code to the system_0.s file and re-assembling the LSD. The steps are examplified below for the addition of the "revenu" function described for similar purposes in version 050 (it is a "backwards", or reverse, enumeration function, used for simplicity of code, and hopefully unrelated to the concept of revenue).

The target new function, revenu, takes either a positive integer count and an ending integer or a positive integer count, an integer step and an ending integer as input values, and returns a list of "count" integers that ends with the ending integer, and decrements by the step (or 1 as default). Once the function has been added to the code and the code has been reassembled and uploaded to the MCU, it will be possible to use this new function at top-level and perform operations such as:

ap> revenu
#proc

ap> (revenu 3 7)
(9 8 7)

ap> (revenu 5 3 2)
(14 11 8 5 2)

ap> (revenu 4 -10 100)
(70 80 90 100)

To build this function, we use the PRIMIT macro from aps_macros.s to define the function's name ("revenu"), its label (frevenu), its type (named primitive function: pfun), and its number of non-optional input arguments (2). The PRIMIT macro automatically adds the function to the scheme environment and obarray. Here, as the function is added to the system_0.s file (eg. at the end of that file), its code becomes part of the env_system_0 sub-environment, and its name gets integrated into the oba_system_0 sub-obarray (cf. lsd_080.s for which sub-environment system-0 is a part of). Then, under PRIMIT, we write the code of the function:


	/* (revenu count end) or (revenu count step . end) */
	PRIMIT "revenu",frevenu,pfun,2
	// in:	sv1 = count
	// in:	sv2 = end  or step
	// in:	sv3 = null or (end)
	// out:	sv1 = result (list of numbers)
	sub	sv5, sv1, #4		// sv5 = count - 1		(sch int)   1
        set	sv4, i1			// sv4 = 1, default step	(sch int)   2
	nullp   sv3                     // use default step?		            3
	beq	rvnu01			// 	if so, jump to continue             4
	set	sv4, sv2		// sv4 = step			(sch int)   5
	car	sv2, sv3	        // sv2 = end			(sch int)   6
rvnu01:	// continue
	set     sv1, sv2                // sv1 = end, 1st val for result(sch list)  7
	list    sv2, sv1                // sv2 = (end) = initial result	(sch list)  8
rvnulp:	// loop over values to cons
        eq	sv5, #i0		// is count = 0 (done) ?		    9
        beq     rvnuxt		        //       if so,  jump to exit		   10
	int2raw	rva, sv1		// rva = latest val consed to result (raw) 11
	int2raw	rvb, sv4		// rvb = step			     (raw) 12
	add	rva, rva, rvb	        // rva = next val to cons to result  (raw) 13
	raw2int	sv1, rva        	// sv1 = next val to cons to res (sch int) 14
	cons    sv2, sv1, sv2           // sv2 = (... end), updated res (sch list) 15
        sub	sv5, sv5, #4		// sv5 = updated count		 (sch int) 16
	b       rvnulp                  // jump to add next item	           17
rvnuxt: // exit
        set     sv1, sv2                // sv1 = result			(sch list) 18
        br	con			// return with result in sv1	           19

The function receives its input arguments (scheme objects) in scheme value registers sv1 to sv3, its environment is in the register env (used, for example, if the function calls eval or bndchk) and its continuation (return address) is in the register con. The function code can use registers sv1 to sv5 (gc-ed) to manipulate scheme values and rva to rvc (not gc-ed) to manipulate raw values. Scheme values can be temporarily saved on the scheme stack, dts (gc-ed), if needed, and, if so, that stack needs to be popped back to its entry state prior to returning from the function. When ready to return, the function will need to store its return value in sv1 and will then jump to its continuation (con). If the function needs to call another scheme function to perform its work (eg. eval or apply) it can save its return address (con) on the dts prior to that call, then use the macro call (aps_macros.s) to call that other scheme function (the macro sets con for the appropriate return) and then restore its own continuation from the dts upon return from the call. The revenu function is a simple code example that does not use the env and dts registers and does not call other scheme functions.

In Line 1 of the revenu code, the number of items to cons onto the result list, in additon to the end value, is computed from count (scheme integer in sv1) and stored in sv5 for later use (scheme ints are shifted left by 2 bits relative to raw ints and therefore adding 4 to them is equivalent to adding 1 to a raw int. The same holds for subtraction if numbers are and remain positive or 0). Line 2 sets the default step, stored in sv4, to 1 (scheme integer). Line 3 tests to see if 2 or 3 input arguments were provided to the function, using the nullp macro that checks if the content of sv3 is null. If null, the code jumps to continue. Lines 5 and 6 are executed only if a step was given by the user of the function. Line 5 sets the step register, sv4, to the given value, currently in sv2. Line 6 extracts the end value from the car of the list of optional input arguments, currently in sv3. Line 7 copies the end value to sv1 and Line 8 builds the initial result list, using the list assembler macro, and stores it in sv2. The statement: list sv2, sv1, builds a cons between the contents of sv1 and null, and stores the result in sv2 (Note: the list macro side-effects raw value registers rva to rvc but does not modify sv1-sv5 except the destination register for the list).

The main code loop starts on Line 9 by using eq (alias to ARM's teq) to test whether more integers should be consed to the front of the result list (i.e. if count, stored in sv5 as a scheme integer, is scheme zero = #0x01 = #i0). If no more numbers need to be consed the code jumps to rvnuxt: for function exit. If more numbers are to be consed, the last number added (in sv1) is converted to a raw integer by the macro int2raw and stored in raw value register a (rva) on Line 11. Similarly, the step (in sv4) is converted to a raw integer and stored in raw value register b (rvb) on Line 12. The sum of the last number (rva) and step (rvb) is then stored in rva (raw integer) and then converted back to a scheme integer with the raw2int macro and stored in scheme value register 1 (sv1) on Lines 13 and 14. The sum (in sv1) is consed to the front of the result list (in sv2) using the cons macro and the resulting list is stored back in sv2 on Line 15 (Note: the cons macro side-effects rva-rvc but preserves sv1-sv5 except the destination register for the cons which is updated. This is why raw values in rva and rvb need to be re-computed from sv1 and sv4 at each pass through the loop. Also, one could potentially replace Lines 11-14 with just 2 lines: (1) bic rva, sv4, #3 (2) add sv1, sv1, rva). The count of numbers remaining to be consed (in sv5) is decreased by 1 (as scheme int) on Line 16 and the code jumps back to repeat the loop on Line 17.

When the cons-loop is complete the function's result list is in sv2 and the code jumps to rvnuxt:. There, the result is moved to sv1 which is where it needs to be for proper return (Line 18). The function then returns by branching to its continuation (con). The "br" used to do this is a native instruction in Aarch64 and is defined as a macro (in aps_macros.s) for 32-bit systems. Also note that the name of the continuation regiser was changed from cnt, in previous versions of Armpit Scheme, to con in this version, because there is a cnt instruction in Aarch64 that conflicts with that former register renaming.

Last updated July 22, 2018

bioe-hubert-at-sourceforge.net