A Scheme Interpreter for ARM Microcontrollers:
Performance of Version 060

Performance Details:

A minimal amount of performance assessment has been performed on Armpit Scheme using test code from the Gambit-C Scheme Benchmarks (available here) discussed (among others) by Pierard and Feeley (2007) in the context of Mobit (a Portable and Mobile Scheme Interpreter).

Three functions were used in the present assessment to estimate Armpit Scheme's performance in relation to other interpreters and compilers: tak.scm, ctak.scm and mazefun.scm. The first test computes the Takeuchi function recursively: (tak 18 12 6), the second test computes the same function in continuation-passing style: (ctak 18 12 6), and the third test generates a 11x11 cell maze in a purely functional way: (make-maze 11 11). An additional test was performed on a few MCUs (Cortex-M4 and A8), most with Floating Point Unit (FPU), using the fft benchmark code that computes the Fourier transform of a vector of 1024 zeros.

On Armpit Scheme, all tests were started with a clear heap (system reset). For the tak test, 2 evaluations were performed: 1) gtak - an interpreted version with generic arithmetic, 2) rtak - an interpreted version with fixnum-specific arithmetic (fx-, fx $>$ =?), The code for gtak is:


  (define gtak
    (lambda (x y z)
      (if (>= y x)
	  z
  	  (gtak (gtak (- x 1) y z)
	        (gtak (- y 1) z x)
	        (gtak (- z 1) x y)))))

Reference results were obtained on Sparc IIe (500MHz, circa 2000) and Intel Core i3 (2.3GHz, 2012) computers. The reference Scheme implementations used were Guile 1.7.1 to 2.05, Scheme48 1.9 and Gambit Scheme 4.6.6 in interactive (gsc) and compiled mode. To obtain a common base for comparison, results obtained for multiple benchmark iterations and at speeds other than 60 MHz were converted to the time, t1/60, that it would take to perform 1 iteration of the benchmark if the CPU clock was 60MHz and the system performance scaled linearly:

     t1/60 = (time-for-n-iterations / n) * (CPU-clock-speed / 60MHz).

The results (t1/60) for the reference implementations were as follows (all values in seconds):

               -------------------  -------------------------------------------------------
               Sparc IIe SunOS5.11         Intel Core i3, Fedora Linux 3.4.4, 64-bit
                      (2000)                              (2012)
               -------------------  -------------------------------------------------------
                  Guile     Guile     Guile     Guile    scheme48    gsc 4.6.6    gsc 4.6.6
                  1.7.1     1.8.8     1.8.8     2.0.5       1.9     interactive    compiled
    benchmark    (2004)    (2006)    (2006)    (2012)     (2013)       (2012)       (2012)
    ---------  ---------  --------  --------  --------  ----------  -----------  ----------
      gtak         5.         4.5       1.8      0.15       0.43        0.88        0.019  
    mazefun        6.         5.8       1.7      0.27       0.96        1.3         0.031   
      ctak        36.       325.      130.      18.         3.3         2.6         0.34   
      fft                                                               0.46
    ---------  ---------  --------  --------  --------  ----------  -----------  ----------
    maze/gtak      1.2        1.3       0.94     1.8        2.2         1.5         1.6
    ctak/gtak      7.2       72.       72.     120.         7.7         3.0        18. 
    ---------  ---------  --------  --------  --------  ----------  -----------  ----------

Guile performance is observed to have improved considerably from 2004 to 2012 and running it on a contemporary processor (Core i3) is also quite beneficial. Guile 2.0.5 is faster than scheme48 1.9 and Gambit Scheme 4.6.6 interactive on gtak and mazefun, but suffers a bit on the continuation-intensive ctak. Scheme 48 is slightly faster than Gambit Scheme (interactive) on gtak and mazefun but slightly slower on ctak. The use of Gambit Scheme in a compiled framework (rightmost column) results in the fastest run times. The two bottom rows of the table show the ratios of t1/60 for mazefun and ctak to that obtained for gtak. The maze/gtak ratio is quite consistent across implementations but ctak/gtak varies substantially suggesting different implementation decisions on which type of code to more fully optimize. The Gambit Scheme system used interactively (gsc) has the lowest ctak/gtak ratio and is selected here as a reference for evaluation of Armpit Scheme's performance.

The t1/60 results obtained with Armpit Scheme 060 were (sorted by speed on the gtak test):

    -------------------------------------------------------
                armpit t1/60 ordered by gtak
    ------------------------------------------------------- ---------
        MCU     typ   mem     gtak  mazefun   ctak    rtak  ctak/gtak   fft
    ----------  ---  -----  ------- ------- ------- ------- --------- ------
    LPC4330_Xp  M4F  128KB    1.4     3.4     3.1     1.1      2.2     0.82
    LM3S1968     M3   64KB    1.5     5.5     3.5     1.2      2.3
    Beagle-XM    A8  512MB    1.8     2.5     3.7     1.3      2.0     0.88
    TCT-Hammer  920   32MB    1.8     2.7     3.7     1.3      2.0
    STM32F4_Di  M4F  112KB    2.1     4.3     4.5     1.5      2.1     1.1
    TINY-2106     7   64KB    2.1     6.6     4.6     1.6      2.2
    LM4F210     M4F   32KB    2.3     xxx     6.5     1.9      2.8     2.5
    SAM4S_Xp     M4  128KB    2.4     5.8     5.0     2.0      2.1     1.6
    STM32-LCD    M3   64KB    2.5     9.1     5.8     2.0      2.3
    LPC-H2214     7    1MB    2.5     4.2     5.2     2.1      2.1
    ------------------------------------------------------- --------- ------

These times are better than those for version 050 except for mazefun on 3 MCUs: LM3S1968, TINY-2106 and STM32-LCD, where times are from 2% to 10% higher. For these 3 MCUs, RAM (64 KB) became a limiting factor in running mazefun with the fast_lambda_lkp option enabled. Disabling this option led to times that were uniformly better than in version 050 for these MCUs, but the gtak and ctak times were no longer as good as reported above.

The performance of Armpit Scheme relative to Gambit Scheme (interactive mode) is depicted in the table below where entries represent the ratio of Gambit Scheme t1/60 to Armpit Scheme t1/60:

    -----------------------------------------------
                gsc interactive over armpit t1/60
    ----------------------------------------------- -------
        MCU     typ   mem     gtak  mazefun   ctak    avg    fft
    ----------  ---  -----  ------- ------- ------- ------- -----
    LPC4330_Xp  M4F  128KB    0.6     0.4     0.8     0.6    0.6
    Beagle-XM    A8  512MB    0.5     0.5     0.7     0.6    0.5
    TCT-Hammer  920   32MB    0.5     0.5     0.7     0.6
    LM3S1968     M3   64KB    0.6     0.2     0.7     0.5
    ----------  ---  -----  ------- ------- ------- ------- -----
    STM32F4_Di  M4F  112KB    0.4     0.3     0.6     0.4    0.4
    TINY-2106     7   64KB    0.4     0.2     0.6     0.4
    LPC-H2214     7    1MB    0.4     0.3     0.5     0.4
    SAM4S_Xp     M4  128KB    0.4     0.2     0.5     0.4    0.3
    STM32-LCD    M3   64KB    0.4     0.1     0.4     0.3
    LM4F210     M4F   32KB    0.4     xxx     0.4     xxx    0.2
    ----------------------------------------------- ------- -----

Higher numbers in this table indicate a higher performance of Armpit Scheme. The results are sorted by the avg (average) column and indicate that, for the three test programs and the fft benchmark, Armpit Scheme can have 40% to 60% of the performance of Gambit Scheme (in interactive mode). This is an encouraging result given the architectural differences (eg. cache size) between the Core i3 on which the reference Scheme was run and the ARM cores.

Last updated February 3, 2013

bioe-hubert-at-sourceforge.net

A Scheme Interpreter for ARM Microcontrollers:Performance of Version 060

Performance Details:

Last updated February 3, 2013

A Scheme Interpreter for ARM Microcontrollers:
Performance of Version 060