Table of Contents

Marsaglia's XORshift routine on the ARM processor

Raspberry Pi 3b+ with wabiForth

The Forth version of the randomisation routines is the same on any processor as only standard Forth words are used. But the ARM-processor can do do a neat trick: it can do 1 cycle (dup, shift and xor) in 1 opcode!! And as most Forths include an assembler it is an interesting exercise to see how much faster the routine is when coded in assembly. This example is coded using wabiForth on a Raspberry 3b+, but the principle is the same for any ARMv8 Aarch32 processor.

The routine uses three registers named top, v and w. Top contains the top of the stack, v and w are scratch registers.

XORshift in ARM Aarch32 assembly

variable seed
2345 seed !

code: ASMRANDOM ( address_seed -- rndm_val )
  [ w, top, ldr,       \ get value in seed in w
  
  w, w, w, 13 lsl#, eor,
  w, w, w, 17 lsr#, eor,
  w, w, w,  5 lsl#, eor,
  
  v, v, w, eor,	       \ xor old seed value with generated random number
  v, top, str,         \ save xor'd value in seed
  top, w, mov,
  
  ] ; 7 inlinable

Comparison of Forth vs assembly

Tested with wabiForth on Raspberry 3b+ @ 1.5 GHz
Here some simple benchmarks which compare the 1 and 2 seed versions coded in Forth and the 1 seed version in assembly. Just to get an idea about execution-speeds.

    ---------------------------
    1 seed 32bit Forth:     40c
    2 seed 32bit Forth:     60c
    1 seed 32bit assembly:  13c
    ---------------------------

Time measured is the number of CPU-cycles required to put a random number on the stack with a given method. The routine in assembly is 3 times as fast as the corresponding routine in Forth. Which is a decent speed-up of the routine.