Table of Contents
Marsaglia's XORshift routine on the ARM processor
Raspberry Pi 3b+ with wabiForth
The Forth version of the randomisation routines is the same on any processor as only standard Forth words are used. But the ARM-processor can do do a neat trick: it can do 1 cycle (dup, shift and xor) in 1 opcode!! And as most Forths include an assembler it is an interesting exercise to see how much faster the routine is when coded in assembly. This example is coded using wabiForth on a Raspberry 3b+, but the principle is the same for any ARMv8 Aarch32 processor.
The routine uses three registers named top, v and w. Top contains the top of the stack, v and w are scratch registers.
XORshift in ARM Aarch32 assembly
variable seed 2345 seed ! code: ASMRANDOM ( address_seed -- rndm_val ) [ w, top, ldr, \ get value in seed in w w, w, w, 13 lsl#, eor, w, w, w, 17 lsr#, eor, w, w, w, 5 lsl#, eor, v, v, w, eor, \ xor old seed value with generated random number v, top, str, \ save xor'd value in seed top, w, mov, ] ; 7 inlinable
Comparison of Forth vs assembly
Tested with wabiForth on Raspberry 3b+ @ 1.5 GHz
Here some simple benchmarks which compare the 1 and 2 seed
versions coded in Forth and the 1 seed version in assembly. Just
to get an idea about execution-speeds.
--------------------------- 1 seed 32bit Forth: 40c 2 seed 32bit Forth: 60c 1 seed 32bit assembly: 13c ---------------------------
Time measured is the number of CPU-cycles required to put a
random number on the stack with a given method. The routine in assembly
is 3 times as fast as the corresponding routine in Forth. Which is a
decent speed-up of the routine.