User Tools

Site Tools


en:pfw:string_handling

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:pfw:string_handling [2023-09-04 18:22] – gelöscht - Externe Bearbeitung (Unbekanntes Datum) 127.0.0.1en:pfw:string_handling [2024-04-17 15:59] (current) – [Generic Forth] willem
Line 1: Line 1:
 +{{pfw:banner.png}}
 +====== String handling ======
 +
 +===== The idea =====
 +
 +Character strings are mostly associated with dynamic memory and garbage collection. 
 +That is an overkill with the string handling that is used in most Forth programs.
 +In particular we can get by with buffers that are statically allocated using CREATE.
 +It is still useful to lift manipulating single characters to manipulating strings as a whole.
 +
 +(We define) A few words that make string manipulation in forth a little smoother.\\
 +Original idea Albert Nijhof & [[https://home.hccnet.nl/a.w.m.van.der.horst/index.html|Albert van der Horst]]. Examples are:
 +
 +  * Manipulate files
 +  * Start programs
 +  * Add, delete and use folders/directories
 +  * Etc.
 +
 +===== Construction of strings =====
 +
 +Strings in Forth are of the type address & length. The length is stored in front of the string. 
 +There are two views possible. The classic view is to store the length in a byte.
 +
 +The so called counted strings, as is shown in the picture:
 +
 +{{https://user-images.githubusercontent.com/11397265/142727480-4cb13037-c118-4d05-9eec-529aeaf23cad.jpg| string usage example}}
 +
 +
 + 
 +===== Pseudo code =====
 +
 +<code>
 +Function: $VARIABLE 
 +    reserve a buffer for the count-byte + 'maxlen' characters
 +    Alternatively" reserve a buffer for the count-cell + 'maxlen' characters
 +    Define: ( maxlen "name" -- )
 +          Save maxlen & buffer-address
 +    Action: ( -- s )
 +          Leave address of string variable
 +
 +Function: $@   ( s -- c )
 +  Read counted string from address
 +Function: $+!  ( c s -- )
 +  Extend counted string at address
 +Function: $!   ( c s -- )
 +  Store counted string at address
 +Function: $.   ( c -- )
 +  Print counted string
 +Function: $C+! ( char s -- )
 +  Add one character to counted string at address
 +</code>
 +
 +The original idea also contains : <code> $^ $? $/ $\ </code>
 +See the reference in the introduction.
 +
 +Two tools, idea Albert Nijhof:
 +
 +<code>
 +Function: -HEAD ( adr len i -- adr' len' ) cut first 'i' characters from string
 +Function: -TAIL ( adr len i -- adr len' )  cut last  'i' characters from string
 +</code>
 +However that flies in the face of the goals mentionned in the introduction.
 +We promised to get rid of characters, never count characters, only concern ourselves with strings.
 +
 +A better example in this context is:
 +<code>
 +Function: -TRAILING ( c -- c' ) remove trailing blanks space from string.
 +Function: -LEADING  ( c -- c' ) remove leading  blanks space from string.
 +</code>
 +===== Generic Forth =====
 +
 +The idea of strings is that a character string (s) is in fact a counted string <nowiki>(c)</nowiki>  that has been stored. s (c-addr) is the string, c (c-addr u) is constant string
 +
 +<code forth>
 +: $VARIABLE     \ Reserve space for a string buffer
 +    here  swap 1+ allot  align  \ Reserve RAM buffer
 +    create  ( here) ,       ( +n "name" -- )
 +    does>  @ ;              ( -- s )
 +
 +: C+!   ( n a -- )      >r  r@ c@ +  r> c! ;    \ Incr. byte with n at a
 +: $@    ( s -- c )      count ;                 \ Fetch string
 +: $+!   ( c s -- )      >r  tuck  r@ $@ +  swap cmove  r> c+! ; \ Extend string 
 +: $!    ( c s -- )      0 over c!  $+! ;        \ Store string
 +: $.    ( c -- )        type ;                  \ Print string
 +: $C+!  ( char s -- )   dup >r  $@ + c!  1 r> c+! ; \ Add char to string
 +</code>
 +
 +Here is a version where the count is stored in a cell, it is hardly different.
 +Note that it uses the non Generic Forth word ''%%@+%%'' you can find an implementation example in
 +the [[https://project-forth-works.github.io/well-known-words.txt|well known words]] list.
 +
 +<code forth>
 +: $VARIABLE     \ Reserve space for a string buffer
 +    here  swap CELL+ allot  align  \ Reserve RAM buffer
 +    create  ( here) ,       ( +n "name" -- )
 +    does>  @ ;              ( -- s )
 +
 +: $@    ( s -- c )      @+  ;                  \ Fetch string
 +: $+!   ( c s -- )      >r  tuck  r@ $@ +  swap cmove  r> +! ; \ Extend string 
 +: $!    ( c s -- )      0 over !  $+! ;        \ Store string
 +: $.    ( c -- )        type ;                 \ Print string
 +: $C+!  ( char s -- )   dup >r  $@ + c!  1 r> +! ; \ Add char to string
 +</code>
 +
 +===== Implementations =====
 +
 +Have a look at the sub directories for implementations for different systems.
 +
 +  * String word sets
 +    * [[https://github.com/project-forth-works/project-forth-works/blob/main/Algorithms/String%20handling/Primitive-string-word-set.f|Primitive string word set]], Simple string word set e.g. for file and OS interfacing
 +    * [[https://github.com/project-forth-works/project-forth-works/blob/main/Algorithms/String%20handling/Safe-string-word-set-pr.f|Safe primitive string word set]], Version with string overflow warning!
 +    * [[https://github.com/project-forth-works/project-forth-works/blob/main/Algorithms/String%20handling/Safe-string-word-set.f|Safe string word set v1]], Version with string limiting
 +    * [[https://github.com/project-forth-works/project-forth-works/blob/main/Algorithms/String%20handling/building-strings-an.f|Building strings]], A different approach, author Albert Nijhof
 +    * Etc.
 +
 +Note that Albert Nijhof's string version puts the address of the structure of the ''%%$VARIABLE%%'' on the stack. The original example puts the address of the string on the stack. Functionally they are equivalent.
 +
 +^      Name      ^    Alt-name    ^Function                       ^
 +|   ''%%S@%%''    ''%%GET$%%''  |Read string variable           |
 +|  ''%%$+!%%''    ''%%ADD$%%''  |Add string to string variable  |
 +|   ''%%$!%%''    ''%%SET$%%''  |Store string in string variable|
 +|   ''%%$.%%''    ''%%TYPE%%''  |Type string                    |
 +|  ''%%@C+!%%''  |  ''%%INC$%%''  |Add char to string variable    |
 +
 +===== String tools =====
 +
 +Two string tools as implemented by Albert Nijhof.\\
 +- ''%%-HEAD%%'' cuts the first 'i' characters from the given string.\\
 +- ''%%-TAIL%%'' cuts the last 'i' characters from the given string.
 +
 +<code forth>
 +\ Extra: cut i characters from a string, with underflow protection
 +: -TAIL ( adr len i -- adr len' )   0 max  over min - ;
 +: -HEAD ( adr len i -- adr' len' )  0 max  over min  tuck - >r + r> ;
 +\ -HEAD and -TAIL do not store anything.
 +</code>