Efficiently building strings #765
-
Hi, this is a newbie question. I wonder how to efficiently build strings in Koka. In particular: What is the cost of successivly concatenating a small string to another string that grows. I.e. would the following have quadratic runtime? fun main()
var acc : string := ""
for (1,100) fn(_) acc := acc ++ "X"
println(acc) Am I supposed to build a I tried to write a small function (as an exercise) that converts a This is my (rather unwieldy) code: pub fun csv( entries : list<a>, ?show : a -> string ) : string
fun build ( acc : ctx<list<string>>, l : list<_> ) : ctx<list<string>>
match l
Cons(x, xx) -> (acc ++ ctx Cons(",",Cons(x.show,hole))).build(xx)
Nil -> acc
match entries
Nil -> ""
Cons(x, xx) -> (build(ctx Cons(x.show,hole), xx) ++. Nil).join
pub fun main()
println([1,2,3].csv) Some questions:
|
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 17 replies
-
I found this post on the web, where Tim Cuthbertson writes regarding Koka:
I wonder if there is some way to work around this, maybe with an explicit I noticed I can write: pub fun csv<element>( entries : list<element>, ?show : element -> string ) : string but this also doesn't bind fun build( acc : ctx<list<string>>, l : list<element> ) : ctx<list<string>> and gives me:
I assume this is just not possible? Or is there a way? |
Beta Was this translation helpful? Give feedback.
-
To answer my questions, I did some benchmarks: pub fun bm0()
var buf := ""
for(1,10000) fn(_)
buf := buf ++ "PARIS " // concatenating plain strings
buf.keep.ignore
pub fun bm1()
var buf := empty1
for(1,10000) fn(_)
buf := buf.append1("PARIS ") // using `strbuf1` based on `ctx<list<char>>`
buf.finalize1.keep.ignore
pub fun bm2()
var buf := empty2
for(1,10000) fn(_)
buf := buf.append2("PARIS ") // using `strbuf2` based on `ctx<list<string>>`
buf.finalize2.keep.ignore Without really using any sophisticated benchmarking framework, I did three measurements on my machine and calculated the median.
This looked like I should just work with
At least I tried adding the alias strbuf2 = ctx<list<string>>
pub fip fun append2( buf : strbuf2, s : string ) : strbuf2
buf ++ ctx Cons(s,hole) And I got:
I guess that's because a new Finally, I was able to declare pub fip fun list-char/append1( buf : strbuf1, cs : list<char> ) : strbuf1
match cs
Nil -> buf
Cons(c,cc) -> (buf ++ ctx Cons(c,hole)).append1(cc) But this reduced execution time from 27.78s to 26.85s only (if that's even a significant change). |
Beta Was this translation helpful? Give feedback.
-
I found a solution that works pretty fast: alias strbuf3 = list<string>
val empty3 : strbuf3 = Nil
pub fun append3( buf : strbuf3, s : string ) : strbuf3
Cons(s,buf)
pub fun finalize3( buf : strbuf3 ) : string
buf.reverse-join But what I don't understand is why this is so much faster than the first-class constructor contexts used in What am I missing (or misunderstanding)? |
Beta Was this translation helpful? Give feedback.
-
There is a lot to unpack here, I'm going to also mention PR #607, which delays the |
Beta Was this translation helpful? Give feedback.
-
Thanks @TimWhiting for your feedback, information, and referring to the open issues. I'll answer in a single post here.
Okay, then it's an issue of Koka. I will be looking out for it and want to practice functional style programming anyway for now. 😉 But as you can see further below, even in functional context, constructor contexts seem to impose some penalty as of now.
I think that depends on usage. For example plain string concatenation using I added a alias strbuf0 = string
val empty0 : strbuf0 = ""
pub fun append0( buf : strbuf0, s : string ) : strbuf0
buf ++ s
pub fun finalize0( buf : strbuf0 ) : string
buf And put together a program to benchmark all variants, while commenting out the combinations that trigger the issues with local variables and thus are catastrophically slow: Codeimport std/time
// Working directly with strings by appending:
alias strbuf0 = string
val empty0 : strbuf0 = ""
pub fun append0( buf : strbuf0, s : string ) : strbuf0
buf ++ s
pub fun finalize0( buf : strbuf0 ) : string
buf
// List of chars, using constructor contexts:
alias strbuf1 = ctx<list<char>>
val empty1 : strbuf1 = ctx hole
pub fun char/append1( buf : strbuf1, c : char ) : strbuf1
buf ++ ctx Cons(c,hole)
pub fun list-char/append1( buf : strbuf1, cs : list<char> ) : strbuf1
match cs
Nil -> buf
Cons(c,cc) -> buf.append1(c).append1(cc)
pub fun string/append1( buf : strbuf1, s : string ) : strbuf1
buf.append1(list(s))
pub fun finalize1( buf : strbuf1 ) : string
(buf ++. Nil).string
// List of strings, using constructor contexts
alias strbuf2 = ctx<list<string>>
val empty2 : strbuf2 = ctx hole
pub fun append2( buf : strbuf2, s : string ) : strbuf2
buf ++ ctx Cons(s,hole)
pub fun finalize2( buf : strbuf2 ) : string
(buf ++. Nil).join
// List of strings, using final reversing:
alias strbuf3 = list<string>
val empty3 : strbuf3 = Nil
pub fun append3( buf : strbuf3, s : string ) : strbuf3
Cons(s,buf)
pub fun finalize3( buf : strbuf3 ) : string
buf.reverse-join
// Using local variables:
pub fun bm-localvar(count, empty, append, finalize)
var buf := empty
for(1,count) fn(_)
buf := buf.append("PARIS ")
buf.finalize.keep.ignore
pub fun bm-localvar0(count) bm-localvar(count, empty0, append0, finalize0)
pub fun bm-localvar1(count) bm-localvar(count, empty1, append1, finalize1)
pub fun bm-localvar2(count) bm-localvar(count, empty2, append2, finalize2)
pub fun bm-localvar3(count) bm-localvar(count, empty3, append3, finalize3)
// Without local variables, but functional:
pub fun bm-functional(count, empty, append, finalize)
fun func(remaining, buf)
if ( remaining <= 0 ) then return buf
func(remaining - 1, buf.append("PARIS "))
func(count, empty).finalize.keep.ignore
pub fun bm-functional0(count) bm-functional(count, empty0, append0, finalize0)
pub fun bm-functional1(count) bm-functional(count, empty1, append1, finalize1)
pub fun bm-functional2(count) bm-functional(count, empty2, append2, finalize2)
pub fun bm-functional3(count) bm-functional(count, empty3, append3, finalize3)
// Execute single benchmark multiple times:
pub fun benchmark(bm-func)
val start = now-in()
bm-func()
val end = now-in()
end - start
// Execute benchmarks:
pub fun benchmarks( count : int )
println("functional0: " ++ show(benchmark{bm-functional0(count)}))
println("functional1: " ++ show(benchmark{bm-functional1(count)}))
println("functional2: " ++ show(benchmark{bm-functional2(count)}))
println("functional3: " ++ show(benchmark{bm-functional3(count)}))
println("localvar0: " ++ show(benchmark{bm-localvar0(count)}))
//println("localvar1: " ++ show(benchmark{bm-localvar1(count)}))
//println("localvar2: " ++ show(benchmark{bm-localvar2(count)}))
println("localvar3: " ++ show(benchmark{bm-localvar3(count)}))
pub fun main()
benchmarks(100000)
println("Double:")
benchmarks(200000) These are the results on my machine (three runs, manually taking the median, each):
As you can see, Whether that's possible or not may depend on the internal representation of Also interesting is that even in the functional case, there seem to be some slowdown when using constructor contexts (compare
After having loved Unicode in the beginning, I'm meanwhile more fond of treating strings as "opaque" sequences of (8 bit) bytes (like done in Lua or done in Zig). That is because the notion of a Unicode codepoint isn't much useful anyway, considering the complex rules of Unicode text segmentation, specifically regarding grapheme cluster boundaries. (Also consider Rust's weird |
Beta Was this translation helpful? Give feedback.
-
For anyone who has the same problem and just wants a quick solution: This is what I now do "from scratch" when needed: pub fun main() : string
var strbuf := Nil
fun put( s : string ) strbuf := Cons( s, strbuf )
put("Hello ")
put("World!")
// and so on
strbuf.reverse-join() Of course one could define a type alias and a free standing function, but the above code snippet seems to be reasonable in most cases. |
Beta Was this translation helpful? Give feedback.
There is not something currently in the standard library that acts like a string buffer, but I think the regular string functions are pretty performant for most use cases currently. I believe they try to realloc before mallocing new memory for most operations, which saves time for copying and allocation.
join
for example iterates over the list, calculating the total size before allocating space for the new string.As a general rule working with
string
will be much faster than converting between strings andlist<char>
, at least when you are working with large strings.With string building I would expect that
list<string>
would be fastest, with a finaljoin()
.Yes, the
list<char>
will be ta…