Problem
Converting a string to code points involves a lot of transient string allocation. Given _codePoints as an empty Array(Of Integer):
For i As Integer = 0 To someString.Length - 1
_codePoints.Append(Asc(someString.CharacterAt(i)))
Next
We have String.FromCodePoints() but need to cover the other direction and have the VM return an Array(Of Integer) from a string.
Proposed Solution
Either or both of:
Shared ToCodePoints(s As String) As Array(Of Integer)
Public ToCodePoints() As Array(Of Integer) # operates on the current instance
Example Workflow
Rather than the above code:
_codePoints = someString.ToCodePoints()
Alternatives Considered
Status quo
Who Would This Help?
Moving between strings and code points ("chars") and back as performantly as possible will help anyone who needs to do that.
I have two use cases to begin with:
- A class called MutableString which puts a string in the code point domain and can fill the office of a StringBuilder but also do much more -- various string operations such as trimming, upper / lower / title casing, supports method chaining.
- A class called TextReader that can read text files either as lines or when needed as code points to accommodate fields with embedded line endings -- it will probably at times leverage MutableString as well.
Most of the string allocation overhead is moving between the string and code point / "char" domains and this proposal will minimize the last main source of that. It should be possible for the VM to just interate a string indexer and cast each char value to an Int64, so the only heap allocation would be the returned Array(Of Integer)
TBH when .NET interop is available I will likely just call the existing C# version of TextReader but it's just an interesting problem for me to see how close I can get using Objo in terms of performance and feature-wise and it gives me an excuse to learn Objo and give it a good shake down.