Thanks both. I think this is a gap, but I’m leaning away from adding a Char type for now.
The awkward part is that “character” is not one clear thing. It could mean a UTF-16 code unit, a Unicode code point, a Unicode scalar value, or a user-perceived grapheme. I’d rather not bake that ambiguity into the language before v1.
Instead, I’m considering adding a text-oriented stream API that sits above byte-level BinaryStream.
Proposed shape:
Class TextStream
Constructor(file As FileSystemItem)
Constructor(file As FileSystemItem, encoding As TextEncoding)
Property Encoding As TextEncoding (readonly)
Property EndOfFile As Boolean (readonly)
Function ReadCodePoint() As Integer
Function PeekCodePoint() As Integer
Function ReadCharacter() As String
Function PeekCharacter() As String
Sub WriteCodePoint(codePoint As Integer)
Sub WriteCharacter(value As String)
Function ReadLine() As String
Function ReadAll() As String
Sub Write(value As String)
Sub WriteLine(value As String)
Sub Close()
End Class
Default encoding would be TextEncoding.UTF8.
ReadCodePoint() would return the next Unicode code point as an Integer, matching the existing Chr(), Asc(), and String.FromCodePoints() APIs. ReadCharacter() would be a convenience wrapper returning Chr(ReadCodePoint()).
For parsers and scanners:
Var stream As New TextStream(file, TextEncoding.UTF8)
While Not stream.EndOfFile
Var cp As Integer = stream.ReadCodePoint()
Select Case cp
Case Asc("""")
inQuotes = Not inQuotes
Case Asc(",")
If Not inQuotes Then
# End of field
End If
Case 10, 13
If Not inQuotes Then
# End of record
End If
End Select
End While
I’m proposing PeekCodePoint() rather than UngetCodePoint() initially because it covers common one-character lookahead without introducing a pushback buffer. If people feel strongly that scanner-style pushback is needed, maybe this should include one of:
Sub PushBackCodePoint(codePoint As Integer)
or:
Sub UnreadCodePoint()
Questions I’d like opinions on before implementing:
- Is
TextStream the right name, or would TextReader / TextWriter be clearer?
- Should this be a new class, or should these methods live directly on
BinaryStream?
- Is
PeekCodePoint() enough, or do scanner authors really want pushback?
- Should
ReadCodePoint() return -1 at EOF, or should users rely on EndOfFile and let reading past EOF throw an IOException like BinaryStream?
- Are
ReadCharacter() / WriteCharacter() useful, or is code point I/O enough?
My current preference is a separate TextStream class with code point methods, EOF checked via EndOfFile, and no Char type for the initial release.