While adding support for editing and viewing text encoded in UTF-8 to HxD’s hex editor control itself, it turns out I have to query Unicode property tables, that go beyond the basic ones included with Delphi (and most other languages / default libraries).
Parsing the structured text files, provided by the Unicode consortium, at each startup is too inefficient, and merely storing the parsed text into a simple integer array wastes too much memory.
A more efficient storage uses a dictionary-like approach, to compress the needed data using a few layers of indirections, while still giving array-like performance with constant (and negligible) overhead.
In the following, I’ll briefly present the solution I found.
All data in a computer, including files, is a sequence of numbers. But almost no program shows data in such a raw format, except for hex editors, which can make this concept pretty confusing and abstract. (This actually was one of the motivations for me to write such a program: to understand data and representations better.)
Data encoding, decoding and representation is a big topic, but for many applications of hex editors a few concepts are enough. We’ll start with a brief answer to this question: how do I make sense of (hexadecimal) numbers in a hex editor?
These or similar formulations seem to be popular variations of the above question:
How do you translate hex to English?
Can I change the text to English (or another language)?
What are those ‘random’ numbers on the left in the hex editor?
How do you know if the text representation on the right in a hex editor is valid?