All data in a computer, including files, is a sequence of numbers. But almost no program shows data in such a raw format, except for hex editors, which can make this concept pretty confusing and abstract. (This actually was one of the motivations for me to write such a program: to understand data and representations better.)
Data encoding, decoding and representation is a big topic, but for many applications of hex editors a few concepts are enough. We’ll start with a brief answer to this question: How do I make sense of (hexadecimal) numbers in a hex editor?
These or similar formulations seem to be popular variations of the above question:
How do you translate hex to English?
Can I change the text to English (or another language)?
What are those ‘random’ numbers on the left in the hex editor?
How do you know if the text representation on the right in a hex editor is valid?
The general answer
The essential information a hex editor shows is divided into two columns, which both show the same data but represent it in different ways: the left column in a hex editor shows the raw numbers a file is made of, and next to it, the right column shows a tentative textual representation. It would be helpful if the textual representation could be so smart to interpret any data into something meaningful, but in truth it is just an attempt to show data as text assuming a common encoding.
In other words: there is no general translation program for numbers to meaningful representations, such as text. Hexadecimal numbers first and foremost are just numbers. You have to know the data format to translate it into something that makes more sense to humans.
Sometimes they directly translate to intelligible text because the text is encoded in a typical way. That’s what the right text column in a hex editor is for. When you can’t read anything there that makes sense, it means either this is no text or the encoding is untypical.
Even if you know the file format it doesn’t mean there is a text representation: imagine an image file like a JPG, this can never become English text, since it is a series of pixels, you will either see a picture rendering, or a sequence of color values.
Another reason: files are often a combination of text and other data types like images, sound, or program code. So you have to know the position in the file where this text is stored and go to it. The rest of the file however will make little sense when viewed as text.
Since you don’t always know where the text is located in a file, you can try the text search function. Some hex editors also scan the entire file for text passages and list all matches in a result window. Another option is to use a structure editor or a data inspector.
In the following, we will have a quick look at the data inspector of HxD 2.0, to interpret data in a meaningful way.
An example of translating data
Below, in the screenshot of HxD, we have loaded an executable file that adheres to the PE file format (the executable file format of Windows).
You will notice that there is intelligible text: “This program must be run under Win32”. This is the message that you see when you try to run this executable under DOS (yes, Windows executables still start with a DOS program).
Closer to the bottom of the window, you see the characters “PE” that denote the beginning of the actual Windows executable. The caret (dark vertical line) is at the position where the date and time of file creation is stored. Depending on the executable the position of the date/time stamp varies, since the preceding DOS executable can vary in size. I won’t go into details on how to compute this position, suffice to say that it always starts 8 bytes after the start of the actual PE file. (For details, refer to the PE file format specification.)
At the caret you see the hexadecimal numbers “EF 7B 83 54” that are supposed to be the PE file creation date and time stamp. We still have to know how that information is encoded before we can turn those numbers into a meaningful representation of date/time. If you know the expected date or time, you can just try out one of the formats listed in the data inspector (DOS time/date, time64_t, FILETIME, OLETIME, and time_t) and see if the interpretation it returns matches your expectations.
Alternatively, you can check the PE file format specification and find out the date/time datatype is time_t. Now we check the translation the data inspector returns for “EF 7B 83 54” in the row time_t, which is the date 2014-12-06 and the time 21:58:07.
As you can see, there are many other rows that show possible interpretations, some of them are not valid encodings for the corresponding datatypes (see time64_t). But without knowing anything about the file format or file structure, the data inspector can only guess and try its best to make sense of the data at the current position.
There is no general solution to interpret “random” data. You have to know the data format to make sense of it. Identifying a data format can be done using so called magic numbers in files, file extensions such as .jpg or .wav, statistical informations or patterns visible to the trained eye.
If you do not know the data format, you have to use your knowledge of common data formats and common data types, and try what interpretations make sense. To ease these kinds of tasks is a longterm goal of HxD.
Good resources to gain insight on some data formats are sites like Wotsit.org, modding forums (for games), or programmer forums in general.