How to understand raw data in a hex editor?

All data in a computer, including files, is a sequence of numbers. But almost no program shows data in such a raw format, except for hex editors, which can make this concept pretty confusing and abstract. (This actually was one of the motivations for me to write such a program: to understand data and representations better.)

Data encoding, decoding and representation is a big topic, but for many applications of hex editors a few concepts are enough. We’ll start with a brief answer to this question: How do I make sense of (hexadecimal) numbers in a hex editor?

These or similar formulations seem to be popular variations of the above question:

How do you translate hex to English?

Can I change the text to English (or another language)?

What are those ‘random’ numbers on the left in the hex editor?

How do you know if the text representation on the right in a hex editor is valid?

The general answer

The essential information a hex editor shows is divided into two columns, which both show the same data but represent it in different ways: the left column in a hex editor shows the raw numbers a file is made of, and next to it, the right column shows a tentative textual representation. It would be helpful if the textual representation could be so smart to interpret any data into something meaningful, but in truth it is just an attempt to show data as text assuming a common encoding.

In other words: there is no general translation program for numbers to meaningful representations, such as text. Hexadecimal numbers first and foremost are just numbers. You have to know the data format to translate it into something that makes more sense to humans.

Sometimes they directly translate to intelligible text because the text is encoded in a typical way. That’s what the right text column in a hex editor is for. When you can’t read anything there that makes sense, it means either this is no text or the encoding is untypical.

Even if you know the file format it doesn’t mean there is a text representation: imagine an image file like a JPG, this can never become English text, since it is a series of pixels, you will either see a picture rendering, or a sequence of color values.

Another reason: files are often a combination of text and other data types like images, sound, or program code. So you have to know the position in the file where this text is stored and go to it. The rest of the file however will make little sense when viewed as text.

Since you don’t always know where the text is located in a file, you can try the text search function. Some hex editors also scan the entire file for text passages and list all matches in a result window. Another option is to use a structure editor or a data inspector.

In the following, we will have a quick look at the data inspector of HxD 2.0, to interpret data in a meaningful way.

An example of translating data

Below, in the screenshot of HxD, we have loaded an executable file that adheres to the PE file format (the executable file format of Windows).

You will notice that there is intelligible text: “This program must be run under Win32”. This is the message that you see when you try to run this executable under DOS (yes, Windows executables still start with a DOS program).

HxD-Translate-Hex-To-Meaning

Closer to the bottom of the window, you see the characters “PE” that denote the beginning of the actual Windows executable. The caret (dark vertical line) is at the position where the date and time of file creation is stored. Depending on the executable the position of the date/time stamp varies, since the preceding DOS executable can vary in size. I won’t go into details on how to compute this position, suffice to say that it always starts 8 bytes after the start of the actual PE file. (For details, refer to the PE file format specification.)

At the caret you see the hexadecimal numbers “EF 7B 83 54” that are supposed to be the PE file creation date and time stamp. We still have to know how that information is encoded before we can turn those numbers into a meaningful representation of date/time. If you know the expected date or time, you can just try out one of the formats listed in the data inspector (DOS time/date, time64_t, FILETIME, OLETIME, and time_t) and see if the interpretation it returns matches your expectations.

Alternatively, you can check the PE file format specification and find out the date/time datatype is time_t. Now we check the translation the data inspector returns for “EF 7B 83 54” in the row time_t, which is the date 2014-12-06 and the time 21:58:07.

As you can see, there are many other rows that show possible interpretations, some of them are not valid encodings for the corresponding datatypes (see time64_t). But without knowing anything about the file format or file structure, the data inspector can only guess and try its best to make sense of the data at the current position.

Conclusion

There is no general solution to interpret “random” data. You have to know the data format to make sense of it. Identifying a data format can be done using so called magic numbers in files, file extensions such as .jpg or .wav, statistical informations or patterns visible to the trained eye.

If you do not know the data format, you have to use your knowledge of common data formats and common data types, and try what interpretations make sense. To ease these kinds of tasks is a longterm goal of HxD.

Good resources to gain insight on some data formats are sites like Wotsit.org, modding forums (for games), or programmer forums in general.

10 responses on “How to understand raw data in a hex editor?

  1. Stefan

    Interesting Article. I opened a file and managed to convert the 4 Bytes to an Integer with windows calculator (note: little endian) and convert the Integer to an Unix Date (via an Online Tool). Works.

  2. pandy

    “Another option is to use a structure editor or a data inspector. This is not available currently in HxD, but planned.”
    – no beta or release date? 🙂 It looks exciting.

    1. mael Post author

      There is an alpha version of HxD (in German). [Removed link, see below.]

      [Edit: 2016/06/01] Currently Beta 2.0 is in testing and includes an English version. See the the main website to know when 2.0 final is announced or contact me by mail if you want to be part of beta testing.

      1. mael Post author

        Currently Beta 2.0 is in testing and includes an English version. See the the main website to know when 2.0 final is announced or contact me by mail if you want to be part of beta testing.

  3. Frank Rago

    Hello, may you tell me how can I know the data size using hex editor? I mean, how to know if the drive is totally empty or just with a little data saved? is there a way to know how many data is saved using just hex editor? Thank you

    1. mael Post author

      That would depend on the file system on the disk. If you know the format of the file system it is certainly possible.
      To simplify your task, open a logical disk (instead of a physical disk), then you don’t need to deal with partition tables and can focus on understanding just the file system.
      Of course some template editor, structure viewer or similar would come in handy. But it is possible to solve manually as well.

      For example, if your logical disk is formatted with FAT32, you can use the following description of its format to help you navigate in the disk and decode the data:
      https://www.pjrc.com/tech/8051/ide/fat32.html

      For FAT32 file systems, the FAT (file allocation table) lists all clusters on the logical disk. The clusters that are free have a 0 entry in this table. So to compute the free space you have to count the 0 entries in this table and multiply that sum with SizeOfCluster. SizeOfCluster = BytesPerSector * SectorsPerCluster. In summary: FreeSpace (in bytes) = CountOfClustersMarkedAsZeroInFAT * BytesPerSector * SectorsPerCluster

      Doing this by hand is a little cumbersome — and also a reason why FAT is not well suited for big disks, since even a computer takes some time completing this task.
      FAT32 introduced the FS Information Sector to speed up free space querying, by caching the free space value in this sector:
      https://en.wikipedia.org/wiki/Design_of_the_FAT_file_system#FS_Information_Sector

Leave a Reply

Your email address will not be published. Required fields are marked *