Principles of Computer System (14) Data format, access information and operand indicator


The content of this article can actually become the basis of assembly language, because most of the time assembly language is operating something that we don’t usually develop, so the purpose of this article is to figure out what assembly language is operating. Or to be more precise, what kind of object the various assembly instructions are operating.

Assembly-level objects

In the usual development process, the state of the CPU processor is hidden from developers, and we cannot see the state of each object in the CPU. But in assembly language, we can clearly see the status of these objects, and the CPU mainly contains the following objects.

Program counter (PC): Record the address of the next instruction.

Integer register files: 8 in total, which can store some addresses or integer data.

Condition register: save the status information of arithmetic or logic instructions, which can realize the flow control of the program.

Floating point registers: store floating point numbers.

It can be seen that these are all objects in the CPU processor. In the previous chapter, we wrote a simple C program. I believe that if you did not read the assembly code, you would not be able to see these objects in the CPU during the running of the program. What kind of operations are they doing, and what kind of content are being stored.

Data format

In the last chapter, almost all assembly instructions have a letter l after them, such as movl, addl, subl, pushl, etc. The suffix of this l is actually the data format, which means that we are operating 32-bit values.

As the computer expands from 16-bit to 32-bit, so as to the current 64-bit, the data format has been changing. But history will always affect the future direction to some extent, so we are used to calling 16 bits a “word”, while 32 bits are a “double word”, and correspondingly, 64 bits are a “four word”.

It should be mentioned that long long int does not support this data format in the IA32 architecture, so its suffix is ​​not listed. In addition, long double is an extended type, usually represented by 12 bytes.

Registers are very important objects in the CPU. Under normal circumstances, many temporary variables are stored here, just like the temporary variable t in the previous chapter. After optimization, t will no longer enter the main memory, but only stay in the register. . This can increase the speed of the program, because the speed of the register is higher than the main memory, and transferring data between the register and the main memory is also very time-consuming.

The following is a register diagram in a book, which is based on the IA32 architecture.

As you can see, for the %esp and %ebp registers, they are marked as the stack pointer and the frame pointer, respectively. For the other six registers, they are the same most of the time, but there are still some differences.

For example, the %eax register is often used to store the return value of a function. For %eax, %ecx, %edx, and %ebx, they can all be accessed in individual bytes. Another thing to mention is that all eight registers can be accessed by double bytes.

In addition to the above differences, for %eax, %ecx, %edx and %ebx, %esi, %edi, their usage conventions are also slightly different, which we will discuss in depth later. You only need to get to know these eight gods.

Operand indicator

The title of operand indicator is given in the book, but LZ feels that this concept is not easy to understand. Operand indicator actually refers to a value identification method used to obtain operands participating in various operations.

There are three types of these identification methods, one is the $ sign followed by an integer represented by a standard C, such as $100, $0x11, and so on. The second type is a register. When it is used as an operand, it is the value in the register. In addition, for the register, 4, 2, and 1 bytes can also be selectively operated, and 4 bytes are not necessarily operated. The last one is the one we are relatively familiar with, which is storage or memory. When it is used as an operand, it will calculate the value of the memory address, and then go to this address to get the corresponding value.

As the memory is relatively speaking, it is more difficult to understand. So here LZ gives a simple example, for example, for the operand 4 (%esp,%eax,4), it represents the value of the memory area with the memory address of 4+%esp+4*%eax.

Operands are found in most instructions, so the above identification methods, we will often see in subsequent articles, they will become good friends of all ape friends.

Article summary

This chapter only introduces some basic knowledge in the compilation. Relatively speaking, these contents are not particularly difficult, but they are the key to open the mysterious door behind. Therefore, if any ape friend does not understand the content of this chapter too much, LZ hopes that you can start with practice to understand the content of this chapter. This point can be combined with the previous chapter. It should be very easy to find the data format, operands and registers from the assembly code given in the previous chapter, because the assembly code in the previous chapter is full of these three. The contents of each part.

The Links:   2RI100G-160 QM100HY-H IGBTS

Author: Yoyokuo