Representing and Manipulation information

Practice problems

2.1

  • A. Ox39A7F8 In binary: 0011 1001 1010 0111 1111 1000
  • B. 1100 1001 0111 1011 In hex: 0xC97B
  • C. OxD5E4C In binary: 1101 0101 1110 0100 1100
  • D. 10 0110 1110 0111 1011 0101 In hex: 0x26E7B5

2.2

Fill in the blank entries in the following table, giving the decimal and hexadecimal representations of different powers of 2:

n2^n (decimal)2^n (hexadecimal)
95120x200
195242880x80000
14163840x4000
16655360x10000
171310720x20000
4320x20
71280x80

2.3

A single byte can be represented by 2 hexadecimal digits. Fill in the missing entries in the following table, giving the decimal, binary, and hexadecimal values of different byte patterns:

DecimalBinaryHexadecimal
00000 00000x00
167101000100xA7
621111100x3E
188101111000xBC
55001101110x37
1361000 10000x88
243111100110xF3
820101 00100x52
1721010 11000xAC
2311110 01110xE7

2.4

Without converting the numbers to decimal or binary, try to solve the following arithmetic problems, giving the answers in hexadecimal. Hint: Just modify the methods you use for performing decimal addition and subtraction to use base 16.

A. 0x503c + 0x8 = 0x5044

B. 0x503c - 0x40 = 0x4ffc

C. 0x503c + 64 = 0x503c + 0x40 = 507c

D. 0x50ea - 0x503c = 0xAE

2.5

Consider the following three calls to show_bytes:

int val = Ox87654321;
byte_pointer valp = (byte_pointer) &val;
show_bytes(valp , 1);
show_bytes(valp , 2);
show_bytes(valp , 3);

Indicate the values that will be printed by each call on a little-endian machine and on a big-endian machine:

Little endiandBig endian
A2187
B21438765
C214365876543

2.7

What would be printed as a result of the following call to show_bytes?

const char *s = "abcdef";
show_bytes((byte_pointer) s, strlen(s));
  

Note that letters ‘a ’ through ‘z’ have ASCII codes 0x61 through 0x7A.

61 62 63 64 65 66.

2.8

a: [01101001] b: [01010101]

~a: [10010110] ~b: [10101010]

a & b: [01000001] a | b: [01111101] a ^ b: [00111100]

2.18

A: 0x2e0 = 0000001011100000 = 736.

B: 0x58 = 0000000001011000 = 88.

C: 0x28 = 0000000000101000 = 40.

D: 0x30 = 0000000000110000 = 48.

E: 0x78 = 0000000001111000 = 120.

F: 0x88 = 0000000010001000 = 136

G: 0x1f8 = 0000000111111000 = 504.

H: 0xC0 = 0000000011000000 = 192.

I: 0x48 = 0000000001001000 = 72.

2.19

Machine-Level Representation of Programs

long mult2(long , long);

void multstore(long x, long y, long *dest) {
  long t = mult2(x, y);
  *dest = t;
}

We can run gcc -Og -S mstore.c to generate the assembly code. The file mstore.s with the following content is generated:

        .file	"mstore.c"
        .text
        .globl	multstore
        .type	multstore, @function
multstore:
        .LFB0:
        .cfi_startproc
        pushq	%rbx
        .cfi_def_cfa_offset 16
        .cfi_offset 3, -16
        movq	%rdx, %rbx
        call	mult2@PLT
        movq	%rax, (%rbx)
        popq	%rbx
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
        .LFE0:
        .size	multstore, .-multstore
        .ident	"GCC: (GNU) 13.2.1 20230801"
        .section	.note.GNU-stack,"",@progbits

We can use the -c option to compile and assemble: gcc -Og -c mstore.c. Now the file mstore.o is generate too. This file is 1,368 bytes and contains the 14-byt sequence with the following hexadecimal representation:

53 48 89 d3 e8 00 00 00 00 48 89 03 5b c3

We can use objdump to disassemble: objdump -d mstore.o. This is the output:

mstore.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <multstore>:
   0:	53                   	push   %rbx
   1:	48 89 d3             	mov    %rdx,%rbx
   4:	e8 00 00 00 00       	call   9 <multstore+0x9>
   9:	48 89 03             	mov    %rax,(%rbx)
   c:	5b                   	pop    %rbx
   d:	c3                   	ret

The last thing to do to get the actual executable code is running a linker. The linker must be run on the set of object-code files, one of which must contain a main function.

Let’s then create a main.c file:

#include <stdio.h>

void multstore(long, long, long *);

int main() {
  long d;
  multstore(2, 3, &d);
  printf("2 * 3 --> %ld\n", d);
  return 0;
}

long mult2(long a, long b) {
  long s = a * b;
  return s;
}

We can generate an executable named prog with: gcc -Og -o prog main.c mstore.c.

Let’s also disassemble the file prog: objdump -d prog

Practice problems

3.4

Assume variables sp and dp are declared with types

src_t *sp;
dest_t *dp;

where src_t and dest_t are data types declared with typedef. We wish to use the appropriate pair of data mvoement instructions to implement the operation

*dp = (dest_t) *sp;

Assume that the values of sp and dp are stored in registers %rdi and %rsi, respectively. For each entry in the table, show the two instructions that implement the specified data movement. The first instruction in the sequence should read from memory, do the appropriate conversion, and set the appropriate portion of register %rax. The second instruction should then write the appropriate portion of %rax to memory. In both cases, the portions may be %rax, %eax, %ax, or %al, and they may differ from one another.

Recall that when performing a cast that involves both a size change and a change in “signedness” in C, the operation should change the size first (Section 2.2.6).

src_tdest_tInstruction
longlongmovq (%rdi), %rax
movq %rax, (%rsi)
  • Explanation: the C statement is this
    *dp = *sp;
        

    where dp and sp are defined as follows:

    typedef long src_t;
    typedef long dest_t;
    
    src_t *sp;
    dest_t *dp;
        

    The operation to perform (*dp = *sp;) is that of moving the data in memory at the address stored in sp into another location in memory; the location at the address store in dp.

    The get the data in memory at the address stored in sp we can use (%rsi). %rsi, by definition, is the register in which sp is stored. Wrapping into parenthesis means: the location in memory with the address stored in %rsi.

    We cannot simply move the data from a location in memory into another location in memory. In fact we are told to use the register %rax as an intermediary step.

    The data we are moving is 8 bytes (a “quad word”) so we can use movq:

    movq (%rdi), %rax
        

    With that, we have the moved 8 bytes from the source into one register. The next step consists in moving those bytes into another location in memory; that location by definition is at the address stored in %rsi.

    So, the second instruction is:

    movq %rax, (%rsi)
        
src_tdest_tInstruction
charintmovsbl (%rdi), %eax
movl %eax, (%rsi)

Comment: in this case, we first need to convert the char into an int. This means converting 1 byte (see the b in movsbl) to 4 bytes (see the l in movsbl). Given that the value is signed, this operation is performed by sign extension (see the s in movsbl), not by zero extension (z).

src_tdest_tInstruction
charunsignedmovsbl (%rdi), %eax
movl %eax, (%rsi)

Comment: the comment about the previous case applies here too.

src_tdest_tInstruction
unsigned charlongmovzbl (%rdi), %eax
movq %rax, (%rsi)

Comment: here we want to extend a unsigned char (one byte) to a long (eight bytes). Given that it is an unsigned value, we use zero extension, not sign extension. Then why movzbl (%rdi), %eax instead of movzbq (%rdi), %rax?

Here is what Authors say in Errata (https://web.archive.org/web/20230813064349/http://csapp.cs.cmu.edu/3e/errata.html):

p. 184, (Clarification, not an erratum) Figure 3.5. Although there is an instruction movzbq, the GCC compiler typically generates the instruction movzbl for this purpose, relying on the property that an instruction generating a 4-byte with a register as destination will fill the upper 4 bytes of the register with zeros. Posted 04/27/2018. Randal Bryant

(Clarification, not an erratum) p. 326, Solution to Problem 3.4, seventh line of code. The GCC compiler generates the instruction movzbl for this case, even though the goal is to extend the 1-byte value to 8 bytes. See the note on Figure 3.5 (p. 184). Posted 04/27/2018. Randal Bryant

src_tdest_tInstruction
intcharmovl (%rdi), %eax
movb %al, (%rsi)

Comment: here we first move the whole 4 bytes into the 4-byte portion of %rax. Then we just store the the content of %al (the first byte of %rax).

src_tdest_tInstruction
unsignedunsigned charmovl (%rdi), %eax
movb %al, (%rsi)

Comment: none. Notice though that these are the same instructions as before.

src_tdest_tInstruction
charshortmovsbw (%rdi), %ax
movw %ax, (%rsi)

Comment: we want to upgrade a char (one byte) (b) to a short (two bytes) (w). Since it’s a signed value we use sign extension (s).

3.6

Suppose register %rax holds value x and %rcx holds value y. Fill in the table below with formulas indicating the value that will be stored in %rdx for each of the given assembly-code instructions:

InstructionResult
leaq 6(%rax), %rdx6 + x
leaq (%rax,%rcx), %rdxx + y
leaq (%rax,%rcx,4), %rdxx + 4y
leaq 7(%rax,%rax,8), %rdx7 + 9x
leaq 0xA(,%rcx,4), %rdx10 + 4y
leaq 9(%rax,%rcx,2), %rdx9 + x + 2y

3.16

Problem:

When given the C code

void cond (long a, long *p) {
  if (p && a > *p) {
    *p = a;
  }
}
  

GCC generates the following assembly code:

cond:
        testq   %rsi, %rsi
        je      .L1
        cmpq    %rdi, (%rsi)
        jge     .L1
        movq    %rdi, (%rsi)
.L1:
        ret
  

A. Write a goto version in C that performs the same computation and mimics the control flow of the assembly code, in the style shown in Figure 3.16(b). You might find it helpful to first annotate the assembly code as we have done in our examples.

B. Explain why the assembly code contains two conditional branches, even though the C code has only one if statement.

Answer:

void cond(long a, long *p) {

  if (p == 0) {
    goto done;
  }

  if (*p >= a) {
    goto done;
  }

  *p = a;

  done:
    return;
}

3.17

Problem:

An alternate rule for translating if statements into goto code is as follows:

t = test-expr;
if (t)
  goto true/;
else-statement;
goto done;
true:
    then-statement
done:
  

A. Rewrite the goto version of absdiff_se based on this alternate rule.

B. Can you think of any reasons for choosing one rule over the other?

Answer:

long gotodiff_se_alt(long x, long y) {
  long result;
  if (x < y)
    goto true;
  ge_cnt++;
  result = x - y;
  return result;
  true:
    lt_cnt++;
    result = y - x;
    return result;
}

Linking

What is linking?

It consists in collecting/combining code and data into a single file.

The single file can be loaded in memory and executed.

Linking can happen at different times:

  • compile time (static linking);
  • load time (dynamic linking);
  • run time (dynamic linking);

Why are linkers important? Because they enable separate compilation.

Most compilation systems provide a compiler driver which takes care of preprocessing, compiling, assembling, and linking. GCC is one of such drivers.

$ gcc -Og prog main.c sum.c

You can run gcc using the -v options to see the steps gcc takes.

Static linking

  • A static linker takes a collection of relocatable object files and command line arguments.
  • The output of the a static linker is a “fully linked executable object file that can be loaded an run”.
  • What does a relocatable file consist of? It consists of “code and data sections, where each section is a contiguous sequence of bytes.” There is a section, for example, for instructions, and another section for initialized global variables, and another section for uninitialized variables.
  • What main tasks does a linker perform?
    • Symbol resolution: association of each symbol reference with exactly one symbole definition.
    • Relocation: “compilers and assemblers generate code and data sections that start at address 0. The linker relocates…”. That is it assign a memory location to each symbol definition and changes all the refereces to the symbol so that they point to that memory location. In order to relocate, the linker uses instructions (called relocation entries) that have been generated by the assembler.
  • Remember that object files are merely collections of blocks of bytes and that the compiler and the assembler have done most of the work…

Object files

  • There are three types of object files:
    Relocatable object file
    Executable object file
    Shared object file
    a special type of relocatable object file that can be loaded into memory and linked dynamically (either at load time or at run time).
  • Object files are organized according to specific object file formats.
    • Windows: Portable Executable (PE);
    • Mac OS-X: Mach-O;
    • x86-64 Linux: Executable and Linkable Format (ELF).

Relocatable Object Files

             +-- +------------------------------+
             |   |      ELF header              |
             |   +------------------------------+
             |   |      .text                   |
             |   +------------------------------+
             |   |      .rodata                 |
             |   +------------------------------+
             |   |      .data                   |
             |   +------------------------------+
             |   |      .bss                    |
             |   +------------------------------+
   Sections <|   |      .symtab                 |
             |   +------------------------------+
             |   |      .rel.text               |
             |   +------------------------------+
             |   |      .rel.data               |
             |   +------------------------------+
             |   |      .debug                  |
             |   +------------------------------+
             |   |      .line                   |
             |   +------------------------------+
             |   |      .strtab                 |
Describes    +-- +------------------------------+
object file <|   |      Section header table    |
sections     +-- +------------------------------+
  • ELF header: …
  • Section header table: …
  • sections: …

Symbols and Symbol Tables

  • Each relocatable module, m/[fn::Modules? Authors say: “Technically, an /object module is a sequence of bytes, and an object file is an object module stored on a disk in a file. However, we will use these terms interchangeably”.], has a symbol table.
  • The symbole table contains information about the symbols that are defined and referenced by m.
  • There are three kinds of symbols (in the context of a linker):
    Global linker symbols
    they are defined by m and can be referenced by other modules. They correspond to nonstatic C functions and global variables.
    Externals
    they are referenced by m but defined in some other module. They correspond to nonstatic C functions and global variables defined in other modules.
    Local linker symbols
    they are defined and referenced exclusively by m. They correspond to static C function and global variable that are defined with the static attribute. They are visible anywhere within m, but cannot be referenced by other modules.
  • Local linker symbols are not the same as local program variables. Local nonstatic program variables do not appear in the symbol table in .symtab. Local nonstatic program variables are managed at run time on the stack. They are business of the linker.
  • Local procedures variables that are defined with static are not managed on the stack. The compiler allocates spaec in .data or .bss for each definition and dreates a local linker symbol in the table with a unique name.

    For example:

    int f()
    {
      static int x = 0;
      return x;
    }
    
    int g()
    {
      static int x = 1;
      return x;
    }
        

Symbol Resolution

  • strong vs weak symbols

Virtual memory

Practice problems

9.1

| Number of virtual | Number of virtual | Largest possible |
|  address bits (n) | addresses (N)     | virtual address  |
|-------------------+-------------------+------------------|
|                 4 | 2^4 = 16          | 2^(4)-1 = 15     |
|-------------------+-------------------+------------------|
|                14 | 2^14 = 16K        | 2^(14)-1 = 16K-1 |
|-------------------+-------------------+------------------|
|                24 | 2^24 = 16M        | 2^(24) = 16M-1   |
|-------------------+-------------------+------------------|
|                46 | 2^46 = 64T        | 2^(46)-1 = 64T-1 |
|-------------------+-------------------+------------------|
|                54 | 2^54 = 16P        | 2^(54)-1 = 16P-1 |

System-Level I/O

Opening and closing files

  • open

Reading and Writing files

  • read
  • write
  • Authors also show several reading and writing function that they consider “robust”. Some perform “buffered” operations.

Reading File Metadata

  • stat
  • fstat

Network Programming

Practice Problem 11.1

Complete the following table:

Hex addressDotted-decimal address
0x00.0.0.0
0xffffffff255.255.255.255
0x7f000001127.0.0.1
0xcdbca079205.188.160.121
0x400c950d64.12.149.13
0xcdbc9217205.188.146.23

Practice Problem 11.2

Write a program hex2dd.c that converts its hex argument to a dotted-decimal string and prints the result. For example:

linux> ./hex2dd 0x8002c2f2
128.2.194.242
  
/*
  Covert hex argument to a dotted-decimal string and print it.

  (Practice problem 11.2)
*/

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <arpa/inet.h>

#define MAXBUF 100

int main(int argc, char* argv[]) {

    struct in_addr inaddr; /* Address in network byte order (big endian)*/
    uint32_t addr;         /* Address in host byter order */
    char buf[MAXBUF];      /* Buffer for dotted-decimal string */

    if (argc != 2) {
        fprintf(stderr, "usage: %s <hex number>\n", argv[0]);
        exit(0);
    }
    sscanf(argv[1], "%x", &addr); // scan input

    inaddr.s_addr = htonl(addr); // convert input into network order
                                 // and store it in to in_addr structure

    if (!inet_ntop(AF_INET, &inaddr, buf, MAXBUF)) { // convert to dotted-decimal
        printf("error: inet_ntop\n");
        exit(1);
    }
    printf("%s\n", buf);

    return 0;
}

Practice Problem 11.3

Write a program dd2hex.c that converts its dotted-decimal argument to a hex number and prints the result. For example,

linux> ./dd2hex 128.2.194.242
0x8002c2f2
  
/*
  Convert dotted-decimal argument to a hex number and print it.

  (Practice problem 11.3)
*/

#include <arpa/inet.h>
#include <netinet/in.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#define MAXBUF 100

int main(int argc, char *argv[]) {
    if (argc != 2) {
        fprintf(stderr, "usage: %s <dotted-decimal value>\n", argv[0]);
        exit(0);
    }

    struct in_addr inaddr; /* Address in network bite order (big endian) */

    if (!inet_pton(AF_INET, argv[1], &inaddr)) {
        printf("error: inet_pton\n");
        exit(1);

    }

    uint32_t addr = ntohl(inaddr.s_addr); // address in host order

    printf("0x%x\n", addr);

    return 0;
}