Exercise 3.1:

```LD   A, (LOC1)       LOAD CONTENTS OF LOC1 INTO A
LD   (LOC3), A       STORE ACCUMULATOR INTO LOC3```

Comparison: Conceptually, they are exactly the same.

Exercise 3.2:

```LD   A, (ADR1)       LOAD LOW HALF OF OP1
LD   (ADR3), A       STORE RESULT, LOW
INC  HL              ADDRESS OF HIGH HALF OF OP2
ADC  A, (HL)         (OP1 + OP2) HIGH + CARRY
LD   (ADR3+1), A     STORE RESULT, HIGH```

Exercise 3.3:

```LD   A, (ADR1-1)     LOAD LOW HALF OF OP1
LD   (ADR3-1), A     STORE RESULT, LOW
INC  HL              ADDRESS OF HIGH HALF OF OP2
ADC  A, (HL)         (OP1 + OP2) HIGH + CARRY
LD   (ADR3), A       STORE RESULT, HIGH```

Exercise 3.6:

```LD   A, (ADR1)       LOAD LOWER HALF OF OP1
SUB  A, (HL)         (OP1 - OP2) LOW
LD   (ADR3), A       STORE RES, LOW
INC  HL              ADDRESS OF HIGHER HALF OF OP2
SBC  A, (HL)         (OP1 - OP2) HIGH - CARRY
LD   (ADR3+1), A     STORE RES, HIGH```

Exercise 3.7:

```LD   A, (ADR1)       LOAD OP1
SUB  A, (HL)         (OP1 - OP2)

Exercise 3.9:

In general, the result stored in (ADR) would not be a valid BCD value, because the correction by DAA was performed after the storage in the memory. So it could be done, but it would give a wrong result.

In this special case, however, the binary addition of "11" with "22" does not give an invalid BCD result, so if DAA were left out, it would not result in an invalid BCD result.

Exercise 3.11: No, because there is no SBC A, (DE).

Exercise 3.12:

```LD   A, (ADR1)       LOAD LOWER HALF OF OP1
SUB  A, (HL)         (OP1 - OP2) LOW
LD   (ADR3), A       STORE (RESULT) LOW
INC  HL              POINT TO HIGHER HALF OF OP2
SBC  A, (HL)         (OP1 - OP2) HIGH - CARRY
LD   (ADR3 + 1), A   STORE (RESULT) HIGH```

Exercise 3.14: Very long routine:

```MPY88    LD BC, (MPRAD)
LD D, 0
LD HL, 0
BIT 0, C
RL D
BIT 1, C
RL D
BIT 2, C
RL D
BIT 3, C
RL D
BIT 4, C
RL D
BIT 5, C
RL D
BIT 6, C
RL D
BIT 7, C

Exercise 3.15: Yes, the routine would be 1 byte shorter, but 11 T states more to execute:

```DEC B 1 BYTE,  8 X 4 T        = 32 T } 123 T
JR    2 BYTES, 7 X 12 T + 7 T = 91 T }```

```DEC B 1 BYTE,  8 X 4 T        = 32 T } 112 T
JP    3 BYTES, 8 X 10 T       = 80 T }```

Exercise 3.16: Yes, it would be 1 byte shorter, and 13 T states less to execute:

```DEC B 1 BYTE,  8 X 4 T        = 32 T } 112 T
JP    3 BYTES, 8 X 10 T       = 80 T }```

`DJNZ  2 BYTES, 7 X 13 T + 8 T = 99 T`

Exercise 3.17: Yes, it is 1 byte shorter, and 1 T state less to execute.

Exercise 3.19: Speed makes no difference, because SLA E, RL D takes exactly as much clock cycles as SRL L, RR H.

Original program:

```MPY88      LD   BC, (MPRAD)     LOAD MULTIPLIER INTO C
LD   B, 8            B IS BIT COUNTER
LD   D, 0            CLEAR D
LD   HL, 0           SET RESULT TO 0
MULT       SRL  C               SHIFT MULTIPLIER INTO CARRY
NOADD      SLA  E               SHIFT MPD LEFT
RL   D               SAVE BIT IN D
DEC  B               DECREMENT SHIFT COUNTER
JP   NZ, MULT        DO IT AGAIN IF COUNTER <> 0

Alternative program:

```MPY88A     LD   BC, (MPRAD)     LOAD MULTIPLIER INTO C
LD   B, 8            B IS BIT COUNTER
LD   D, 0            CLEAR D
LD   HL, 0           SET RESULT TO 0
MULT       SRL  C               SHIFT MULTIPLIER INTO CARRY
NOADD      SRL  L               SHIFT PARTIAL RES RIGHT
RR   H               SAVE BIT IN H
DEC  B               DECREMENT SHIFT COUNTER
JP   NZ, MULT        DO IT AGAIN IF COUNTER <> 0

Exercise 3.20: Original program used 504 T states, 252 us. The new program uses 384 T states, 192 us.

Original program:

```MPY88      LD   BC, (MPRAD)      20 T
LD   B, 8              7 T
LD   D, 0              7 T
LD   HL, 0            10 T
----- +
64 T
MULT       SRL  C               --  8 T
JR   NC, NOADD       --  7 T / 12 T
ADD  HL, DE          -- 11 T
----   ----
26 T   20 T
NOADD      SLA  E               --  8 T
RL   D               --  8 T
DEC  B               --  4 T
JP   NZ, MULT        -- 10 T
----   ----
56 T   50 T
x 4    x 4
-----  -----
224 T  200 T
------------ +
424 T
----- +
504 T```

New program:

```MUL88C     LD   HL, (MPRAD-1)    20 T
LD   L, 0              7 T
LD   D, 0              7 T
LD   B, 8              7 T
----- +
61 T
MULT       ADD  HL, HL          -- 11 T
JR   NC, NOADD       --  7 T / 12 T
ADD  HL, DE          -- 11 T
----   ----
29 T   23 T
x 4    x 4
-----  -----
116 T   92 T
------------ +
208 T
NOADD      DJNZ MULT             99 T = 7 x 13 + 8
----- +
384 T```

Exercise 3.21:

```MUL88D     LD   HL, (MPRAD-1)   (same)
LD   L, 0            (same)
LD   B, 0            (different)
LD   D, 8            (different)
JP   NZ, MULT        (different)
RET                  (same)```

Exercise 3.22: It could destroy the multiplier MPR in register H, by adding another value than zero in register D.

Exercise 3.23: Advantage: All 16-bit numbers can be loaded in one instruction. Disadvantage: DJNZ is not possible, so the overall routine will be longer and slower.

```MULT16A    LD   A, 16
LD   HL, 0
MULT       SRL  C
RL   B
EX   DE, HL
DEC  A
JP   NZ, MULT
RET```

Exercise 3.24: The new code snippet is faster (by 3 T states), but results in longer code (1 byte longer).

New code snippet:

```SLA  E       2 BYTES, 8 T STATES } 16 T STATES
RL   D       2 BYTES, 8 T STATES }
------- +
4 BYTES```

Original code snippet:

```EX   DE, HL  1 BYTE, 4 T STATES  }
ADD  HL, HL  1 BYTE, 11 T STATES } 19 T STATES
EX   DE, HL  1 BYTE, 4 T STATES  }
------- +
3 BYTES```

Exercise 3.25: The last carry indicates an overflow. However, if we test for a carry at the time RET is reached, the carry will be lost by the ADD HL, HL instruction. We have to save the carry before we test it with JR NC, NOADD, and then retrieve it before the loop is closed. Luckily DJNZ does not change the carry bit. To save the flag, we use PUSH AF, and to retrieve it, we use POP AF. The calling routine can now test for a set carry bit, which indicates an overflow error.

```MUL16C     LD   A, (MPRAD + 1)
LD   C, A
LD   B, 16D
LD   HL, 0
MULT       SRL  C
RRA
PUSH AF              SAVE CARRY FOR LATER
EX   DE, HL
POP  AF              RETRIEVE CARRY
DJNZ MULT
RET                  IF CARRY IS SET AT THIS POINT,
AN OVERFLOW HAS OCCURRED. THE
CALLING ROUTINE HAS TO DEAL
WITH THAT```

Exercise 3.26: The registers are used as follows (see Figure A3.1):

Fig. A3.1: Registers Used In Exercise 3.26

We want to use register pair DE to contain the high part of the 32-bit result. For this, we use the following diagram (see Figure A3.2):

Fig. A3.2: Data Flow Between Registers in Exercise 3.26

The multiplication loop MULT can be described as follows:

1. First, we want to shift left MPR (register pair DE) into the carry C ("1-A" in Figure A3.2). If the carry equals "1", the contents of MPD (register pair BC) is added to the contents of HL ("1-B" in Figure A3.2); if the carry equals "0", this addition is skipped.
2. Second, we add HL to itself, or, in other words, shift HL one bit position to the left ("2-A" in Figure A3.2). The value of the carry that results from this shift operation is used in the next loop cycle, i.e., left shift it into register pair DE ("2-B" in Figure A3.2).
3. Third, we decrement the counter A. If this did not result in a zero-value, the loop is continued at step 1.

If after step 3, we do step 1, thus creating a program loop. The loop is ended if the value of register A reaches zero.

By rotating register E and then register D, the value of the carry is shifted into the right-most bit of register E, while the left-most bit of register D is shifted into the carry. This way, bit 16 of the result (coming out off register pair HL, and temporay stored in carry C) "shifts in" register pair DE on the right, while the multiplier MPR "shifts out" register pair DE on the left, into the carry bit.

Note, that we cannot use SLA E as in the answer of Exercise 3.24. We need to use a rotate instruction to shift the carry into the right-most bit of register E. SLA E would replace bit 16 of the result by a zero value. Remember, from the second iteration of the program loop on, the value of the carry bit at the start of the iteration originates from the ADD HL,HL instruction in the previous iteration of the program loop.

We have to combine the RL E instruction with a consecutive RL D instruction to shift the left-most bit of register D into the carry. We use this carry for testing purposes (to decide whether or not we should add MPD to RES), just as before.

Also note, when exiting the program loop, that the carry bit that resulted from the final ADD HL,HL operation is not yet shifted into register pair DE, as it should be. This means that the for a correct 32-bit result, we must perform this shift operation one more time.

The first time the two instructions "RL E/RL D" are executed (when the program loop is entered), the carry bit is undetermined. RL E rotates this undetermined carry bit into bit 0 of register E, and the combination "RL E/RL D" keeps left shifting this unwanted bit. After the program loop is exited, this bit is located in bit 7 of register D. However, the final "RL E/RL D" code sequence removes it from register D. (By the way, as a result, the original value of the C bit is preserved by this routine.)

```MUL32  LD   BC,(MPDAD)    LOAD MPD FROM THE MEMORY
LD   HL,0          INITIALIZE RES
LD   A,16D         COUNT 16 BITS
MULT   RL   E             SHIFT IN CARRY FROM ADD HL,HL
RL   D             SHIFT OUT LEFT-MOST BIT OF MPR
JR   NC,NOADD      CHECK LEFT-MOST BIT OF MPR
BIT 16 OF THE RESULT (PREVIOULY
BIT 7 OF REGISTER H) IS NOW
CONTAINED IN CARRY BIT C,
AND WILL BE SHIFTED INTO BIT 0
OF REGISTER E IN THE NEXT
ITERATION OF THE LOOP
DEC  A             DECREMENT COUNTER
JP   NZ,MULT       CONTINUE UNTIL COUNTER = 0
RL   E             SHIFT IN CARRY FROM ADD HL,HL
RL   D
LD   (RESAD),HL    STORE INTO MEMORY LOWER PART
LD   (RESAD+2),DE  AND UPPER PART OF 32-BIT RES```

Exercise 3.27:

The program suggested does not work. In the last iteration, both the quotient and remainder are doubled. This can't be right!

```DIV168     LD   A,(DVSAD)       LOAD DIVISOR
LD   D,A             INTO D
LD   E,0
LD   B,8             INITIALIZE COUNTER
DIV        XOR  A               CLEAR C BIT
SBC  HL,DE           DIVIDEND - DIVISOR
INC  HL              QUOTIENT = QUOTIENT + 1
POSITIVE
DEC  HL              QUOTIENT = QUOTIENT - 1
DJNZ DIV             LOOP UNTIL B = 0```

In fact, there has to be a little piece of code added to this routine (between DJNZ DIV and RET) to make it right:

```           XOR  A               CLEAR CARRY C
SBC  HL,DE           FINAL TRIAL-SUBTRACT
INC  HL              INCREMENT QUOTIENT
JP   P,EXIT          DON'T ADD IF POSITIVE
ADD  HL,DE           CORRECT REMAINDER IN H
DEC  HL              DECREMENT QUOTIENT IN L
EXIT       RET```

To test the validity of this program, let us divide 320 (0140H) by 7 (07H), and fill out the form below (contents (DE) = 0700H). If you check the table in Figure A3.3, you will see that H and L contain the correct result--H contains the remainder "05H" (5 in decimal), and L the quotient "2DH" (45 in decimal). You may confirm that if we had not performed the additional steps (exclude the light-blue colored rows in Figure A3.3), the result would have been: remainder (H) = 0CH (=12 decimal), quotient (L) = 2CH (=44 decimal).

Note, that if the dividend is greater than (16383 + divisor), i.e., if bit 15 of register pair HL is set, the sign flag M will always be set if the divisor is less than 128 (which means that bit 15 of register pair DE is cleared), and the result will be wrong. Something similar applies to the situation that the divisor is larger than 127, and the dividend less than 16384.

This means that both the divisor and dividend should be positive numbers in the two's complement notation. In fact, this is a 15/7 division, and not a 16/8 division program.

LABEL INSTRUCTION B H L
DIV168 LD A,(DVSAD) -- -- --
LD D,A -- -- --
LD E,0 -- -- --
LD B,8 08 01 40
DIV XOR A 08 01 40
SBC HL,DE 08 FA 40
INC HL 08 FA 41
DEC HL 08 01 40
DJNZ DIV 07 02 80
DIV XOR A 07 02 80
SBC HL,DE 07 FB 80
INC HL 07 FB 81
DEC HL 07 02 80
DJNZ DIV 06 05 00
DIV XOR A 06 05 00
SBC HL,DE 06 FE 00
INC HL 06 FE 01
DEC HL 06 05 00
DJNZ DIV 05 0A 00
DIV XOR A 05 0A 00
SBC HL,DE 05 03 00
INC HL 05 03 01
NOADD INC HL,HL 05 06 02
DJNZ DIV 04 06 02
DIV XOR A 04 06 02
SBC HL,DE 04 FF 02
INC HL 04 FF 03
DEC HL 04 06 02
DJNZ DIV 03 0C 04
DIV XOR A 03 0C 04
SBC HL,DE 03 05 04
INC HL 03 05 05
DJNZ DIV 02 0A 0A
DIV XOR A 02 0A 0A
SBC HL,DE 02 03 0A
INC HL 02 03 0B
NOADD INC HL,HL 02 06 16
DJNZ DIV 01 06 16
DIV XOR A 01 06 16
SBC HL,DE 01 FF 16
INC HL 01 FF 17