Answers Chapter 3

Exercise 3.1:

LD   A, (LOC1)       LOAD CONTENTS OF LOC1 INTO A
LD   HL,LOC2         LOAD ADDRESS OF LOC2 INTO HL
ADD  A, (HL)         ADD CONTENTS OF LOC2 TO CONTENTS OF LOC1
LD   (LOC3), A       STORE ACCUMULATOR INTO LOC3

Comparison: Conceptually, they are exactly the same.

Exercise 3.2:

LD   A, (ADR1)       LOAD LOW HALF OF OP1
LD   HL, ADR2        ADDRESS OF LOW HALF OF OP2
ADD  A, (HL)         ADD OP1 AND OP2 LOW
LD   (ADR3), A       STORE RESULT, LOW
LD   A, (ADR1+1)     LOAD HIGH HALF OF OP1
INC  HL              ADDRESS OF HIGH HALF OF OP2
ADC  A, (HL)         (OP1 + OP2) HIGH + CARRY
LD   (ADR3+1), A     STORE RESULT, HIGH

Exercise 3.3:

LD   A, (ADR1-1)     LOAD LOW HALF OF OP1
LD   HL, ADR2-1      ADDRESS OF LOW HALF OF OP2
ADD  A, (HL)         ADD OP1 AND OP2 LOW
LD   (ADR3-1), A     STORE RESULT, LOW
LD   A, (ADR1)       LOAD HIGH HALF OF OP1
INC  HL              ADDRESS OF HIGH HALF OF OP2
ADC  A, (HL)         (OP1 + OP2) HIGH + CARRY
LD   (ADR3), A       STORE RESULT, HIGH

Exercise 3.6:

LD   A, (ADR1)       LOAD LOWER HALF OF OP1
LD   HL, ADR2        ADDRESS OF LOWER HALF OF OP2
SUB  A, (HL)         (OP1 - OP2) LOW
LD   (ADR3), A       STORE RES, LOW
LD   A, (ADR1+1)     LOAD HIGHER HALF OF OP1
INC  HL              ADDRESS OF HIGHER HALF OF OP2
SBC  A, (HL)         (OP1 - OP2) HIGH - CARRY
LD   (ADR3+1), A     STORE RES, HIGH

Exercise 3.7:

LD   A, (ADR1)       LOAD OP1
LD   HL, ADR2        ADDRESS OF OP2
SUB  A, (HL)         (OP1 - OP2)
LD   (ADR3), A       STORE RES

Exercise 3.9:

In general, the result stored in (ADR) would not be a valid BCD value, because the correction by DAA was performed after the storage in the memory. So it could be done, but it would give a wrong result.

In this special case, however, the binary addition of "11" with "22" does not give an invalid BCD result, so if DAA were left out, it would not result in an invalid BCD result.

Exercise 3.10: LD A, (DE) instead of LD A,(ADR1).

Exercise 3.11: No, because there is no SBC A, (DE).

Exercise 3.12:

LD   A, (ADR1)       LOAD LOWER HALF OF OP1
LD   HL, ADR2        ADDRESS OF LOWER HALF OF OP2
SUB  A, (HL)         (OP1 - OP2) LOW
DAA                  DECIMAL ADJUST
LD   (ADR3), A       STORE (RESULT) LOW
LD   A, (ADR1 + 1)   LOAD HIGHER HALF OF OP1
INC  HL              POINT TO HIGHER HALF OF OP2
SBC  A, (HL)         (OP1 - OP2) HIGH - CARRY
DAA                  DECIMAL ADJUST
LD   (ADR3 + 1), A   STORE (RESULT) HIGH

Exercise 3.14: Very long routine:

MPY88    LD BC, (MPRAD)
         LD DE, (MPDAD)
         LD D, 0
         LD HL, 0
         BIT 0, C
         JR NZ, NOADD0
         ADD HL, DE
NOADD0   SLA E
         RL D
         BIT 1, C
         JR NZ, NOADD1
         ADD HL, DE
NOADD1   SLA E
         RL D
         BIT 2, C
         JR NZ, NOADD2
         ADD HL, DE
NOADD2   SLA E
         RL D
         BIT 3, C
         JR NZ, NOADD3
         ADD HL, DE
NOADD3   SLA E
         RL D
         BIT 4, C
         JR NZ, NOADD4
         ADD HL, DE
NOADD4   SLA E
         RL D
         BIT 5, C
         JR NZ, NOADD5
         ADD HL, DE
NOADD5   SLA E
         RL D
         BIT 6, C
         JR NZ, NOADD6
         ADD HL, DE
NOADD6   SLA E
         RL D
         BIT 7, C
         JR NZ, NOADD7
         ADD HL, DE
NOADD7   LD (RESAD), HL

Exercise 3.15: Yes, the routine would be 1 byte shorter, but 11 T states more to execute:

DEC B 1 BYTE,  8 X 4 T        = 32 T } 123 T
JR    2 BYTES, 7 X 12 T + 7 T = 91 T }

DEC B 1 BYTE,  8 X 4 T        = 32 T } 112 T
JP    3 BYTES, 8 X 10 T       = 80 T }

Exercise 3.16: Yes, it would be 1 byte shorter, and 13 T states less to execute:

DEC B 1 BYTE,  8 X 4 T        = 32 T } 112 T
JP    3 BYTES, 8 X 10 T       = 80 T }

DJNZ  2 BYTES, 7 X 13 T + 8 T = 99 T

Exercise 3.17: Yes, it is 1 byte shorter, and 1 T state less to execute.

Exercise 3.19: Speed makes no difference, because SLA E, RL D takes exactly as much clock cycles as SRL L, RR H.

Original program:

MPY88      LD   BC, (MPRAD)     LOAD MULTIPLIER INTO C
           LD   B, 8            B IS BIT COUNTER
           LD   DE, (MPDAD)     LOAD MULTIPLICAND INTO E
           LD   D, 0            CLEAR D
           LD   HL, 0           SET RESULT TO 0
MULT       SRL  C               SHIFT MULTIPLIER INTO CARRY
           JR   NC, NOADD       TEST CARRY
           ADD  HL, DE          ADD MPD TO RESULT
NOADD      SLA  E               SHIFT MPD LEFT
           RL   D               SAVE BIT IN D
           DEC  B               DECREMENT SHIFT COUNTER
           JP   NZ, MULT        DO IT AGAIN IF COUNTER <> 0
           LD   (RESAD), HL     STORE RESULT

Alternative program:

MPY88A     LD   BC, (MPRAD)     LOAD MULTIPLIER INTO C
           LD   B, 8            B IS BIT COUNTER
           LD   DE, (MPDAD)     LOAD MULTIPLICAND INTO E
           LD   D, 0            CLEAR D
           LD   HL, 0           SET RESULT TO 0
MULT       SRL  C               SHIFT MULTIPLIER INTO CARRY
           JR   NC, NOADD       TEST CARRY
           ADD  HL, DE          ADD MPD TO RESULT
NOADD      SRL  L               SHIFT PARTIAL RES RIGHT
           RR   H               SAVE BIT IN H
           DEC  B               DECREMENT SHIFT COUNTER
           JP   NZ, MULT        DO IT AGAIN IF COUNTER <> 0
           LD   (RESAD), HL     STORE RESULT

Exercise 3.20: Original program used 504 T states, 252 us. The new program uses 384 T states, 192 us.

Original program:

MPY88      LD   BC, (MPRAD)      20 T
           LD   B, 8              7 T
           LD   DE, (MPDAD)      20 T
           LD   D, 0              7 T
           LD   HL, 0            10 T
                                ----- +
                                 64 T
MULT       SRL  C               --  8 T
           JR   NC, NOADD       --  7 T / 12 T
           ADD  HL, DE          -- 11 T
                                   ----   ----
                                   26 T   20 T
NOADD      SLA  E               --  8 T
           RL   D               --  8 T
           DEC  B               --  4 T
           JP   NZ, MULT        -- 10 T
                                   ----   ----
                                   56 T   50 T
                                  x 4    x 4
                                  -----  -----
                                  224 T  200 T
                                  ------------ +
                                424 T
           LD   (RESAD), HL      16 T
                                ----- +
                                504 T

New program:

MUL88C     LD   HL, (MPRAD-1)    20 T
           LD   L, 0              7 T
           LD   DE, (MPDAD)      20 T
           LD   D, 0              7 T
           LD   B, 8              7 T
                                ----- +
                                 61 T
MULT       ADD  HL, HL          -- 11 T
           JR   NC, NOADD       --  7 T / 12 T
           ADD  HL, DE          -- 11 T
                                   ----   ----
                                   29 T   23 T
                                  x 4    x 4
                                  -----  -----
                                  116 T   92 T
                                  ------------ +
                                208 T
NOADD      DJNZ MULT             99 T = 7 x 13 + 8
           LD   (RESAD), HL      16 T
                                ----- +
                                384 T

Exercise 3.21:

MUL88D     LD   HL, (MPRAD-1)   (same)
           LD   L, 0            (same)
           LD   BC, (MPDAD)     (different)
           LD   B, 0            (different)
           LD   D, 8            (different)
MULT       ADD  HL, HL          (same)
           JR   NC, NOADD       (same)
           ADD  HL, BC          (different)
NOADD      DEC  D               (different)
           JP   NZ, MULT        (different)
           LD   (RESAD), HL     (same)
           RET                  (same)

Exercise 3.22: It could destroy the multiplier MPR in register H, by adding another value than zero in register D.

Exercise 3.23: Advantage: All 16-bit numbers can be loaded in one instruction. Disadvantage: DJNZ is not possible, so the overall routine will be longer and slower.

MULT16A    LD   A, 16
           LD   BC, (MPRAD)
           LD   DE, (MPDAD)
           LD   HL, 0
MULT       SRL  C
           RL   B
           JR   NC, NOADD
           ADD  HL, DE
NOADD      EX   DE, HL
           ADD  HL, HL
           EX   DE, HL
           DEC  A
           JP   NZ, MULT
           LD   (RESAD), HL
           RET

Exercise 3.24: The new code snippet is faster (by 3 T states), but results in longer code (1 byte longer).

New code snippet:

SLA  E       2 BYTES, 8 T STATES } 16 T STATES
RL   D       2 BYTES, 8 T STATES }
             ------- +
             4 BYTES

Original code snippet:

EX   DE, HL  1 BYTE, 4 T STATES  }
ADD  HL, HL  1 BYTE, 11 T STATES } 19 T STATES
EX   DE, HL  1 BYTE, 4 T STATES  }
             ------- +
             3 BYTES

Exercise 3.25: The last carry indicates an overflow. However, if we test for a carry at the time RET is reached, the carry will be lost by the ADD HL, HL instruction. We have to save the carry before we test it with JR NC, NOADD, and then retrieve it before the loop is closed. Luckily DJNZ does not change the carry bit. To save the flag, we use PUSH AF, and to retrieve it, we use POP AF. The calling routine can now test for a set carry bit, which indicates an overflow error.

MUL16C     LD   A, (MPRAD + 1)
           LD   C, A
           LD   A, (MPRAD)
           LD   B, 16D
           LD   DE, (MPDAD)
           LD   HL, 0
MULT       SRL  C
           RRA
           PUSH AF              SAVE CARRY FOR LATER
           JR   NC, NOADD       TEST CARRY
           ADD  HL, DE
NOADD      EX   DE, HL
           ADD  HL, HL
           EX   DE, HL
           POP  AF              RETRIEVE CARRY
           DJNZ MULT
           LD   (RESAD), HL
           RET                  IF CARRY IS SET AT THIS POINT,
                                AN OVERFLOW HAS OCCURRED. THE
                                CALLING ROUTINE HAS TO DEAL
                                WITH THAT

Exercise 3.26: The registers are used as follows (see Figure A3.1):

figure A3.1

Fig. A3.1: Registers Used In Exercise 3.26

We want to use register pair DE to contain the high part of the 32-bit result. For this, we use the following diagram (see Figure A3.2):

figure A3.2

Fig. A3.2: Data Flow Between Registers in Exercise 3.26

The multiplication loop MULT can be described as follows:

  1. First, we want to shift left MPR (register pair DE) into the carry C ("1-A" in Figure A3.2). If the carry equals "1", the contents of MPD (register pair BC) is added to the contents of HL ("1-B" in Figure A3.2); if the carry equals "0", this addition is skipped.
  2. Second, we add HL to itself, or, in other words, shift HL one bit position to the left ("2-A" in Figure A3.2). The value of the carry that results from this shift operation is used in the next loop cycle, i.e., left shift it into register pair DE ("2-B" in Figure A3.2).
  3. Third, we decrement the counter A. If this did not result in a zero-value, the loop is continued at step 1.

If after step 3, we do step 1, thus creating a program loop. The loop is ended if the value of register A reaches zero.

By rotating register E and then register D, the value of the carry is shifted into the right-most bit of register E, while the left-most bit of register D is shifted into the carry. This way, bit 16 of the result (coming out off register pair HL, and temporay stored in carry C) "shifts in" register pair DE on the right, while the multiplier MPR "shifts out" register pair DE on the left, into the carry bit.

Note, that we cannot use SLA E as in the answer of Exercise 3.24. We need to use a rotate instruction to shift the carry into the right-most bit of register E. SLA E would replace bit 16 of the result by a zero value. Remember, from the second iteration of the program loop on, the value of the carry bit at the start of the iteration originates from the ADD HL,HL instruction in the previous iteration of the program loop.

We have to combine the RL E instruction with a consecutive RL D instruction to shift the left-most bit of register D into the carry. We use this carry for testing purposes (to decide whether or not we should add MPD to RES), just as before.

Also note, when exiting the program loop, that the carry bit that resulted from the final ADD HL,HL operation is not yet shifted into register pair DE, as it should be. This means that the for a correct 32-bit result, we must perform this shift operation one more time.

The first time the two instructions "RL E/RL D" are executed (when the program loop is entered), the carry bit is undetermined. RL E rotates this undetermined carry bit into bit 0 of register E, and the combination "RL E/RL D" keeps left shifting this unwanted bit. After the program loop is exited, this bit is located in bit 7 of register D. However, the final "RL E/RL D" code sequence removes it from register D. (By the way, as a result, the original value of the C bit is preserved by this routine.)

MUL32  LD   BC,(MPDAD)    LOAD MPD FROM THE MEMORY
       LD   DE,(MPRAD)    LOAD MPR FROM THE MEMORY
       LD   HL,0          INITIALIZE RES
       LD   A,16D         COUNT 16 BITS
MULT   RL   E             SHIFT IN CARRY FROM ADD HL,HL
       RL   D             SHIFT OUT LEFT-MOST BIT OF MPR
       JR   NC,NOADD      CHECK LEFT-MOST BIT OF MPR
       ADD  HL,BC         ADD MPD TO RES
NOADD  ADD  HL,HL         SHIFT LEFT RES
                          BIT 16 OF THE RESULT (PREVIOULY
                          BIT 7 OF REGISTER H) IS NOW
                          CONTAINED IN CARRY BIT C,
                          AND WILL BE SHIFTED INTO BIT 0
                          OF REGISTER E IN THE NEXT
                          ITERATION OF THE LOOP
       DEC  A             DECREMENT COUNTER
       JP   NZ,MULT       CONTINUE UNTIL COUNTER = 0
       RL   E             SHIFT IN CARRY FROM ADD HL,HL
       RL   D
       LD   (RESAD),HL    STORE INTO MEMORY LOWER PART
       LD   (RESAD+2),DE  AND UPPER PART OF 32-BIT RES

Exercise 3.27:

The program suggested does not work. In the last iteration, both the quotient and remainder are doubled. This can't be right!

DIV168     LD   A,(DVSAD)       LOAD DIVISOR
           LD   D,A             INTO D
           LD   E,0
           LD   HL,(DVDAD)      LOAD 16-BIT DIVIDEND
           LD   B,8             INITIALIZE COUNTER
DIV        XOR  A               CLEAR C BIT
           SBC  HL,DE           DIVIDEND - DIVISOR
           INC  HL              QUOTIENT = QUOTIENT + 1
           JP   P,NOADD         TEST IF REMAINDER
                                POSITIVE
           ADD  HL,DE           RESTORE IF NECESSARY
           DEC  HL              QUOTIENT = QUOTIENT - 1
NOADD      ADD  HL,HL           SHIFT DIVIDEND LEFT
           DJNZ DIV             LOOP UNTIL B = 0

In fact, there has to be a little piece of code added to this routine (between DJNZ DIV and RET) to make it right:

           XOR  A               CLEAR CARRY C
           SBC  HL,DE           FINAL TRIAL-SUBTRACT
           INC  HL              INCREMENT QUOTIENT
           JP   P,EXIT          DON'T ADD IF POSITIVE
           ADD  HL,DE           CORRECT REMAINDER IN H
           DEC  HL              DECREMENT QUOTIENT IN L
EXIT       RET

To test the validity of this program, let us divide 320 (0140H) by 7 (07H), and fill out the form below (contents (DE) = 0700H). If you check the table in Figure A3.3, you will see that H and L contain the correct result--H contains the remainder "05H" (5 in decimal), and L the quotient "2DH" (45 in decimal). You may confirm that if we had not performed the additional steps (exclude the light-blue colored rows in Figure A3.3), the result would have been: remainder (H) = 0CH (=12 decimal), quotient (L) = 2CH (=44 decimal).

Note, that if the dividend is greater than (16383 + divisor), i.e., if bit 15 of register pair HL is set, the sign flag M will always be set if the divisor is less than 128 (which means that bit 15 of register pair DE is cleared), and the result will be wrong. Something similar applies to the situation that the divisor is larger than 127, and the dividend less than 16384.

This means that both the divisor and dividend should be positive numbers in the two's complement notation. In fact, this is a 15/7 division, and not a 16/8 division program.

LABEL INSTRUCTION B H L
DIV168 LD A,(DVSAD) -- -- --
  LD D,A -- -- --
  LD E,0 -- -- --
  LD HL,(DVDAD) -- 01 40
  LD B,8 08 01 40
DIV XOR A 08 01 40
  SBC HL,DE 08 FA 40
  INC HL 08 FA 41
  JP P,NOADD 08 FA 41
  ADD HL,DE 08 01 41
  DEC HL 08 01 40
NOADD ADD HL,HL 08 02 80
  DJNZ DIV 07 02 80
DIV XOR A 07 02 80
  SBC HL,DE 07 FB 80
  INC HL 07 FB 81
  JP P,NOADD 07 FB 81
  ADD HL,DE 07 02 81
  DEC HL 07 02 80
NOADD ADD HL,HL 07 05 00
  DJNZ DIV 06 05 00
DIV XOR A 06 05 00
  SBC HL,DE 06 FE 00
  INC HL 06 FE 01
  JP P,NOADD 06 FE 01
  ADD HL,DE 06 05 01
  DEC HL 06 05 00
NOADD ADD HL,HL 06 0A 00
  DJNZ DIV 05 0A 00
DIV XOR A 05 0A 00
  SBC HL,DE 05 03 00
  INC HL 05 03 01
  JP P,NOADD 05 03 01
NOADD INC HL,HL 05 06 02
  DJNZ DIV 04 06 02
DIV XOR A 04 06 02
  SBC HL,DE 04 FF 02
  INC HL 04 FF 03
  JP P,NOADD 04 FF 03
  ADD HL,DE 04 06 03
  DEC HL 04 06 02
NOADD ADD HL,HL 04 0C 04
  DJNZ DIV 03 0C 04
DIV XOR A 03 0C 04
  SBC HL,DE 03 05 04
  INC HL 03 05 05
  JP P,NOADD 03 05 05
NOADD ADD HL,HL 03 0A 0A
  DJNZ DIV 02 0A 0A
DIV XOR A 02 0A 0A
  SBC HL,DE 02 03 0A
  INC HL 02 03 0B
  JP P,NOADD 02 03 0B
NOADD INC HL,HL 02 06 16
  DJNZ DIV 01 06 16
DIV XOR A 01 06 16
  SBC HL,DE 01 FF 16
  INC HL 01 FF 17
  JP P,NOADD 01 FF 17
  ADD HL,DE 01 06 17
  DEC HL 01 06 16
NOADD INC HL,HL 01 0C 2C
  DJNZ DIV 00 0C 2C
  XOR A 00 0C 2C
  SBC HL,DE 00 05 2C
  INC HL 00 05 2D
  JP P,EXIT 00 05 2D
EXIT RET 00 05 2D

Fig. A3.3: Complete Trace of 16/8 Division Program
(the additional steps are colored light-blue)