CPCWiki forum
General Category => Programming => Topic started by: AMSDOS on 12:11, 03 May 16

In the "Get the Cash" game I'm making, I've got:
340 FUNCTION row(xpos : integer) : integer;
350 BEGIN
360 row:=xpos*(4*8)
370 END;
which can be simplified to row:=xpos*32 to get the next row position, but was hoping SRA or SRL could be used instead, but I don't know if the bits are incremented automatically, so if the value was zero, would a zero be the result?
The other problem I've got is my number range exceeds 255, so I'm wondering if SRL (HL) can be used with some sort of loop to check the Carry? :'(
Just felt this would be the most appropriate approach given I'm multiplying by 32 for this to get the next cursor position in Mode 0.

Maybe you could mix Srl (hl) and Rl (hl) for 16 bits Numbers....
Who knows ? ;D

the easyest way to do x * 32 I this is :
LD HL,x ; x value
ADD HL,HL ; *2
ADD HL,HL ; *4
ADD HL,HL ; *8
ADD HL,HL ; *16
ADD HL,HL ; HL = x*32

You are right Demo ^^

In this case I would put the lower 8 bits in A the upper 8 bits in one register (B,C,D,E,H or L, not the slow ones). And then use 3 time RRCA and RRC r / SRL r (depending!). That's way more quick. :)

Demoniak's version in Orgams:
; HL = x
5 ** add hl,hl ; hl=32*x
TFM's idea pumped:
; A = x bits 76543210
; We want HL = _____76543'210______ (_ is plain 0)
3 ** rrca ; Now bits are: 21076543
ld l,a ; backup
and &1f ; bits 76543
ld h,a
xor l ; bits 210
ld l,a
; HL = x
; Compute x/8*256 with previous trick
; Left as exercice.
Better solution yet, if x only increment/decrement: maintain both x and 32*x values. Then you simply add/sub 32 to the second.
SRA/SRL is for right shift. Can be used for the exercice (SRL would insert 0 in bit 7, and SRA would leave bit 7 as is, which is the correct behavior for signed values).
Besides, (hl) would modify your value in place. Is that what you wish ?

It's worth noting that TFM's method only works for unsigned values between 0 and 255. It's not clear when you say "my number range exceeds 255" whether that's your initial value or the value after multiplication.
Demoniak's version should work for signed or unsigned values, although it's quite limited how much bigger they can be (since the result has to fit in a 16bit value)  up to something like 2047 for signed values and something like +1023>1024 for signed values if my rough maths is right. Anything bigger than that and you wander into the realms of 24/32 bit arithmetic.

How does an ADD HL,HL work for signed please? Assuming that the sign is in the high bit. Or do you talk about something else?
Anyway, for the purpose of a game negative number are not needed.

You could also try
ld hl, x
sla l; rl h
sla l; rl h
sla l; rl h
sla l; rl h
sla l; rl h
this on CPC will take 80 tstates  slightly more time than 5 times add hl, hl (60 tstates)
But you can easily extend it to 32bits by adding rl e; rl d after each sla l; rl h.
ld hl, x
sla l; rl h ; rl e; rl d
sla l; rl h ; rl e; rl d
sla l; rl h ; rl e; rl d
sla l; rl h ; rl e; rl d
sla l; rl h ; rl e; rl d
It will take 160 tstates and 32 bit value will be in DEHL pair.
Better speed can be achieved by mixing both approaches:
add hl, hl; rl e; rl d
add hl, hl; rl e; rl d
add hl, hl; rl e; rl d
add hl, hl; rl e; rl d
add hl, hl; rl e; rl d
will take only 140 tstates...

How does an ADD HL,HL work for signed please? Assuming that the sign is in the high bit. Or do you talk about something else?
Just the same. It's a matter of interpreting the results.
If you have a number &4001 and you multiply it by 2 you get &8002. Interpreted as an unsigned number, there's no change, but in signed mode the change in the signed bit from 0 to 1 shows you that an overflow has occurred. The number cannot be interpreted within 15 bits (+1 signed bit).
Likewise if you have &BFFF (16385) and multiply it by 2 you get &7FFE plus a carry. The sign bit has changed again, this time from 1 to 0, so an underflow occurred because the number 32770 is too small to fit in 15 bits + sign bit.
As a 32 bit number both examples would work: &FFFFBFFF * 2 = &FFFF7FFE (32770)
In the first example the numbers are unchanged because the sign bit in 32bit numbers is in bit 31 and it did not change.

Nope, works only for few circumstances. But let's stop to be nitpicking. ;)

MaV is right. ADD HL,HL and SLA r work perfectly fine whether the value is seen as signed or unsigned.
Overflow is given by Carry in unsigned context, and P/V flag in signed context.

Nope, works only for few circumstances. But let's stop to be nitpicking. ;)
By signed I meant twos complement, which always works.

Thanks for replies folks.
So to clear up any confusion, the formula I posted in that function works, I just thought it could be improved with assembly.
The function I made works by assigning a value between 0 and 19, which is the value for xpos, when it's multiplied by 32, I then have the Graphical XPOS coordinate, which is anything from 0 to 608.
I just thought that if SRA or SRL focused at increasing the bits they would of been faster than ADD HL,HL though happy to use whatever is better if I can get a 16bit number as a result.

TFM's solution is probably the optimal approach under those constraints.

<Duplicate Post  Please Delete >

Ok, Signed was always a different thing than the 2complement back the day. Of course with the latter one it works just fine. But again, who needs negative number in a game? Negative numbers are reserved for bankers and mathematicians ;) :)

TFM's solution is probably the optimal approach under those constraints.
I've tried Demoniak's 1st example which works, but had problems while constructing something from @TFM (http://www.cpcwiki.eu/forum/index.php?action=profile;u=179) comments. :(
This is what I tried:
org &5000
ld a,(ix+00)
ld h,(ix+01)
rrca
rrca
rrca
rrc h
ld (result),a
ld a,h
ld (result1),a
ret
.result defb 0
.result1 defb 0
I tried variations between rrc h and srl h & both, but no luck! :(

Have you tried madram's code in reply #5 under "TFM's idea pumped:"?

Have you tried madram's code in reply #5 under "TFM's idea pumped:"?
Sadly no, it appeared different from TFMs comments. :'(

You should. ;)
Rotating the bits alone isn't quite enough, you also need to break them out into different bytes which that code does with some nifty bitwise logic manipulation.

I've tried Demoniak's 1st example which works, but had problems while constructing something from @TFM (http://www.cpcwiki.eu/forum/index.php?action=profile;u=179) comments. :(
This is what I tried:
org &5000
ld a,(ix+00)
ld h,(ix+01)
rrca
rrca
rrca
rrc h
ld (result),a
ld a,h
ld (result1),a
ret
.result defb 0
.result1 defb 0
I tried variations between rrc h and srl h & both, but no luck! :(
Try this:
xor a
ld hl, var
rld
sla (hl)
inc hl
rld
rl (hl)
var: dw 0
If you have your variable in ix, replace ld hl, var with push ix; pop hl

Try this:
or a
ld hl, var
rld
sla (hl)
inc hl
rld
rl (hl)
var: dw 0
If you have your variable in ix, replace ld hl, var with push ix; pop hl
I don't see how that can possibly work. Even if the OR A at the start was replaced with an XOR A to ensure it was 0, the result would still be wrong.
EDIT: thinking about it, should work with an XOR at the start, since the carry flag should still work. Although I suspect it's slower than TFM'S approach for initial values < 255

I don't see how that can possibly work. Even if the OR A at the start was replaced with an XOR A to ensure it was 0, the result would still be wrong.
EDIT: thinking about it, should work with an XOR at the start, since the carry flag should still work. Although I suspect it's slower than TFM'S approach for initial values < 255
Yes, it should be xor a of course  x get lost while I cut & paste from notepad.
It works  you can test it in any emulator. Carry flag is cleared by xor ,then first rld does *16 on the lsb of variable at (hl). Sla does the remaining shifting to get to *32. 7th bit of lsb goes to carry, then the operation is repeated for the msb  rld does *16 and rl shifts msb putting carry into the 0 bit.
The code above is for 16 bit values  for 8bit values you'll only need
rld
sla (hl)
which takes 33 tstates

I've tried Demoniak's 1st example which works, but had problems while constructing something from @TFM (http://www.cpcwiki.eu/forum/index.php?action=profile;u=179) comments. :(
This is what I tried:
org &5000
ld a,(ix+00)
ld h,(ix+01)
rrca
rrca
rrca
rrc h
ld (result),a
ld a,h
ld (result1),a
ret
.result defb 0
.result1 defb 0
I tried variations between rrc h and srl h & both, but no luck! :(
It wont work  you did 3 rotation for lsb and 1 rotation for msb in h and did not mask unnecessary bits.
The main idea for TFMs code is to rotate bits in a byte instead of shifting. In case of shifting you do sla and rl 5 times to get the result like in the example I gave earlier.
In case of rotation, you rotate the content by 3 bits in the opposite direction. Madrams example code does 8bit multiplication by 32 with 16bit result.
If you are looking for 16bit*32 multiplication with rotation, the code would probably look as follows:
ld a, (ix)
rrca
rrca
rrca
ld h,a
and 0xe0
ld l,a
xor h
ld h,a
ld a,(ix+1)
rrca
rrca
rrca
and 0xe0
or h
ld h,a
It takes 90 tstates which is similar to rld;sla code for 16bit values I posted earlier and consumes x2 more memory...

It wont work  you did 3 rotation for lsb and 1 rotation for msb in h and did not mask unnecessary bits.
The main idea for TFMs code is to rotate bits in a byte instead of shifting. In case of shifting you do sla and rl 5 times to get the result like in the example I gave earlier.
In case of rotation, you rotate the content by 3 bits in the opposite direction. Madrams example code does 8bit multiplication by 32 with 16bit result.
If you are looking for 16bit*32 multiplication with rotation, the code would probably look as follows:
ld a, (ix)
rrca
rrca
rrca
ld h,a
and 0xe0
ld l,a
xor h
ld h,a
ld a,(ix+1)
rrca
rrca
rrca
and 0xe0
or h
ld h,a
It takes 90 tstates which is similar to rld;sla code for 16bit values I posted earlier and consumes x2 more memory...
Oh well, that's a bit slower than the add hl,hl.
Was just trying something based on ones comments, but missed the masking components. But as you can probably tell, I've had little to do with rotation. ;D
Incidentally I was poking around on CPCRULEZ and forgot about Math Programming they had there. This page (http://cpcrulez.fr/coding_srcmathmul.htm) looks interesting, was just wondering if "Doc n°3" could be useful? It uses RRA, ADD HL,DE SLA E RL D and a couple of loops in play.

Try this:
xor a
ld hl, var
rld
sla (hl)
inc hl
rld
rl (hl)
var: dw 0
If you have your variable in ix, replace ld hl, var with push ix; pop hl
Sorry I only just noticed this. Hmm, well the code I'll be putting into my game, which is being written in Hisoft Pascal. I managed to get it to work with the ADD HL,HL routine earlier, so in HP a function is defined:
FUNCTION row(xpos : integer) : integer;
that Function name and Variables defined are accessible from Index Registers like this:
ld l,(ix+02)
ld h,(ix+03)
for xpos, and the value can be put back into row like this:
ld (ix+04),l
ld (ix+05),h
But am interested if another way is possible, but will probably get confused! ???

Oh well, that's a bit slower than the add hl,hl.
Was just trying something based on ones comments, but missed the masking components. But as you can probably tell, I've had little to do with rotation. ;D
Incidentally I was poking around on CPCRULEZ and forgot about Math Programming they had there. This page (http://cpcrulez.fr/coding_srcmathmul.htm) looks interesting, was just wondering if "Doc n°3" could be useful? It uses RRA, ADD HL,DE SLA E RL D and a couple of loops in play.
I had a quick look at the page you linked  and the code there is incorrect. It claims to multiply DE by A, but it does only partial calculation  look at ld ,b 6  it should be ld,b 8. Here's the correct one:
ld hl,#0000
ld b,#08
mul8loop:
rrca
jr nc,mul8skip
add hl,de
mul8skip:
sla e
rl d
djnz mul8loop
This does generic 16bit*8bit multiplication, but it will be much slower than the dedicated solution for *32 provided earlier, because the code needs to iterate through all 8 bits of the multiplier. Only the mandatory part (rrca, sla e, rl d, djnz) will take more than 250 tstates, so in your case, if you don't need the generic multiplication, it is better to stick to add hl,hl trick.

I had a quick look at the page you linked  and the code there is incorrect. It claims to multiply DE by A, but it does only partial calculation  look at ld ,b 6  it should be ld,b 8. Here's the correct one:
ld hl,#0000
ld b,#08
mul8loop:
rrca
jr nc,mul8skip
add hl,de
mul8skip:
sla e
rl d
djnz mul8loop
This does generic 16bit*8bit multiplication, but it will be much slower than the dedicated solution for *32 provided earlier, because the code needs to iterate through all 8 bits of the multiplier. Only the mandatory part (rrca, sla e, rl d, djnz) will take more than 250 tstates, so in your case, if you don't need the generic multiplication, it is better to stick to add hl,hl trick.
Thanks for explaining that. I'm unsure if the code on CPCRULEZ has come from a magazine, normally they give the magazine source in those cases.