optimisation for 16 bits system

classic Classic list List threaded Threaded
6 messages Options
veb veb veb veb
Reply | Threaded
Open this post in threaded view
|

optimisation for 16 bits system

Hello everybody,

I've programmed a lua intepretor on a 16 bits system, and I find a big optimisation to read opcodes.

I saw with obj2asm how compilers (I tested 3: digital mars, borland C++, open watcom) code to read the opcodes, and it wasn't optimised at all for 16 bits system.

With my code, the opcodes are read more than 10 times faster on a 16 bits system (exept if your compilers are better than mines)
My lua interpretor runs 2 times faster!


To have an optimised code outpout:
In lopcodes.h


struct f16
 {
         unsigned int word1; // instead of reading 1 long (2 16 bits registers 'emulating' like 1 32 bits (non optimised at all because it use a loop to scrolling one per one bits)
         unsigned int word2; // it divide the opcode in two 16 bits registers with one scrolling per register (no loops)
 };

#define GETARG_A(i)    (cast(int, ((((struct f16 *)&i)-> word1)>>POS_A) & MASK1(SIZE_A,0)))

#define GETARG_B(i)    cast(int, ((((struct f16 *)&i)-> word2)>>(POS_B-16) & MASK1(SIZE_B,0)))

#define GETARG_C(i)    (cast(int, ( ( (((struct f16 *)&i)-> word2)<<(16-POS_C)) | (( ((struct f16 *)&i)-> word1)>>(POS_C)) ) & MASK1(SIZE_C,0)))

#define GETARG_Bx(i) (cast(int, ( ( (((struct f16 *)&i)-> word2)<<(16-POS_C)) | (( ((struct f16 *)&i)-> word1)>>(POS_C)) ) & MASK1(SIZE_Bx,0)))


#define GETARG_sBx(i)    (GETARG_Bx(i)-MAXARG_sBx)

I posted this code to le you increase perfomance of eLua for 16 bits system.

If you use the code I would like to be inform.

vebveb


_______________________________________________
eLua-dev mailing list
[hidden email]
https://lists.berlios.de/mailman/listinfo/elua-dev
Bittencourt Bittencourt
Reply | Threaded
Open this post in threaded view
|

Re: optimisation for 16 bits system

Hi vebveb!

Can you please explain this "cast" function that you are using?

--Pedro Bittencourt


On Mon, Aug 23, 2010 at 7:38 AM, veb veb <[hidden email]> wrote:
Hello everybody,

I've programmed a lua intepretor on a 16 bits system, and I find a big optimisation to read opcodes.

I saw with obj2asm how compilers (I tested 3: digital mars, borland C++, open watcom) code to read the opcodes, and it wasn't optimised at all for 16 bits system.

With my code, the opcodes are read more than 10 times faster on a 16 bits system (exept if your compilers are better than mines)
My lua interpretor runs 2 times faster!


To have an optimised code outpout:
In lopcodes.h


struct f16
 {
         unsigned int word1; // instead of reading 1 long (2 16 bits registers 'emulating' like 1 32 bits (non optimised at all because it use a loop to scrolling one per one bits)
         unsigned int word2; // it divide the opcode in two 16 bits registers with one scrolling per register (no loops)
 };

#define GETARG_A(i)    (cast(int, ((((struct f16 *)&i)-> word1)>>POS_A) & MASK1(SIZE_A,0)))

#define GETARG_B(i)    cast(int, ((((struct f16 *)&i)-> word2)>>(POS_B-16) & MASK1(SIZE_B,0)))

#define GETARG_C(i)    (cast(int, ( ( (((struct f16 *)&i)-> word2)<<(16-POS_C)) | (( ((struct f16 *)&i)-> word1)>>(POS_C)) ) & MASK1(SIZE_C,0)))

#define GETARG_Bx(i) (cast(int, ( ( (((struct f16 *)&i)-> word2)<<(16-POS_C)) | (( ((struct f16 *)&i)-> word1)>>(POS_C)) ) & MASK1(SIZE_Bx,0)))


#define GETARG_sBx(i)    (GETARG_Bx(i)-MAXARG_sBx)

I posted this code to le you increase perfomance of eLua for 16 bits system.

If you use the code I would like to be inform.

vebveb


_______________________________________________
eLua-dev mailing list
[hidden email]
https://lists.berlios.de/mailman/listinfo/elua-dev



_______________________________________________
eLua-dev mailing list
[hidden email]
https://lists.berlios.de/mailman/listinfo/elua-dev
BogdanM BogdanM
Reply | Threaded
Open this post in threaded view
|

Re: optimisation for 16 bits system

I'm guessing it's just this:

#define cast (type, arg) (type)(arg)

Thanks for this. I'll try to run your patch on some of our Thumb(2)
ports and see how it goes. Unfortunately we don't have a benchmark
system in place right now, but we can improvise something.

Best,
Bogdan

On Mon, Aug 23, 2010 at 2:50 PM, Pedro Bittencourt
<[hidden email]> wrote:

> Hi vebveb!
>
> Can you please explain this "cast" function that you are using?
>
> --Pedro Bittencourt
>
>
> On Mon, Aug 23, 2010 at 7:38 AM, veb veb <[hidden email]> wrote:
>>
>> Hello everybody,
>>
>> I've programmed a lua intepretor on a 16 bits system, and I find a big
>> optimisation to read opcodes.
>>
>> I saw with obj2asm how compilers (I tested 3: digital mars, borland C++,
>> open watcom) code to read the opcodes, and it wasn't optimised at all for 16
>> bits system.
>>
>> With my code, the opcodes are read more than 10 times faster on a 16 bits
>> system (exept if your compilers are better than mines)
>> My lua interpretor runs 2 times faster!
>>
>>
>> To have an optimised code outpout:
>> In lopcodes.h
>>
>>
>> struct f16
>>  {
>>          unsigned int word1; // instead of reading 1 long (2 16 bits
>> registers 'emulating' like 1 32 bits (non optimised at all because it use a
>> loop to scrolling one per one bits)
>>          unsigned int word2; // it divide the opcode in two 16 bits
>> registers with one scrolling per register (no loops)
>>  };
>>
>> #define GETARG_A(i)    (cast(int, ((((struct f16 *)&i)-> word1)>>POS_A) &
>> MASK1(SIZE_A,0)))
>>
>> #define GETARG_B(i)    cast(int, ((((struct f16 *)&i)-> word2)>>(POS_B-16)
>> & MASK1(SIZE_B,0)))
>>
>> #define GETARG_C(i)    (cast(int, ( ( (((struct f16 *)&i)->
>> word2)<<(16-POS_C)) | (( ((struct f16 *)&i)-> word1)>>(POS_C)) ) &
>> MASK1(SIZE_C,0)))
>>
>> #define GETARG_Bx(i) (cast(int, ( ( (((struct f16 *)&i)->
>> word2)<<(16-POS_C)) | (( ((struct f16 *)&i)-> word1)>>(POS_C)) ) &
>> MASK1(SIZE_Bx,0)))
>>
>>
>> #define GETARG_sBx(i)    (GETARG_Bx(i)-MAXARG_sBx)
>>
>> I posted this code to le you increase perfomance of eLua for 16 bits
>> system.
>>
>> If you use the code I would like to be inform.
>>
>> vebveb
>>
>>
>> _______________________________________________
>> eLua-dev mailing list
>> [hidden email]
>> https://lists.berlios.de/mailman/listinfo/elua-dev
>>
>
>
> _______________________________________________
> eLua-dev mailing list
> [hidden email]
> https://lists.berlios.de/mailman/listinfo/elua-dev
>
>
_______________________________________________
eLua-dev mailing list
[hidden email]
https://lists.berlios.de/mailman/listinfo/elua-dev
veb veb veb veb
Reply | Threaded
Open this post in threaded view
|

Re: optimisation for 16 bits system

In reply to this post by Bittencourt
Hello,

To explain how it read an opcode, I will explain with an example:

On A 16 bits system:

Normal GETARG_A:
#define GETARG_A(i)    (cast(int, ((i)>>POS_A) & MASK1(SIZE_A,0)))
-> #define GETARG_A(i)   (int)  ((i)>>8) & 0xFF)

So, if we have (in binary):
11100011 11011011 10011001 10001000
it is rotated bit per bit to:       (this part is slow, because it use a loop with a scrolling 1 per 1 bit (8*2 scrolling here)
00000000 11100011 11011011 10011001   ((i)>>8)
and it becomes:
00000000 00000000 00000000 10011001   (& 0xFF)
and it is cast to an int (16 bits)
00000000 10011001

Optimised GETARG_A:
#define GETARG_A(i)    (cast(int, ((((struct f16 *)&i)-> word1)>>POS_A) & MASK1(SIZE_A,0)))
-> #define GETARG_A(i)  (int) (((register_right)>>8) & 0xFF)

So, if we have (in binary):
11100011 11011011 10011001 10001000
it is rotated with one scrolling only: (register_right is 16 bits)
00000000 10011001   ((register_right)>>8)
and it becomes:
00000000 10011001   (& 0xFF) (we don't need it for GETARG_A)
and it is cast to an int if it wasn't (16 bits)
00000000 10011001 (we don't need it for GETARG_A)

Instead of scrolling everything in an slow way, we scroll in one time only what we need.
It is fast the same way for the other GETARG_B, ... (but a bit more complex because we use register_left and register_right)




From: [hidden email]
Date: Mon, 23 Aug 2010 08:50:35 -0300
To: [hidden email]
Subject: Re: [eLua-dev] optimisation for 16 bits system

Hi vebveb!

Can you please explain this "cast" function that you are using?

--Pedro Bittencourt


On Mon, Aug 23, 2010 at 7:38 AM, veb veb <[hidden email]> wrote:
Hello everybody,

I've programmed a lua intepretor on a 16 bits system, and I find a big optimisation to read opcodes.

I saw with obj2asm how compilers (I tested 3: digital mars, borland C++, open watcom) code to read the opcodes, and it wasn't optimised at all for 16 bits system.

With my code, the opcodes are read more than 10 times faster on a 16 bits system (exept if your compilers are better than mines)
My lua interpretor runs 2 times faster!


To have an optimised code outpout:
In lopcodes.h


struct f16
 {
         unsigned int word1; // instead of reading 1 long (2 16 bits registers 'emulating' like 1 32 bits (non optimised at all because it use a loop to scrolling one per one bits)
         unsigned int word2; // it divide the opcode in two 16 bits registers with one scrolling per register (no loops)
 };

#define GETARG_A(i)    (cast(int, ((((struct f16 *)&i)-> word1)>>POS_A) & MASK1(SIZE_A,0)))

#define GETARG_B(i)    cast(int, ((((struct f16 *)&i)-> word2)>>(POS_B-16) & MASK1(SIZE_B,0)))

#define GETARG_C(i)    (cast(int, ( ( (((struct f16 *)&i)-> word2)<<(16-POS_C)) | (( ((struct f16 *)&i)-> word1)>>(POS_C)) ) & MASK1(SIZE_C,0)))

#define GETARG_Bx(i) (cast(int, ( ( (((struct f16 *)&i)-> word2)<<(16-POS_C)) | (( ((struct f16 *)&i)-> word1)>>(POS_C)) ) & MASK1(SIZE_Bx,0)))


#define GETARG_sBx(i)    (GETARG_Bx(i)-MAXARG_sBx)

I posted this code to le you increase perfomance of eLua for 16 bits system.

If you use the code I would like to be inform.

vebveb


_______________________________________________
eLua-dev mailing list
[hidden email]
https://lists.berlios.de/mailman/listinfo/elua-dev



_______________________________________________ eLua-dev mailing list [hidden email] https://lists.berlios.de/mailman/listinfo/elua-dev
_______________________________________________
eLua-dev mailing list
[hidden email]
https://lists.berlios.de/mailman/listinfo/elua-dev
Bittencourt Bittencourt
Reply | Threaded
Open this post in threaded view
|

Re: optimisation for 16 bits system

Ok Veb,

Thanks for the explanation =)
But I was asking only about the cast, if was just the define Bogdan guessed or something more elaborated.

thanks,
--Pedro Bittencourt


On Mon, Aug 23, 2010 at 9:24 AM, veb veb <[hidden email]> wrote:
Hello,

To explain how it read an opcode, I will explain with an example:

On A 16 bits system:

Normal GETARG_A:
#define GETARG_A(i)    (cast(int, ((i)>>POS_A) & MASK1(SIZE_A,0)))
-> #define GETARG_A(i)   (int)  ((i)>>8) & 0xFF)

So, if we have (in binary):
11100011 11011011 10011001 10001000
it is rotated bit per bit to:       (this part is slow, because it use a loop with a scrolling 1 per 1 bit (8*2 scrolling here)
00000000 11100011 11011011 10011001   ((i)>>8)
and it becomes:
00000000 00000000 00000000 10011001   (& 0xFF)
and it is cast to an int (16 bits)
00000000 10011001

Optimised GETARG_A:

#define GETARG_A(i)    (cast(int, ((((struct f16 *)&i)-> word1)>>POS_A) & MASK1(SIZE_A,0)))
-> #define GETARG_A(i)  (int) (((register_right)>>8) & 0xFF)

So, if we have (in binary):
11100011 11011011 10011001 10001000
it is rotated with one scrolling only: (register_right is 16 bits)
00000000 10011001   ((register_right)>>8)
and it becomes:
00000000 10011001   (& 0xFF) (we don't need it for GETARG_A)
and it is cast to an int if it wasn't (16 bits)
00000000 10011001 (we don't need it for GETARG_A)

Instead of scrolling everything in an slow way, we scroll in one time only what we need.
It is fast the same way for the other GETARG_B, ... (but a bit more complex because we use register_left and register_right)




From: [hidden email]
Date: Mon, 23 Aug 2010 08:50:35 -0300
To: [hidden email]
Subject: Re: [eLua-dev] optimisation for 16 bits system


Hi vebveb!

Can you please explain this "cast" function that you are using?

--Pedro Bittencourt


On Mon, Aug 23, 2010 at 7:38 AM, veb veb <[hidden email]> wrote:
Hello everybody,

I've programmed a lua intepretor on a 16 bits system, and I find a big optimisation to read opcodes.

I saw with obj2asm how compilers (I tested 3: digital mars, borland C++, open watcom) code to read the opcodes, and it wasn't optimised at all for 16 bits system.

With my code, the opcodes are read more than 10 times faster on a 16 bits system (exept if your compilers are better than mines)
My lua interpretor runs 2 times faster!


To have an optimised code outpout:
In lopcodes.h


struct f16
 {
         unsigned int word1; // instead of reading 1 long (2 16 bits registers 'emulating' like 1 32 bits (non optimised at all because it use a loop to scrolling one per one bits)
         unsigned int word2; // it divide the opcode in two 16 bits registers with one scrolling per register (no loops)
 };

#define GETARG_A(i)    (cast(int, ((((struct f16 *)&i)-> word1)>>POS_A) & MASK1(SIZE_A,0)))

#define GETARG_B(i)    cast(int, ((((struct f16 *)&i)-> word2)>>(POS_B-16) & MASK1(SIZE_B,0)))

#define GETARG_C(i)    (cast(int, ( ( (((struct f16 *)&i)-> word2)<<(16-POS_C)) | (( ((struct f16 *)&i)-> word1)>>(POS_C)) ) & MASK1(SIZE_C,0)))

#define GETARG_Bx(i) (cast(int, ( ( (((struct f16 *)&i)-> word2)<<(16-POS_C)) | (( ((struct f16 *)&i)-> word1)>>(POS_C)) ) & MASK1(SIZE_Bx,0)))


#define GETARG_sBx(i)    (GETARG_Bx(i)-MAXARG_sBx)

I posted this code to le you increase perfomance of eLua for 16 bits system.

If you use the code I would like to be inform.

vebveb


_______________________________________________
eLua-dev mailing list
[hidden email]
https://lists.berlios.de/mailman/listinfo/elua-dev



_______________________________________________ eLua-dev mailing list [hidden email] https://lists.berlios.de/mailman/listinfo/elua-dev

_______________________________________________
eLua-dev mailing list
[hidden email]
https://lists.berlios.de/mailman/listinfo/elua-dev



_______________________________________________
eLua-dev mailing list
[hidden email]
https://lists.berlios.de/mailman/listinfo/elua-dev
veb veb veb veb
Reply | Threaded
Open this post in threaded view
|

Re: optimisation for 16 bits system

No, the cast( int, ...) is just a cast to an int (it returns an int)



From: [hidden email]
Date: Mon, 23 Aug 2010 09:33:18 -0300
To: [hidden email]
Subject: Re: [eLua-dev] optimisation for 16 bits system

Ok Veb,

Thanks for the explanation =)
But I was asking only about the cast, if was just the define Bogdan guessed or something more elaborated.

thanks,
--Pedro Bittencourt


On Mon, Aug 23, 2010 at 9:24 AM, veb veb <[hidden email]> wrote:
Hello,

To explain how it read an opcode, I will explain with an example:

On A 16 bits system:

Normal GETARG_A:
#define GETARG_A(i)    (cast(int, ((i)>>POS_A) & MASK1(SIZE_A,0)))
-> #define GETARG_A(i)   (int)  ((i)>>8) & 0xFF)

So, if we have (in binary):
11100011 11011011 10011001 10001000
it is rotated bit per bit to:       (this part is slow, because it use a loop with a scrolling 1 per 1 bit (8*2 scrolling here)
00000000 11100011 11011011 10011001   ((i)>>8)
and it becomes:
00000000 00000000 00000000 10011001   (& 0xFF)
and it is cast to an int (16 bits)
00000000 10011001

Optimised GETARG_A:

#define GETARG_A(i)    (cast(int, ((((struct f16 *)&i)-> word1)>>POS_A) & MASK1(SIZE_A,0)))
-> #define GETARG_A(i)  (int) (((register_right)>>8) & 0xFF)

So, if we have (in binary):
11100011 11011011 10011001 10001000
it is rotated with one scrolling only: (register_right is 16 bits)
00000000 10011001   ((register_right)>>8)
and it becomes:
00000000 10011001   (& 0xFF) (we don't need it for GETARG_A)
and it is cast to an int if it wasn't (16 bits)
00000000 10011001 (we don't need it for GETARG_A)

Instead of scrolling everything in an slow way, we scroll in one time only what we need.
It is fast the same way for the other GETARG_B, ... (but a bit more complex because we use register_left and register_right)




From: [hidden email]
Date: Mon, 23 Aug 2010 08:50:35 -0300
To: [hidden email]
Subject: Re: [eLua-dev] optimisation for 16 bits system


Hi vebveb!

Can you please explain this "cast" function that you are using?

--Pedro Bittencourt


On Mon, Aug 23, 2010 at 7:38 AM, veb veb <[hidden email]> wrote:
Hello everybody,

I've programmed a lua intepretor on a 16 bits system, and I find a big optimisation to read opcodes.

I saw with obj2asm how compilers (I tested 3: digital mars, borland C++, open watcom) code to read the opcodes, and it wasn't optimised at all for 16 bits system.

With my code, the opcodes are read more than 10 times faster on a 16 bits system (exept if your compilers are better than mines)
My lua interpretor runs 2 times faster!


To have an optimised code outpout:
In lopcodes.h


struct f16
 {
         unsigned int word1; // instead of reading 1 long (2 16 bits registers 'emulating' like 1 32 bits (non optimised at all because it use a loop to scrolling one per one bits)
         unsigned int word2; // it divide the opcode in two 16 bits registers with one scrolling per register (no loops)
 };

#define GETARG_A(i)    (cast(int, ((((struct f16 *)&i)-> word1)>>POS_A) & MASK1(SIZE_A,0)))

#define GETARG_B(i)    cast(int, ((((struct f16 *)&i)-> word2)>>(POS_B-16) & MASK1(SIZE_B,0)))

#define GETARG_C(i)    (cast(int, ( ( (((struct f16 *)&i)-> word2)<<(16-POS_C)) | (( ((struct f16 *)&i)-> word1)>>(POS_C)) ) & MASK1(SIZE_C,0)))

#define GETARG_Bx(i) (cast(int, ( ( (((struct f16 *)&i)-> word2)<<(16-POS_C)) | (( ((struct f16 *)&i)-> word1)>>(POS_C)) ) & MASK1(SIZE_Bx,0)))


#define GETARG_sBx(i)    (GETARG_Bx(i)-MAXARG_sBx)

I posted this code to le you increase perfomance of eLua for 16 bits system.

If you use the code I would like to be inform.

vebveb


_______________________________________________
eLua-dev mailing list
[hidden email]
https://lists.berlios.de/mailman/listinfo/elua-dev



_______________________________________________ eLua-dev mailing list [hidden email] https://lists.berlios.de/mailman/listinfo/elua-dev

_______________________________________________
eLua-dev mailing list
[hidden email]
https://lists.berlios.de/mailman/listinfo/elua-dev



_______________________________________________ eLua-dev mailing list [hidden email] https://lists.berlios.de/mailman/listinfo/elua-dev
_______________________________________________
eLua-dev mailing list
[hidden email]
https://lists.berlios.de/mailman/listinfo/elua-dev