Author Topic: TC and memory (Read 11780 times)

Paulo · « **Reply #30 on:** April 02, 2013, 02:38:37 AM »

Hi genec
You are quite right that with microcode and especially mbr where every byte is precious as one only has 507 bytes to play with (512-3 for the jump-2 for the end marker).
I don´t use gcc for asm but I´m sure there is an option to enable optimization level (-O ?).
When I write bootloaders and kernels for embedded x86 based boards where the absolute position of certain data structures and calls are crucial, I prefer to do my own optimization and not have to wrestle with the compiler as to what code goes where.
That is why I use Fasm.
Sometimes relying on the compiler to automatically do all the optimizations does not yield the expected results.

genec · « **Reply #31 on:** April 02, 2013, 08:59:45 PM »

Quote from: Paulo on April 02, 2013, 02:38:37 AM

Hi genec
You are quite right that with microcode and especially mbr where every byte is precious as one only has 507 bytes to play with (512-3 for the jump-2 for the end marker).
I don´t use gcc for asm but I´m sure there is an option to enable optimization level (-O ?).
When I write bootloaders and kernels for embedded x86 based boards where the absolute position of certain data structures and calls are crucial, I prefer to do my own optimization and not have to wrestle with the compiler as to what code goes where.
That is why I use Fasm.
Sometimes relying on the compiler to automatically do all the optimizations does not yield the expected results.

440 is an MBR; 507 for a non-FAT VBR. Floppies only have a VBR.

I've used NASM and GAS (GCC Assembler) and often prefer NASM but GAS has some nice portability.

Rich · « **Reply #32 on:** April 13, 2013, 04:32:26 PM »

Hi Paulo
Here is something else to consider, gcc also keeps the instruction mix in mind when it generates code. You might
want to check out chapters 12, 13, 20, 21, and 22 of this link:
http://www.phatcode.net/res/224/files/html/index.html
If you want more, check out:
http://www.agner.org/optimize/
Optimizing in assembly should usually be the last choice. Focusing on what you are asking the compiler to do and
how you implement algorithms will yield far higher returns. Case in point, I was sorting ~600,000 records using qsort.
The compare function that I provided qsort with performs a string comparison on the first field, and if those are equal,
does the same for the second field. After some thought, I realized I could compare the pointers for the first field to
determine if they were equal, and skip straight to the second string compare. That one tiny change reduced execution
time from 1.54 seconds to 1.05 seconds. Gcc compiled that compare function into 102 bytes of code. I wrote the
function in assembly and got it down to 67 bytes. The "big" payoff for the time spent hand optimizing the code, run
time dropped to 0.95 seconds. Qsort calls this function 7,258,316 times and I saved a whole 0.1 seconds. Not the
best use of my time. If I really decide I want to make this run faster, I will find a way to change how I deal with the data.

Paulo · « **Reply #33 on:** April 14, 2013, 03:35:30 AM »

Hi Rich

Quote

and I saved a whole 0.1 seconds. Not the best use of my time.

Sounds like corporate meetings, spending hours to take down minutes.

On a serious note, you are of course correct that one has to make a trade off between time spent
on optimizing the code versus how much it will shave off the execution times.

However there are times where one has to "hand make" the code where a specific function or block of code
has to take a certain amount of time.
An example of this was when a wrote an ATAPI driver for a small microcontroller project several years back.
Some CD-ROMs were very fussy about how much time could elapse between sending the ATAPI packets
and reading the resulting returned data.

Another example is when code is position /offset dependent, I find that the compiler tends to re-organize things
the way it wants too and thus "breaks" the code.
I know there are ways to tell it in a make file but it's just too much bother hence I prefer the manual method.

So yes, manual optimization is sometimes required but for most cases, it's simply not worth it.

curaga · « **Reply #34 on:** April 14, 2013, 01:50:15 PM »

@Rich

For kicks, try writing it in C++ using std::sort. It inlines the comparison function, and your bottleneck seems to be function calls.

Paulo · « **Reply #35 on:** April 14, 2013, 02:25:10 PM »

I'm wondering how the Perl version will stack up.
http://perldoc.perl.org/sort.html

tinypoodle · « **Reply #36 on:** April 14, 2013, 09:11:48 PM »

Quote from: Paulo on April 14, 2013, 02:25:10 PM

http://perldoc.perl.org/sort.html

Way to go to discredit perl...

Paulo · « **Reply #37 on:** April 15, 2013, 09:04:45 AM »

Hi tinypoodle
Don't know what happened there.
The site is obviously rubbish but strangely I have not
had trouble with it before.
Anyway the point I was trying to make is perhaps the
sort function of perl is slightly quicker then the C version
once the perl interpreter loads the script and runs.
It would be an interesting exercise to run strace on both
the C and perl versions using the same data and compare results.

Tiny Core Linux

News:

Author Topic: TC and memory (Read 11780 times)

Paulo

Re: TC and memory

genec

Re: TC and memory

Rich

Re: TC and memory

Paulo

Re: TC and memory

curaga

Re: TC and memory

Paulo

Re: TC and memory

tinypoodle

Re: TC and memory

Paulo

Re: TC and memory