Hi Paulo
Here is something else to consider, gcc also keeps the instruction mix in mind when it generates code. You might
want to check out chapters 12, 13, 20, 21, and 22 of this link:
http://www.phatcode.net/res/224/files/html/index.htmlIf you want more, check out:
http://www.agner.org/optimize/Optimizing in assembly should usually be the last choice. Focusing on what you are asking the compiler to do and
how you implement algorithms will yield far higher returns. Case in point, I was sorting ~600,000 records using qsort.
The compare function that I provided qsort with performs a string comparison on the first field, and if those are equal,
does the same for the second field. After some thought, I realized I could compare the pointers for the first field to
determine if they were equal, and skip straight to the second string compare. That one tiny change reduced execution
time from 1.54 seconds to 1.05 seconds. Gcc compiled that compare function into 102 bytes of code. I wrote the
function in assembly and got it down to 67 bytes. The "big" payoff for the time spent hand optimizing the code, run
time dropped to 0.95 seconds. Qsort calls this function 7,258,316 times and I saved a whole 0.1 seconds. Not the
best use of my time. If I really decide I want to make this run faster, I will find a way to change how I deal with the data.