long time ago i have setup real time kernel for tc 2.x tested it on a mobile p4 platform with a similar parallel port square wave program, verified results with an oscilloscope connected at the parallel port pins
i remember successful results for a period of 100us
did you try cylictest program to see if everything is ok ?
it is available in rt-tests.tcz extension
sudo cyclictest -i 1000 -n -t -p 80
base interval of thread is 1000us, use clock_nanosleep, start one thread per cpu, highest thread priority is 80
one more thing high resolution timer is also availabe in standart tc kernel