Most modern machines have some form of clock-tick counter; it's trivial to implement (one register, and an adder in a corner of the die which sits there incrementing the register), and very useful for timing inner loops, and other things which take less time than one tick of most real-time clocks; an Athlon 1100 can do half a dozen operations (using MMX instructions) in a nanosecond, which makes one-microsecond timing errors seem embarrassingly large.
One file: prectime.h. It's not a .h file in the sense of being an interface to some code elsewhere; everything's inlined. You have to define something to indicate which CPU and compiler you're using: the present options are FOO BAR BAZ