There are still texture sampling issues that I don't understand. Until
I properly understand this, using (bitmap) fonts that have
power-of-two dimensions seem to produce "acceptable but incorrect"
results.
This still needs to be cleaned up, particularly to properly pass the
texture size around--there are a few unnecessary '128x256' magic
numbers scattered in the code.
I think the original version is more readable, but the newer version
is better overall because it doesn't require reading from dst, and is
able to directly write to a 32-bit dst.