The <uint8_t> template instantiation was causing 8-bit writes to the
command buffer, when they were intended to be 32-bit writes. This
garbled and truncated the data ultimately sent to the VMU LCD.
There were two notable bugs:
- the maple transfer/data sizes were not being set correctly
- align_32byte always realigned the address of `_scene`, and not the
`mem` parameter as expected. This had the effect of the maple-DMA
send and receive buffers being the same buffer. On real hardware,
this causes unpredicable behavior.