The y_size - 3 value was cargo-culted from Darkness.
The correct value appears to be y_size - 1, which also matches what is
documented in DCDBSysArc990907E
This adds a new texture memory allocation header,
texture_memory_alloc2.hpp, with two of each memory area.
This also adds two new examples, "cube_textured" and "cube_vq" that
demonstrate using the new texture_memory_alloc2 to perform CORE
rendering, geometry transformation, and tile acceleration
concurrently.
The previous texture_memory_alloc.hpp was written based on an
incorrect understanding of the "32-bit" and "64-bit" texture memory
address mapping.
The primary motivation is to rearrange the texture memory address map
so that "textures" (64-bit access) do not overlap with 32-bit
accesses, such as REGION_BASE or PARAM_BASE.
Though I did spend much time thinking about this, my idea was not correct.
The "tearing" and "previous frame is being shown while it is being drawn" is
simply because that's exactly what the logic in holly/core.cpp did.
This is no longer the case--by the time the newly-created core_flip function is
called, the core render is complete, and we should switch the FB_R_SOF1 to the
current framebuffer, not the one that is going to be written on next frame.
This also modifies alt.lds so that (non-startup) code now runs in the P1 area,
with operand/instruction/copyback caches enabled. This caused a 10x speed
increase in my testing.
I think this was only relevant when END_OF_RENDER_VIDEO was in use;
this doesn't seem to affect flycast's END_OF_RENDER_TSP
generation. The former is definitely a flycast bug.