This adds a new texture memory allocation header,
texture_memory_alloc2.hpp, with two of each memory area.
This also adds two new examples, "cube_textured" and "cube_vq" that
demonstrate using the new texture_memory_alloc2 to perform CORE
rendering, geometry transformation, and tile acceleration
concurrently.
The previous texture_memory_alloc.hpp was written based on an
incorrect understanding of the "32-bit" and "64-bit" texture memory
address mapping.
The primary motivation is to rearrange the texture memory address map
so that "textures" (64-bit access) do not overlap with 32-bit
accesses, such as REGION_BASE or PARAM_BASE.
Though I did spend much time thinking about this, my idea was not correct.
The "tearing" and "previous frame is being shown while it is being drawn" is
simply because that's exactly what the logic in holly/core.cpp did.
This is no longer the case--by the time the newly-created core_flip function is
called, the core render is complete, and we should switch the FB_R_SOF1 to the
current framebuffer, not the one that is going to be written on next frame.
This also modifies alt.lds so that (non-startup) code now runs in the P1 area,
with operand/instruction/copyback caches enabled. This caused a 10x speed
increase in my testing.
I think this was only relevant when END_OF_RENDER_VIDEO was in use;
this doesn't seem to affect flycast's END_OF_RENDER_TSP
generation. The former is definitely a flycast bug.
The main issue with the previous code:
constexpr uint32_t tiles = (640 / 32) * (320 / 32);
Should have been:
constexpr uint32_t tiles = (640 / 32) * (480 / 32);
The consequence of this is some OPBs were being overwritten by
TA_NEXT_OPB, causing corruption (missing triangles, incomplete
drawings) in some tiles.
0x15 is 21, which is larger than the OPB size (16).
The 0x15 value directly causes CORE to hang given a sufficiently
"large" object list (more than ~2 triangles per tile).
After changing pointer burst size to the intended value, 15, CORE no
longer hangs while drawing "large" object lists.
This enables alpha blending for both font_outline and
font_outline_punch_through.
I have also experimented more with 16-gray vs 256-gray--I have not
decided which between monochrome, 16-gray, or 256-gray I like the
most.
Perhaps a better test might be to test hanzi.
This fully threads both the real minimum size of the texture and the
dimensions of the texture through to the TA parameters.
This also removes spurious zero-area drawing commands (space
characters).
This also adds ta_parameter_writer; I am not certain if I like this,
but it felt necessary to deal with ta_parameters being either 32 or 64
bytes long--for this reason, a reinterpret_cast to a union is less
attractive (the union members are not a fixed size).
This also includes an example for generating a quad primitive. In
flycast, this is very obviously rendered as two triangles. On real
hardware, this appears to be a "native" quad.
This draws a nice macaw texture in a square-shaped triangle
strip. The square is then rotated around the y-axis.
I dealt with myriad bugs while experimenting with this, all of them
entirely my fault:
- macaw texture colors were incorrect because GIMP was exporting raw
RGB data in gamma-corrected sRGB space, whereas the Dreamcast is in
linear color space.
- macaw texture colors were incorrect because I truncated color values
to the least significant rather than most significant bits.
- macaw rotation around the Y axis caused the macaw texture to
distort, stretch and recurse in interesting and unexpected ways. This
was caused by sending Z values in the wrong coordinate space (Z)
contrast to what is expected by the Dreamcast (1/z). Reordering
z-coordinate operations so that the reciprocal is computed last
resolved this.
- macaw rotation around the Y axis caused the macaw texture to warp
unexpectedly, but only on real hardware. This was caused by
unnecessarily negating Z coordinate values.
Behavior for each of the Z-coordinate issues differed between Flycast
and real Dreamcast hardware.
I also did several tests related to SH4 cache behavior, particularly
related to the "copy-back" mode. I verified copy-back behavior on a
real dreamcast, and experimented with the operand cache write-back
instruction, "ocbwb".
In particular, when the `scene` buffer is access from cacheable
memory, e.g: the P1 area, and CCR__CB is enabled, DMA from physical
memory to the TA FIFO polygon converter will fail because the scene
data has not yet been written to physical memory yet. `ocbwb` can be
used to "write back" scene from the SH4 operand cache to physical
memory--only the latter is visible from the CH2-DMA perspective.