This enables alpha blending for both font_outline and
font_outline_punch_through.
I have also experimented more with 16-gray vs 256-gray--I have not
decided which between monochrome, 16-gray, or 256-gray I like the
most.
Perhaps a better test might be to test hanzi.
Previously, due to the ordering of .text.p2ram and .bss, the linker
was forced to allocate .bss in the output file, increasing the size
of the final binary unnecessarily.
The linker scripts now preserve debugging symbols during linking.
This fully threads both the real minimum size of the texture and the
dimensions of the texture through to the TA parameters.
This also removes spurious zero-area drawing commands (space
characters).
This still needs to be cleaned up, particularly to properly pass the
texture size around--there are a few unnecessary '128x256' magic
numbers scattered in the code.
I think the original version is more readable, but the newer version
is better overall because it doesn't require reading from dst, and is
able to directly write to a 32-bit dst.
This is very barebones, and uses the serial interface to communicate
the status of the "a" controller button being pressed.
I'd like to make this a more interactive/graphical demo.
The <uint8_t> template instantiation was causing 8-bit writes to the
command buffer, when they were intended to be 32-bit writes. This
garbled and truncated the data ultimately sent to the VMU LCD.
This also adds ta_parameter_writer; I am not certain if I like this,
but it felt necessary to deal with ta_parameters being either 32 or 64
bytes long--for this reason, a reinterpret_cast to a union is less
attractive (the union members are not a fixed size).
There were two notable bugs:
- the maple transfer/data sizes were not being set correctly
- align_32byte always realigned the address of `_scene`, and not the
`mem` parameter as expected. This had the effect of the maple-DMA
send and receive buffers being the same buffer. On real hardware,
this causes unpredicable behavior.
On an emulator, the receive buffer is filled with the correct/expected
data for 'device status'.
I found this experiment useful:
- it revealed a bug in my register struct generator code (the
maple_if-related registers were not at the correct offsets)
- it validates my understanding about endianness-swapping between the
maple bus and the SH4
This also includes an example for generating a quad primitive. In
flycast, this is very obviously rendered as two triangles. On real
hardware, this appears to be a "native" quad.
This draws a nice macaw texture in a square-shaped triangle
strip. The square is then rotated around the y-axis.
I dealt with myriad bugs while experimenting with this, all of them
entirely my fault:
- macaw texture colors were incorrect because GIMP was exporting raw
RGB data in gamma-corrected sRGB space, whereas the Dreamcast is in
linear color space.
- macaw texture colors were incorrect because I truncated color values
to the least significant rather than most significant bits.
- macaw rotation around the Y axis caused the macaw texture to
distort, stretch and recurse in interesting and unexpected ways. This
was caused by sending Z values in the wrong coordinate space (Z)
contrast to what is expected by the Dreamcast (1/z). Reordering
z-coordinate operations so that the reciprocal is computed last
resolved this.
- macaw rotation around the Y axis caused the macaw texture to warp
unexpectedly, but only on real hardware. This was caused by
unnecessarily negating Z coordinate values.
Behavior for each of the Z-coordinate issues differed between Flycast
and real Dreamcast hardware.
I also did several tests related to SH4 cache behavior, particularly
related to the "copy-back" mode. I verified copy-back behavior on a
real dreamcast, and experimented with the operand cache write-back
instruction, "ocbwb".
In particular, when the `scene` buffer is access from cacheable
memory, e.g: the P1 area, and CCR__CB is enabled, DMA from physical
memory to the TA FIFO polygon converter will fail because the scene
data has not yet been written to physical memory yet. `ocbwb` can be
used to "write back" scene from the SH4 operand cache to physical
memory--only the latter is visible from the CH2-DMA perspective.
Successfully tested on real hardware on multiple optimization levels.
I knew in the previous commit that __attribute__((aligned(32))) did
not actually align to 32-bytes. However, at -Os specifically, and only
with that exact code, GCC was coincidentally generating a 32-byte
alignment. When the code or optimization level changed, this changed
the alignment of the "scene" buffer, which caused CH2-DMA to perform
incomplete copies of the TA parameters, which in turn variously caused
the TA to generate incomplete/nonsensical/nonexistent object lists.
This also fixes an unrelated issue with the background ISP/TSP
parameters. This "worked" in flycast but not on real hardware by
complete accident (a coincidence of the specific timing that the
ISP/TSP parameters are read in each Dreamcast implementation). The
issue is that the TSP parameters are 60 bytes long, which is greater
than the 32 bytes were previously being allocated. After changing the
allocation to 64 bytes, the background color is now drawn on real
hardware as expected.
In addition, though this did not cause issues yet, I corrected the
length of p1ram/p2ram in the linker script, to prevent future issues
where GCC's memory allocations wrap around past the end of the system
memory address space.
This rearranges scene.cpp to a file organization that more closely
follows which code is responsible for what area of (hardware)
initialization.
All TA and CORE register accesses now use the new ta_bits.h and
core_bits.h, respectively.