This adds a new texture memory allocation header,
texture_memory_alloc2.hpp, with two of each memory area.
This also adds two new examples, "cube_textured" and "cube_vq" that
demonstrate using the new texture_memory_alloc2 to perform CORE
rendering, geometry transformation, and tile acceleration
concurrently.
This adopts a "writer" concept, vaguely inspired by the ta parameter
writer. This might turn out to be not a great idea if the
response/offsets for heterogenous commands are too inconvenient to
keep track of.
This breaks every example that uses maple--only
example/maple_controller is updated to use the new interface.
The previous texture_memory_alloc.hpp was written based on an
incorrect understanding of the "32-bit" and "64-bit" texture memory
address mapping.
The primary motivation is to rearrange the texture memory address map
so that "textures" (64-bit access) do not overlap with 32-bit
accesses, such as REGION_BASE or PARAM_BASE.
This makes it possible to change the serial baud rate without
uploading a new serial transfer program. I'm not sure how useful this
will be, but it is simple enough to add.
The client program is also substantially improved. Sincerely I do not
understand how/why this works. Experimentally, I found that feeding
the ft232h data in chunks of up to roughly 384 bytes works reliably,
both for reads and writes. Larger chunk sizes are (as expected)
faster, but the tranfers do not appear to be consistently correct in
this case.
I have no logical explanation for this. The size of the ft232h FIFO is
1K each for the transmit and receive buffer respectively.
This also enables RTS/CTS hardware flow control. Surprisingly, this
doesn't appear to affect reliability significantly.
The serial_transfer loader, as long as the target program voluntarily
terminates itself at some point, is able to load multiple programs
consecutively without requiring a physical power cycle to reload the
transfer program from CD.
The current example.mk juggles between two different "memory layouts",
one for "burn to a physical CD" and another for "load via serial
cable". Because the serial_transfer program now relocates itself to
the end of system memory, this means the 0x8c010000 area is now usable
by programs that are loaded by serial_transfer.
This still has issues, notably:
Despite the first 16kbytes of audio being loaded prior to starting the
AICA ARM7 CPU, the GDROM drive returns "busy" for the following
~48kbytes. This in turn causes the AICA to play audio from
uninitialized memory.
There is also a separate issue where the timing of changing the start
address of the audio channel causes a faint popping sound throughout
the audio playback.
I should do more timing experiments with the GDROM drive, and improve
this example to play the audio with fewer artifacts.
This combines my iso9660 parsing code, with all of the prior gdrom packet
interface / command code.
The example, on real Dreamcast hardware, displays the first 2048 bytes [1] of every
file in the root directory on the serial console.
[1] or the size of the file, whichever is smaller
After thinking about this more, I realized it is probably never useful, and
certainly completely incorrect in all of the cases it was still being used in
the examples.
Necessarily, this means that dma_start must now know what the size of the
response is, so that it can issue the appropriate number of ocbp instructions.
This also cleans up the inconsistent _command_buf and _recieve_buf declarations.
From the GCC manual.
> GCC permits a C structure to have no members:
struct empty {
};
> The structure has size zero. In C++, empty structures are part of the
> language. G++ treats empty structures as if they had a single member of type
> char.
I was not aware of the different behavior in C++.
This fixes every maple example--most were broken for multiple reasons, including
this one.
This also enables SH4 caching. This includes linking code/data into the P1
area (previously this was not the case).
The maple examples (which indeed involve much use of DMA) require much work to
successfully work with the operand and copyback caches. The vibration example
currently is the most complete, though I should consider more on how I want to
structure maple response operand cache invalidation more generally.
Though I did spend much time thinking about this, my idea was not correct.
The "tearing" and "previous frame is being shown while it is being drawn" is
simply because that's exactly what the logic in holly/core.cpp did.
This is no longer the case--by the time the newly-created core_flip function is
called, the core render is complete, and we should switch the FB_R_SOF1 to the
current framebuffer, not the one that is going to be written on next frame.
This also modifies alt.lds so that (non-startup) code now runs in the P1 area,
with operand/instruction/copyback caches enabled. This caused a 10x speed
increase in my testing.
There are still texture sampling issues that I don't understand. Until
I properly understand this, using (bitmap) fonts that have
power-of-two dimensions seem to produce "acceptable but incorrect"
results.