Super Xilinx Kart

So here’s an attempt at documenting an old project from last semester.  In my Computer Systems Design class we worked with Spartan 3E 500 FPGAs and we could do basically anything we wanted for the final project.  So a friend and I decided to make a Mario Kart clone.  We used the Xilinx ISE, XPS version 10.4 and Verilog.  All the graphics and hardware controls for the game are written in software.  Everything else is pure hardware.  Essentially we implemented a rudimentary GPU that performs graphics transformations in hardware to overcome the speed limitations of linear processing.    Lastly note that most of this is lifted from the written report so please excuse the formal tone and any grammar mismatches.

We drew a lot of our inspiration from the game Super Mario Kart on the Super Nintendo. This game
utilizes a hardware rendering mode called “mode 7.” In “mode 7” a perspective transform is applied to a background bitmap which gives the illusion of a 3D plane. Our foremost goal in this project was to get this mode working on the FPGA in hardware.

Super Mario Kart Screenshot
Notice how you can clearly see the difference between the character sprites and transformed 2D background

In addition to “mode 7” we also wanted to implement a general purpose hardware framework.
Processors with low clock speeds and small memories often have a hard time rendering large, complex
graphical scenes in real time. We set out to design a coprocessor that could render any graphical scene
you could throw at and make it all addressable from software. To do this we once again took inspiration from video game consoles of the past. We wanted to build a sprite system where sprites could be loaded from software into special purpose buffers that would allow for fast rendering. We would additionally have buffers for background bitmaps that we could apply the perspective transforms to.


Proof of concept tests for Rotation, Translation, and Perspective transform equations

The main image processing section of our hardware uses a three stage processing pipeline to create the
3D effect. In 2D vga games, pixel values are fed into differing components which output an RGB value that is given to the display. In our design, the same pixel values (X and Y on the screen) are fed into a
pipeline which applies three separate transforms to get a resulting X,Y location. This location is fed into a component which contains a hardware synthesized course map. The course map is what finally
outputs the RGB color that is fed into the VGA module. By mapping these screen locations to an internal course map via different image transforms, the illusion of three dimensions can be created.

The first stage of the pipeline handles image translation. Our goal was to keep the player at the center
of the screen at all times. Using the position of the player, we shifted the course map to achieve this.

The next stage was rotation. Since sprites implemented in software are very costly in terms of memory, we chose to keep number low. Rotating the map to accommodate the player’s orientation allowed us to extend the 3D effect as well as use one sprite for all situations.

Lastly was the unsuccessful perspective transform. The intent of this transform was to expand the
course map as the screen_y increased to give the appearance of depth and a horizon. While initially
successful, results of the perspective transform were warped after a bug fix in the translation transform.  Translation was set as a higher priority than perspective and due to lack of time a solution was never established.

It is worth noting that the perspective transform is the most mathematically complicated of the three and as such it was hard to create a perfect perspective transform without spending more time than was possible devoted to the mathematics.  In the Java mock up for the different transforms, the transform matrix above was used to generate proportionately scaled perspective, however, we found out that implementing exponentiation with variables was incredibly difficult with FPGAs and so a workaround had to be used that resulted in the disproportionate perspectives seen above.

Sprites are graphical objects that are handled separately from the rest of a scene in graphical overlays.
They are a method of integrating unrelated bitmaps in order to create a full scene. The main benefit of
sprites is that they use a very small amount of memory and processing power compared to rendering a
full scene. They consist of a series of bitmaps that can be overlaid on the rest of the scene that the graphics processor is rendering. This means that the processor only has to keep track of the location of the sprite and it’s position, or animation progress. Below you can see an example of a sprite sheet from Super Mario Kart.
As you can see each of the squares contains Mario riding his kart in a different orientation. Game
programmers can use this sheet to make Mario rotate without using any advanced graphics processing
features (such as the transforms discussed in the previous section).
High Level Design
Our project is based on the Xilinx Spartan-3E FPGA chip. We used MicroBlaze, a softcore CPU provided by Xilinx in their Platform Studio, as the primary processor in our game and complemented it with a custom graphics coprocessor. We’re using onboard BRAM to store our program, which allows for optimal performance. We also have two additional hardware blocks, a button controller and an RS232 module, which provide input and debugging respectively. This is all outlined below in the system level block diagram.
Graphics Co-Processor
The graphics coprocessor is probably the most interesting part of the project. Our goals were to
implement “mode 7” as well as a software addressable sprite system. We did partially meet those goals. In the end our coprocessor supported 2 of the 3 transforms required to achieve “mode 7”. In our implementation we have the translation and rotation transforms working. While this does not provide an emulated 3D it still makes the game much more interesting. Additionally, we got the sprite system partially functional, due to hardware limitations we were only able to implement 2 sprite blocks at one time. This limited us to having two non-background objects on the screen at any one time. Below is a block diagram of how these modules interact with one another.
Game play
The Game play aspect of Super Xilinx Cart exists primarily in software, in accordance with our goals. We created two player objects, one for the computer and one for the player. The Microblaze processor reads the states of three buttons on the FPGA board and decides how to proceed based on that input.
  1.  Left Button rotates map clockwise
  2. Right Button rotates map counter-clockwise
  3. Down button moves the map in the opposite direction of the player, making it appear as if the kart is moving forward
Some very basic boundary avoidance groundwork has been created. The center of the default course
has a blue square which represents a pool of water. If the player enters the boundary of the water, they are placed back at the start. This groundwork provides an example of addition boundary and obstacle avoidance that could be implemented in the future to improve upon the game. For example, this could be used to count laps once the player crosses the finish line, detect collisions with other players, and allow for speed changes over different terrain. 
Lastly is the creation of a computer player. To demonstrate competitive nature of all racing games,
we’ve created a second sprite which stays close to the player throughout their run. It randomly swerves from side to side but slowly advances on the player’s position. The goal of the player is to finish the race before the computer player passes them.
Design Challenges
The primary constraint on all aspects of this design was time. Finding time to focus and plan out the
design for this project proved more difficult than anticipated.
Another large challenge was the mathematics involved in the transformation pipeline. Implementing
the three transforms required study and derivation of several complex equations based on concepts in
linear algebra and computer graphics. The limitation of the hardware in this regard also proved rather
difficult. Due to the nature of FPGA’s no existing sine and cosine functions are implemented and so
large lookup table modules had to be created. Additionally, to provide a smooth perspective transform with convincing depth perception, a second order differential equation was required, but hardware constraints limited the implementation to linear systems. Because of this the accuracy of the depth was sacrificed for ease of implementation.
Timing also proved to be an issue in the case of hardware design. The transformation pipeline
introduced large propagation delays into the system with the maximum clock frequency limited to
around 35MHz. The Microblaze and system clocks were brought down to 30MHz to compensate.
A smaller challenge that we faced was the complex nature of our project going beyond the realm of
conventional Xilinx Tool Usage. We had to learn our way around some of the more complex Xilinx tools in order to achieve the level of hardware control required for this project.
And lastly, I’ve included a video of the final working game.  While I really wish there had been time to get the 3D perspective working, I’m still very happy with the result.