a little in-depth understanding of Mali’s architecture, compare the basic process of the existing GPU with that of Mali, and propose the advantages and disadvantages of the GPU. The original address: https://developer.arm.com/graphics/developer-guides/tile-based-rendering p>
the architecture of a traditional GPU is generally called the Immediate mode GPU. The main process is vertex shader and fragment shader executed sequentially. The pseudocode is:
for draw in renderPass: for primitive in draw: for vertex in primitive: execute_vertex_shader(vertex) for fragment in primitive: execute_fragment_shader(fragment)
data stream looks like this:
The main advantage of
is that the output from vertex stays on the chip and can be read directly and quickly in the next stage.
if there are large graphics (mostly triangles) that need to be rendered, then the framebuffer will be very large. For example, rendering the color of the entire screen or deep rendering will consume a lot of storage resources, but there are no such resources on the chip, so the DDR will be read frequently. Many operations related to the current frame (such as blending, depth testing or stencil testing) all need to read this working set, so the bandwidth required is huge and the energy consumption is also high. For mobile devices, this way is not conducive to the operation of the device.
so Mali’s GPU proposed the Tile-based concept, which is to divide the image into 16*16 pieces. Rendering in small chunks and writing to DDR solves this problem by reducing the frequency of reading and writing to DDR. But chunking requires knowing the geometry of the entire image, so the operation is broken down into two steps:
- first step to perform geometry related operations, and generate tile list.
- second step to execute fragment operation on each tile, after completion, write memory
pseudocode is as follows:
# Pass one for draw in renderPass: for primitive in draw: for vertex in primitive: execute_vertex_shader(vertex) append_tile_list(primitive) # Pass two for tile in renderPass: for primitive in tile: for fragment in primitive: execute_fragment_shader(fragment)
data flow as follows:
obviously solves the bandwidth problem of the traditional model, because the fragment shader reads a small fragment every time and puts it on the fragment. There is no need to read the memory frequently until the last operation is finished, and then write to the memory. You can even further reduce reads and writes to memory by compressing tiles. In addition, when some areas of the image are fixed, the function is called to determine whether tiles are the same, so as to reduce repeated rendering.
is used to write the output geometry to the DDR after the vertex phase, and then to be read by fragment shader. This is the balance between the overhead of tile writing DDR and the overhead of fragment Shader rendering reading DDR. Another operation, such as Tessellation, is not suitable for the Tile-based GPU.
now the resolution of the screen is getting bigger and bigger from 1080p to 1440p to 4K, you can see, Mali’s architecture will be used on a large scale in the future.
but there are some pitfalls that developers need to avoid. The first is to properly set the Render Pass to take advantage of the features of the architecture; The second is to understand the benefits of this geometric division.
- Rendering Problems :Failed to load platform rendering library
- Rendering Problems Exception raised during rendering:
- Introduction to JIRA introduction to HP ALM
- Introduction to VTK
- Graphics rendering pipeline diagram of OpenGL
- Failed to load platform rendering library
- Principle and usage of feof ()
- RPC principle and related technologies used
- Solution to the incomplete display of the principle icon number exported by smartpdf in Ad
- The principle of deformable convolution
- Introduction to total phase data center
- DHCP principle and experimental verification
- Content rendering error: a solution to the problem of zero Download document cannot be opened
- Practice based on how to tango with Django 1.7 (1)
- Brief introduction of Linux MMAP and solution of bus error
- Java — one of Apollo configuration centers — Introduction to Apollo
- How to install IDM Extension in Chromium-based Microsoft Edge (Canary/Dev)
- An example of 3D data modeling based on VB6 + OpenGL
- ##Configure VLAN partition based on IP subnet
- Simple license plate recognition based on Halcon