Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Optimization Question - Drawing many meshes/sprites

edited May 23 in Questions Posts: 185

Hey everyone,

I've been struggling with drawing an abundance of sprites. To be precise, my game needs to draw a lot of grass. Basically, I use a mesh with a texture that consists of a few blades of grass, which is animated using a wind-shader. To have a nicer effect, I draw a couple of different versions of it with different wind strenghts and use setContext(...) to create images to use on the spot. This way, I don't have to create hundreds of meshes that all calculate their shader on their own, but draw the mentioned images using sprite(...).

Before the game even starts to draw anything, I use a custom algorithm to distribute the grass across a 2d-plane. At the end, I have a 2d-array with probably thousands of entries, which contains the positions where a grass-sprite should be drawn.

In the actual draw-method, the grass-sprites are drawn. I loop through the 2d-array and draw one of the few variations of the sprite. This amounts to sometimes hundreds of grass-sprites being drawn at the same time, which (obviously) brings down the performance considerably. Only a fraction of all possible grass-positions is drawn at the same time as the 2d-array contains all positions across the entire 2d-plane, of which only a part can be visible since there is a camera that follows the player.

As a result, I thought using a mesh would be beneficial. I created an image grassAtlas = image(...), used setContext(grassAtlas)and then drew all different grass versions next to each other. This way, I thought, I would have one sprite, the grassAtlas, that could be used as the texture and using mesh:setTexRect(...), I could set the version for a specific rect that I would be adding by mesh:addRect(...). As a result, I would have one single mesh, but could use the same variations of the grass-sprite as before.

As long as I don't draw the mesh, everything's fine. I was worried that adding thousands of rects to the mesh could be an issue, but it doesn't seem to be. However, drawing the mesh instead of the sprites brings down the performance considerably more, even to a point where the game's basically unplayable. This seems odd to me as drawing the same amount of sprites was heavy on performance, but the game was still running at a decent and stable framerate.

Now I have two questions:
- Should my approach theoretically work? By that I mean: Is my assumption correct that multiple rects of one mesh, which uses a single texture atlas but attributes a different part of said texture using mesh:setTexRect(...) to different rects, only require one draw call instead of one per rect?
- Is there any efficient way to draw that many rects of a mesh/add that many rects to a mesh? I'm fully aware that both of my approaches are terrible as far as memory consumption and performance are concerned, but I also don't know of a better one.

Thanks in advance and sorry for the long text!

Comments

  • dave1707dave1707 Mod
    Posts: 10,055

    @Elias Not sure about what your doing, but here’s a stripped down version from a mesh program I used to check the FPS of the different Codea versions as they came out. I’ve used it since version 2.1 . This is setup to run 3000 meshes of size 30. It runs at 59.99 FPS on my iPad Air 3. I run the original program at different sizes and numbers and keep track of the FPS of the different versions. Don’t know if this helps any or not.

    viewer.mode=FULLSCREEN
    
    function setup()
    
        size=30
        nbr=3000
    
        count=0
        fill(0)
        tab={}
        for z=1,nbr do --# of meshes
            table.insert(tab,m(math.random(50,WIDTH-50),
                math.random(50,HEIGHT-50),math.random(360)))
        end
    end
    
    function draw()
        background(132, 224, 217, 255)
        for a,b in pairs(tab) do
            b:draw()
        end
        count=count+1
        text(1/DeltaTime,WIDTH/2,HEIGHT-25)
    end
    
    m=class()   -- mesh
    
    function m:init(x,y,r)
        self.x=x
        self.y=y
        self.ms=mesh()
        self.rot=r
    end
    
    function m:draw()
        pushMatrix()
        translate(self.x,self.y)
        rotate(self.rot+count)   
        self.ms.vertices={vec2(0,0),vec2(size,size),vec2(size,0)}
        self.ms:setColors(255,0,0)
        self.ms:draw()  
        popMatrix()
    end
    
  • Posts: 41

    I think, you can avoid doing the setRectTex() by moving the animation part into the shader. You can use a counter as uniform and use it to offset the UVs.
    For that, you'll need to carefully think where you should put you image in the atlas.
    For instance on the X axis: spread the variation, and on the Y axis: spread the animation frames.

    Also, but not related: having one big 1D table should be better, performance wise, than a 2D table.

  • edited May 23 Posts: 421

    @Elias From what you’re describing you’re calling mesh:addRect(…) every frame for every grass image to be drawn and regenerating the mesh every frame. Is that right?

    If so, assuming that for the vast majority of the time the grass positions do not change at all rather than generate a mesh for on-screen grass on each frame split the 2D plane into ‘chunks’ so say for a world 128x128 divide it into 16x16 chunks with a total of 64 chunks. Generate a grass mesh for each chunk (not during rendering) then while rendering draw any chunk that is at least partially on screen. The grass that’s offscreen should be efficiently culled by the GPU with minimal overhead. This would allow far quicker ‘grass in view culling’ and also avoid the need for you to regenerate the mesh on every frame (which is sub-optimal at best due to bandwidth costs).

    Combined with @moechofe2’s suggestion I’d expect you to see some better results.

    When it comes to mobile GPU performance modern iOS devices are far more capable than people actually realise, it just requires a little optimisation (e.g. https://apps.apple.com/gb/app/alien-isolation/id1573029040).

    It’s also worth noting that unless there’s caching going on in Codea’s runtime every call to mesh:setTexRect(…) and mesh:addRect(…) may be introducing an OpenGL call too, which can really add up to poor performance.

  • Posts: 185

    @dave1707 Thanks for the program, I'll check it out to see if it is similar to what I'm looking for!

    @moechofe2 Thanks, I'll consider this later on! However, I don't think that this is the reason it performance so badly. Didn't know that about tables, thanks!

    @Steppers Almost, I use mesh:addRect(...) when generating the grass-positions and mesh:setTexRect(...) to assign one of the ten grass-sprites that are within the texture-atlas, also when generation the positions. I only occasionally need to change the color of certain rects, but as soon as the project is initialized, no new rects are added to the mesh.

    As for your suggestions about using chunks: I've considered this as well, though shouldn't one mesh already equal one draw call? I assumed that using one mesh with one texture wouldn't end up causing multiple draw calls.

  • Posts: 421

    @Elias You’re right in that having one mesh and one texture should result in one draw call, the chunk thing was mainly to have your culling logic (if any) be more efficient.

    Also, if you are modifying the mesh with setTexRect every frame then that’s also unlikely to be efficient.

  • Posts: 185

    @Steppers Hm, but I'm also only using setTexRect at the start. I've also noticed that just adding the rects and drawing the mesh already tanks the performance. Maybe adding all those rects is already too much...

  • dave1707dave1707 Mod
    edited May 23 Posts: 10,055

    @Elias Here's another version. I'm using addRect and I'm not creating the mesh every draw cycle like in the previous example. I display the FPS and the number of meshes. I don't know how many rects you were creating, so you can alter the size variable to change how many rects are created.

    A size of 10 creates 9,213 rects and runs at 59.99 FPS on my air 3.

    viewer.mode=FULLSCREEN
    
    function setup()
        size=15
        xs=WIDTH//size
        ys=HEIGHT//size
        fill(255)
        tab={}
        for x=0,xs do
            for y=0,ys do
                table.insert(tab,m(x*size,y*size,math.random(360)))
            end
        end
    end
    
    function draw()
        background(132, 224, 217, 255)
        for a,b in pairs(tab) do
            b:draw()
        end
        text(1/DeltaTime,WIDTH/2,HEIGHT-25)
        text(xs*ys,WIDTH/2,HEIGHT-50)
    end
    
    m=class()   -- mesh
    
    function m:init(x,y,r)
        self.x=x
        self.y=y
        self.ms=mesh()    
        self.ms:addRect(self.x+size/2,self.y+size/2,size,size)  
        self.ms:setColors(math.random(255),math.random(255),math.random(255))
    end
    
    function m:draw()
        self.ms:draw()  
    end
    
  • Posts: 185

    @dave1707 Thank you, this is extremely helpful! I've increased the number of rects to be added and saw a significant drop in performance. I also added a way to change between drawing the mesh and drawing the same amount of sprites. For some reason, the performance when using sprites was better...

  • dave1707dave1707 Mod
    Posts: 10,055

    @Elias Here’s another version that uses a shader to move the Grass in a random x,y direction each frame. I don’t have a wind shader or know what grass you’re using for the texture, so I just used what I had.

    On my iPad Air 3

    At a size of 10, there are 9,213 rects and it runs at 59 FPS.
    At a size of 5, there are 36,852 rects and it runs at 59 FPS.
    At a size of 2, there are 231,852 rects and it runs at 59 FPS.
    At a size of 1, there are 927,408 rects and it runs at 14 FPS.

    viewer.mode=FULLSCREEN
    
    function setup() 
        img=readImage(asset.builtin.Blocks.Grass4)
        size=10
        xs=WIDTH//size
        ys=HEIGHT//size
        fill(255)
        m=mesh()
        for x=1,xs do
            for y=1,ys do
                m:addRect(x*size,y*size,size,size)
            end
        end
        m.texture=img 
        m.shader=shader(vShader,fShader)
    end
    
    function draw() 
        background(40, 40, 50) 
        m.shader.xset=math.random(-1,1)*.2
        m.shader.yset=math.random(-1,1)*.2
        m:draw()
        text("FPS "..1//DeltaTime,WIDTH/2,HEIGHT-30)
        text("# Rects "..xs*ys,WIDTH/2,HEIGHT-60)   
    end
    
    vShader = [[
    uniform mat4 modelViewProjection;
    attribute vec4 position; 
    attribute vec4 color; 
    attribute vec2 texCoord;
    varying lowp vec4 vColor; 
    varying highp vec2 vTexCoord;
    void main() 
    {   vColor=color;
    vTexCoord = texCoord;
    gl_Position = modelViewProjection * position;
    }    ]]
    
    fShader = [[
    uniform lowp sampler2D texture;
    varying lowp vec4 vColor; 
    varying highp vec2 vTexCoord;
    uniform lowp float xset;
    uniform lowp float yset;
    void main() 
    { gl_FragColor=texture2D( texture,vec2(vTexCoord.x+xset,vTexCoord.y+yset))*vColor;
    }    ]]
    
  • edited May 25 Posts: 324

    here’s a fun one to play around with - meshes.zip

    what i’ve learned is that shaders will impact performance the most, followed by the size of assets, and overall of course everything is dragged down by number of assets

    (one interesting note is that i’ve noticed it doesn’t take as much to drop from 120 to below as it does to drop below 60 (120 and 60 being relevant to your device max and half refresh rate))

    i swear it used to perform better, but i dont have any actual data,

    if memory serves right i used to be able to see 5-6 thousand meshes at full performance (120 for my ipad pro) but now it drops below 120 around 2.2K+

    meshes.zip 533.9K
  • Posts: 185

    @dave1707 Thank you, for some reason this runs much better than with the changes I made to your previous code. I'll need to check what the difference is between the two.

    @skar Thanks, I'll check your project out as well! That's interesting, I'd be curious what the reason for that could be.

  • dave1707dave1707 Mod
    edited May 25 Posts: 10,055

    @Elias The difference between the 2 are the first one created multiple meshes and the last one just created 1 mesh and added the rects to it.

  • Posts: 185

    @dave1707 Exactly, but I changed the first example so that it only uses 1 mesh and adds rects to them. So I must have messed something up there and that might also be the reason why it performed so badly with the grass in my game.

  • dave1707dave1707 Mod
    Posts: 10,055

    @Elias Is there any way you can post just you mesh grass code. Maybe if we can see what it’s doing it might be more helpful. Is there a difference in speeds if you just show the mesh with or without the wind shader.

  • Posts: 324

    keep in mind the size of your texture will impact performance, i had an issue previously where i thought i was using a 300x300 png but i was actually using a 1600x1600 png where only the middle 300x300 had pixels. this was hard to figure out but all those extra pixels even though they were alpha 0 were still in the fragment shader

  • dave1707dave1707 Mod
    Posts: 10,055

    @Elias Without knowing exactly what you’re doing with the grass, I’m just throwing out random code. Here another example where I’m drawing 6,348 grass rects. Tap the screen and I redraw about 3,200 rects where I wiggle the grass. All of this runs at 59 FPS. Even if I change the size to 5 and draw 36,852 rects and wiggle about half of them, the FPS is still 59. Again, it would help if you can show some code you’re using for the mesh.

    viewer.mode=FULLSCREEN
    
    function setup() 
        size=12
        xs=WIDTH//size
        ys=HEIGHT//size
        fill(255)
        tab={}
        m=mesh()
        m.texture=readImage(asset.builtin.Blocks.Grass4)
        for x=1,xs do
            for y=1,ys do
                m:addRect(x*size,y*size,size,size)
                table.insert(tab,vec3(x*size,y*size,0))
            end
        end
        tot,cnt=0,0
    end
    
    function draw() 
        background(40, 40, 50) 
        cnt=cnt+1
        if cnt>20 then  -- wiggle every 1/3 second
            cnt=0
            for a,b in pairs(tab) do
                if tab[a].z==1 then -- if 1 then wiggle
                    m:setRect(a,tab[a].x,tab[a].y,size,size,math.random(-1,1)*.4)
                end
            end
        end
        m:draw()
        text("FPS "..1//DeltaTime,WIDTH/2,HEIGHT-30)
        text("# Rects "..xs*ys,WIDTH/2,HEIGHT-60)
        text("# Wiggles "..tot,WIDTH/2,HEIGHT-90)
    end
    
    function touched(t)
        if t.state==BEGAN then
            for r=1,xs*ys do
                tab[r].z=math.random(2) -- set to 1 or 2
                if tab[r].z==1 then
                    tot=tot+1
                end            
            end
        end
    end
    
  • Posts: 185

    @skar You're right, I hadn't considered this. I basically use ten meshes that have a shader applied to them, use setContext(img) to draw all ten of them next to each other, which is treated as a texture atlas for spritesheet-animations, and then supply img as the texture to the actual mesh with all the rects. This way I only apply a shader to those ten meshes. The actual mesh with all the rects then uses m:setRectTex(...) for individual rects to select one of the ten different versions of the grass. I don't think that the image generated this way is too large, but I should take a look at that.

    @dave1707 Thanks for another example! I haven't had the time yet to check whether I was doing something wrong compared to your code. But I'll probably find some time for it today. If I can't spot a difference, I'll post my code here (unfortunately, it's more complicated than just a quick copy-paste as there are some references to other parts in my code).

  • dave1707dave1707 Mod
    Posts: 10,055

    @Elias If you can’t post any code, maybe you could give a good description of what you’re actually trying to do with the mesh. What mesh commands you’re using, etc. Reading your above posts, it’s hard to get an idea of what’s going on.

  • Posts: 185

    So, I've spent some time today trying to figure out what the issue was. It's actually kind of embarrassing... I had previously used a nested for-loop to draw the sprites:

    for i = 0, value, 1 do
        for j = 0, value, 1 do
             local pos = table_that_contains_positions[i][j]
             -- some code
             sprite(...)
        end
    end

    When replacing this version with drawing a mesh, I couldn't just delete the entire thing as the loop contains some code that changes the alpha-value of the grass if you step onto it. Long story short, I didn't put mesh:draw() behind the last end, but within the outer loop, which caused Codea to draw the mesh around 300 times. No wonder the performance was so bad... Now it runs at almost 60 FPS.

    Thanks to everybody for their help!

  • dave1707dave1707 Mod
    Posts: 10,055

    @Elias Glad you got it figured out. Don’t worry about the embarrassing part, I’ve done things like that many times. No matter how many times I go over the code, I can’t see the problem. I have to just give up, wait a day or 2, then look at the code again before I see what’s wrong. It’s like your mind gets stuck in the error and you just don’t see it. Waiting seems to break the cycle.

  • Posts: 324

    nice job figuring it out, like dave said don’t worry about it, there’s been times when i got stuck on an issue and thought i dont even know how to code or i should just give up on trying to make games

    you say it’s “almost” 60 fps, one thing i do is keep track of how many objects im drawing on the screen, do you know how many you have on the screen at once?

  • Posts: 185

    @dave1707 Thanks! You're right, waiting and sometimes re-writing a piece of codes often fixes these issues.

    @skar Thanks! Yes and no. I do count how many objects and of what type are on the island. However, players can basically customise it however they want. Some types of objects are more expensive to draw than others (shaders, particle effects, sounds, ...). I've created a test environment with some areas that are worst case scenarios and others that should be closer to what players typically experience. Hitting 60 FPS is the goal. However, I'll probably keep the game at 30 FPS anyway for most devices as battery consumption is much higher otherwise.

Sign In or Register to comment.