Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Codea app performance on iPad Pro 3rd Generation worse than on older model

in Bugs Posts: 5

I’ve just configured a new 3rd generation iPad Pro 11’ (1TB storage, 16 GB RAM) with Codea and am running an app I’ve previously developed that calculates and renders a full screen’s worth or small voronoi polygons (heavy on the floating point math) and notice that it runs SO MUCH SLOWER than the same exact code did on a 1st generation iPad Pro 11’ (256GB storage, 4GB RAM). Does anyone have any insights into what’s going on here?

Comments

  • Posts: 128

    Wow, @rvmott, you’ve got a super new iPad! I’ve got the the lower line version of your pair (original 9.5 iPad Pro and more recently the 8GB 11 iPad Pro), and 3D models consistently seem to render much faster on my new iPad Pro. If you post the code you are testing, I’d be happy to compare the speeds of rendering on my pair if that helps.

  • Posts: 2,689

    @rvmott - not sure which gen is metal, but if you have a metal system perhaps the need to convert from old to new graphically is slowing the process.

  • Posts: 5

    Voronoi code posted here. Curious if anyone else experiences the same problem.

  • edited April 30 Posts: 308

    how much slower is slower? it renders for me in about 1-2 minutes

    one thing to keep in mind is the pixel count is much higher in newer models so the use of HEIGHT and WIDTH will be different, probably making your loops much longer; actually never mind, the 1st gen ipad pro has the same resolution

  • Posts: 5

    Thanks everyone for your feedback on this so far. The code runs to completion in about 17 seconds on my 1st generation device, but takes around 47 seconds to complete on the newest, 3rd generation device.

  • dave1707dave1707 Mod
    edited April 30 Posts: 9,977

    I tried it on my iPads. Is there any way to set the size so it can be the same on different devices. That way it would be a better speed comparison.

    PS. It also cancelled on the air 4 when it should have completed.

    iPad Pro 1 54 sec.
    iPad Air 3 23 sec.
    iPad Air 4 69 sec.

  • Posts: 128

    @rvmott, similar outcome for me:
    iPad Pro (new 3rd gen 11 inch 8GB, iOS 15.4): 48 sec
    IPad Pro (original 9.7 inch, iOS 15): 24 sec
    (Original iPad slightly smaller resolution but wouldn’t think should make that much difference).
    That is strange— not sure if everyone is comparing same iOS to iOS (in case that is a confounding variable).

  • Posts: 5

    Thanks @SugarRay for running the same tests on your devices!

    @John - are there any settings that I am overlooking for optimizing performance with my app?

  • JohnJohn Admin Mod
    Posts: 762

    @rvmott I'm not exactly sure why the performance drops so much on a newer device. But I did try commenting out line 362 (unnecessary call to setContext()) in the code you posted on my M1 Max (a very new device) and got significant performance gains.

    With setContext(): 105.4s
    Without setContext(): 8s

    That's a > 13x speed up! So what's happening here? I suspect you thought that image:set(x, y, c) needs set context to to work (it doesn't). This actually causes significant performance issues due to how image data is stored and transferred to and from the CPU and GPU

    In order to enable direct access to pixel data from Lua we have to keep a local copy of the image in CPU memory. We do this by caching the data locally and using flags to keep track of the last time the image was manipulated on either the CPU (via setting pixels) or GPU (via setContext). If Lua has modified the image, we send it to the GPU, if the GPU has modified it we read that back

    Reading from the GPU is generally slower, and because we want the data right now we also have to stall the GPU (tell it to drop what its doing and retrieve the image data right away). By setting a single pixel and calling setContext one after the other you get image data being copied back and forth multiple times without actually needing to. If you only ever want to directly set pixels you should use image:set() OR draw a pixel sized rectangle via setContext, never do both

    As for the large different in performance on newer devices it could be a change in how things like glReadPixels and glFlush are handled on newer chip designs which give worse performance for atypical usage (i.e. stalling and reading/writing the same image constantly). Apple doesn't support OpenGL officially anymore so I wouldn't be surprised if they broke something without realising it

  • dave1707dave1707 Mod
    Posts: 9,977

    Here’s a Voronoi diagram I wrote that uses setContext. See how this compares on the different devices. On my iPad Air 3 it takes about 67 seconds. It shows a count at the top that stops at 250 then shows the seconds.

    -- Voronoi diagram
    
    viewer.mode=FULLSCREEN
    
    function setup()
        s=require("socket")
        img=image(WIDTH,HEIGHT)
        limit=30
        ms,mc,mr=math.sin,math.cos,math.rad
        tab,col={},{}
        fill(255)
        for z=1,limit do
            r,g,b=math.random(255),math.random(255),math.random(255)
            table.insert(col,color(r,g,b,255))
            x=math.random(50,WIDTH-50)
            y=math.random(50,HEIGHT-50)
            table.insert(tab,vec2(x,y))
            setContext(img)
            ellipse(x,y,16)
            setContext()
        end
        rad=8
        st=s:gettime()
    end
    
    function draw()
        background(0)
        sprite(img,WIDTH/2,HEIGHT/2)
        if rad<250 then
            incRad()
            en=s:gettime()
            fill(255)
            text(rad,WIDTH/2,HEIGHT-40)
        else
            fill(255)
            text(en-st,WIDTH/2,HEIGHT-40)
        end
    end
    
    function touched(t)
        if t.state==BEGAN and #tab<limit then
            table.insert(tab,vec2(t.x,t.y))
            setContext(img)
            ellipse(t.x,t.y,16)
            setContext()
        end
    end
    
    function incRad()
        rad=rad+1
        for a,b in pairs(tab)do
            drawCirc(b.x,b.y,col[a])
        end
    end
    
    function drawCirc(xx,yy,c)
        setContext(img)
        for z=1,360 do
            local x=(mc(mr(z))*rad)//1
            local y=(ms(mr(z))*rad)//1
            if x+xx>0 and x+xx<WIDTH and y+yy>0 and y+yy<HEIGHT then
                local r,g,b,a=img:get((x+xx+1)//1,(y+yy+1)//1)
                if r+g+b==0 then
                    fill(c)
                    ellipse(x+xx,y+yy,6)
                end
            end
        end
        setContext()
    end
    
  • Posts: 5

    Thanks @John for taking the time to diagnose the issue and for offering the insights into the unnecessary and costly setContext() calls, including the background on the CPU-GPU interactions. Codea had been a great platform for me to explore my hobby of coding fractal-based landscapes and simulations, which sometimes require manipulating pixels in multiple image maps simultaneously, and the background you have provided will guide me towards more efficient development moving forward.

  • JohnJohn Admin Mod
    edited May 3 Posts: 762

    @dave1707 cool, I'll have to check that one out

    @rvmott No worries, I think it's pretty interesting when we run into some quirky hardware differences like this.

    I've done a bunch of tests with Codea 4 using compute shaders, which would let you run something like this in realtime. I've got an implementation of the Jump flood algorithm (JFA) for calculating signed distance fields which I use for drawing outlines in the scene editor

    Some info: https://blog.demofox.org/2016/02/29/fast-voronoi-diagrams-and-distance-dield-textures-on-the-gpu-with-the-jump-flooding-algorithm/

    It can also be done with regular shaders in Codea 3.x, it's just using the rasterisation pipeline instead

  • Posts: 1,350

    @john M1 Max? How are you doing that?

Sign In or Register to comment.