There was an e-mail to Apple’s cocoa-dev e-mail list that provoked a bit of discussion. The discussion thread starts with this e-mail to cocoa dev by Trygve.

Basically Trygve was wanting to get better performance from his code for taking frame grabs from movies and drawing those frame grabs as thumbnails to what I call a cover sheet. Trygve was using NSImage to do the drawing and was complaining that based on profiling his code was spending 90% of the time in a method called drawInRect.

There was some discussion as to whether using CoreGraphics and/or CoreImage and some sort of multi-threading would help with getting the work done, so I decided to put together a Xcode project that built a command line tool to try out the different possibilities. I used Apple’s avframegrabber sample project as a starting point.

My project and its branches

The command line tool I setup the Xcode project to build is called makecoversheet.

My various attempts to improve performance are on different branches in my git repository that I’ve pushed up to github.

Master branch: Uses CoreImage to scale down the images. I’ve not created any gcd queues to distribute work. The CoreImage scale frame grabs results are reported here.

cgandgcd branch: Uses CoreGraphics to scale the images and uses gcd queues to distribute the work of scaling. There is a #define in YVSMakeCoverSheet.m called “USE_COREIMAGE” which is set to 0 but can be set to 1 to see the performance difference. The CoreGraphics scale frame grabs with gcd queues results are reported here.

generateimagesasynch branch: Uses CoreGraphics to scale the images and uses gcd queues to create multiple AVImageGenerator objects which are used to obtain the movie frames and gcd queues to distribute scaling of the images. The “USE_COREIMAGE” #define is still available for trying out options. The results for the tests with a few comments are reported here.

usensoperation branch: Which replaces the gcd queue used to create multiple AVImageGenerator objects with a NSOperationQueue which has the number of concurrent threads it is allowed to run set to the number of processors. Otherwise this is the same as the generateimagesasynch branch. The results for the tests using NSOperationQueues are here.

nsoperationlimited branch: Similar to usensoperation branch, but that the number of block operations running on the NSOperationQueue is limited to a maximum of 4 programmatically rather than having all the queues created and then just limiting the number of queues that can be run. This produced good results, getting the benefits of the random access SSD without making the performance of the hdd worse. The results are for the tests using NSOperationQueues and limiting the number of operations is here.

I pretty much started with the master branch and then moved forward from one branch to the next as listed above.

Observations

Starting with build from the last commit for the master branch I got quite different behaviour to Trygve. CPU usage was low. Looking in Activity monitor, every time the “makecoversheet” command line tool was run a service called VTDecodeXPCService would start and would take up about 2 to 3 times more CPU usage than the command line tool. But CPU usage was never high.

In my testing I got caught out a few times. If sitting in the comfy chair with my laptop then when running the makecoversheet tool using CoreImage there was a brief lag while the discrete graphics card was brought into use. When at my desk with the laptop plugged into an external monitor the discrete graphics is always in operation and there is no delay, but at the same time I think the GPU has less free resources so it can handle fewer requests. Secondly when testing against the movie file on the external hdd, the second and third runs of the same test would be slightly faster. Perhaps some of the movie file data had been moved to a disk cache during the first run. Keeping the testing environment consistent is tricky.

Forgetting to set the requested time tolerances on the AVAssetImageGenerator object for movie frame access, produced stupidly fast results, but you get many duplicated frame grabs.

Conclusions

Trygve’s profiling showed to him that his issue was with scaling the images whilst drawing. My first approach was to use CoreImage to do the image scaling. This resulted in a large amount of idle time (master branch). My next approach was to use gcd queues to distribute the work of scaling on queues using dispatch_async.  This didn’t improve the situation. If I’d looked more closely at the profile data from running the tool built on the master branch I would have seen that most of the time was in seeking and reading of the movie file. No amount of improving drawing performance was going to really help.

The next approach was to find a way to speed up the process of getting frame data. Jim Crate suggested creating multiple AVAssetImageGenerator objects and then requesting frame grabs in generateCGImagesAsynchronouslyForTimes and this helped when reading data from my internal SSD  but performance plummeted when reading a movie file on my external hdd in some cases makecoversheet command line didn’t actually finish and becomes stuck. Also many threads were being created to distribute the work, not an efficient use of resources. I was able via trial and error in limiting the number of created block operations to more than double performance when the movie file was on my SSD and get a 60% increase in speed when the movie file was on the hdd for scaling frame grabs. Though in the case of the movie file on the SSD this also means that my computer is running flat out with sub 1% idle times. I am more than happy with this result and am happy for others to take advantage of my experiments.