Archive for category Cocoa

The Pleasure and Pain of AVAssetWriterInputPixelBufferAdaptor

For a class with such a small interface it seems remarkable that I would feel that it deserves a blog post all on its own. But a mixture of appreciation for what it does, and the pain I have had working with it requires catharsis.

AVAssetWriterInputPixelBufferAdaptor objects are used for taking in pixel data and providing it to a AVAssetWriterInput in a suitable format for writing that pixel data into a movie file.

Read the rest of this entry »

Tags:

Drawing rotated text with CoreText is broken

Here is a video where the text is being rotated at different angles. You can see from the flickering that at certain angles the text just doesn’t get drawn.

 

And here is a video where I’ve created a bitmap of the text drawn unrotated and then I draw the bitmap rotated at different angles. Which works:

 

I am drawing the core text within a path rather than from a point and I think that is the issue. I have seen in other situations when drawing multi-line wrapping text in a a path that is a column then if the drawing happens while the context is rotated then at certain angles the line wrapping for the first line of text behaves oddly.

I get exactly the same behaviour on iOS as I do on OS X.

This is definitely an issue of drawing the text within a path, drawing the text from a point works as expected.

Tags:

AV Foundation editing movie file content

Firstly a link to a new useful Technote for AV Foundation released 1 December 2014.

TechNote 2404 – A short note on added AV Foundation API in Yosemite, specifically see section on AVSampleCursor and AVSampleBufferGenerator

Now onto WWDC 2013 session 612 Advanced Editing with AV Foundation on this page

Talk Overview

  • Custom Video compositing
    • Existing architecture
    • New custom video compositing
    • Choosing pixel formats
    • Tweening
    • Performance
  • Debugging compositions

    • Common pitfalls

Existing Architecture

AV Foundation editing today

  • Available since iOS 4.0 and OS X Lion
  • Used in video editing apps from Apple and in the store
  • Video editing
    • Temporal composition
    • Video composition
    • Audio mixing

Custom Video Compositor

What is a Video Compositor?

  • Unit of video mixing code
    • A chunk of video mixing code takes multiple sources.
  • Receives multiple source frames
  • Blends or transforms pixels
  • Delivers single output frame
  • Part of the composition architecture

What is a Composition model

AVCompositionModel

Instruction objects in a AVVideoComposition

AVVideoInstructions

The Video compositor takes multiple source frames in and produces a single frame out.

For example we can encode a dissolve as a property of an instruction. For example an opacity ramp for 1 down to 0.

New Custom Video Compositing

As of Mavericks there is a new custom compositing API. You can replace the built in compositor with your own compositor. Instructions with mixing parameters are bundled up together with the source frames into a request object. You’ll be implementing the protocol @AVVideoCompositing which receives the new object AVAsynchronousVideoCompositionRequest and also implementing the protocol AVVideoCompositionInstruction.

func startVideoCompositionRequest(_ asyncVideoCompositionRequest: AVAsynchronousVideoCompositionRequest!)

Once you have rendered the frame you deliver it with

func finishWithComposedVideoFrame(_ composedVideoFrame: CVPixelBuffer!)

But you can also finish with one of:

func finishCancelledRequest()
func finishWithError(_ error: NSError!)

Choosing Pixel Formats

  • Source Pixel Formats – small subset
    • YUV 8-bit 4:2:0
    • YUV 8-bit 4:4:4
    • YUV 10-bit 4:2:2
    • YUV 10-bit 4:4:4
    • RGB 24-bit
    • BGRA 32-bit
    • ARGB 32-bit
    • ABGR 32-bit

When decoding H264 Video your source pixel format is typically YUV 8-bit 4:2:0

You may not be able to deal with that format or whatever the native format of the source pixels are that you require. You can specify what format you require by your custom video compositor using method: func sourcePixelBufferAttributes which should return a dictionary. The key kCVPixelBufferPixelFormatTypeKey should be specified and it takes an array of possible pixel formats and if you want the compositor to work with a CoreAnimation video layer then you should provide a single entry in the array with a value kCVPixelFormatType_32BGRA.

This will cause the source frames to be converted into the format required by your custom compositor.

Output Pixel Formats

For the output pixel formats there is also a method requiredPixelBufferAttributesForRenderContext where you specify the formats your custom renderer can provide.

To get hold of a new empty frame to render into, we go back to the request object ask it for the render context which contains information about the aspect ratio and size that we are rendering to and also the required pixel format. We ask for a new pixel buffer which comes from a managed pool and we can then render into it to get our dissolve.

The Hello World equivalent for a custom compositor

@interface MyCompositor1 : NSObject<AVVideoCompositing>
@end

// Sources as BGRA please
-(NSDictionary *)sourcePixelBufferAttributes {
    return @{ (id)kCVPixelBufferPixelFormatTypeKey :
            @[ @(kCVPixelFormatType_32BGRA) ] }
}

// We'll output BGRA
-(NSDictionary *)requiredPixelBufferAttributesForRenderContext {
    return @{ (id)kCVPixelBufferPixelFormatTypeKey :
            @[ @(kCVPixelFormatType_32BGRA) ] }
}

// Render a frame - the action happens here on receiving a request object.
-(void)startVideoCompositionRequest:(AVAsynchronousVideoCompositionRequest *)request {
    if (request.sourceTrackIDs count] != 2) {

    }
    // There'll be an attempt to back the pixel buffer with an IOSurface which means
    // that they will be in GPU memory.
    CVPixelBufferRef srcPixelsBackground = [request sourceFrameByTrackID:[request.sourceTrackIDs[0] intValue]];
    CVPixelBufferRef srcPixelsForeground = [request sourceFrameByTrackID:[request.sourceTrackIDs[1] intValue]];
    CVPixelBufferRef outPixels = [[request renderContext] newPixelBuffer];

    // render - this is really only in its own scope so that code folding is poss in demo.
    {
        // However here because we want to manipulate pixels ourselves we will lock
        // the pixel buffer base address which I think makes sure the pixel data is in
        // main memory so that we can access it.
        CVPixelBufferLockBaseAddress(srcPixelsForeground, kCVPixelBufferLock_ReadOnly);
        CVPixelBufferLockBaseAddress(srcPixelsBackground, kCVPixelBufferLock_ReadOnly);
        CVPixelBufferLockBaseAddress(outPixels, 0);

        // Calculate the tween block.
        float tween;
        CMTime renderTime = request.compositionTime;
        CMTimeRange range = request.videoCompositionInstruction.timeRange;
        CMTime elapsed = CMTimeSubtract(renderTime, range.start);
        tween = CMTimeGetSeconds(elapsed) / CMTimeGetSeconds(range.duration)

        size_t height = CVPixelBufferGetHeight(srcPixelsBackground, kCVPixelBufferLock_ReadOnly);
        size_t foregroundBytesPerRow = CVPixelBufferGetBytesPerRow(srcPixelsForeground, kCVPixelBufferLock_ReadOnly);
        size_t backgroundBytesPerRow = CVPixelBufferGetBytesPerRow(srcPixelsBackground, kCVPixelBufferLock_ReadOnly);
        outBytesPerRow = CVPixelBufferGetBaseAddress(outPixels);
        const char *foregroundRow = CVPixelBufferGetBaseAddress(srcPixelsForeground)
        const char *backgroundRow = CVPixelBufferGetBaseAddress(srcPixelsBackground)
        outRow = CVPixelBufferGetBaseAddress(outputPixels)
        for (size_t y = 0; y < height; ++y)
        {
            // Some hacky code for copy bytes from one buffer to another.
        }
        CVPixelBufferUnlockBaseAddress(srcPixelsForeground, kCVPixelBufferLock_ReadOnly);
        CVPixelBufferUnlockBaseAddress(srcPixelsBackground, kCVPixelBufferLock_ReadOnly);
        CVPixelBufferUnlockBaseAddress(outPixels, 0);
    }

    // deliver output
    [request finishWithcomposedVideoFrame:outPixels];
    CFRelease(outPixels)
}

Tweening

Tweening is the parameterisation of the transition from one state to another. In the case where you are transitioning so that the new video track generated images start with images from one video track and end with images from another, for a dissolve transition the tween is an opacity ramp where the input for the opacity ramp is time.

TweeningAVVideoCompositionOpacityRamp

The image above shows the two input video tracks and the opacity ramp. The image below shows the calculation of the tween value once you are 10% of the way through the transition. In this case the output video frame will display the first video at 90% opacity and the second video at 10%.

TweeningPart2

Performance

Instruction properties for the AVVideoCompositionInstruction protocol help with the compositor optimising performance.

@protocol AVVideoCompositionInstruction<NSObject>
{
    @property CMPersistentTrackID passthroughTrackID;
    @property NSArray *requiredSourceTrackIDs;
    @property BOOL containsTweening;
}

By setting these values appropriately there are performance wins to be had.

passthroughTrackID

Some instructions are simpler than others, they might just take one source and often not even change the frames, for example in the frames leading up to a transition, the output frames are just the input frames from a particular track. In the instruction if you set the passthroughTrackID to the id of a particular track then the compositor will be bypassed.

requiredSourceTrackIDs

Use this to specify the required tracks and that we do want the compositor to be called. If we have just a single track but we want to modify the contents of the frame in some then requiredSourceTrackIDs will contain just the single track. If you leave requiredSourceTrackIDs set to nil then that means deliver all frames from all tracks.

containsTweening

Even if source frames are the same, two static images for example but if we want to have a picture in a picture effect where the smaller image moves within the bigger picture then containsTweening needs to be set to YES. We have time extended source. If the smaller image doesn’t move then if we leave containsTweening to be YES then we are just re-rendering identical output so instead containsTweening should be set to NO. Then after the initial frame is rendered the compositor can optimise by just reusing the identical output.

Pixel buffer formats

  • Performance hit converting sources
    • H.264 decodes to YUV 4:2:0 natively.
    • For best performance, work in YUV 4:2:0
  • Output format less critical, display can accept multiple formats for example:

    • BGRA
    • YUV 4:2:0

The AVCustomEdit example code is available here.

Debugging Compositions

  • Common pitfalls
    • Gaps between segments
      • Results in black frames or hanging onto the last frame.
    • Misaligned track segments

      • Rounding errors when working with CMTime etc.
      • Results in a short gap between end of one segment & beginning of next.
    • Misaligned layer instructions

      • Track/Layers are rendered in wrong order.
    • Misaligned opacity/audio ramps

      • Opacity/Audio ramps over or undershoot their final value.
    • Bogus layer transforms

      • Errors in your transformation matrix so layers disappear
      • Outside boundaries of output frame

Being able to view the structure of the composition is useful and this is where AVCompositionDebugView comes in.

There is also the composition validation API which you can implement. You will receive callbacks when something appears to not be correct in the video composition.

Tags:

Image generation performance

I’m wanting random access to individual movie frames so I’m using the AVAssetImageGenerator class but for this part of the project using generateCGImagesAsynchronously is not appropriate. Now clearly performance is not the crucial component here but at the same time you don’t want to do something that is stupidly slow.

I’d like to not have to hold onto a AVAssetImageGenerator object to use each time I need an image but just create one at the time a image is requested. So I thought I’d find out the penalty of creating a AVAssetImageGenerator object each time.

To compare the performance I added some performance tests and ran those tests on my i7 MBP with an SSD and on my iPad Mini 2. I’ve confirmed that the images are generated. See code at end.

On my iPad Mini 2 the measure block in performance test 1 took between 0.25 and 0.45 seconds to run. Most results clustering around 0.45 seconds. It was the second run that returned the 0.25 second result. Performance test 2 on the iPad Mini 2 was much more consistent with times ranging between 0.5 and 0.52 seconds. But reversing the order in which the tests run reverses these results. I’m not sure what to think about this, but in relation to what I’m testing for I feel comfortable that the cost of creating an AVAssetImageGenerator object before generating an image is minimal in comparison to generating the CGImage.

Strangely my MBP pro is slower, but doesn’t have the variation observed on the iPad. The measure block in both performance tests take 1.1 seconds.

Whatever performance difference there is in keeping a AVAssetImageGenerator object around or not is inconsequential.

    func testAVAssetImageGeneratorPerformance1() {
        let options = [
            AVURLAssetPreferPreciseDurationAndTimingKey: true,
            AVURLAssetReferenceRestrictionsKey:
                AVAssetReferenceRestrictions.RestrictionForbidNone.rawValue
        ]
        
        let asset = AVURLAsset(URL: movieURL, options: options)!
        self.measureBlock() {
            let generator = AVAssetImageGenerator(asset: asset)
            var actualTime:CMTime = CMTimeMake(0, 600)
            for i in 0..<10 {
                let image = generator.copyCGImageAtTime(
                    CMTimeMake(i * 600 + 30, 600),
                    actualTime: &actualTime, error: nil)
            }
        }
    }
    
    func functionToTestPerformance(#movieAsset:AVURLAsset, index:Int) -> Void {
        let generator = AVAssetImageGenerator(asset: movieAsset)
        var actualTime:CMTime = CMTimeMake(0, 600)
        let image = generator.copyCGImageAtTime(
            CMTimeMake(index * 600 + 30, 600),
            actualTime: &actualTime, error: nil)
    }
    
    func testAVAssetImageGeneratorPerformance2() {
        let options = [
            AVURLAssetPreferPreciseDurationAndTimingKey: true,
            AVURLAssetReferenceRestrictionsKey:
                AVAssetReferenceRestrictions.RestrictionForbidNone.rawValue
        ]
        
        let asset = AVURLAsset(URL: movieURL, options: options)!
        self.measureBlock() {
            for i in 0..<10 {
                self.functionToTestPerformance(movieAsset: asset, index: i)
            }
        }
    }

Tags: ,

Getting tracks from an AVAsset

I’ve been playing around with the AVAsset AVFoundation API.

The AVAsset object is at the core of representing an imported movie. An AVAssetTrack is AVFoundation’s representation of a track in a movie. There are multiple ways to get AVAssetTracks from an AVAsset.

You can get a list of all the tracks:

let movie:AVAsset = ...
let tracks:[AVAssetTrack] = movie.tracks

You can get a list of tracks with a specific characteristic, for example a visual characteristic:

let movie:AVAsset = ...
let tracks:[AVAssetTrack] = movie.tracksWithMediaCharacteristic(AVMediaCharacteristicVisual)

Or you can get a list of tracks which have a specific media type, for example audio:

let movie:AVAsset = ...
let tracks:[AVAssetTrack] = movie.tracksWithMediaType(AVMediaTypeAudio)

You can obtain a single AVAssetTrack object if you know it’s persistent track identifier value.

The persistent track identifier is of type CMPersitentTrackID which is 32 bit integer typedef and the invalid track reference kCMPersistentTrackID_Invalid is an anonymous enum with value 0.

Unfortunately the only way to get the track id of a track in an imported movie is querying a AVAssetTrack object so the persistent track id is useful when later on you want to reference a track that you have previously identified.

From what I understand the AVAssetTrack objects are fairly lightweight so keeping a list of AVAssetTrack objects is not going to be too much of a drain, but you might still prefer to keep a list of persistent track id values to request a AVAssetTrack object when you need it rather than holding onto a reference to an AVAssetTrack object.

To get a track using the tracks persistent identifier

let track:AVAssetTrack = movie.trackWithTrackID(2)

Tracks have segments and a segment specifies when one bit of content in a track starts and finishes based on a within the time range of a track. Each segment contains a time mapping between the source and target. You can get a list of all the segments in a track and this will often be a list of 1 segment which lasts the full length of the track but this is not necessarily the case.

To get a list of segments from a track:

let segments = track.segments

You can get the segment that corresponds to a specific track time:

let trackTime = CMTimeMake(60000, 600)
let segment = track.segmentForTrackTime(trackTime)

I’ve created a gist which is a very simple command line tool written in swift demonstrating this blog post.

Tags:

Thinking about my tests

I’ve installed Yosemite and of course the first thing I did was to run my tests

Almost every test failed. Generated images are all different. They look the same to my poor eyesight but pixel values can be quite different with the compare tolerance increased to 26* from 0 needed for an image to be identified as the same. I had previously only needed to do this when comparing images created from windows on different monitors. I think perhaps I need to have a think about exactly what it is I’m testing. These tests have saved me a lot of time and given me confidence that I’ve not been breaking stuff but for so many to break with an os upgrade doesn’t help.

For now the failure of the tests beyond the image generation described above, has informed me about the following changes to ImageIO and CoreImage filters.

Information returned about functionality provided by ImageIO and CoreImage

ImageIO can now import three new formats: “public.pbm”, “public.pvr”, “com.apple.rjpeg”
ImageIO has lost one import format: “public.xbitmap-image”

I’ve no idea what these formats are and I’ve been unsuccessful at finding information about them.

ImageIO has added export formats: “public.pbm”, “public.pvr”, “com.apple.rjpeg”

Apple has added these new CoreImage filters:

CIAccordionFoldTransition CIAztecCodeGenerator CICode128BarcodeGenerator CIDivideBlendMode CILinearBurnBlendMode CILinearDodgeBlendMode CILinearToSRGBToneCurve CIMaskedVariableBlur CIPerspectiveCorrection CIPinLightBlendMode CISRGBToneCurveToLinear CISubtractBlendMode

There are minor configuration or filter property changes to the filters listed below with a brief description of the change:

  • CIBarsSwipeTransition inputAngle given updated values for default and max. Identity attributes removed for inputWidth and inputBarOffset.
  • CIVignetteEffect inputIntensity slider min changed from 0 to -1.
  • CIQRCodeGenerator has spaces added to description of one property, and a description added for another.
  • CILanczosScaleTransform has a fix for the filter display name.
  • CIHighlightShadowAdjust inputRadius has minimum slider value changed from 1 to 0.
  • CICMYKHalftone inputWidth attribute minimum changed from 2 to -2. inputShapness attribute type is CIAttributeTypeDistance not CIAttributeTypeScalar
  • CICircleSplashDistortion inputRadius has a new identity attribute with value 0.1
  • CIBumpDistortionLinear inputScale, inputRadius and inputCenter given slightly more rational default values.
  • CIBumpDistortion inputScale, and inputRadius are given slightly more rational defaults.
  • CIBarsSwipeTransition inputAngle given updated values for default and max. Identity attributes removed for inputWidth and inputBarOffset.

*This is comparing images created from a 8 bit per color component bitmap context. So out of a range of 256 possible values images generated on Mavericks compared to ones generated on Yosemite are different by up to 26 of those 256 values. That’s huge.

Core Image Filter Rendering. Performance & color profiles

The Apple documentation for rendering a core image filter chain notes that allowing the filter chain to render in the Generic Linear color space is faster. If you need better performance and are willing to trade that off against better color matching then allowing the filter chain to render in the generic linear color space should be faster.

I thought I better look at what the impact of this was both for performance and color matching. I also wanted to see what the difference was if the core graphics context that the filter chain rendered to was created with a sRGB color profile or a Generic Linear RGB profile when the context bitmap was saved as an image to an image file.

All the tests were done on my laptop with the following configuration:

OS: Mavericks 10.9.2
System information: MacBookPro non retina, model: MacBookPro9,1
Chipset Model:	NVIDIA GeForce GT 650M 500MByte.
Chipset Model:	Intel HD Graphics 4000
A 512GByte SSD, 16GByte RAM.

I installed gfxCard Status tool sometime ago which allows me to manually switch which cards to use, and also to inform me when the system automatically changes which card is in use. I use to get changes reported regularly but after one of the Mavericks updates this happened much less. After that update the only consistent way for the discrete card to be switched on automatically by the system was having an external monitor plugged in. I think the OS is trying much harder to keep the discrete graphics card turned off. I have NSSupportsAutomaticGraphics switching key in my info.plist set to YES. I have tried setting the value to NO, and if I run the tests then as long as software render is not specified I’m informed that the system has turned the discrete graphics card on but the CoreImage filter render performance is still poor. The consequence is I’m not really sure that the discrete graphics card is being used for these tests. Perhaps I’d get different results as to whether GPU rendering or software rendering was faster if I had a more complex filter chain so what I might be seeing here is the time needed to push the data to the graphics card, and then pull it back dominating the timing results.

First up, when comparing images where the only difference in image generation has been whether they are rendered to a CGContext with a sRGB profile or a Generic Linear RGB profile then when I view the images in Preview they look identical. The reported profiles are different, the image generated from a context with Generic Linear RGB has a reported profile of Generic HDR profile while the image from a context with a SRGB profile has a reported profile of sRGB IEC61966-2.1.

When the filter chain has the straighten filter and it rotates the image 180 degrees the colors of the output image are exactly the same as the input image when viewed in Preview, no matter the options for generating the output image.

When the filter chain has the box blur filter applied with a radius of 10 pixels the image rendered in the linear generic rgb profile is lighter than the one rendered using the sRGB profile when viewing the output images in preview. The image rendered using the sRGB looks to match better the original colors of the image. The generic linear rgb profile appears to lighten the image. The color change is not large and would be probably be acceptable for real time rendering purposes.

Setting kCIContextUseSoftwareRenderer to YES or NO when creating the CIContext makes no difference in terms of the color changes.

However I get the opposite of what I’d expect with speed.

Asking the filter chain with filter CIBoxBlur with radius of 10 to render 200 times to a Core Graphics context with a sRGB color profile:

Software render using sRGB profile: 4.1 seconds
Software render using Linear Generic RGB profile: 5.3 seconds
GPU render using sRGB profile: 7.0 seconds
GPU render using Linear Generic RGB profile: 7.5 seconds

If I create a Core Graphics context with a Generic Linear RGB color profile then:

Software render using sRGB profile: 4.0 seconds
Software render using Linear Generic RGB profile: 5.3 seconds
GPU render using sRGB profile: 7.3 seconds
GPU render using Linear Generic RGB profile: 7.7 seconds
  1. These results are completely 180º turned around from the results that I’d expect. If I was to accept them as unquestioned truth then I’d have to decide to always just work using the sRGB profile and to do all rendering via software and not worry about using the GPU unless I needed to offload work from the CPU.

A later observation (Friday 2nd Mary 2014), when drawing text into a bitmap context and running off battery power, I’m informed that the system has switched temporarily to using the discrete graphics card and then informed soon after it has switched back.

Tags: , ,

MovingImages CoreImage Transition Filter Example

I’ve written a number of ruby scripts that use MovingImages. One of the recent ones takes advantage of the CoreImage filter functionality that I’ve recently hooked into Moving Images. You’ll get to see this in the second alpha release which I’m pleased to say will be released soon.

The script is called exactly as shown below:

./dotransition --count 30 --sourceimage "/Users/ktam/Pictures/20140422 Pictures/DSC01625Small2.JPG" --destinationimage "/Users/ktam/Pictures/20140422 Pictures/DSC01625Small2.JPG" --basename PageCurl --outputdir "~/Desktop/deleteme/" --backsideimage "/Users/ktam/Pictures/20140422 Pictures/DSC01625SmallCropped.jpg" --angle=-3.07 --transitionfilter CIPageCurlTransition --extent 0,0,463,694 --verbose

I then used a script that you can download that works with the first alpha release of MovingImages and was called exactly as shown below:

./createanimation --delaytime 0.1 --outputfile ~/Desktop/PageTurningAnimation.gif --verbose ~/Desktop/deleteme/PageCurl*.tiff

The result of running both those scripts

Gif animation where the same page is turned over forever

The MovingImages documentation shows the output images at each step of the filter chain. Scroll down past the generated json code to see the images.

The create animation script can be viewed here.

The do transition script can be viewed here.

And an earlier demonstration using the embossmask script

Moving Images

Backside image supplied to CIPageCurlTransition filter doesn’t take

Please see note at end.

I’ve not been able to get setting the “inputBacksideImage” key to work when setting an image to use as the image displayed on the reverse side when a page is being curled over. I’ve not seen any reports that this is broken anywhere on the internet, so I thought I’d just let people know here.

As of OS X 10.9.2 using Xcode 5.1 developer tools, this option doesn’t work.

I’ve posted sample code for a command line tool that demonstrates the problem. This is the same code I used in my bug report to Apple. The sample code can be viewed as a gist on git hub: https://gist.github.com/SheffieldKevin/9873485

Note: This is not broken. The circle in the shading image needs to be partially transparent. The shading image is applied on top of the backside image and covers it up unless it is partially transparent. I’ve updated the code in the gist and everything works as it should.

CoreImage, CIPageCurlTransition, Cocoa, inputBacksideImage, broken, OS X

 

SSD versus HDD, Movie Frame Grabs and the importance of profiling

There was an e-mail to Apple’s cocoa-dev e-mail list that provoked a bit of discussion. The discussion thread starts with this e-mail to cocoa dev by Trygve.

Basically Trygve was wanting to get better performance from his code for taking frame grabs from movies and drawing those frame grabs as thumbnails to what I call a cover sheet. Trygve was using NSImage to do the drawing and was complaining that based on profiling his code was spending 90% of the time in a method called drawInRect.

Read the rest of this entry »

Tags: , , , , , , , ,