Wednesday, September 7, 2011

F# pipeline and composition performance

Seems that F# pipeline and function composition is very similar. I experiment the performance of these two statements:

    let f0 x = x + 1
    let f1 x = x * 3

    let f2 = f0 >> f1
    let f3 x = x |> f0 |> f1

    let sw = Stopwatch()

    sw.Reset()
    sw.Start()
    [1..10000] |> Seq.iter (fun n -> (f2 n) |> ignore )
    sw.Stop()
    printfn "%d" (sw.ElapsedMilliseconds)

    sw.Reset()
    sw.Start()
    [1..10000] |> Seq.iter (fun n -> (f3 n) |> ignore )
    sw.Stop()
    printfn "%d" (sw.ElapsedMilliseconds)

    Console.ReadKey() |> ignore

The result is, in release mode, the pipeline is almost 3 times faster than composition.

I run the code from Debug mode and release mode. And invoke the .EXE from file system.  Please feel free to perform the same tests and give me feedback.

As suggested by Arseny Kapoulkine, I tried to use the Seq.map and then Seq.Length to force evaluation. Unfortunately, the result is similiar:


open System.Diagnostics

let f0 x = x + 1
let f1 x = x * 3

let f2 = f0 >> f1
let f3 x = x |> f0 |> f1

let sw = Stopwatch()

sw.Reset()
sw.Start()
[1..10000] |> Seq.map (fun n -> (f2 n) |> ignore ) |> Seq.length |> ignore
sw.Stop()
printfn "f2 = %d" (sw.ElapsedMilliseconds)

sw.Reset()
sw.Start()
[1..10000] |> Seq.map (fun n -> (f3 n) |> ignore ) |> Seq.length |> ignore
sw.Stop()
printfn "f3 = %d" (sw.ElapsedMilliseconds)  

System.Console.ReadKey() |> ignore


Really appreciate the feedback from Arseny!

the code he present is at  http://fssnip.net/8e. The result shows the difference between composite and pipeline is very small.


Now seems that composite and pipeline are same.

5 comments:

Arseny Kapoulkine said...

Your analysis is incorrect. Take a look at generated IL code (I recommend ILSpy) to find out why. The performance is exactly the same, given a valid test case.

Tao Liu said...

Thank you for the feedback, Arseny. I post the screen shot from one of my test machines. Both machines give me same result. I will work with team to figure out why. Thanks again for your feedback.

Arseny Kapoulkine said...

To clarify, the performance of your code is different because in the pipeline case the compiler figures out that the entire computation does not need to be performed, and replaces the iter body with nothing; so you compare construction of a list + iteration with math vs construction of a list + iteration with empty body, which is an unfair comparison. Try replacing Seq.iter with Seq.map followed by Seq.length to force evaluation; you can also replace list construction with Seq.init to eliminate list construction time from the benchmark (though in this case it's not needed because the timings will be the same).

Tao Liu said...

Really appreciate your follow up! Tried your approach, but still the same result.

Arseny Kapoulkine said...

I had the following code in mind:

Seq.init N id |> Seq.map (fun x -> f2 x) |> Seq.length |> ignore

or (better in this case)

Seq.init N id |> Seq.map f2 |> Seq.length |> ignore

Same for f3, obviously.

The second version of the code generates identical bytecode and performs the same, except for ~30ms difference between the two (you can see it by increasing the N - the perf. difference does not vary with N). I don't have a profiler with me, should be some lazy initialization inside F# Core, or maybe JIT. Adding an extra call fixes this, the full code is here: http://fssnip.net/8e

The first version of the code indeed penalizes the composition slightly, since the f2 call is not inlined by F# (not sure if it is inlined by JIT, it's a virtual method call - the performance is similar anyway).