Pages

Wednesday, September 7, 2011

F# pipeline and composition performance

Seems that F# pipeline and function composition is very similar. I experiment the performance of these two statements:

    let f0 x = x + 1
    let f1 x = x * 3

    let f2 = f0 >> f1
    let f3 x = x |> f0 |> f1

    let sw = Stopwatch()

    sw.Reset()
    sw.Start()
    [1..10000] |> Seq.iter (fun n -> (f2 n) |> ignore )
    sw.Stop()
    printfn "%d" (sw.ElapsedMilliseconds)

    sw.Reset()
    sw.Start()
    [1..10000] |> Seq.iter (fun n -> (f3 n) |> ignore )
    sw.Stop()
    printfn "%d" (sw.ElapsedMilliseconds)

    Console.ReadKey() |> ignore

The result is, in release mode, the pipeline is almost 3 times faster than composition.

I run the code from Debug mode and release mode. And invoke the .EXE from file system.  Please feel free to perform the same tests and give me feedback.

As suggested by Arseny Kapoulkine, I tried to use the Seq.map and then Seq.Length to force evaluation. Unfortunately, the result is similiar:


open System.Diagnostics

let f0 x = x + 1
let f1 x = x * 3

let f2 = f0 >> f1
let f3 x = x |> f0 |> f1

let sw = Stopwatch()

sw.Reset()
sw.Start()
[1..10000] |> Seq.map (fun n -> (f2 n) |> ignore ) |> Seq.length |> ignore
sw.Stop()
printfn "f2 = %d" (sw.ElapsedMilliseconds)

sw.Reset()
sw.Start()
[1..10000] |> Seq.map (fun n -> (f3 n) |> ignore ) |> Seq.length |> ignore
sw.Stop()
printfn "f3 = %d" (sw.ElapsedMilliseconds)  

System.Console.ReadKey() |> ignore


Really appreciate the feedback from Arseny!

the code he present is at  http://fssnip.net/8e. The result shows the difference between composite and pipeline is very small.


Now seems that composite and pipeline are same.

6 comments:

Arseny Kapoulkine said...

Your analysis is incorrect. Take a look at generated IL code (I recommend ILSpy) to find out why. The performance is exactly the same, given a valid test case.

Tao said...

Thank you for the feedback, Arseny. I post the screen shot from one of my test machines. Both machines give me same result. I will work with team to figure out why. Thanks again for your feedback.

Arseny Kapoulkine said...

To clarify, the performance of your code is different because in the pipeline case the compiler figures out that the entire computation does not need to be performed, and replaces the iter body with nothing; so you compare construction of a list + iteration with math vs construction of a list + iteration with empty body, which is an unfair comparison. Try replacing Seq.iter with Seq.map followed by Seq.length to force evaluation; you can also replace list construction with Seq.init to eliminate list construction time from the benchmark (though in this case it's not needed because the timings will be the same).

Tao said...

Really appreciate your follow up! Tried your approach, but still the same result.

Arseny Kapoulkine said...

I had the following code in mind:

Seq.init N id |> Seq.map (fun x -> f2 x) |> Seq.length |> ignore

or (better in this case)

Seq.init N id |> Seq.map f2 |> Seq.length |> ignore

Same for f3, obviously.

The second version of the code generates identical bytecode and performs the same, except for ~30ms difference between the two (you can see it by increasing the N - the perf. difference does not vary with N). I don't have a profiler with me, should be some lazy initialization inside F# Core, or maybe JIT. Adding an extra call fixes this, the full code is here: http://fssnip.net/8e

The first version of the code indeed penalizes the composition slightly, since the f2 call is not inlined by F# (not sure if it is inlined by JIT, it's a virtual method call - the performance is similar anyway).

Anonymous Developer said...

I realize this is old; however, I just wanted to point out that piping and function composition are different things used in different situations. When using function composition in its niche, it can help you write code with a nice pipe-like flow that is faster than code which actually uses piping.

Function composition allows you to mix logic that runs right now with logic that runs later. That means you can build an efficient, streamlined function without much branching to be used in your tight loops. You can do that without function composition, but it would not have the same nice, unbroken forward flow that composition and piping have. Consider the following examples, and assume we will know classifications "a", "b", and "c" ahead of time for a very large number of values "v". Pardon the stripping of whitespace.

type Cases =
| Case1
| Case2
| Case3

// Pure piping - nice, unbroken forward flow + sub-optimal execution
let version1 a b c v =
match a with
| Case1 -> v * 2.0
| Case2 -> v / 2.0
| Case3 -> v + 2.0
|> fun v ->
match b with
| Case1 -> v + 12.0
| Case2 -> v - 9.55
| Case3 -> v * 4.3
|> fun v ->
match c with
| Case1 -> v - 10.0
| Case2 -> v - 7.5
| Case3 -> v + 1.5

// Composition - nice, unbroken forward flow + streamlined result function
let version2 a b c =
match a with
| Case1 -> fun v -> v * 2.0
| Case2 -> fun v -> v / 2.0
| Case3 -> fun v -> v + 2.0
>> match b with
| Case1 -> fun v -> v + 12.0
| Case2 -> fun v -> v - 9.55
| Case3 -> fun v -> v * 4.3
>> match c with
| Case1 -> fun v -> v - 10.0
| Case2 -> fun v -> v - 7.5
| Case3 -> fun v -> v + 1.5

// Declaring functions - broken flow + streamlined result function
let version3 a b c =
let f1 =
match a with
| Case1 -> fun v -> v * 2.0
| Case2 -> fun v -> v / 2.0
| Case3 -> fun v -> v + 2.0
let f2 =
match b with
| Case1 -> fun v -> v + 12.0
| Case2 -> fun v -> v - 9.55
| Case3 -> fun v -> v * 4.3
let f3 =
match c with
| Case1 -> fun v -> v - 10.0
| Case2 -> fun v -> v - 7.5
| Case3 -> fun v -> v + 1.5
fun v -> v |> f1 |> f2 |> f3

Version 1 maintains a nice, unbroken forward flow, but it is inefficient because it contains branching that will be processed within our tight loop on a large set of data.

Version 2 maintains a nice, unbroken forward flow, and it is efficient because it returns a function that contains no branching whatsoever. That function can be used in a tight loop with high efficiency.

Version 3 does the same thing as version 2, so it is efficient; however, the flow of execution jumps around the code. It is not as easy to read and follow as version 2.

Version 2 gives us the best of both worlds in this particular scenario.