Pages

Sunday, September 5, 2010

Simulation on Euler Project 205

When I was trying to solve Euler Project 205 using F#, I was thrilled to find an opportunity to use GPGPU to solve this problem. The correct solution for this problem is to use statistic method, but I would like to simulate the process and see what I can get. Again, I do not intend to solve this problem by this approach, just to learn GPGPU.

Simulating this process is simple, I make a 9*1000 size matrix A with random number from 1 to 4. Similarly, matrix B is 6 * 1000 with element ranged from 1 to 6. The Accelerator v2 from Microsoft Research provides a good managed platform for GPGPU. The following is the code:


let dxTarget = new Microsoft.ParallelArrays.DX9Target()

let getWin index (gridSize:int) =
    let shape = [| gridSize; gridSize; |]
    let resultShape = [|gridSize|]
    let zero, one = new FPA(0.0f, resultShape), new FPA(1.0f, resultShape)

    let generatePrymaid i j = float32(random.Next(1,4))
    let generateDice i j = float32(random.Next(1,6))

    let px = new FPA(Array2D.init 9 gridSize generatePrymaid)
    let py = new FPA(Array2D.init 6 gridSize generateDice)
    let pxSum = PA.Sum(px, 0)
    let pySum = PA.Sum(py, 0)
    let cond = FPA.op_GreaterThan(pxSum, pySum)
    let gpu = PA.Sum(PA.Cond(cond, one, zero), 0)  
  
    let a = dxTarget.ToArray1D(gpu)
    let result = Seq.head a
    tempResult <- tempResult + int64(result)
    if (index%500=0) then
        let total = int64(index) * int64(gridSize) + totalTest
        printfn "Time = %A; Result = %A" index (float(tempResult) / float(total))
        appendToFile file (System.String.Format("{0}\t{1}", tempResult, total))
    result

let compute n = Seq.init n (fun i->getWin i gridSize) |> Seq.sum

let n = 250000;
let mutable start = System.DateTime.Now;
printfn "%A" ((compute n) / float32(gridSize*n))
printfn "%A" (System.DateTime.Now - start);

The result is interesting:


  1. on my powerful desktop: the GPU version is slower than CPU version.
  2. on my laptop, the GPU version is 2 time faster than CPU version. The CPU fan is quiet when I do the computation.

No comments: