Monday, 3 March 2014

Performance and Progress

Performance

Performance of actuarial software has always been a major challenge. The standard packages have historically all been written with performance as a priority. They have therefore typically generated code in a highly performant language, usually C or C++. They have also offered ways to utilise multiple machines to carry out the computations. More recently they have offered Cloud compute facilities, for example see:

http://www.milliman.com/insight/insurance/MG-ALFA-takes-to-the-cloud-High-performance-computing-for-Solvency-II/

Even so, there continues to be interest in getting even more performance. There is an interesting conference, which is principally about this topic, next month in Edinburgh - Computation in Finance and Insurance, post Napier. I also noticed an interesting academic paper - An efficient algorithm for the calculation of non-unit linked reserves. This used reasonably complex algorithms and specialised tools such as Fortran and OpenMP.

One of the benefits of using F# is that it offers the potential of being both easy for an actuary to use and also highly performant. Where necessary, there is easy access to more powerful computational techniques. This includes highly optimised libraries, such as Math.NET Numerics, Extreme Optimization Numerical Libraries for .NET, StatFactory FCore. For relevant links, see the Math page on the F# Software Foundation site. There are straightforward options for parallelising F# code and running it on multiple machines or on GPUs. For  more links, see the options on the Cloud and GPU pages on the F# Software Foundation site.

Although mortality manager currently does not require high performance, this may become more important when stochastic mortality is added. I therefore decided to have a brief look at the performance of some of the code.

Simple Performance Test

In this section I will test options for coding the calculation of Dx. I will consider both the simplicity of the code and its relative performance, as it might be expected that some loss of simplicity will be required to achieve higher performance.

The test will be to generate 1,000,000 sets of Dx values from age 0 to 120 and for interest rates from  0.00001% to 10.0% in steps of 0.00001%.

I will assume I already have a list containing lx for each age.I can then define a list (is) or array (isa) of interest rates:
1: 
2: 
3: 
//list and array of interest rates
let is = [0.0000001..0.0000001..0.1]
let isa = [|0.0000001..0.0000001..0.1|]

The simplest code for this would be:
1: 
2: 
//simplest test
let Ds = is|>List.iter(fun i -> ls|>List.mapi(fun n l -> l/(1.0+i)**float(n))|>ignore)
This just calculates the Ds values but does not retain them as that would have significant memory implications. It does this by just raising v (i.e. 1/(1+i)) to the required power and multiplying it by the relevant lx.

In some circumstances using an array can give better performance than using a list. So here is an array version:
1: 
2: 
//array test
let Dsa = isa|>Array.iter(fun i -> ls|>List.mapi(fun n l -> l/(1.0+i)**float(n))|>ignore)
F# makes it very simple to run this code in Parallel, using multiple cores. Here is a parallel version:
1: 
2: 
//array parallel
let Dsap = isa|>Array.Parallel.iter(fun i -> ls|>List.mapi(fun n l -> l/(1.0+i)**float(n))|>ignore)
One known slow element will be the use of powers. You can remove this by creating a more complex recursive version:
1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
//avoid power
let rec getDs i ll lastv Dl =
    if ll=[] then List.rev Dl
    else
        let newv = lastv/(1.0+i)
        let D = newv*ll.Head
        getDs i ll.Tail newv (D::Dl)
let Dsrec = isa|>Array.Parallel.iter(fun i -> getDs i ls.Tail 1.0 [ls.[0]]|>ignore)
This starts at t=0 and then recursively iterates through each t, calculating v^n by dividing the previous one by (1+i).

We can use F# Interactive and #time to get a quick performance result for the 4 versions:
  1. Real: 00:00:06.416, CPU: 00:00:06.411, GC gen0: 773, gen1: 0, gen2: 0
  2. Real: 00:00:06.366, CPU: 00:00:06.364, GC gen0: 774, gen1: 0, gen2: 0
  3. Real: 00:00:01.932, CPU: 00:00:06.864, GC gen0: 777, gen1: 1, gen2: 0
  4. Real: 00:00:01.208, CPU: 00:00:03.385, GC gen0: 1541, gen1: 0, gen2: 0
The first thing to notice is the very high performance of all versions.You can see that the first two were virtually identical, as you might expect. On my quad core machine, the parallel version performed very well and efficiently used the cores available. The last version is interesting in demonstrating that in some cases it is worth making the code more complex to get a performance gain. I currently use this more performant recursive approach.

I may return to this topic in a later post - I intend to run this test on GPUs.

Progress

Since my previous post, it has been very straightforward to add further facilities. These include:
  • Creation of rates based on existing rates
  • Generation of Commutation functions
  • Generation of Present Value functions
  • Support for Joint Life functions
  • Greatly expanded set of UDFs (now 58)
I have therefore created a new version - V002. It looks like this:

 

This can be set up in the same way as V001 - see the instructions in my previous post. The new version is stored in bitbucket at:
https://bitbucket.org/pb_bwfc/mortality-manager/src/93331bc1b029b2ed98fe7297ee4d69cc3811f2f2/v002/MortMgrV002.xlsx?at=default

I now think this has a fairly complete set of base functionality. I therefore do not currently intend to add anything further that works with the standard mortality rates. Instead, I will next add modelling of mortality improvements and stochastic mortality modelling.

No comments:

Post a Comment