Monday, May 19, 2014

Benchmarking Discrete Event Simulation Software

Introduction

Over the last several months Harvey Harrison and Matt Chudleigh have been working hard to increase JaamSim's execution speed. With the release this week of JaamSim2014-20, I am pleased to say that JaamSim is now faster than both Arena and Simio.

Before starting our optimisation process, we searched the internet for any information on benchmarks for discrete event simulation software. Surprisingly, there are no published benchmarks of any kind that compare the execution speeds of the various simulation packages. Some articles compare the features and design of the software packages, but provide no quantitative data. To the best of my knowledge, this blog post is the first to give numerical results.

The first step in any optimisation effort is to decide what to measure. For discrete event software, there are three activities that are likely to be slow enough to bottleneck execution speed:
  1. Event scheduling and processing.
  2. Creating and destroying entities.
  3. Executing blocks that do not involve simulated time.

Event Scheduling and Processing Benchmark

This post focuses on the first of these activities -- scheduling and executing events. The following model was devised to measure this time.
Benchmark Model
In this model, N entities are created at time zero and fed into a delay activity whose duration is selected randomly from the range 0 to 2N seconds (uniformly distributed). On completion of the delay, each entity is routed back to re-execute the delay again. This design results in an average of one entity per second completing the delay and an average of N events on the future event list. The use of the uniform distribution ensures that each new event is inserted in a random position in the future event list.

The average time to create, schedule, and execute an event was measured by running the model for 60 seconds of real time (using a stopwatch) and counting the number of times the delay activity was completed. The effect of the computer speed was allowed for by converting the calculated time into clock cycles. All measurements were made using my laptop computer which has a second generator (Sandybridge) Core i5 processor running at 2.5 GHz.

Performance Results

We began by benchmarking the initial version of JaamSim (2014-11) and comparing the results to those for Arena and Simio. The following graph shows the execution time per event for the three software packages over a range of future event list sizes. This is the "Before" picture for the simulation weight loss clinic.
Before Optimisation
Clearly, JaamSim2014-11 was quite a lot slower than both Arena and Simio for event scheduling and processing. This slowness was caused by its use of system threads to control simulated time. Each event required two context switches -- passing control from one thread to another -- a relatively time consuming process for the computer.

Next we began optimising. The following graph shows the performance of two subsequent releases of JaamSim.
After Optimisation
The biggest gain came from re-designing the event processing engine so that events could executed without context switching. This was no small task and preparations had been made for this change over several previous months. Jaam2014-16 was the first release to implement the new system and the event processing time immediately dropped from 67,000 clock cycles per event to 1,200 clock cycles per event (with an empty future events list). For comparison, Simio required 1,800 clock cycles per event.

The next step was to improve performance when there are a large number of events on the future events list. JaamSim2014-16 used a simple array structure that required O(N) clock cycles to insert a new event. This worked well up to about 100 events on the future event list, but became very inefficient for larger numbers of events. JaamSim2014-18 (not shown) improved this design enough to extend the range of efficient operation up to 1,000 future events, but became too slow at 10,000 events.

Finally, in JaamSim2014-20, we replaced the future events array with a new data structure called a "red-black tree" that requires only O(logN) clock cycles to insert a new event. With this change, JaamSim provides excellent performance to at least 10,000 future events.

Concluding Remarks

As of JaamSim2014-20, we have improved event scheduling and processing to the extent that it is no longer a significant factor in model execution speed. As subsequent posts will show, the execution time for a JaamSim delay block is now about the same as that for any of the other blocks in the Basic Objects palette.

So far, we have only benchmarked Arena and Simio, but more will come over the next 6 months. In particular, I'd like to see how well AnyLogic, FlexSim, and ExtendSim perform. Assistance from any readers who are familiar with these software packages would be most welcome. The hardest part of benchmarking other software is to learn enough to implement the various benchmarks in way that gets the best from the software. In the case of Simio, I am grateful to Alan Sagan for providing an implementation that performed ten times faster than my first attempt.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.