Skip to content

Finding Prime Numbers

ADMIN | 07-11-2011

I’m disappointed when I come across code examples for multiprocessing technology that use trivial algorithms to demonstrate the simplicity of an implementation.

Hexagon pattern 1

I’m generally disappointed when I come across source code examples for multiprocessing technologies that use trivial algorithms to demonstrate the simplicity of an implementation. This not only establishes expectations that are unrealistic but also hides details that are painfully discovered when applied to more complex code. We are after all, trying to improve performance in our real-world applications with multiprocessing by targeting algorithms that have a real impact on the end user, which probably does not include a bubble sort.

As mentioned in my previous post, I have a new toy with 48 cores and thought it would be interesting to put the various multiprocessing technologies that are at my disposal to the test. These are: OpenMP, MPI, PPL (Microsoft Concurrency Runtime), the ACIS thread manager, and the CGM multiprocessing infrastructure. Forgive me for creating yet another trivial example, but finding prime numbers is simple and yet compute intensive, and serves my purpose well. The challenge is to find the number of primes between 1 and 100,000,000 as quickly as possible. I’m using Visual Studio 2010 when possible, targeting 64 bit release binaries. Here are the results:

jefftable

Before evaluating the results, let’s first have a look at the implementation. I’m not going to show all of the code but instead just focus on the highlights. Let’s start with the original implementation used to calculate the serial time. As we can see, the iterations of the loop test for primality and conditionally increment a total. I’ve included the IsPrime function (in a simplified format) just to show that it is reentrant. This function is thread-safe because it does not modify or maintain state data, and will therefore not have any side effects.

 

jeffcode1

 

The OpenMP version only requires one additional line of code to add parallelism. The pragma statement instructs OpenMP to parallelize the loop, to create a thread-local version of Primes to be summed up at the end, and to schedule the iterations using a guided approach. The syntax is quite simple and is explained in more detail on Wikipedia. 

jeffcode2

 

The PPL version also requires only minor changes. I used the parallel_for statement from the Parallel Patterns Library, which is part of the Microsoft Concurrency Runtime. This version takes a start and end value, and a lambda expression, which consists of a capture clause, a parameter list, and a function body. A more thorough explanation can be found on the MSDN website. The Windows kernel function InterlockedIncrement avoids race conditions by incrementing the Primes variable atomically.

 

jeffcode3

 

The other three implementations (ACIS, CGM, and MPI) are not as trivial because we have to divide the work into smaller pieces, typically into one task for each processor. So we divide up the range into 48 pieces and let each process/thread calculate the number of primes in their sub-range. Then we add all the answers together. The ACIS implementation is shown below.

 

 

  Part Ijeffcode4_1

 

 

 

 

 Part IIjeffcode4_2

 

In terms of simplicity and performance, OpenMP is the clear winner. It achieved almost perfect scaling and the implementation is simple and straightforward. PPL is a close second, requiring very few changes to original code with good scalability. The syntax is a bit difficult to understand at first but lambda expressions are the new thing. So much so that they are included in the next C++ standard.

The downfall of these two approaches is that they do not allow the developer to have any control of the threads that are used. When used in ACIS operations, each thread must initialize and terminate the modeler appropriately. Since we do not know which threads will be used by OpenMP and PPL, we must make sure they have properly initialized ACIS in every iteration of the loop. This will certainly have an impact on performance and explains why the ACIS thread manager maintains a pool of pre-initialized threads.

The ACIS and MPI implementations are close in performance and similar in implementation. This is not surprising because the main drawback when using multiple processes is in the overhead of inter-process communication. This is not an issue in the primes example because we are sending two integers to each process (start and stop values) and are receiving one back (the number of primes found in the range). When used in geometric modeling code, this can be a bottleneck.

The CGM results are not what I expected but it’s a bit unfair to be critical at this point. I did after all make significant changes to ACIS in order to handle more than eight threads concurrently. This work has simply not yet been done in CGM. These types of tests put the magnifying glass on the infrastructure, and in the CGM case has exposed additional overhead in creating processes, loading DLLs, and processing the task queue. Things to address in time.

You might also like...

5 Min read
3D Modeling
What is digital manufacturing? Here’s a simple digital manufacturing definition: the process of using computer systems...
5 Min read
CGM Modeler
Software components are like the stage crew at a big concert performance: the audience doesn’t see them, but their...
Application Lifecycle Management Flow
4 Min read
CGM Modeler
When you hear the term, Application Lifecycle Management (ALM), you likely think about the process that a software...
9 Min read
CGM Modeler
SLS in Additive Manufacturing is used to convert 3D CAD designs into physical parts, in a matter of hours.
8 Min read
CGM Modeler
There’s a lot of confusion around what the terms additive manufacturing and 3D printing mean.
4 Min read
3D Modeling
Additive manufacturing, often referred to as 3D printing, is a computer-controlled process for creating 3D objects.
5 Min read
CGM Modeler
Computational Fluid Dynamics (CFD) is a type of analysis that provides insight into solving complex problems, and...
2 Min read
CGM Modeler
WRL files are an extension of the Virtual Reality Modeling Language (VRML) format . VRML file types enable browser...
Voxel model example
3 Min read
CGM Modeler
Voxels are to 3D what pixels are to 2D. Firstly -- let’s examine what pixels actually are. Everything you see on your...
Point_cloud_torus
2 Min read
CGM Modeler
Point-cloud modeling is typically used in the process of 3D scanning objects. Rather than defining surfaces through...
Polygonal Modeling
2 Min read
CGM Modeler
Polygonal (or polyhedral) modeling is the most common type of modeling for video games and animation studios. This type...
aerodynamics-CFD
9 Min read
CGM Modeler
Computational fluid dynamics (CFD) is a science that uses data structures to solve issues of fluid flow -- like...
BREP Model example
2 Min read
CGM Modeler
BRep modeling, or Boundary Representation modeling, is, in CAD applications, the most common type of modeling. BRep is...
Feature Recognition Zoomed
5 Min read
CGM Modeler
IN THIS ARTICLE: What is FEA (Finite Element Analysis) Principles of Finite Element Analysis and Simulation Software A...
3YourMind and Spatial
3 Min read
3D Modeling
As manufacturers begin to rely more and more on additive manufacturing (AM), moving from a few select piece parts that...
Voxeldance and Spatial
2 Min read
3D InterOp
To the uninitiated, 3D printing may seem a simple process — download your CAD file and hit print. But the world of...
BIM_word_cloud_web
2 Min read
3D Modeling
The construction industry has long taken advantage of prebuilt components, from prehung doors to prefabbed roof...
Clash-Detection-fig-1
4 Min read
3D Modeling
A major benefit of constructing a building virtually is the cost savings gained by identifying errors in the design...