We are moving things around in the office here at Spatial to accommodate some new people. As a result, our marketing closet is being cleaned out. I walked by and noticed t-shirts from the summer of 2009, when Spatial released R20 of the 3D ACIS Modeler. Since that release, the 3D ACIS Modeler has been thread safe, i.e, able to operate on disjoint data concurrently and correctly. The t-shirt in question made some predictions about the future of processor evolution. Some predictions came true, some did not. So while the thread-safe 3D ACIS Modeler has delivered enhanced performance for our customers, the number of cores on consumer-grade computers has not grown as fast as predicted.
Challenges with Multithreading
What has stunted the growth in the number of processor cores is software. If there are no applications which can make use of 64-core processors, there isn’t much incentive to design them. Rather than selling more cores, we can have processors, which use less power, or have a larger cache. The other issue with folks selling bigger and badder PCs, is that the market is shrinking. Most computers sold now are actually (in) phones, TVs, cars, watches, or even toasters. However, for engineering purposes PCs and workstations are still the most common way to go.
Let us discuss a few mathematical models for multithreading to unpack some of the limitations.
- Utopian scaling – all steps in the algorithm may occur in any order; time required is inversely proportional to the number of threads.
- Amdahl’s-law scaling – there is some fixed amount of the algorithm which must be serialized.
- Linear overhead – there is contention that grows with the number of threads used, so the time may initially get better, but eventually adding more threads slows things down.
- Serial time (No threading) – the process takes the full time no matter how many threads you add.
Measuring success with a multithreaded algorithm is complex. One should not expect an application to run N-times faster just because N threads are being used. Like other abstraction mechanisms, threads have overhead. If the overhead is large relative to the work being done, threading will not make the application run faster. It is even possible that an application with sufficiently large linear overhead with different coefficients could have a multithreaded version that always runs slower.
As an engineer, we strive to achieve optimal scaling (in time vs number of threads, time vs size of the problem, memory vs problem size, etc.). Using development time responsibly also requires us to take maintenance and development costs in to account.
Speedups with Thread-Safe 3D ACIS Modeler and Thread Hot APIs
Rather than spend years trying to achieve perfect multithreaded performance, Spatial has taken the incremental approach, achieving performance improvements while not ever compromising the underlying algorithms or integrity of the model. Over time gradual improvements compound, providing a better and faster modeler
Even though the scaling in number of threads is not currently ideal, the multithreaded and thread-safe features in 3D ACIS Modeler have been a significant success. Customers can write their own multithreaded workflows, provided some limitations are met. 3D ACIS Modeler must be initialized on each thread used. The ENTITYs modified must be on different history streams. By staying within these limitations, our customers have seen substantial speedups. There are also several APIs which can make use of multiple threads without requiring much special effort from the customer—just turn on the thread pool, and things get faster. Spatia'sl 3D InterOp has also taken advantage of the 3D ACIS Modeler entities on import concurrently.
There were several keys to this success. First, the 3D ACIS Modeler makes extensive use of thread-local storage allowing the context to be maintained within a thread without requiring locks/mutexes everywhere. Second, the algorithms used in the 3D ACIS Modeler are battle tested. The core 3D ACIS Modeler algorithms have been used by customers for more than 20 years. Multithreading an algorithm that doesn’t work, or works prohibitively slowly, does not provide much benefit. Building on well proven algorithms delivers both speed and correctness.
For the initial implementations in the 3D ACIS Modeler, conservative refactorings were made. We preserved the behavior of the serial versions of algorithms as much as possible and accepting success if we got speedups through multithreading. For example, using api_facet_entity with six threads on one BODY results in a 2× speedup with the results nearly indistinguishable from the serial algorithm. Concurrently faceting multiple bodies provides a 6 or 7× speedup when using many-core processors. To refactor api_stitch, a new algorithm was devised which is 10% faster (serially) than the original serial algorithm.
Rather than starting over from scratch, development on the 3D ACIS Modeler has taken a pragmatic approach by growing multithreading functionality incrementally. This approach avoids regressions in features, but sometimes it also means worse thread scaling. APIs in the 3D ACIS Modeler show speedups, but some show linear overhead scaling. We grow with our customers, with the 3D ACIS Modeler improving as the requirements placed on it grow. I still expect to see huge numbers of cores in engineering desktop computers solving ever larger geometry problems, but it has not happened as fast as we initially hoped.