MPPDYNA begins each simulation by splitting the model into multiple pieces (domains), and assigning each domain to a CPU core. This is referred to as “domain decomposition.” Efficient MPP processing requires that, as much as possible, every CPU core is kept busy doing useful work. That is to say, each domain should represent the same amount of work to be done. If one core is assigned a domain that is too large, then at certain points in each timestep cycle the other cores will be idle, waiting for this core to finish its calculations. The initial decomposition is based primarily on measured execution times for the different types of elements and the different material models. And it has been the case that once the domains are determined, they persist through the duration of the calculation. This approach has two significant problems. First, the element costs used during decomposition are not perfectly accurate. As material types are added, routines are modified, compiler options changed, and new CPUs are available, keeping this decomposition timing information up to date is simply infeasible. But even if that could be done, the second issue is that for most materials the computational cost of the material changes during the calculation. As elements distort, or exceed their elastic limit and begin to experience plastic deformation, the element evaluations can become more time consuming. This leads to the inevitable conclusion that any static decomposition will result in at least some computational imbalance. As core counts increase, the need for dynamically adjusting the decomposition will also increase.
HPC
The default decomposition method for LS-DYNA/MPP is RCB which dividing the model based on the initial geometry. If the geometry does not severely distorted during the simulation, this decomposition gives reasonable scaling upto few hundreds cores. LS-DYNA also provides additional “pfile” options which relies on user’s knowledge of deformation to achieve better MPP efficiency. Unfortunately, there are many problems cannot be easily treated by those options, i.e. bird strike, water wading, FBO, etc. The simulations involve parts with relative motion which are difficult to decompose only once and those jobs are usually suffer from the scaling. Furthermore more cores are used in the simulation, a load unbalancing effects will be amplified and results in poor scalability. To Achieve better computational load balancing, a new automatic re-decomposition algorithm has been implemented recently. The new method can readjust the load balancing during simulation based on the current geometry. In this study, we will give some typical examples to show how to regain the load balancing and improve the parallel efficiency.
In this paper we discuss Intel’s continued optimization efforts with LS-DYNA® and demonstrate the impact of new Intel technologies. Two different approaches to exploit Intel® Advanced Vector Extensions 512 (AVX-512) are shown: LS-DYNA® Explicit using Intel compiler vectorization techniques and LS-DYNA® Implicit using Intel® Math Kernel Library (MKL) for accelerating dense matrix computational kernels. Numerical accuracy of simulation results for LS-DYNA® Explicit comparing Intel® SSE2 and Intel® AVX-512 is also explored. Finally, we reveal the benefits of Intel® Optane™ DC Persistent Memory technology for LS-DYNA® Implicit simulations. For our studies we used the Topcrunch benchmarks, ODB-10M & car2car models, for LS-DYNA® Explicit and AWE benchmarks, CYL1E6 & CL2E6 models, for LS-DYNA® Implicit
From concept to engineering, and from design to test and manufacturing, engineers from a wide range of industries face the ever-increasing need for complex and realistic models to analyze the most challenging industrial problems; Finite Element Analysis is performed to secure quality and speed up the development process. Powerful virtual development software aims to tackle the need for finite element-based Computational LS-DYNA simulations with superior robustness, speed, and accuracy. These simulations are designed to run effectively on large-scale computational High-Performance Computing (HPC) systems.
Mainframe computers are expected to be highly reliable and available. To achieve this high level of reliability and availability, care must be taken from the initial development cycles to insure robust software and hardware. Here, the discussion will be focused on the structural aspect, namely the hardware assembly. A mainframe computer’s hardware structure consists of the rack, processor drawer, cooling assembly, input and output (I/O) assembly, power supply assembly, memory assembly and storage drawers. A typical mainframe computer with a single drawer installed is shown in Figure 1. The total height is 2.0 m where a total of 42 units (U) of many different types of mountable assemblies or drawers can be installed in the rack; 1U is 44.45 mm in vertical height. The height of the assemblies varies from 88 mm to 440 mm. The rack is an EIA (Electronic Industries Alliance) standard 19-inch-wide rack (482.6 mm), where the actual width of the mounting rails where the assemblies or server drawer is installed is 17 ¾” (450.85 mm). The total width of the rack is equal to 600 mm, which provides space to accommodate the cabling and vertical structure outside the width of the server drawer. The rack depth is 1070 mm. The drawer shown in Figure 1 is a 4U server drawer installed in the bottom of the rack, with a total drawer mass of 73 kg.