Proceedings of the International Conference on Supercomputing
N-body problems, such as simulating the motion of stars in a galaxy, are popularly solved using tree codes like Barnes-Hut. ChaNGa is a best-of-breed n-body platform that uses an asymptotically-efficient tree traversal strategy known as a dual-tree walk to quickly determine which bodies need to interact with each other to provide an accurate simulation result. However, this strategy does not work well on GPUs, due to the highly-irregular nature of the dual-tree algorithm. On GPUs, ChaNGa uses a hybrid strategy where the CPU performs the tree walk to determine which bodies interact while the GPU performs the force computation. In this paper, we show that a highly-optimized single-tree walk approach is able to achieve better GPU performance by significantly accelerating the tree walk and reducing CPU/GPU communication. Our experiments show that this new design can achieve a 8.25× speedup over baseline ChaNGa using a one node, one process per node configuration.
Distributed system, GPU, Heterogeneous system, N-body problems, Tree traversal
Liu, Jianqiao; Robson, Michael; Quinn, Thomas; and Kulkarni, Milind, "Efficient GPU Tree Walks for Effective Distributed N-Body Simulations" (2019). Computer Science: Faculty Publications, Smith College, Northampton, MA.