Global and local feature communications with transformers for 3D human pose estimation
Abstract Recently, spatiotemporal Transformer structures have been widely applied to the problem of 3D human pose estimation, achieving state-of-the-art performance.Many of these approaches consider a single joint in a single frame as a token, and attention is applied over the tokens in either the same frame or the same trajectory.While this struct