Static Head Avatar's Project Page

High-Quality Static Head Avatars with

Gaussian Splatting

Jiepeng (Frank) Chen

University of Toronto

Abstract

Static 3D head avatars are essential for augmented reality (AR), virtual reality (VR), and gaming applications, yet achieving high fidelity and efficient initialization under lightweight setups remains challenging. Gaussian Splatting has proven effective for general scene representation, but it struggles with robust initialization and fidelity when modeling smooth human head geometries. To address these challenges, we propose a hybrid approach that combines random point initialization guided by head avatar masks with iterative point refinement through addition and pruning, thereby eliminating reliance on Structure-from-Motion (SfM). For rendering, we enhance Gaussian Splatting by integrating a fully connected neural network inspired by NeRF to predict Gaussian attributes such as color and opacity from encoded spatial features. This integration achieves smoother transitions and slight improvements in rendering fidelity, addressing some of the known limitations. Experiments on the RenderMe-360 dataset demonstrate that our approach provides modest but meaningful advancements in quality and efficiency for lightweight static head avatar modeling and novel view synthesis.

Novel View Synthesis

SfM + Color Opacity Prediction

Random Initialization + Color Opacity Prediction

Method

Our method replaces the point cloud initialization step in 3D Gaussian Splatting (3DGS) with a random sampling strategy, refining points iteratively using image masks to ensure comprehensive coverage of the head geometry. The resulting point cloud retains the same structure as the original 3DGS pipeline. Additionally, we replace the spherical harmonic (SH) representation for predicting color and opacity with a fully connected neural network. This network operates in two stages: the first stage uses hash-encoded Gaussian positions along with rotation and scale attributes to predict opacity and a feature vector, while the second stage combines this feature vector with the camera viewing direction to predict RGB color.