Accurate smartphone-based outdoor localization system in deep urban canyons are increasingly needed for various IoT applications such as augmented reality, intelligent transportation, etc. This article proposes a multi-material image registration solution for accurate pose estimation in urban canyons where global navigation satellite system (GNSS) tends to fail. In the offline stage, a material segmented city model is used to generate segmented images at each pose (six degrees of freedom of position and rotation). In the online stage, an image is taken with a smartphone camera that provides textual information about the surrounding environment. The approach utilizes computer vision algorithms to rectify and manually segment between the different types of material identified in the smartphone image. The hypothesized poses (candidate) images are then matched with the segmented smartphone image. The candidate image with the maximum likelihood is regarded as the estimated pose of the user. The positioning results achieves 2.0m level accuracy in common high rise along street, 5.5m in foliage dense environment and 15.7m in alleyway. A 45% positioning improvement to current state-of-the-art method. The estimation of yaw achieves 2.3° level accuracy, 8 times the improvement to smartphone IMU.