In recent years, researchers have intensively investigated various topics in test prioritization, which aims to re-order tests to increase the rate of fault detection during regression testing. While the main research focus in test prioritization is on proposing novel prioritization techniques and evaluating on more and larger subject systems, little effort has been put on investigating the threats to validity in existing work on test prioritization. One main threat to validity is that existing work mainly evaluates prioritization techniques based on simple artificial changes on the source code and tests. For example, the changes in the source code usually include only seeded program faults, whereas the test suite is usually not augmented at all. On the contrary, in real-world software development, software systems usually undergo various changes on the source code and test suite augmentation. Therefore, it is not clear whether the conclusions drawn by existing work in test prioritization from the artificial changes are still valid for real-world software evolution. In this paper, we present the first empirical study to investigate this important threat to validity in test prioritization. We reimplemented 24 variant techniques of both the traditional and time-aware test prioritization, and investigated the impacts of software evolution on those techniques based on the version history of 8 real-world Java programs from GitHub. The results show that for both traditional and time-aware test prioritization, test suite augmentation significantly hampers their effectiveness, whereas source code changes alone do not influence their effectiveness much.