Abstract:
Author name disambiguation is well known problem in digital libraries
that aims at identifying the real owner of a scienti c contribution. Author
name disambiguation is a challenging problem because of di erent name strings ambiguities. For example, the same name might be written in many di erent ways. On the other hand, the same name string might be shared between di erent individuals. In this thesis, we propose a new approach to solve author name ambiguities. Our approach depends on a heuristicbased scoring method that utilizes di erent stages in an e ort to take the disambiguation decision as early as possible. In addition, the algorithm is designed to be both scalable for large databases and adaptive based on the case in hand. The algorithm is validated against a wide variety of manually labeled datasets. Our results showed that about 91.03% of the generated pro-
les exactly matched the pro les in the reference datasets and about 8.18% partially matches the reference pro les and only less than 0.8% of error pro-les. Moreover, we ran the algorithm against more than 10 million name string instances in real database within a relatively short time.
Description:
CD,no of pages 70,30124, informatics 4/2017