Dr Petr Knoth

Senior Research Fellow in Text and Data Mining

I lead the Big Scientific Data and Text Analytics Group (BSDTAG) doing R&D in the domains of text-mining, digital libraries and open access/science. I am the founder and head of CORE (core.ac.uk), a large full text aggregator of open access papers with millions of monthly active users. CORE makes research papers available for people to freely discover and access and for machines to text-mine.

Previously, I worked as a Senior Data Scientist at Mendeley on information extraction and content recommendation for research. I have a deep interest in the use of AI to improve research workflows. I have co-founded Semantometrics.org which aim to go beyond bibliometrics and altmetrics to produce new research evaluation methods that make use of the publication full-texts in research assessment.

I have been involved as a researcher and as a PI in over 20 European Commission, national and international funded research projects in the areas of text-mining, open science and eLearning.


Natural Language Processing, Text and data miningOpen Access, Open Science, Scholarly communicationInformation Retrieval, Information Extraction, Recommendation systems, Scientometrics


Ghafourian, Y.,Knoth, P. and Hanbury, A.(2021). Information retrieval evaluation in knowledge acquisition tasks. In: WEPIR 2021: The 3rd Workshop on Evaluation of Personalisation in Information Retrieval at CHIIR 2021, 19 Mar 2021, [Online], pp. 88–95.

Taha, Abdel Aziz, Papariello, Luca, Alexandros, Bampoulidis,Knoth, Petr and Lupu, Mihai(2021). Formal Analysis and Estimation of Chance in Datasets Based on Their Properties. IEEE Transactions on Knowledge and Data Engineering (Early Access).

Knoth, Petr, Kunnath, Suchetha N., Gyawali, Bikash , Pride, David, Stahl, Christopher and Herrmannova, Drahomira (2020). 8th International Workshop on Mining Scientific Publications (WOSP 2020). In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, ACM pp. 581–582.

Kunnath, Suchetha N., Pride, David, Gyawali, Bikash and Knoth, Petr(2020). Overview of the 2020 WOSP 3C Citation Context Classification Task. In: Proceedings of the 8th International Workshop on Mining Scientific Publications, Association for Computational Linguistics pp. 75–83.

Gyawali, Bikash , Anastasiou, Lucas and Knoth, Petr(2020). Deduplication of Scholarly Documents using Locality Sensitive Hashing and Word Embeddings. In: Proceedings of The 12th Language Resources and Evaluation Conference, European Language Resources Association, Marseille, France, pp. 894–903.

Gyawali, Bikash , Pontika, Nancy and Knoth, Petr(2020). Open Access 2007 - 2017: Country and University Level Perspective. In: In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL ’20).

Pride, David and Knoth, Petr(2020). An Authoritative Approach to Citation Classification. In: ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL ’20), 1-5 Aug 2020, Virtual - China.

Brinken, Helene, Kuchma, Iryna, Kalaitzi, Vaso, Davidson, Joy,Pontika, Nancy, Cancellieri, Matteo, Correia, Antonia, Carvalho, Jose, Melero, Reme, Kastelic, Damjana, Borba, Filomena, Lenaki, Katerina, Toelch, Ulf, Zourou, Katerina,Knoth, Petr, Schmidt, Birgit and Rodrigues, Eloy ,(2019). A Case Report: Building communities with training and resources for Open Science trainers. LIBER Quarterly, 29(1) pp. 1–36.

Knoth, Petr, Anastasiou, Lucas , Cancellieri, Matteo, Gyawali, Bikash , Herrmannova, Drahomira , Misak, Sergei, Huba, Alexander,Pearce, Samuel, Pontika, Nancy, Rumyanceva, Svetlana and Tarasiuk, Maria ,(2019). Aggregating The World's Open Access Research Papers. Open Science Fair.

Pride, David, Harag, Jozef and Knoth, Petr(2019). ACT: An Annotation Platform for Citation Typing at Scale [JCDL Poster Presentation]. In: JCDL 2019 - ACM/IEEE-CS Joint Conference on Digital Libraries 2019, 2-6 Jun 2019, Urbana-Champaign, Illinois.