Dr Petr Knoth

Senior Research Fellow in Text and Data Mining

I lead the Big Scientific Data and Text Analytics Group (BSDTAG) doing R&D in the domains of text-mining, digital libraries and open access/science. I am the founder and head of CORE (core.ac.uk), a large full text aggregator of open access papers with millions of monthly active users. CORE makes research papers available for people to freely discover and access and for machines to text-mine.

Previously, I worked as a Senior Data Scientist at Mendeley on information extraction and content recommendation for research. I have a deep interest in the use of AI to improve research workflows. I have co-founded Semantometrics.org which aim to go beyond bibliometrics and altmetrics to produce new research evaluation methods that make use of the publication full-texts in research assessment.

I have been involved as a researcher and as a PI in over 20 European Commission, national and international funded research projects in the areas of text-mining, open science and eLearning.

Keywords

Natural Language Processing, Text and data miningOpen Access, Open Science, Scholarly communicationInformation Retrieval, Information Extraction, Recommendation systems, Scientometrics


Publications

Ross-Hellauer, Tony, Klebel, Thomas, Knoth, Petr and Pontika, Nancy ,(2024). Value dissonance in research(er) assessment: individual and perceived institutional priorities in review, promotion, and tenure. Science and Public Policy, 51(3), pp. 337–351.

Mendoza, Óscar E., Kusa, Wojciech, El-Ebshihy, Alaa, Wu, Ronin, Pride, David, Knoth, Petr, Herrmannova, Drahomira, Piroi, Florina, Pasi, Gabriella and Hanbury, Allan ,(2022). Benchmark for Research Theme Classification of Scholarly Documents. In: COLING 2022: 29th International Conference on Computational Linguistics, 12-17 Oct 2022, Gyeongju, South Korea.

Ghafourian, Yasin, Hanbury, Allan and Knoth, Petr ,(2023). Ranking for Learning: Studying Users’ Perceptions of Relevance, Understandability, and Engagement. In: International Conference on Theory and Practice of Digital Libraries TPDL 2023: Linking Theory and Practice of Digital Libraries, 26-29 Sep 2023, Zadar, Croatia.

Ghafourian, Yasin, Hanbury, Allan and Knoth, Petr ,(2023). Readability Measures as Predictors of Understandability and Engagement in Searching to Learn. In: Linking Theory and Practice of Digital Libraries 27th International Conference on Theory and Practice of Digital Libraries, TPDL 2023, 26-29 Sep 2023, Zadar, Croatia.

Pride, David, Cancellieri, Matteo and Knoth, Petr ,(2023). CORE-GPT: Combining Open Access Research and Large Language Models for Credible, Trustworthy Question Answering. In: International Conference on Theory and Practice of Digital Libraries TPDL 2023: Linking Theory and Practice of Digital Libraries, 26-29 Sep 2023, Zadar, Croatia.

Kusa, Wojciech, Knoth, Petr and Hanbury, Allan ,(2023). CRUISE-Screening: Living Literature Reviews Toolbox. In: CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 21-25 Oct 2023, Birmingham, UK.

Nambanoor Kunnath, Suchetha, Pride, David and Knoth, Petr ,(2023). Prompting Strategies for Citation Classification. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM '23), 21-25 Oct 2023, Birmingham, UK.

Kusa, Wojciech, Zuccon, Guido, Knoth, Petr and Hanbury, Allan ,(2023). Outcome-based Evaluation of Systematic Review Automation. In: ICTIR '23: Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval, 23 Jul 2023, Taipei, Taiwan.

Kusa, Wojciech, Lipani, Aldo, Knoth, Petr and Hanbury, Allan ,(2023). VoMBaT: A Tool for Visualising Evaluation Measure Behaviour in High-Recall Search Tasks. In: SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 23-27 Jul 2023, Taipei, Taiwan.

Kusa, Wojciech, Mendoza, Óscar E, Knoth, Petr, Pasi, Gabriella and Hanbury, Allan ,(2023). Effective matching of patients to clinical trials using entity extraction and neural re-ranking. Journal of biomedical informatics, 144