
Dr Petr Knoth
Senior Research Fellow in Text and Data Mining
I lead the Big Scientific Data and Text Analytics Group (BSDTAG) doing R&D in the domains of text-mining, digital libraries and open access/science. I am the founder and head of CORE (core.ac.uk), a large full text aggregator of open access papers with millions of monthly active users. CORE makes research papers available for people to freely discover and access and for machines to text-mine.
Previously, I worked as a Senior Data Scientist at Mendeley on information extraction and content recommendation for research. I have a deep interest in the use of AI to improve research workflows. I have co-founded Semantometrics.org which aim to go beyond bibliometrics and altmetrics to produce new research evaluation methods that make use of the publication full-texts in research assessment.
I have been involved as a researcher and as a PI in over 20 European Commission, national and international funded research projects in the areas of text-mining, open science and eLearning.
Keywords
Natural Language Processing, Text and data miningOpen Access, Open Science, Scholarly communicationInformation Retrieval, Information Extraction, Recommendation systems, Scientometrics
Publications
Kusa, Wojciech, E. Mendoza, Oscar, Samwald, Matthias, Knoth, Petr and Hanbury, Allan ,(2024). CSMED: Bridging the Dataset Gap in Automated Citation Screening for Systematic Literature Reviews. In: 37th Conference on Neural Information Processing Systems (NeurIPS 2023): Track on Datasets and Benchmarks., 10 Dec 2023, New Orleans, USA..
Knoth, Petr, Klein, Martin, Macgregor, George, Cancellieri, Matteo and Walk, Paul ,(2024). How to make repository content indexed and discoverable. In: The 19th International Conference on Open Repositories, 03-06 Jun 2024, Göteborg, Sweden.
George, Macgregor, Knoth, Petr, Walk, Paul, Dowson, Nicola, Eadie, Mick, Jones, Beverley and Martínez-García, Agustina ,(2024). Exploring the concept of 'custodianship' in harvesting repository resources and graphing their relations: Rioxx version 3.0. In: The 19th International Conference on Open Repositories, 03-06 Jun 2024, Göteborg, Sweden.
Knoth, Petr, Laurent, Romary, Lopez, Patrice, Di Cosmo, Roberto, Smrz, Pavel, Umerle, Tomasz, Harrison, Melissa, Monteil, Alain, Cancellieri, Matteo and Pride, David ,(2025). Making Software FAIR: A machine-assisted workflow for the research software lifecycle. In: 19th International Conference on Open Repositories (OR2024), 3-6 Jun 2024, Göteborg, Sweden.
Ross-Hellauer, Tony, Klebel, Thomas, Knoth, Petr and Pontika, Nancy ,(2024). Value dissonance in research(er) assessment: individual and perceived institutional priorities in review, promotion, and tenure. Science and Public Policy, 51(3), pp. 337–351.
Mendoza, Óscar E., Kusa, Wojciech, El-Ebshihy, Alaa, Wu, Ronin, Pride, David, Knoth, Petr, Herrmannova, Drahomira, Piroi, Florina, Pasi, Gabriella and Hanbury, Allan ,(2022). Benchmark for Research Theme Classification of Scholarly Documents. In: COLING 2022: 29th International Conference on Computational Linguistics, 12-17 Oct 2022, Gyeongju, South Korea.
Ghafourian, Yasin, Hanbury, Allan and Knoth, Petr ,(2023). Ranking for Learning: Studying Users’ Perceptions of Relevance, Understandability, and Engagement. In: International Conference on Theory and Practice of Digital Libraries TPDL 2023: Linking Theory and Practice of Digital Libraries, 26-29 Sep 2023, Zadar, Croatia.
Ghafourian, Yasin, Hanbury, Allan and Knoth, Petr ,(2023). Readability Measures as Predictors of Understandability and Engagement in Searching to Learn. In: Linking Theory and Practice of Digital Libraries 27th International Conference on Theory and Practice of Digital Libraries, TPDL 2023, 26-29 Sep 2023, Zadar, Croatia.
Pride, David, Cancellieri, Matteo and Knoth, Petr ,(2023). CORE-GPT: Combining Open Access Research and Large Language Models for Credible, Trustworthy Question Answering. In: International Conference on Theory and Practice of Digital Libraries TPDL 2023: Linking Theory and Practice of Digital Libraries, 26-29 Sep 2023, Zadar, Croatia.
Kusa, Wojciech, Knoth, Petr and Hanbury, Allan ,(2023). CRUISE-Screening: Living Literature Reviews Toolbox. In: CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 21-25 Oct 2023, Birmingham, UK.
Projects
CORECORE - COnnecting REpositoriesEurogene SoftwareEuropeana CloudFIT4RRIFOSTERFOSTER (fosteropenscience.eu)FOSTER Plus Frictionless Data Exchange Across Research Data, Software and Scientific Paper RepositoriesON-MERRITOpenMinTeD REF 2021 PredictionsUK Aggregation 2VocTeachThemes
Artificial Intelligence and Data AnalysisTopics
Learning and Education
Science and scholarly communication