The Semantic Reader Open Research Platform
Semantic Reader Project is a collaborative effort of NLP + HCI researchers from non-profit, industry, and academic institutions to create interactive, intelligent reading interfaces for scholarly papers. Our research led to the creation of Semantic Reader, an application used by tens of thousands of scholars each week.
The Semantic Reader Open Research Platform provides resources that enable the broader research community to explore exciting challenges around novel research support tools: PaperMage, a library for processing and analyzing scholarly PDFs, and PaperCraft, a React UI component for building augmented and interactive reading interfaces. Join us in designing the future of scholarly reading interfaces with our open source libraries!
Open Source Libraries
We provide PaperMage + PaperCraft for building intelligent and interactive paper readers. Below we showcase how to extract text from a PDF to prompt a LLM for term definitions and then visually augment the paper with highlights and popups.
PaperMage
Process and Analyze Scholarly PDF Documents
from papermage.recipes import CoreRecipe
recipe = CoreRecipe()
doc = recipe.run("paper.pdf")
paragraphs_text = [p.text for p in doc.paragraphs]
term_defs = []
for sentence in doc.abstracts[0].sentences:
print(sentence.text)
# When reading a scholarly article, inline...
# However, it can be challenging to pri...
# ...
print(sentence.words[:2])
# ['When', 'reading']
# ['However', 'it']
# ...
# bounding boxes of 4th words + definitions
term = sentence.words[3]
term_def = prompt(
' '.join(paragraphs_text)} +
f'What is the definition of "{term.text}"?'
)
term_defs.append((term.boxes, term_def))
send_to_paper_craft_ui(term_defs)οΈ
PaperCraft
Create Visually Augmented Interactive Readers
import {
DocumentContext, DocumentWrapper, Overlay, PageWrapper
} from '@allenai/pdf-components' // aka PaperCraft
const Reader: React.FC = ({termDefinitions}) => {
const {numPages} = useContext(DocumentContext)
const pageIndices = [...Array(numPages).keys()]
/* PageWrapper: render each page */
/* Overlay: visual augmentations and interactions */
return (
<DocumentWrapper file={pdfUrl}>
{pageIndices.map(pageIndex => (
<PageWrapper pageIndex={pageIndex}>
<Overlay>
{/* abstract is on page 1 */}
{pageIndex === 0 && (
{termDefinitions.map(termDefinition => (
<BlueTextPopover
termDefinition={termDefinition}
/>
)}
)}
</Overlay>
</PageWrapper>
)}
</DocumentWrapper>
)
}
Research Prototype Showcase
Here we present several interactive demos to showcase systems you can build with PaperMage and PaperCraft.
Papeos
Augmenting Research Papers with Author Talk Videos
DemoPaperPresentation
Synergi & Threddy
Clipping Research Threads from Papers for Synthesis and Exploration
PaperPresentation
Paper Plain
Making Medical Research Papers Approachable to Healthcare Consumers
DemoCode TutorialPaper
LLM Paper Q&A
A GPT-powered PDF QA system with attribution support.
DemoCode Tutorial
CiteSee
Augmenting Citations in Papers with Persistent and Personalized Context
In-ProductionPaperPresentation
CiteRead
Localizing Incoming Citations from Follow on Papers in the Margins
DemoPaperPresentation
Scim
Automatic highlights for skimming support of scientific papers
In-ProductionPaper
ScholarPhi
Augmenting Papers with Just-in-Time Definitions of Terms and Symbols
Founding ProjectDemoPaper
Publications
Semantic Reader Project Overview
The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie (Yu-Yen) Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, F.Q. Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Michael Kinney, A. Kittur, Hyeonsu B Kang, Egor Klevak, Bailey Kuehl, Michael Langan, Matt Latzke, Jaron Lochner, Kelsey MacMillan, Eric Stuart Marsh, Tyler Murray, Aakanksha Naik, Ngoc-Uyen Nguyen, Srishti Palani, Soya Park, Caroline Paulic, Napol Rachatasumrit, Smita R Rao, P. Sayre, Zejiang Shen, Pao Siangliulue, Luca Soldaini, Huy Tran, Madeleine van Zuylen, Lucy Lu Wang, Christopher Wilhelm, Caroline M Wu, Jiangjiang Yang, Angele Zamarron, Marti A. Hearst, Daniel S. Weld. ArXiv. 2023.
Interactive and Intelligent Reading Interfaces
Qlarify: Bridging Scholarly Abstracts and Papers with Recursively Expandable Summaries
Raymond Fok, Joseph Chee Chang, Tal August, Amy X. Zhang, Daniel S. Weld. ArXiv. 2023.Papeos: Augmenting Research Papers with Talk Videos
Tae Soo Kim, Matt Latzke, Jonathan Bragg, Amy X. Zhang, Joseph Chee Chang. The ACM Symposium on User Interface Software and Technology. 2023.Synergi: A Mixed-Initiative System for Scholarly Synthesis and Sensemaking
Hyeonsu B Kang, Sherry Wu, Joseph Chee Chang, A. Kittur. The ACM Symposium on User Interface Software and Technology. 2023.π Best Paper Award
CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context
Joseph Chee Chang, Amy X. Zhang, Jonathan Bragg, Andrew Head, Kyle Lo, Doug Downey, Daniel S. Weld. Proceedings of the CHI Conference on Human Factors in Computing Systems. 2023.Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections
Srishti Palani, Aakanksha Naik, Doug Downey, Amy X. Zhang, Jonathan Bragg, Joseph Chee Chang. Proceedings of the CHI Conference on Human Factors in Computing Systems. 2023.CiteRead: Integrating Localized Citation Contexts into Scientific Paper Reading
Napol Rachatasumrit, Jonathan Bragg, Amy X. Zhang, Daniel S. Weld. 27th International Conference on Intelligent User Interfaces. 2022.π Best Paper Award
Math Augmentation: How Authors Enhance the Readability of Formulas using Novel Visual Design Practices
Andrew Head, Amber Xie, Marti A. Hearst. Proceedings of the CHI Conference on Human Factors in Computing Systems. 2022.Scim: Intelligent Skimming Support for Scientific Papers
Raymond Fok, Hita Kambhamettu, Luca Soldaini, Jonathan Bragg, Kyle Lo, Andrew Head, Marti A. Hearst, Daniel S. Weld. Proceedings of the 28th International Conference on Intelligent User Interfaces. 2022.Exploring Team-Sourced Hyperlinks to Address Navigation Challenges for Low-Vision Readers of Scientific Papers
Soya Park, Jonathan Bragg, Michael Chang, K. Larson, Danielle Bragg. Proceedings of the ACM on Human-Computer Interaction. 2022.Paper Plain: Making Medical Research Papers Approachable to Healthcare Consumers with Natural Language Processing
Tal August, Lucy Lu Wang, Jonathan Bragg, Marti A. Hearst, Andrew Head, Kyle Lo. ACM Transactions on Computer-Human Interaction. 2022. Presentation at CHI 2024.Threddy: An Interactive System for Personalized Thread-based Exploration and Organization of Scientific Literature
Hyeonsu B Kang, Joseph Chee Chang, Yongsung Kim, A. Kittur. Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 2022.π Best Paper Award
SciA11y: Converting Scientific Papers to Accessible HTML
Lucy Lu Wang, Isabel Cachola, Jonathan Bragg, Evie (Yu-Yen) Cheng, Chelsea Hess Haupt, Matt Latzke, Bailey Kuehl, Madeleine van Zuylen, Linda M. Wagner, Daniel S. Weld. Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility. 2021.Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and Symbols
Andrew Head, Kyle Lo, Dongyeop Kang, Raymond Fok, Sam Skjonsberg, Daniel S. Weld, Marti A. Hearst. Proceedings of the CHI Conference on Human Factors in Computing Systems. 2020.
Open Research Resources: Libraries, Models, Datasets
π Best Paper Award
PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents
Kyle Lo, Zejiang Shen, Benjamin Newman, Joseph Chee Chang, Russell Authur, Erin Bransom, Stefan Candra, Yoganand Chandrasekhar, Regan Huff, Bailey Kuehl, Amanpreet Singh, Chris Wilhelm, Angele Zamarron, Marti A. Hearst, Daniel S. Weld, Doug Downey, Luca Soldaini. Conference on Empirical Methods in Natural Language Processing: Demos. 2023.A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents
Benjamin Newman, Luca Soldaini, Raymond Fok, Arman Cohan, Kyle Lo. undefined. 2023.π Best Paper Award
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
Kalpesh Krishna, Erin Bransom, Bailey Kuehl, Mohit Iyyer, Pradeep Dasigi, Arman Cohan, Kyle Lo. ArXiv. 2023.Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents
Catherine Chen, Zejiang Shen, D. Klein, G. Stanovsky, Doug Downey, Kyle Lo. ArXiv. 2023.The Semantic Scholar Open Data Platform
Rodney Michael Kinney, Chloe Anastasiades, Russell Authur, Iz Beltagy, Jonathan Bragg, Alexandra Buraczynski, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Arman Cohan, Miles Crawford, Doug Downey, Jason Dunkelberger, Oren Etzioni, Rob Evans, Sergey Feldman, Joseph Gorney, D. Graham, F.Q. Hu, Regan Huff, Daniel King, Sebastian Kohlmeier, Bailey Kuehl, Michael Langan, Daniel Lin, Haokun Liu, Kyle Lo, Jaron Lochner, Kelsey MacMillan, Tyler Murray, Christopher Newell, Smita R Rao, Shaurya Rohatgi, P. Sayre, Zejiang Shen, Amanpreet Singh, Luca Soldaini, Shivashankar Subramanian, A. Tanaka, Alex D Wade, Linda M. Wagner, Lucy Lu Wang, Christopher Wilhelm, Caroline Wu, Jiangjiang Yang, Angele Zamarron, Madeleine van Zuylen, Daniel S. Weld. ArXiv. 2023.VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups
Zejiang Shen, Kyle Lo, Lucy Lu Wang, Bailey Kuehl, Daniel S. Weld, Doug Downey. Transactions of the Association for Computational Linguistics. 2021.Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions
Dongyeop Kang, Andrew Head, Risham Sidhu, Kyle Lo, Daniel S. Weld, Marti A. Hearst. Proceedings of the First Workshop on Scholarly Document Processing @ ACL. 2020.
Core Team
See theΒ Project Overview PaperΒ to see a full list of contributors.
β For questions and inquiries, please contact Joseph Chee Chang (PaperCraft & Intelligent reading interfaces), or Kyle Lo and Luca Soldaini (PaperMage & Scientific document processing).