The Semantic Reader Open Research Platform

Semantic Reader Project is a collaborative effort of NLP + HCI researchers from non-profit, industry, and academic institutions to create interactive, intelligent reading interfaces for scholarly papers. Our research led to the creation of Semantic Reader, an application used by tens of thousands of scholars each week.

The Semantic Reader Open Research Platform provides resources that enable the broader research community to explore exciting challenges around novel research support tools: PaperMage, a library for processing and analyzing scholarly PDFs, and PaperCraft, a React UI component for building augmented and interactive reading interfaces. Join us in designing the future of scholarly reading interfaces with our open source libraries!

Read the Overview Paper

Try Semantic Reader

Open Source Libraries

We provide PaperMage + PaperCraft for building intelligent and interactive paper readers. Below we showcase how to extract text from a PDF to prompt a LLM for term definitions and then visually augment the paper with highlights and popups.

PaperMage

Process and Analyze Scholarly PDF Documents

from papermage.recipes import CoreRecipe

recipe = CoreRecipe()
doc = recipe.run("paper.pdf")
paragraphs_text = [p.text for p in doc.paragraphs]

term_defs = []

for sentence in doc.abstracts[0].sentences:
    print(sentence.text)
    # When reading a scholarly article, inline...
    # However, it can be challenging to pri...
    # ...

    print(sentence.words[:2])
    # ['When', 'reading']
    # ['However', 'it']
    # ...

    # bounding boxes of 4th words + definitions
    term = sentence.words[3]
    term_def = prompt(
      ' '.join(paragraphs_text)} + 
      f'What is the definition of "{term.text}"?'
    )
    term_defs.append((term.boxes, term_def))

send_to_paper_craft_ui(term_defs)️

PaperCraft

Create Visually Augmented Interactive Readers

Reader.tsxPopover.tsx

import {
  DocumentContext, DocumentWrapper, Overlay, PageWrapper
} from '@allenai/pdf-components' // aka PaperCraft

const Reader: React.FC = ({termDefinitions}) => {
  const {numPages} = useContext(DocumentContext)
  const pageIndices = [...Array(numPages).keys()]
  /* PageWrapper: render each page */
  /* Overlay: visual augmentations and interactions */
  return (
    <DocumentWrapper file={pdfUrl}>
      {pageIndices.map(pageIndex => (
        <PageWrapper pageIndex={pageIndex}>
          <Overlay>
            {/* abstract is on page 1 */}
            {pageIndex === 0 && ( 
              {termDefinitions.map(termDefinition => (
                <BlueTextPopover
                  termDefinition={termDefinition}
                />
              )}
            )}
          </Overlay>
        </PageWrapper>
      )}
    </DocumentWrapper>
  )
}

import { Popover } from 'antd'
import { BoundingBox } from '@allenai/pdf-components'

const BlueTextPopover: React.FC = (props) => {
  const { termDefinition } = props
  const [box, definition] = termDefinition
  {/* show definition on click with an antd widget */}
  {/* highlight the BoundingBox of the term */}
  return (
    <Popover
      content={definition}
      trigger="click"
    >
      <BoundingBox
        className="screen-blend-blue"
        isHighlighted={true}
        page={box.page}
        top={box.top}
        left={box.left}
        height={box.height}
        width={box.width}
      />
    </Popover>
  )
}
/* .screen-blend-blue {
      background: blue;
      mix-blend-mode: screen;} */

Source Code

📄

Read the Tutorial

Research Prototype Showcase

Here we present several interactive demos to showcase systems you can build with PaperMage and PaperCraft.

TaeSoo Kim

Papeos

Augmenting Research Papers with Author Talk Videos

DemoPaperPresentation

NEW: UIST 2023

Hyeonsu B. Kang

Synergi & Threddy

Clipping Research Threads from Papers for Synthesis and Exploration

PaperPresentation

NEW: UIST 2023

Tal August

Paper Plain

Making Medical Research Papers Approachable to Healthcare Consumers

DemoCode TutorialPaper

NEW: TOCHI'22 -> CHI'24

Joseph Chang

Luca Soldaini

LLM Paper Q&A

A GPT-powered PDF QA system with attribution support.

DemoCode Tutorial

Joseph Chee Chang

CiteSee

Augmenting Citations in Papers with Persistent and Personalized Context

In-ProductionPaperPresentation

Napol Rachatasumrit

CiteRead

Localizing Incoming Citations from Follow on Papers in the Margins

DemoPaperPresentation

Raymond Fok

Scim

Automatic highlights for skimming support of scientific papers

In-ProductionPaper

Andrew Head

Kyle Lo

ScholarPhi

Augmenting Papers with Just-in-Time Definitions of Terms and Symbols

Founding ProjectDemoPaper

Publications

Semantic Reader Project Overview

The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie (Yu-Yen) Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, F.Q. Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Michael Kinney, A. Kittur, Hyeonsu B Kang, Egor Klevak, Bailey Kuehl, Michael Langan, Matt Latzke, Jaron Lochner, Kelsey MacMillan, Eric Stuart Marsh, Tyler Murray, Aakanksha Naik, Ngoc-Uyen Nguyen, Srishti Palani, Soya Park, Caroline Paulic, Napol Rachatasumrit, Smita R Rao, P. Sayre, Zejiang Shen, Pao Siangliulue, Luca Soldaini, Huy Tran, Madeleine van Zuylen, Lucy Lu Wang, Christopher Wilhelm, Caroline M Wu, Jiangjiang Yang, Angele Zamarron, Marti A. Hearst, Daniel S. Weld. ArXiv. 2023.

Interactive and Intelligent Reading Interfaces

Qlarify: Bridging Scholarly Abstracts and Papers with Recursively Expandable Summaries
Raymond Fok, Joseph Chee Chang, Tal August, Amy X. Zhang, Daniel S. Weld. ArXiv. 2023.
Papeos: Augmenting Research Papers with Talk Videos
Tae Soo Kim, Matt Latzke, Jonathan Bragg, Amy X. Zhang, Joseph Chee Chang. The ACM Symposium on User Interface Software and Technology. 2023.
Synergi: A Mixed-Initiative System for Scholarly Synthesis and Sensemaking
Hyeonsu B Kang, Sherry Wu, Joseph Chee Chang, A. Kittur. The ACM Symposium on User Interface Software and Technology. 2023.
🏆 Best Paper Award
CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context
Joseph Chee Chang, Amy X. Zhang, Jonathan Bragg, Andrew Head, Kyle Lo, Doug Downey, Daniel S. Weld. Proceedings of the CHI Conference on Human Factors in Computing Systems. 2023.
Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections
Srishti Palani, Aakanksha Naik, Doug Downey, Amy X. Zhang, Jonathan Bragg, Joseph Chee Chang. Proceedings of the CHI Conference on Human Factors in Computing Systems. 2023.
CiteRead: Integrating Localized Citation Contexts into Scientific Paper Reading
Napol Rachatasumrit, Jonathan Bragg, Amy X. Zhang, Daniel S. Weld. 27th International Conference on Intelligent User Interfaces. 2022.
🏆 Best Paper Award
Math Augmentation: How Authors Enhance the Readability of Formulas using Novel Visual Design Practices
Andrew Head, Amber Xie, Marti A. Hearst. Proceedings of the CHI Conference on Human Factors in Computing Systems. 2022.
Scim: Intelligent Skimming Support for Scientific Papers
Raymond Fok, Hita Kambhamettu, Luca Soldaini, Jonathan Bragg, Kyle Lo, Andrew Head, Marti A. Hearst, Daniel S. Weld. Proceedings of the 28th International Conference on Intelligent User Interfaces. 2022.
Exploring Team-Sourced Hyperlinks to Address Navigation Challenges for Low-Vision Readers of Scientific Papers
Soya Park, Jonathan Bragg, Michael Chang, K. Larson, Danielle Bragg. Proceedings of the ACM on Human-Computer Interaction. 2022.
Paper Plain: Making Medical Research Papers Approachable to Healthcare Consumers with Natural Language Processing
Tal August, Lucy Lu Wang, Jonathan Bragg, Marti A. Hearst, Andrew Head, Kyle Lo. ACM Transactions on Computer-Human Interaction. 2022. Presentation at CHI 2024.
Threddy: An Interactive System for Personalized Thread-based Exploration and Organization of Scientific Literature
Hyeonsu B Kang, Joseph Chee Chang, Yongsung Kim, A. Kittur. Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 2022.
🏆 Best Paper Award
SciA11y: Converting Scientific Papers to Accessible HTML
Lucy Lu Wang, Isabel Cachola, Jonathan Bragg, Evie (Yu-Yen) Cheng, Chelsea Hess Haupt, Matt Latzke, Bailey Kuehl, Madeleine van Zuylen, Linda M. Wagner, Daniel S. Weld. Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility. 2021.
Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and Symbols
Andrew Head, Kyle Lo, Dongyeop Kang, Raymond Fok, Sam Skjonsberg, Daniel S. Weld, Marti A. Hearst. Proceedings of the CHI Conference on Human Factors in Computing Systems. 2020.

Open Research Resources: Libraries, Models, Datasets

🏆 Best Paper Award
PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents
Kyle Lo, Zejiang Shen, Benjamin Newman, Joseph Chee Chang, Russell Authur, Erin Bransom, Stefan Candra, Yoganand Chandrasekhar, Regan Huff, Bailey Kuehl, Amanpreet Singh, Chris Wilhelm, Angele Zamarron, Marti A. Hearst, Daniel S. Weld, Doug Downey, Luca Soldaini. Conference on Empirical Methods in Natural Language Processing: Demos. 2023.
A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents
Benjamin Newman, Luca Soldaini, Raymond Fok, Arman Cohan, Kyle Lo. undefined. 2023.
🏆 Best Paper Award
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
Kalpesh Krishna, Erin Bransom, Bailey Kuehl, Mohit Iyyer, Pradeep Dasigi, Arman Cohan, Kyle Lo. ArXiv. 2023.
Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents
Catherine Chen, Zejiang Shen, D. Klein, G. Stanovsky, Doug Downey, Kyle Lo. ArXiv. 2023.
The Semantic Scholar Open Data Platform
Rodney Michael Kinney, Chloe Anastasiades, Russell Authur, Iz Beltagy, Jonathan Bragg, Alexandra Buraczynski, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Arman Cohan, Miles Crawford, Doug Downey, Jason Dunkelberger, Oren Etzioni, Rob Evans, Sergey Feldman, Joseph Gorney, D. Graham, F.Q. Hu, Regan Huff, Daniel King, Sebastian Kohlmeier, Bailey Kuehl, Michael Langan, Daniel Lin, Haokun Liu, Kyle Lo, Jaron Lochner, Kelsey MacMillan, Tyler Murray, Christopher Newell, Smita R Rao, Shaurya Rohatgi, P. Sayre, Zejiang Shen, Amanpreet Singh, Luca Soldaini, Shivashankar Subramanian, A. Tanaka, Alex D Wade, Linda M. Wagner, Lucy Lu Wang, Christopher Wilhelm, Caroline Wu, Jiangjiang Yang, Angele Zamarron, Madeleine van Zuylen, Daniel S. Weld. ArXiv. 2023.
VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups
Zejiang Shen, Kyle Lo, Lucy Lu Wang, Bailey Kuehl, Daniel S. Weld, Doug Downey. Transactions of the Association for Computational Linguistics. 2021.
Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions
Dongyeop Kang, Andrew Head, Risham Sidhu, Kyle Lo, Daniel S. Weld, Marti A. Hearst. Proceedings of the First Workshop on Scholarly Document Processing @ ACL. 2020.

Core Team

See the Project Overview Paper to see a full list of contributors.
^†For questions and inquiries, please contact Joseph Chee Chang (PaperCraft & Intelligent reading interfaces), or Kyle Lo and Luca Soldaini (PaperMage & Scientific document processing).