The Semantic Reader Open Research Platform

Semantic Reader Project is a collaborative effort of NLP + HCI researchers from non-profit, industry, and academic institutions to create interactive, intelligent reading interfaces for scholarly papers. Our research led to the creation of Semantic Reader, an application used by tens of thousands of scholars each week.

The Semantic Reader Open Research Platform provides resources that enable the broader research community to explore exciting challenges around novel research support tools: PaperMage, a library for processing and analyzing scholarly PDFs, and PaperCraft, a React UI component for building augmented and interactive reading interfaces. Join us in designing the future of scholarly reading interfaces with our open source libraries!

AI2 Logo
UW Logo
UCB Logo
UPenn Logo
MIT Logo
UIUC Logo
Minnesota Logo
Minnesota Logo

Open Source Libraries

We provide PaperMage + PaperCraft for building intelligent and interactive paper readers. Below we showcase how to extract text from a PDF to prompt a LLM for term definitions and then visually augment the paper with highlights and popups.

Demo Screenshot

PaperMage LogoPaperMage

Process and Analyze Scholarly PDF Documents

from papermage.recipes import CoreRecipe

recipe = CoreRecipe()
doc = recipe.run("paper.pdf")
paragraphs_text = [p.text for p in doc.paragraphs]

term_defs = []

for sentence in doc.abstracts[0].sentences:
    print(sentence.text)
    # When reading a scholarly article, inline...
    # However, it can be challenging to pri...
    # ...

    print(sentence.words[:2])
    # ['When', 'reading']
    # ['However', 'it']
    # ...

    # bounding boxes of 4th words + definitions
    term = sentence.words[3]
    term_def = prompt(
      ' '.join(paragraphs_text)} + 
      f'What is the definition of "{term.text}"?'
    )
    term_defs.append((term.boxes, term_def))

send_to_paper_craft_ui(term_defs)️

PaperCraft LogoPaperCraft

Create Visually Augmented Interactive Readers

Reader.tsxPopover.tsx
import {
  DocumentContext, DocumentWrapper, Overlay, PageWrapper
} from '@allenai/pdf-components' // aka PaperCraft

const Reader: React.FC = ({termDefinitions}) => {
  const {numPages} = useContext(DocumentContext)
  const pageIndices = [...Array(numPages).keys()]
  /* PageWrapper: render each page */
  /* Overlay: visual augmentations and interactions */
  return (
    <DocumentWrapper file={pdfUrl}>
      {pageIndices.map(pageIndex => (
        <PageWrapper pageIndex={pageIndex}>
          <Overlay>
            {/* abstract is on page 1 */}
            {pageIndex === 0 && ( 
              {termDefinitions.map(termDefinition => (
                <BlueTextPopover
                  termDefinition={termDefinition}
                />
              )}
            )}
          </Overlay>
        </PageWrapper>
      )}
    </DocumentWrapper>
  )
}
import { Popover } from 'antd'
import { BoundingBox } from '@allenai/pdf-components'

const BlueTextPopover: React.FC = (props) => {
  const { termDefinition } = props
  const [box, definition] = termDefinition
  {/* show definition on click with an antd widget */}
  {/* highlight the BoundingBox of the term */}
  return (
    <Popover
      content={definition}
      trigger="click"
    >
      <BoundingBox
        className="screen-blend-blue"
        isHighlighted={true}
        page={box.page}
        top={box.top}
        left={box.left}
        height={box.height}
        width={box.width}
      />
    </Popover>
  )
}
/* .screen-blend-blue {
      background: blue;
      mix-blend-mode: screen;} */

Publications

Semantic Reader Project Overview

  • The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
    Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie (Yu-Yen) Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, F.Q. Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Michael Kinney, A. Kittur, Hyeonsu B Kang, Egor Klevak, Bailey Kuehl, Michael Langan, Matt Latzke, Jaron Lochner, Kelsey MacMillan, Eric Stuart Marsh, Tyler Murray, Aakanksha Naik, Ngoc-Uyen Nguyen, Srishti Palani, Soya Park, Caroline Paulic, Napol Rachatasumrit, Smita R Rao, P. Sayre, Zejiang Shen, Pao Siangliulue, Luca Soldaini, Huy Tran, Madeleine van Zuylen, Lucy Lu Wang, Christopher Wilhelm, Caroline M Wu, Jiangjiang Yang, Angele Zamarron, Marti A. Hearst, Daniel S. Weld. ArXiv. 2023.

Interactive and Intelligent Reading Interfaces

Open Research Resources: Libraries, Models, Datasets

Core Team

See the Project Overview Paper to see a full list of contributors.
†For questions and inquiries, please contact Joseph Chee Chang (PaperCraft & Intelligent reading interfaces), or Kyle Lo and Luca Soldaini (PaperMage & Scientific document processing).

AI2 Logo
UW Logo
UCB Logo
UPenn Logo
MIT Logo
UIUC Logo
Minnesota Logo
Minnesota Logo

Research Advisory Board

Photo of Daniel S. Weld
Daniel S. Weld
Chief Scientist and General Manager
Allen Institute for AI, Semantic Scholar
Photo of Marti A. Hearst
Marti A. Hearst
Professor and Interim Dean
University of California, Berkeley

Intelligent Reading Interfaces Research

Photo of Jonathan Bragg
Jonathan Bragg
Research Scientist
Allen Institute for AI, Semantic Scholar
Photo of Joseph Chang
Joseph Chang†
Research Scientist
Allen Institute for AI, Semantic Scholar
Photo of Andrew Head
Andrew Head
Assistant Professor
University of Pennsylvania
Photo of Tal August
Tal August
Assistant Professor
UIUC
Photo of Aniket Kittur
Aniket Kittur
Professor
Carnegie Mellon University
Photo of Cassidy Trier
Cassidy Trier
Product Designer
Allen Institute for AI, Semantic Scholar
Photo of Lucy Lu Wang
Lucy Lu Wang
Assistant Professor
University of Washington
Photo of Matt Latzke
Matt Latzke
Product Designer
Allen Institute for AI, Semantic Scholar
Photo of Kyle Lo
Kyle Lo
Research Scientist
Allen Institute for AI, Semantic Scholar
Photo of Amy X. Zhang
Amy X. Zhang
Assistant Professor
University of Washington

Scientific Document Processing Research

Photo of Luca Soldaini
Luca Soldaini†
Research Scientist
Allen Institute for AI, Semantic Scholar
Photo of Dongyeop Kang
Dongyeop Kang
Assistant Professor
University of Minnesota
Photo of Shannon Shen
Shannon Shen
Doctoral Student
MIT
Photo of Kyle Lo
Kyle Lo†
Research Scientist
Allen Institute for AI, Semantic Scholar
Photo of Doug Downey
Doug Downey
Senior Director
Allen Institute for AI, Semantic Scholar

Research Libraries and Tooling

Photo of Paul Sayre
Paul Sayre
Software Engineer
Allen Institute for AI, Semantic Scholar
Photo of YenSung Chen
YenSung Chen
Software Engineer
Allen Institute for AI, Semantic Scholar
Photo of Smita Rao
Smita Rao
Engineering Manager
Allen Institute for AI, Semantic Scholar
Photo of Eric Marsh
Eric Marsh
Software Engineer
Allen Institute for AI, Semantic Scholar
Photo of Sam Skjonsberg
Sam Skjonsberg
Engineering Manager
Allen Institute for AI, Semantic Scholar
Photo of Ngoc-Uyen Nguyen
Ngoc-Uyen Nguyen
Software Engineer
Allen Institute for AI, Semantic Scholar
Photo of Tyler Murray
Tyler Murray
Software Engineer
Allen Institute for AI, Semantic Scholar
Photo of Huy Tran
Huy Tran
Software Engineer
Allen Institute for AI, Semantic Scholar
Photo of Chloe Anastasiades
Chloe Anastasiades
Software Engineer
Allen Institute for AI, Semantic Scholar
Photo of Caroline Paulic
Caroline Paulic
Software Engineer
Allen Institute for AI, Semantic Scholar