The Semantic Reader Open Research Platform

Semantic Reader Project is a collaborative effort of NLP + HCI researchers from non-profit, industry, and academic institutions to create interactive, intelligent reading interfaces for scholarly papers. Our research led to the creation of Semantic Reader, an application used by tens of thousands of scholars each week.

The Semantic Reader Open Research Platform provides resources that enable the broader research community to explore exciting challenges around novel research support tools: PaperMage, a library for processing and analyzing scholarly PDFs, and PaperCraft, a React UI component for building augmented and interactive reading interfaces. Join us in designing the future of scholarly reading interfaces with our open source libraries!

AI2 Logo
UW Logo
UCB Logo
UPenn Logo
MIT Logo
UIUC Logo
Minnesota Logo
Minnesota Logo

Open Source Libraries

We provide PaperMage + PaperCraft for building intelligent and interactive paper readers. Below we showcase how to extract text from a PDF to prompt a LLM for term definitions and then visually augment the paper with highlights and popups.

Demo Screenshot

PaperMage LogoPaperMage

Process and Analyze Scholarly PDF Documents

from papermage.recipes import CoreRecipe

recipe = CoreRecipe()
doc = recipe.run("paper.pdf")
paragraphs_text = [p.text for p in doc.paragraphs]

term_defs = []

for sentence in doc.abstracts[0].sentences:
    print(sentence.text)
    # When reading a scholarly article, inline...
    # However, it can be challenging to pri...
    # ...

    print(sentence.words[:2])
    # ['When', 'reading']
    # ['However', 'it']
    # ...

    # bounding boxes of 4th words + definitions
    term = sentence.words[3]
    term_def = prompt(
      ' '.join(paragraphs_text)} + 
      f'What is the definition of "{term.text}"?'
    )
    term_defs.append((term.boxes, term_def))

send_to_paper_craft_ui(term_defs)️

PaperCraft LogoPaperCraft

Create Visually Augmented Interactive Readers

Reader.tsxPopover.tsx
import {
  DocumentContext, DocumentWrapper, Overlay, PageWrapper
} from '@allenai/pdf-components' // aka PaperCraft

const Reader: React.FC = ({termDefinitions}) => {
  const {numPages} = useContext(DocumentContext)
  const pageIndices = [...Array(numPages).keys()]
  /* PageWrapper: render each page */
  /* Overlay: visual augmentations and interactions */
  return (
    <DocumentWrapper file={pdfUrl}>
      {pageIndices.map(pageIndex => (
        <PageWrapper pageIndex={pageIndex}>
          <Overlay>
            {/* abstract is on page 1 */}
            {pageIndex === 0 && ( 
              {termDefinitions.map(termDefinition => (
                <BlueTextPopover
                  termDefinition={termDefinition}
                />
              )}
            )}
          </Overlay>
        </PageWrapper>
      )}
    </DocumentWrapper>
  )
}
import { Popover } from 'antd'
import { BoundingBox } from '@allenai/pdf-components'

const BlueTextPopover: React.FC = (props) => {
  const { termDefinition } = props
  const [box, definition] = termDefinition
  {/* show definition on click with an antd widget */}
  {/* highlight the BoundingBox of the term */}
  return (
    <Popover
      content={definition}
      trigger="click"
    >
      <BoundingBox
        className="screen-blend-blue"
        isHighlighted={true}
        page={box.page}
        top={box.top}
        left={box.left}
        height={box.height}
        width={box.width}
      />
    </Popover>
  )
}
/* .screen-blend-blue {
      background: blue;
      mix-blend-mode: screen;} */

Publications

Semantic Reader Project Overview

  • The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
    Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie (Yu-Yen) Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, F.Q. Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Michael Kinney, A. Kittur, Hyeonsu B Kang, Egor Klevak, Bailey Kuehl, Michael Langan, Matt Latzke, Jaron Lochner, Kelsey MacMillan, Eric Stuart Marsh, Tyler Murray, Aakanksha Naik, Ngoc-Uyen Nguyen, Srishti Palani, Soya Park, Caroline Paulic, Napol Rachatasumrit, Smita R Rao, P. Sayre, Zejiang Shen, Pao Siangliulue, Luca Soldaini, Huy Tran, Madeleine van Zuylen, Lucy Lu Wang, Christopher Wilhelm, Caroline M Wu, Jiangjiang Yang, Angele Zamarron, Marti A. Hearst, Daniel S. Weld. ArXiv. 2023.

Interactive and Intelligent Reading Interfaces

Open Research Resources: Libraries, Models, Datasets

Core Team

See theΒ Project Overview PaperΒ to see a full list of contributors.
†For questions and inquiries, please contact Joseph Chee Chang (PaperCraft & Intelligent reading interfaces), or Kyle Lo and Luca Soldaini (PaperMage & Scientific document processing).

AI2 Logo
UW Logo
UCB Logo
UPenn Logo
MIT Logo
UIUC Logo
Minnesota Logo
Minnesota Logo

Research Advisory Board

Intelligent Reading Interfaces Research

Scientific Document Processing Research

Research Libraries and Tooling