Archives > Primary Documents

PRIMARY SOURCES ONLINE

Minutes of the Trustees of the University of Pennsylvania
Creating searchable PDFs

Creating Online PDF Access to the Minutes of the Trustees of the University of Pennsylvania:

To create online PDF (Portable Document Format) versions of Trustees Minutes, the University Archives scanned the original documents, created digital images of each page, applied optical character recognition to the typeface, and edited and proofed the text files that lie behind the images. The result is that Trustees Minutes and attachments, beginning with the meeting of 9 June 1969 and continuing to recent meetings, are full-text searchable.

The project followed an eight-step procedure:

  1. University Archives staff scan original documents into Tagged-Image File Format (TIFF) images.
  2. Staff then utilize an optical character recognition software to create a digital text file of the image and to associate the text and the image in a single Adobe Capture Document (ACD) file.
  3. Staff then edit ACD files for character recognition errors.
  4. Staff then proofread edited ACD files and save the completed files as Portable Document Format (PDF) files.
  5. For indexing, retrieval, and display purposes, staff then create document ID information for the PDF files.
  6. Staff then create Hypertext Markup Language (HTML) links between PDF files.
  7. Staff then create HTML links between PDF files and the University Archives web site.
  8. Staff conclude by writing TIFF, ACD, PDF and word processing files to Compact Disc (CD) storage medium.