Optical character recognition
Optical character recognition (OCR) is a method of automatic data entry. OCR software is used to convert handwritten, type-written or printed text into data that can be edited on a computer. In simple systems, the paper documents are scanned with an image scanner. The OCR software then looks at the image and compares the shapes of the letters to stored images of letters. In this way, it makes a text file that can be edited with a normal text editor.
More complex systems look at images, layout and so on. This can make editable electronic versions which look identical to the original documents.
OCR works best with clean, clearly printed materials.
OCR-Software
- Adobe Acrobat Professional (Windows, Mac OS)
- BIT-Alpha (Windows)
- ExactScan Pro (Mac OS)
- FineReader (Unix, Windows)
- OCRKit (Mac OS)
- Readiris (Unix, Windows, Mac OS)
- Nuance Omnipage (Windows)
- OCRvision (Windows)
Optical Character Recognition Media
Video of the process of scanning and real-time optical character recognition (OCR) with a portable scanner
Occurrence of laft and last in Google's n-grams database, in English documents from 1700 to 1900, based on OCR scans for the "English 2009" corpus
Occurrence of laft and last in Google's n-grams database, based on OCR scans for the "English 2012" corpus
Searching for words with a long S in English 2012 or later are normalized to an S.