headerheader
technologybg

SCANNING PROCESS

Issues of importance

Capture process:
There are several processing options to consider when planning for digitization. The objects can be prepared for capture by removing staples, unfolding materials, even enhancing faded originals using conservatory techniques. Care must be taken to preserve the originals such as removing bindings on books, inserting fragile papers in clear retaining jackets, and unfolding pages. Once the material is ready for capture, the methods of capture vary depending upon its form. Scanning devices exist to capture slides, movies, maps, and documents. However, the digitizing tools go further to include file conversion programs for material already in digital form (e.g., word processing files, GILS data layer files) but that require conversion to the target formats.

Onsite scanning versus transport to central processing:
Setting up scanning and recording equipment at the location of source material is justified when:

  1. the amount of digitizing is large
  2. the material cannot be safely transported to central scanning facilities
  3. the amount of digitization is small and the quality of images is acceptable from low-cost desktop scanning systems or
  4. if there is any issue with removal of the property from local control, remote scanning may be considered

If production quality, professional scanning equipment will be used, on-site capture of objects poses some challenges. Several companies offer mobile services at an increased charge. Equipment so moved is at risk of vibration and must be calibrated upon setup. However, for large collections of over 5,000 documents, local processing may be feasible.

A new option available is the use of lower cost desktop processing equipment. With the use of scanning and indexing software, medium quality scanning devices, and portable storage media such as ZIP disks or recordable CDs, a local library or small organization can collect indexed images.

Image Quality Assurance
It is important during scanning to verify the quality of the captured image. This can be done on each object or on a sample. The operator should test the quality as the nature of the objects change during a scanning session. This is because the contrast, hue, and focus may have changed. Object that failed this test will need to be rescanned. For microfilming, quality assurance must be delayed until the film is developed

Image enhancements during processing
Frequently source documents, photographs, stereographs, maps, and artwork are in poor condition. Besides fragility, they may have lost contrast, color definition, and clarity. In addition to physical remedies such as conservator methods, electronic remedies are available. Tonal improvements are possible after initial scanning into digital format. Contrast can be increased to improve OCR accuracy.

Imaging standards
The process of capturing objects into digital form has evolved from several methodologies. Several competing standards exist. Depending upon the purpose of the capture and the ultimate display objectives, the digitization process should consider compliance with the most supported and enduring standards.

Resources: Technical Recommendations for Digital Imaging Projects

Library of Congress Imaging Specifications

TOP

Options to consider
Digital Files Exist in Several Forms

Pointer records with Thumbnail images:
If the objective of digitization is to create searchable electronic copies for patron review over the Internet, then a navigational process is needed to get to the copy or to the original resource. Often this means small 'thumbnail' images that are hypertext linked to a source file. Limited quality will suffice in this application. Smaller image files load more quickly on Browser screens and provide sufficient information to guide the searcher.

Text images
Scanning documents to display the contents in a non-editable form is similar to presentation of images. It is a window to look at objects without the ability to search text within. Other than meeting a readable level of resolution, no further quality is needed.

Preservation Copies
If the goal is to retain a copy of an object for long-term archival access, the methodology of digitization is substantially different from straight dissemination. It may involve microfilming before or during scanning. One approach, 'film first', actually scans from a film copy. It stands to reason that the film copy would be at the highest possible resolution. The archival copy is placed in a protective environment to meet preservation standards. One advantage of this approach is to allow re-digitization from the master copy as technology evolves.

Film first (intermediary) versus scan first
If the objective of digitization includes archival preservation, then a durable medium such as microforms should be considered. High-resolution images of the objects can be rendered on the microform (usually microfilm). Since the patron will view the digitized object on a screen with limited resolution, capture of the image onto microforms creates the sharpest reproduction. Conversion to electronic image formats can be accomplished through scanning of the microform image. This process produces good results when trained microform camera operators and good equipment are employed. Future improvements to scanning quality can be utilized by rescanning from the microform. It is expected that ultimately electronic scanning will reach or exceed photographic quality. Perhaps the durability of electronic image files will also exceed that of today's microforms.

Other issues, beyond preservation, may justify the use of microforms as an intermediate process. Microform cameras are portable, require minimal calibration, and can be operated by local, trained operators. This allows volunteers, willing to receive camera operation training, the opportunity to process objects on site. The film is then sent to a central service for processing and scanning. The electronic files are often loaded onto CDs or high capacity disks (e.g., Iomega's ZIP disks) and returned to the collection site for indexing.

If microforms are not desired, then scanning the objects directly into electronic files is possible. Typically, the objects are scanned at the highest resolution to create the master image. Lower resolution image files are then made from the master for medium quality and thumbnail-size images.

35 mm Slides
A variation on microform capture is the use of photographic copies. This is the preferred medium for capturing images of physical artifacts. The slides are then scanned. 35 mm Slides may not satisfy archival needs.

Text searching (OCR)
Optical Character Recognition (OCR) applications attempt to convert images to recognizable text. When the source object has crisp, clean characters, the process works well. Handwritten and aged manuscripts challenge the best OCR tools. The OCR process can be labor intensive for the correction of misread words. Once documents have been interpreted into text, the full power of text searching can be applied to index and retrieve these resources. Human editing of OCR records is mandatory to insure quality control and accuracy.

Optical Character Recognition processing can be applied to textual materials scanned into digital image files. The software will convert patterns of light and dark on the image to characters suitable for word processing viewing. However, even with excellent quality textual documents, some errors in interpretation occur. With decreased contrast, clarity, and typeface size of the source document, read errors increase. Additional human interpretation is required. Typically, this is done on the computer screen, displaying the recorded image and asking the operator to assign the proper character. Even with a spell check, errors in interpretation will get through validation steps. If the purpose of the OCR is to create a full-text reproduction of the words on the source document, then substantial human assistance is required. However, if the objective is only to create a concordance file of words from the document for search and retrieval, then OCR may be cost-effective.

TOP

Project checklist:

  • The technology is the most straightforward piece of the project.
  • Leasing or outsourcing for scanning may be the best practice for small organizations.
  • Benchmark the processes and test results before proceeding to capture the information.
  • Scan at the highest quality that the project can afford.
  • Do not process for any specific output.
  • Scan from the original whenever possible.
  • Plan how you will maintain the image over time, including migration to another file.
  • Consider workflow issues to achieve efficient capture.
  • Plan how to handle the originals to cause the least damage.
  • Plan time to prepare the documents for scanning, preparation time takes longer than the actual scanning.
  • Plan how to train scanning technicians and prepare workbooks of protocols and standards.
  • Record administrative data about the scanner for future reference
  • Capture activities are typically one third of the project cost.
  • Estimate the cost of keeping the digital collections fifty years.
  • Decisions made now will have impact on access to the digital collections in the future.

TOP

projectmanagment
collection
technology
hardware
scanning
delivery
image
resources
funding
digital projets
scenarios
mountain
sitemapprivacycontacthome