SCANNING
PROCESS
Issues
of importance
Capture process:
There are several processing options to consider when planning for digitization.
The objects can be prepared for capture by removing staples, unfolding
materials, even enhancing faded originals using conservatory techniques.
Care must be taken to preserve the originals such as removing bindings
on books, inserting fragile papers in clear retaining jackets, and unfolding
pages. Once the material is ready for capture, the methods of capture
vary depending upon its form. Scanning devices exist to capture slides,
movies, maps, and documents. However, the digitizing tools go further
to include file conversion programs for material already in digital form
(e.g., word processing files, GILS data layer files) but that require
conversion to the target formats.
Onsite scanning
versus transport to central processing:
Setting up scanning and recording equipment at the location of source
material is justified when:
- the amount of digitizing
is large
- the material cannot
be safely transported to central scanning facilities
- the amount of
digitization is small and the quality of images is acceptable from low-cost
desktop scanning systems or
- if there is any
issue with removal of the property from local control, remote scanning
may be considered
If production quality,
professional scanning equipment will be used, on-site capture of objects
poses some challenges. Several companies offer mobile services at an increased
charge. Equipment so moved is at risk of vibration and must be calibrated
upon setup. However, for large collections of over 5,000 documents, local
processing may be feasible.
A new option available
is the use of lower cost desktop processing equipment. With the use of
scanning and indexing software, medium quality scanning devices, and portable
storage media such as ZIP disks or recordable CDs, a local library or
small organization can collect indexed images.
Image Quality Assurance
It is important during scanning to verify the quality of the captured
image. This can be done on each object or on a sample. The operator should
test the quality as the nature of the objects change during a scanning
session. This is because the contrast, hue, and focus may have changed.
Object that failed this test will need to be rescanned. For microfilming,
quality assurance must be delayed until the film is developed
Image enhancements
during processing
Frequently source documents, photographs, stereographs, maps, and artwork
are in poor condition. Besides fragility, they may have lost contrast,
color definition, and clarity. In addition to physical remedies such as
conservator methods, electronic remedies are available. Tonal improvements
are possible after initial scanning into digital format. Contrast can
be increased to improve OCR accuracy.
Imaging standards
The process of capturing objects into digital form has evolved from several
methodologies. Several competing standards exist. Depending upon the purpose
of the capture and the ultimate display objectives, the digitization process
should consider compliance with the most supported and enduring standards.
Resources:
Technical Recommendations
for Digital Imaging Projects
Library of
Congress Imaging Specifications
TOP
Options
to consider
Digital
Files Exist in Several Forms
Pointer records
with Thumbnail images:
If the objective of digitization is to create searchable electronic copies
for patron review over the Internet, then a navigational process is needed
to get to the copy or to the original resource. Often this means small
'thumbnail' images that are hypertext linked to a source file. Limited
quality will suffice in this application. Smaller image files load more
quickly on Browser screens and provide sufficient information to guide
the searcher.
Text images
Scanning documents to display the contents in a non-editable form is similar
to presentation of images. It is a window to look at objects without the
ability to search text within. Other than meeting a readable level of
resolution, no further quality is needed.
Preservation Copies
If the goal is to retain a copy of an object for long-term archival access,
the methodology of digitization is substantially different from straight
dissemination. It may involve microfilming before or during scanning.
One approach, 'film first', actually scans from a film copy. It stands
to reason that the film copy would be at the highest possible resolution.
The archival copy is placed in a protective environment to meet preservation
standards. One advantage of this approach is to allow re-digitization
from the master copy as technology evolves.
Film first (intermediary)
versus scan first
If the objective of digitization includes archival preservation, then
a durable medium such as microforms should be considered. High-resolution
images of the objects can be rendered on the microform (usually microfilm).
Since the patron will view the digitized object on a screen with limited
resolution, capture of the image onto microforms creates the sharpest
reproduction. Conversion to electronic image formats can be accomplished
through scanning of the microform image. This process produces good results
when trained microform camera operators and good equipment are employed.
Future improvements to scanning quality can be utilized by rescanning
from the microform. It is expected that ultimately electronic scanning
will reach or exceed photographic quality. Perhaps the durability of electronic
image files will also exceed that of today's microforms.
Other issues, beyond
preservation, may justify the use of microforms as an intermediate process.
Microform cameras are portable, require minimal calibration, and can be
operated by local, trained operators. This allows volunteers, willing
to receive camera operation training, the opportunity to process objects
on site. The film is then sent to a central service for processing and
scanning. The electronic files are often loaded onto CDs or high capacity
disks (e.g., Iomega's ZIP disks) and returned to the collection site for
indexing.
If microforms are
not desired, then scanning the objects directly into electronic files
is possible. Typically, the objects are scanned at the highest resolution
to create the master image. Lower resolution image files are then made
from the master for medium quality and thumbnail-size images.
35 mm Slides
A variation on microform capture is the use of photographic copies. This
is the preferred medium for capturing images of physical artifacts. The
slides are then scanned. 35 mm Slides may not satisfy archival needs.
Text searching
(OCR)
Optical Character Recognition (OCR) applications attempt to convert images
to recognizable text. When the source object has crisp, clean characters,
the process works well. Handwritten and aged manuscripts challenge the
best OCR tools. The OCR process can be labor intensive for the correction
of misread words. Once documents have been interpreted into text, the
full power of text searching can be applied to index and retrieve these
resources. Human editing of OCR records is mandatory to insure quality
control and accuracy.
Optical Character
Recognition processing can be applied to textual materials scanned into
digital image files. The software will convert patterns of light and dark
on the image to characters suitable for word processing viewing. However,
even with excellent quality textual documents, some errors in interpretation
occur. With decreased contrast, clarity, and typeface size of the source
document, read errors increase. Additional human interpretation is required.
Typically, this is done on the computer screen, displaying the recorded
image and asking the operator to assign the proper character. Even with
a spell check, errors in interpretation will get through validation steps.
If the purpose of the OCR is to create a full-text reproduction of the
words on the source document, then substantial human assistance is required.
However, if the objective is only to create a concordance file of words
from the document for search and retrieval, then OCR may be cost-effective.
TOP
Project
checklist:
- The technology
is the most straightforward piece of the project.
- Leasing or outsourcing
for scanning may be the best practice for small organizations.
- Benchmark the
processes and test results before proceeding to capture the information.
- Scan at the highest
quality that the project can afford.
- Do not process
for any specific output.
- Scan from the
original whenever possible.
- Plan how you will
maintain the image over time, including migration to another file.
- Consider workflow
issues to achieve efficient capture.
- Plan how to handle
the originals to cause the least damage.
- Plan time to prepare
the documents for scanning, preparation time takes longer than the actual
scanning.
- Plan how to train
scanning technicians and prepare workbooks of protocols and standards.
- Record administrative
data about the scanner for future reference
- Capture activities
are typically one third of the project cost.
- Estimate the cost
of keeping the digital collections fifty years.
- Decisions made
now will have impact on access to the digital collections in the future.
TOP
|