Document Capture Software refers to applications that provide the ability and feature set to automate the process of scanning paper documents or importing electronic documents, often for the purposes of feeding advanced document classification and data collection processes. Most scanning hardware, both scanners and copiers, provides the basic ability to scan to any number of image file formats, including: PDF, TIFF, JPG, BMP, etc. This basic functionality is augmented by document capture software, which can add efficiency and standardization to the process.
Typical features of Document Capture Software include:
- Barcode recognition
- Patch Code recognition
- Optical Character Recognition (OCR)
- Optical Mark Recognition (OMR)
- Quality Assurance
Goal for Implementation of a Document Capture Solution
The goal for implementing a document capture solution is to reduce the amount of time spent scanning, separating, enhancing, organizing, classifying, normalizing, and collecting information from document collections, and to produce metadata along with an image/PDF file, and/or OCR text. This information is then migrated to a file share, FTP site, database, Document Management or Enterprise Content Management system. These systems often provide a search function, allowing search of the assets based on the produced metadata, and then viewed using document imaging software.
Document Capture System Solutions – General
Integration with Document Management System
Main article: Enterprise content management
ECM (Enterprise Content management) and their DMS component (Document Management System) are being adopted by many organizations as a corporate document management system for all types of electronic files, e.g. MS word, PDF … However, much of the information held by organisations is on paper and this needs to be integrated within the same document repository.
By converting paper documents into digital format through scanning, organizations convert paper into image formats such as TIF, JPG, and PDF, and also extract valuable index information or business data from the document using OCR technology. Digital documents and associated metadata can easily be stored in the ECM in a variety of formats. The most popular of these formats is PDF which not only provides an accurate representation of the document but also allows all the OCR text in the document to be stored behind the PDF image. This format is known as PDF with hidden text or text-searchable PDF. This allows users to search for documents by using keywords in the metadata fields or by searching the content of PDF files across the repository.
Advantages of scanning documents into a ECM/DMS
Information held on paper is usually just as valuable to organisations as the electronic documents that are generated internally. Often this information represents a large proportion of the day to day correspondence with suppliers and customers. Having the ability to manage and share this information internally through a document management system such as SharePoint or a CMIS-compatible repository improves collaboration between departments or employees and also eliminates the risk of losing this information through disasters such as floods or fire.
Organisations adopting an ECM/DMS often implement electronic workflow which allows the information held on paper to be included as part of an electronic business process and incorporated into a customer record file along with other associated office documents and emails. For business critical documents, such as purchase orders and supplier invoices, digitising documents helps speed up business transactions as well as reduce manual effort involved in keying data into business systems, such as CRM, ERP and Accounting. Scanned invoices can also be routed to managers for payment approval via email or an electronic workflow.
Electronic Document Capture
In the earlier implementations of Document Capture Software, the technology focused solely on the digitization and capture of information from paper documents. Document images were acquired from document scanners via TWAIN/ISIS drivers. Only image-based file formats like TIF, JPG, and BMP were typically compatible with these solutions. But in recent years, as the volume of electronically-created documents and the number of proprietary file formats continues to increase at exponential rates, the need for handling documents existing in electronic formats has grown. The relevant document capture products have adapted to function with non-image file formats with the end-goal of creating a unified processing workflow capable of handling all incoming documents
The ability to import files from a variety of sources is one example of such adaptation. Importing documents from ECM/DMS software solutions, email servers, FTP, and EDI is now as much of a requirement of document capture software as is paper capture.
The normalization of output files to text-based PDF format is now another critical factor in long-term archival of proprietary electronic file formats. Normalization expands access and usage of files to users throughout the enterprise, rather than only those that created the original electronic file.