Document Manager

Our client had a system which received and processed tens of thousands of documents daily, ton be indexed and stored into a multi-terabyte document library. Each document was processed by a conversion program to change it into a common format, suitable for display by the client's system. Once a document was converted, it was examined by an editor, corrected if necessary, placed into a test system for previewing, iterated through the correction and test phases as many times as necessary to produce a viewable document, and then built into a release for indexing. Originally, all of the above steps were done by hand a large department of editors and assistants through a manual system. The tools for editing consisted of a basic text editor and the edits were made directly to the preview HTML documents.

BSM Development undertook a project to automate all of the steps in the above process, including allowing the documents to be directly edited with a WYSIWYG Java applet running on a Web server. The number of editors and assistants was reduced significantly and the client was able to explore outsourcing options that were impossible before the project was undertaken.

BSM began the project by analyzing the flow of documents through the system. It turned out that a large number of the steps that were done manually would lend themselves very well to automation. Furthermore, the iterative steps of editing a document and then placing it into the test system for viewing could be consolidated into a single step using a WYSIWYG editor. Finally, the reimplementation of the system gave us an opportunity to put in place automated verification algorithms that checked for consistency and made repairs, thereby improving the quality of the resulting documents and reducing the burden on the editors.

The amount of manpower required to run the system was reduced. The the vast majority of data handling tasks were eliminated, thereby expunging all of the drudgery from the job. The job became much more satisfying to the editors who, consequently, performed at a much higher level. Additionally, since the system could now be run from a remote location by editors using Web browsers, outsourcing of the editing function was possible. The net results were an improvement in quality and a decrease in cost.

Beginning with the receipt of a document from a supplier, we created a system to accept one or more documents in a bulk document stream, storing them on disk in an interim tree hierarchy. The tree was organized to store the documents by publication, the way the editors normally dealt with them, to make manipulating them a natural process. A series of indexes was added to them to allow different criteria to be used for searching and retrieval of documents of interest.

Automated verification and repair algorithms were applied to the stored documents as they were placed into the tree. Many common editing operations, such as field normalization, were codified and added to the series of algorithms applied. This step reduced significantly the amount of hand editing that was necessary. The idea was to show the editors a high-quality document, thereby reducing the number of mundane tasks that they would have to perform.

BSM Development designed and built a Web-based user interface to display preview documents directly from the interim document storage, exactly as they would be viewed by the end-user, thereby eliminating the need to put a document into the test system to view it. A WYSIWYG editor application was created, using Java, that allowed the editors to examine the documents and perform QA and repairs directly on them. We also created a set of QA support tools that reduced the strain of checking the tens of thousands of documents received daily.

To complete the system, we built document promotion software that collected documents which were marked OK and submitted them to the production document repository, handling distribution to multiple repositories, if need be. The act of promotion also caused the documents to be passed through an indexer that built the document index. All of these steps, which were formerly done by hand, were automated.

In addition to automation, the promotion process, which had previously been one of building large batches of documents for submission to the indexer en-bloc, was turned into a flow-through one that processed documents incrementally. This in turn had a significant impact on reducing the effects of an error on a processing run, since fewer documents were aggregated together. Furthermore, error checking was moved from the indexer to the promotion module, so that documents in error would not cause the indexer to give up. Any document in error was simply kicked back into the interim tree where it could be easily fixed by an editor, while all of the other good documents were sent on to the indexer.

In the end, we saw that a well-designed system, which took into account how documents were handled and processed, plus the automation of a number of mundane tasks, was able to reduce the number of persons employed in processing large volumes of documents. We also observed a higher quality of output from the system and less job-dissatisfaction and burnout on the part of the people doing the document processing.