FlexiCapture 11 Performance Guide: Scanning & Verification stations

  • Last Post 13 April 2016
Katja Ovcharova posted this 13 April 2016

Hello Capturedocs pariticipants!

Today we will post some useful information about Scanning and Verification stations.
Hope you find this helpful
Scanning Stations

ABBYY FlexiCapture supports importing images from:
  • a personal scanner (through a thin or local rich client),
  • a network scanner (images go to a folder or e-mail inbox),
  • or from a mobile app.
Each scanning client’s performance is limited by scanner speed and data transfer bandwidth.

The total number of scanning clients is not as important for performance as the total input flow — the average and peak number of pages processed per hour or per 24 hours, and the size of each page that depends on its color mode. The peak input flow should not exceed the System’s capabilities.

Traffic from all Scanning Stations, Verification Stations and Processing Stations passes through the same channel at the Application Server’s gateway.
When traffic from Scanning Stations takes up half of the channel’s bandwidth or more, or exhibits large spikes, allocate a separate network interface on the Application Server to scanning clients. This helps to avoid situations where traffic spikes cause delays on Verification Stations and Processing Stations.

If the Application Server is deployed on a cluster of several computers, traffic can be split among them by either:
  • using NLB affinity settings for the cluster (the software level);
  • routing network connections to specific cluster nodes (the hardware level).

Scanning Stations support:
  • Automatic resumption of image uploads to the Application Server after the network connection failed. This helps mitigate traffic spikes from Scanning Stations.
  • Centralized setting scanning, image enhancement and export to the Applications Server options. For example, you may define the color mode of scanned images, detect and delete all unnecessary empty pages produced by duplex scanning to reduce the input flow to the Application Server.
  • Scheduling of image uploads to the Application Server to balance network loads (e.g. by assigning different upload times to different regional offices).

Attached Files

Katja Ovcharova posted this 13 April 2016

Verification Stations

The automatically processed documents can be verified manually if needed. For this reason ABBYY FlexiCapture provides rich and thin verification clients.

Verification is a slow and expensive process. ABBYY FlexiCapture provides automatic validation rules that can validate documents automatically which allows to skip manual verification.

Another way to reduce the amount of verification work is to clarify with the customer precisely which document fields have to be extracted with 100% quality. Sometimes this is not all fields of the document and this also allows verification to focus only on documents with problems in these fields.

To calculate the number of verification operators you need to
  • the number of documents to be processed,
  • how many of them require verification,
  • the period of time to process the document according to the Service Level Agreement
  • and the average time needed to verify one document.
Verifiers also generate a workload on the System. A Verification Station interacts with the Application Server in a similar way to a Processing Station: it requests tasks and downloads images and document data from the Application Server and sends modified data back.

NOTE: The processing speed of Verification Stations is much slower because manual verification usually takes a lot more time than automatic processing on a Processing Station.

NOTE: Verification operators do not always need to see document images in their original quality. The FlexiCapture settings enable the compression to be changed (which is 60% by default) for images downloaded by operators from the Application Server.

Thus, we assume that a verifier working at the top of its capacity generates up to 1/3 of the load created by one processing core of a Processing Station.
You may use this assumption to interpret the results of testing conducted without verifiers using only unattended processing: if you see stable functioning of the system with, say, 100 processing cores, that means that you can safely replace a number of them with a number of verification operators working simultaneously multiplied by 3.

Example 4.
We need to process 100,000 documents in 8 working hours.
As initially assumed, only 30% of documents will require manual verification. Verification of each document takes up to 2 minutes.
(100,000 pages * 0,3 * 2 min) / (8 * 60 min) = 125
Hence, up to 125 verifiers will be required.

Each document has about 3 pages on average. You can create a test batch from typical documents and test the System before going live in unattended processing mode. Let’s say the system is stable and you do not see any bottlenecks using 100 processing cores, while 60 processing cores is already enough to process the desired quantity of 300,000 pages in 8 hours.
Hence, the System will easily cope with 125 verifiers over 60 processing cores (as the upper bound estimation for 125 verifiers is 42 cores).