Process document until a stop processing page is encountered

  • 95 Views
  • Last Post 14 February 2018
  • Topic Is Solved
Andy posted this 15 January 2018

FlexiCapture 12 for Invoice permits to define a separation-page to split a single document in more documents whenever the page is found.

What I need is not a separation-page, but a stop-recognition page, to reduce the amount of time needed to process the documents.  All pages after the stop-recognition page have to be ignored from the system but not removed from the document.

If I well understood, in the Document Analysis phase, before the Classification one, the system detects pictures, tables, text blocks. Is it possible configure the system ( or write a script ) to detect the stop-recognition page in this phase and mark the following pages to skip them from the next Classification and Recognition phases ?

If not, is there another way to get what I need ?

 

Order By: Standard | Newest | Votes
Ekaterina posted this 16 January 2018

Hello,

In the described situation we would recommend you to create an one-page layout that will reliably match the “stop-page”. Then in the Document Definition settings set the subsequent pages after the "stop-page" as annexes, getting the following document structure:

Invoice_layout 1-1

StopPage 1-1

Annexes 0- ...

Andy posted this 02 February 2018

Hello Ekaterina,

thanks for your answer.

I followed your instructions but doesn't run as expected and I don't understand where I'm wrong.

My document definition has :

- section "Document section 1" with a multipage FlexiLayout

- section "_END_PROCESSING_PAGE" corresponding to the end processing page FlexiLayout

- annex pages enabled

so the final structure is:

Structure

If I process a document, the system applies to all pages the layout "Document section 1" detected on the first page, as the "_END_PROCESSING_PAGE" didn't exist.

Recognized

If I manually force every page to match with the right section, I get 100% of confidence level.

Forced

So, why the system applies only the first page layout ?

What I have to check in a way the system applies the right layout to the right page ?

 

My document has pages with this layout sequence:

P1 : Document section 1

P2 : Document section 1

P3 : _END_PROCESSING_PAGE

P4 : annex

but the system list them in a reverse order. Why this happen ?

Thanks in advance

 

Andy posted this 02 February 2018

Can you see the images ?

Structure.png

Recognized.png

Forced.png

 

Ekaterina posted this 06 February 2018

Hello,

Yes, I do. Could you please send us the project and images to reproduce the issue?

Andy posted this 06 February 2018

Yes. I'm going to prepare it.

Ekaterina posted this 12 February 2018

Hello,

We examined your project and found that your separator page is recognized as a part of “FORGHIERI_DDT/Document Section 1”. To check this, please add the separator page into the FLS project and match the layout, you will see that separator page is also matched.

Therefore there is an assembling conflict. 

We would offer you to modify your layout and to debug it in the FLS together with separator page to ensure that this page is not captured by modified layout. 

Or you may include the separator page in the main layout as a required footer (if you are sure that there will always be this page after your main documents).

If you will need further assistance, please contact your regional support, describe the issue and send them the project.

  • Liked by
  • Andy
Andy posted this 14 February 2018

Hello Ekaterina,

starting from your suggestions I found the solution.

My  stop-recognition page contains only a string like "XXX....".

In my FlexiLayout, I added a Footer with a field that match the text "XXX...".  The Footer is optional because the stop-recognition page it isn't always present, but the matching field is mandatory.

The Document Definition structure remains simply

Invoice_layout 1-1

Annexes 0- ...

with annex pages enabled.

Thank you very much for your help.

Close