Editing Files after/while verification

  • 240 Views
  • Last Post 1 weeks ago
Lennart Hagemann posted this 27 March 2018

Hello,

is it possible to edit the recognized files while or after verification? E.g. read a PDF -> put a specific stamp (date, company, signature) on the PDF -> export 

Thanks a lot

Order By: Standard | Newest | Votes
AlexeyEfremov posted this 16 April 2018

Hello Lennart,

The answer is yes, you can do it. You can access the property IPictureObject of the page, get the Hbitmap handle of the image, change the image and then replace it for page:

 

OLE_HANDLE handle = Document.Pages[0].Picture.Handle;

System.Drawing.Bitmap bitmap = System.Drawing.Image.FromHbitmap( handle );

//<do something with bitmap>

 

IPictureObject FinalPicture = FCTools.PictureFromHBitmap( bitmap.GetHbitmap().ToInt32(), 300 );

Document.Pages[0].ReplaceImage(FinalPicture);

 

You can do this in the export script or create a custom document processing stage.

 

Hope this helps.

Alexey

 

Fritz posted this 16 April 2018

1. After registration I want to come back to the discussion page I had opened before. Please fix that.

2. Now the question. I want to edit the OCR result behind a PDF text, but without changing the visible document. I scan old Fraktur pages, and would like to edit the OCR results only, for easier machine search, like from an original long ſ into a serpent s e.g. buſt to bust. The original must remain untouched. How to do that? Thank you.

AlexeyEfremov posted this 16 April 2018

Hello Fritz,

Unfortunately, FlexiCapture dose not have such capabilities by design.

Alexey.

 

Fritz posted this 16 April 2018

But I vaguely remember that Abbyy has some software that can do that? It’s a frequent problem. Fritz

AlexeyEfremov posted this 17 April 2018

The ABBYY policy is that the software functions as a "black box". You insert image and get the text as output. 

The closest you can get to changing the text is using ABBYY FineReader Engine's methods Remove(fromPos, toPos) and Insert(position, insertString, charParams) of the Paragraph object. But this methods are mot meant for a wide use because of the processing time they need (irrelevant for one document, but noticeable on a big scale)

We recommend to use third party tools for the tasks you propose. 

Alexey

Fritz posted this 17 April 2018

See https://stackoverflow.com/questions/32914609/how-can-i-edit-the-search-text-of-a-searchable-pdf
   “I'm using ABBYY FineReader 12 Professional. (not open source). Just open a scanned image or scanned pdf and press Verify Text (or Ctrl + F7), then you go over all the spelling errors or low-confidence charachters and fix them.
   The program is very good, it shows you the exact place in image/pdf to correct and the OCR guessing side by side for convenience. It iterates all of them.
   [By the way, I'm using the shortcuts to speed up things: Alt+Enter to add the unrecognized word to dictionary. Ctrl+Delete to skip word or confirm in case you fixed it.]
   Then save the document as a pdf file, Menu: File>Save Document As> PDF File, and you can search it on every pdf reader. The saved file looks the same as the scanned one, but 'behind' it there [is the corrected] text.
   It's weird you tried ABBYY with no luck... it's working great for me. Maybe you didn’t try the Professional version.”

– Might that work, Alexey? I have no finereader, I just enquire with high interest as tech journalist, see www.Joern.De/Presseausweis (can you give me a free version to try? Fritz@Joern.De).

AlexeyEfremov posted this 17 April 2018

I thought you were asking about the business/industrial solutions.

Yes, desktop solutions have this capability. You can request the trial here:

https://www.abbyy.com/en-eu/download/finereader/

 

Ola Thuresson posted this 4 weeks ago

Dear Alexy,

 

I have managed to get the script working, I was missing some assemblies and .net references and I the OLE_HANDLE was wrong and also the issue with the bitmap being indexed. But the script wont work because of it being run as Export script?

System.Drawing.dll

PresentationCore.dll

using System.Drawing;

using System.Drawing.Drawing2D;

using System.Drawing.Text;

using System.Windows.Media.Imaging;

using System.Windows.Forms;

 int handle = Document.Pages[0].Picture.Handle; //int defins the get OLE_HANDLE

            IntPtr xAsIntPtr = new IntPtr(handle); 

            System.Drawing.Bitmap bitmap = System.Drawing.Image.FromHbitmap( xAsIntPtr );

            Bitmap newBitmap = new Bitmap(bitmap.Width, bitmap.Height);

            System.Drawing.Graphics g = Graphics.FromImage(newBitmap);

            g.DrawImage(bitmap, 0, 0);

            

            //add fakturanr to bmp

           

            RectangleF rectf = new RectangleF(70, 90, 90, 50);                        

            g.SmoothingMode = SmoothingMode.AntiAlias;

            g.InterpolationMode = InterpolationMode.HighQualityBicubic;

            g.PixelOffsetMode = PixelOffsetMode.HighQuality;

            g.TextRenderingHint = TextRenderingHint.AntiAliasGridFit;

            

            // Create string formatting options (used for alignment)

            StringFormat format = new StringFormat()

            {

            Alignment = StringAlignment.Far,

            LineAlignment = StringAlignment.Near

            };

            g.DrawString(fakturanr, new Font("Tahoma",16), Brushes.Black, rectf,format);              

            g.Flush();            

            

            newBitmap.Save(@"D:\Apps\Abbyy\Fakturor\GarpFakturor\bitamp.bmp");

            

            //replace old bitmap with new bitmap

            IPictureObject FinalPicture = FCTools.PictureFromHBitmap( newBitmap.GetHbitmap().ToInt32(), 300 );

            Document.Pages[0].ReplaceImage(FinalPicture);

 

Processing server logg: -1 10 6/21/2018 10:49:42 AM Document 1: European Invoice Export: System.Runtime.InteropServices.COMException (0x80004005): Cannot modify object data from this script.    at ABBYY.FlexiCapture.IPage.ReplaceImage(IPictureObject _newPicture)    at Main.AddTextToBitMap(String fakturanr, IDocument Document)    at Main.Execute(IDocument Document, IProcessingCallback Processing)

 

So I assume this wont work in Export Scripts!?

 

Regards,

Ola

Tson

AlexeyEfremov posted this 4 weeks ago

I will our post the discussion from private messages here:

Dear Alexey,

You wrote this in the forum lately and I tried using it but did not get it to work.

Hello Lennart,

The answer is yes, you can do it. You can access the property IPictureObject of the page, get the Hbitmap handle of the image, change the image and then replace it for page:

 

OLE_HANDLE handle = Document.Pages[0].Picture.Handle;

System.Drawing.Bitmap bitmap = System.Drawing.Image.FromHbitmap( handle );

//<do something with bitmap>

 

IPictureObject FinalPicture = FCTools.PictureFromHBitmap( bitmap.GetHbitmap().ToInt32(), 300 );

Document.Pages[0].ReplaceImage(FinalPicture);

 

My not working code:

AddTextToBitMap(fakturanr,Document);

public static bool AddTextToBitMap(string fakturanr, IDocument Document)

{

    try {

            //Get handle and bitmap

            int handle = Document.Pages[0].Picture.Handle; //int defins the get OLE_HANDLE "OLE_HANDLE" generates error..

            IntPtr xAsIntPtr = new IntPtr(handle); 

            System.Drawing.Bitmap bitmap = System.Drawing.Image.FromHbitmap( xAsIntPtr );

            

            //add fakturanr to bmp

            RectangleF rectf = new RectangleF(70, 90, 90, 50);            

            System.Drawing.Graphics g = Graphics.FromImage(bitmap);                        

            //g.SmoothingMode = SmoothingMode.AntiAlias;

            //g.InterpolationMode = InterpolationMode.HighQualityBicubic;

            //g.PixelOffsetMode = PixelOffsetMode.HighQuality;

            

            // Create string formatting options (used for alignment)

            StringFormat format = new StringFormat()

            {

            Alignment = StringAlignment.Far,

            LineAlignment = StringAlignment.Near

            };

            g.DrawString(fakturanr, new Font("Tahoma",8), Brushes.Black, rectf,format);            

            g.Flush();            

            

            //replace old bitmap with new bitmap

            IPictureObject FinalPicture = FCTools.PictureFromHBitmap( bitmap.GetHbitmap().ToInt32(), 300 );

            Document.Pages[0].ReplaceImage(FinalPicture);

        return true;

        }

    catch {}

 

    return false;  

 

Regards,

Ola

---------------------



Dear Ola,

Nice to hear from you. 

Could you please create a new topic or add your reply to the topic mentioned? (This is bureaucratic issue, otherwise, i cannot log time)

Thanks in advance.

The advice would be to check if System.Drawing was added as .Net assembly to the project.

The questions would be:

What error messages are you receiving?

Is your machine 32 or 64 bit? (Please check the size of IntPtr handle during runtime )

Is this FC12? 

Are you sure you are using the method on a script workflow stage? (otherwise it will not work)

Could you please check your Processing Server monitors Task log?

 

 

 

Regards,

Alexey



 

Dear Alexy,

 

I have managed to get the script working, I was missing some assemblies and .net references and I the OLE_HANDLE was wrong and also the issue with the bitmap being indexed. But the script wont work because of it being run as Export script?

System.Drawing.dll

PresentationCore.dll

using System.Drawing;

using System.Drawing.Drawing2D;

using System.Drawing.Text;

using System.Windows.Media.Imaging;

using System.Windows.Forms;

 int handle = Document.Pages[0].Picture.Handle; //int defins the get OLE_HANDLE

            IntPtr xAsIntPtr = new IntPtr(handle); 

            System.Drawing.Bitmap bitmap = System.Drawing.Image.FromHbitmap( xAsIntPtr );

            Bitmap newBitmap = new Bitmap(bitmap.Width, bitmap.Height);

            System.Drawing.Graphics g = Graphics.FromImage(newBitmap);

            g.DrawImage(bitmap, 0, 0);

            

            //add fakturanr to bmp

           

            RectangleF rectf = new RectangleF(70, 90, 90, 50);                        

            g.SmoothingMode = SmoothingMode.AntiAlias;

            g.InterpolationMode = InterpolationMode.HighQualityBicubic;

            g.PixelOffsetMode = PixelOffsetMode.HighQuality;

            g.TextRenderingHint = TextRenderingHint.AntiAliasGridFit;

            

            // Create string formatting options (used for alignment)

            StringFormat format = new StringFormat()

            {

            Alignment = StringAlignment.Far,

            LineAlignment = StringAlignment.Near

            };

            g.DrawString(fakturanr, new Font("Tahoma",16), Brushes.Black, rectf,format);              

            g.Flush();            

            

            newBitmap.Save(@"D:\Apps\Abbyy\Fakturor\GarpFakturor\bitamp.bmp");

            

            //replace old bitmap with new bitmap

            IPictureObject FinalPicture = FCTools.PictureFromHBitmap( newBitmap.GetHbitmap().ToInt32(), 300 );

            Document.Pages[0].ReplaceImage(FinalPicture);

 

Processing server logg: -1 10 6/21/2018 10:49:42 AM Document 1: European Invoice Export: System.Runtime.InteropServices.COMException (0x80004005): Cannot modify object data from this script.    at ABBYY.FlexiCapture.IPage.ReplaceImage(IPictureObject _newPicture)    at Main.AddTextToBitMap(String fakturanr, IDocument Document)    at Main.Execute(IDocument Document, IProcessingCallback Processing)

 

So I assume this wont work in Export Scripts!?

 

Regards,

Ola

 

Tson

 

AlexeyEfremov posted this 4 weeks ago

Dear Ola,

The answer is yes, the IPage::Picture object is read-only every where except for workflow scripts.

You can create a document processing script stage right before export and place your code here.

 

 

  • Liked by
  • Ola Thuresson
Ola Thuresson posted this 4 weeks ago

Dear Alexey,

 

Thanks for confirming this and I assume there is no way of modifying or add text and/or picture during export to the pdf. The problem is that I during Export I fetch the GL Voucher no from the ERP which then will be printed with a hard copy of the invoice but without any knowledge to Which voucher no the invoice belongs to. Is it possible to open the export script stage IPage::Picture object to read and write with a system parameter? Is there a work around within the export script stage?

 

I could always use ITextSharp to modify the PDF after the PDF has been saved but I'm not too keen on using an external assembly for this.

 

Regards

Ola

Tson

AlexeyEfremov posted this 4 weeks ago

Dear Ola,

Unfortunately, i do not entirely understand your situation/ Could you please let me know on why the solution:

"You can create a document/batch processing script stage right before export stage and place your code here."

does not work for you?

You can place all the code of your custom export in that stage if you want.

For the instructions on how to create a script processing stage, please see the Not on the article "Creating processing stages" of the Developer's Help.

Yours sincerely,

Alexey

 

 

 

Ola Thuresson posted this 4 weeks ago

Dear Alexey,

 

You are right, of course I can use WorkFlow Script Stage right before the Export. I just didn't understand it at first. I can place all exports there actually and don't have to save the document definition over and over again while testing.

 

Thank you for the guidance I now think I can make the export work.

 

Regards,

Ola

Tson

Ola Thuresson posted this 2 weeks ago

Dear Alexey,

My export script works brilliantly but...

Everything works except the last stage in the workflow after I have replace the picture object the document suddenly is an "unprocessed document" when it reaches the Export Stage as non analyzed...

-1 2 7/3/2018 2:02:05 PM Document 1: Unable to export a non-analyzed document

 

Stage 1 Verification - check

Stage 2 Custom Export Script Stage - check

Stage 3 Abbyy standard Export Stage - here abbyy throws the exception

Stage 4 Training -

 

Regards,

Ola

Tson

AlexeyEfremov posted this 2 weeks ago

Hello Ola,

I have consulted the developers and it turns out that after calling

Document.Pages[0].ReplaceImage(FinalPicture);

the document always becomes unrecognized and you have to do recognition again

(Note, that the license counter will be decreased again)

Because I do not know your export properly, I cannot suggest the proper workaround.

The one I see is to do the process twice and it is most likely not suitable for you.

You can also customize it to save the data before the image replacement, and restore the data after it.

 

Yours sincerely,

Alexey

 

 

 

 

Ola Thuresson posted this 2 weeks ago

Dear Alexy,

 

Thank you for the answer I worked around it by excluding the abbyy standard export and only have custom export.

The training seems to work ok anyway...no errors but does abbyy learn anything?

regards,

Ola

Tson

AlexeyEfremov posted this 2 weeks ago

Dear Ola,

Can I suggest you to place your custom export after the training? Will this work for you?

Regarding the question - I have to ask the developers, but probably not,

Yours sincerely,

Alexey

 

 

Ola Thuresson posted this 2 weeks ago

Dear Alexey,

Sure that would work.

Out of curiosity it would be nice to  know :)

Regards,

Ola

 

Tson

AlexeyEfremov posted this 2 weeks ago

Dear Ola,

 

Could you please check on Processing Server Monitor the log for the Training task?

What does it says?

 

Yours sincerely,

Alexey

Ola Thuresson posted this 2 weeks ago

Dear Alexey,

Logg when having the export script after traning:

-1 1 7/9/2018 3:58:26 PM Task processing is started

-1 2 7/9/2018 3:58:26 PM Document 1: Trying to use the document for training...

-1 3 7/9/2018 3:58:27 PM Document 1: Field training is not available. Document layout has not been modified.

-1 4 7/9/2018 3:58:27 PM Task processing is completed

Logg when having the export script before:

-1 1 7/9/2018 4:07:24 PM Task processing is started

-1 2 7/9/2018 4:07:25 PM Document 1: Trying to use the document for training...

-1 3 7/9/2018 4:07:25 PM Document 1: Field training is not available. There is no Document Definition to train.

-1 4 7/9/2018 4:07:26 PM Task processing is completed

Regards,

Ola

Tson

AlexeyEfremov posted this 1 weeks ago

Dear Ola,

In case of export script after training everything works, you just have to move the regions for the fields.

In case having the export script before training, the training stage actually fails. In workflow in the properties of the training stage by default there is only 1 exit route - to Processed, there is no route to exceptions, so the error with the document is ignored.

Kind regards,

Alexey

Close