OCR – Optical Character Recognition Explained

OCR has improved the data entry process unimaginably, it is not 100% accurate, but can be useful to extract text that the other two methods could not, as it works with all applications including Citrix. In UiPath these robots are equipped with optical character recognition (OCR), which allows a computer to distinguish a ‘B’ from a ‘D’, for example, even if the size or font is different. While recording, a UiPath user can run OCR, select the appropriate text within the window, and the robot will be able to locate that text every single time after. Even if the text is in a different place, it still works; in fact, using OCR is a much more reliable way to automate, but as per the expectations when OCR came to picture the accuracy percentage was less, as it came down to 90%.

Comparing OCR with other methods:

Capability Methods Speed Accuracy Background Execution Extract Text Position Extract Hidden Text Support for Citrix
Full Text 10/10 100% Yes No Yes No
Native 8/10 100% No Yes No No
OCR 3/10 90% No Yes No Yes

Most used OCR Activities in UiPath:

OCR Text Exists:

It checks if text was found in a given element by using OCR technology and returns a Boolean variable. It returns true if it exists otherwise false.

Click OCR Text:

            It searches for a given string in an indicated element or image using OCR and clicks it. By default Google OCR is used.

Double Click OCR Text:

It searches for a given string in an indicated element or image using OCR and double clicks it. By default Google OCR is used

Hover OCR Text:

            It searches for a given string in an indicated element or image using OCR and hovers over it. By default Google OCR is used

Find OCR Text Position:

            It Searches for a given string in element and returns element variable which contains the string. This can be useful for locating elements relative to text on screen.

Get OCR Text:

            It retrieves string and its information from element using OCR screen scraping method. This activity will generate automatically when performing screen scraping along with containers. By default Google OCR engine is used, we can easily change it with as our choice.

OCR Engines:

            OCR activities can be used to extract a string and its position from a given image by using OCR engines.  Input of OCR activities receive an Image variable that contains the image file to be scanned.  Output of OCR activities return an IEnumerable<KeyValuePair<Rectangle, String>> variable, which contains the extracted text and their on-screen coordinates, and a string variable which contains the extracted text.

How different types of OCR Engines vary from each other:

Google OCR (Tesseract OCR):

Google OCR is easy to use because it’s built into UiPath, It is open source so it is free to use but it is the slowest option. Multi-page documents can take time. Multiple language support can be added in Google OCR. It will support color inversion. It can filter only allowed characters.

Google Cloud OCR:

Google Cloud OCR is fast and accurate but it requires Google Cloud API Key .API key has one month free trail after need to Pay.

Microsoft OCR:

 Microsoft OCR is similar to the built-in Google OCR option.it is also free and easy to use. For some document types it was more accurate than Google OCR, but for others it was less accurate. By default it will support multiple languages. It is more suitable for extracting text from a large area.

Microsoft Cloud OCR:

Microsoft cloud OCR uses Microsoft computer Vision API, which is also free and need to sign up for API

Abyy OCR:

Abby OCR requires you to install abby Fine reader on your local machine and need to purchase a license. Fine Reader can do lots of things including converting a scanned PDF to a searchable PDF or other document formats.

Abyy Cloud OCR:

Abby Cloud OCR requires Application ID and password from Abby cloud. Abby Cloud has one month trail after need to pay. Abby Cloud OCR is fast and accurate like Google cloud OCR.

Benefits of OCR:

Faster Searches:
OCR software allows one to achieve more productivity as it enables fast retrieval of data when needed. The efforts and time that the employee used to put in to extract the relevant data can now be utilized for focusing on core competencies.

Reduced Cost:
Besides helping an organization in cutting down the cost of hiring manpower for data extraction, OCR also helps in reducing several other costs like printing, copying, shipping charge, etc.

Reduced Errors:
Several organizations are marred by the problem of data loss and inaccuracy. OCR comes to the rescue and helps in reducing errors.

More Storage Space:
The lesser the documents, the larger the space. Organizations have always wanted to take the ‘Paperless’ approach and OCR just makes it possible. Also, the expenses of file cabinets are saved with OCR.

Ready Availability:
By scanning the information off documents through OCR, the data can be made available in several different places. One can carry it in a USB drive and retrieve the wanted information with just a few clicks.

Efficient Management:
With the OCR technique, managing data of confidential documents becomes easy and effortless as everything becomes automated.

All organizations give utmost importance to security of documents. Thievery or breaking out of fire does not pose a threat when the documents are scanned and stored in digital formats. Furthermore, the access can also be limited to avoid mishandling of the documents.

Text Translate:
The Translator Text API is easy to integrate in your applications, websites, tools, and solutions. It allows you to add multi-language user experiences in more than 100 languages, and can be used on any hardware platform with any operating system for text-to-text language translation.
The Translator Text API is part of the API collection of machine learning and AI algorithms in the cloud, and is readily consumable in your development projects. Translator provides multi-language support for translation, transliteration, language detection, and dictionaries.
The Cognitive Activities pack helps you use Google’s, IBM’s, Stanford’s and Microsoft’s APIs, and automatically process the information that they help you extract. The package enables you to translate text from one language to another, as well as extract relevant information from a given piece of text such as the overall sentiment, key phrases, possible encountered errors and the language used. \

Use Cases:

Since OCR and PDF automation or an image automation or translating a text, usually go hand in hand due to the difficulty of automating in virtual environments, we created an automation that reads an organization invoice details from a scanned invoice PDF. Afterwards, it inputs the information into an Excel Sheet.

  1. Read a Scanned PDF using OCR

Create a new Sequence.

  • Open the scanned invoice PDF, we recommend using Adobe Acrobat Reader for compatibility reasons.
  • With the use of GET OCR TEXT activity, capture the data required and store it to an output variable which can later be used to write in an excel.
  • Along with that an appropriate OCR engine has to be used, in this we used tesseract OCR engine.
  • Similarly, respective required details have to be captured and the value to be stored by an output variable.
  • Once we fetch all the required data from the scanned pdf , we can now write it to the excel sheet using the Write Range Activity.

2.Read a scanned Image using OCR.

  • Create new sequence.
  • Open the scanned image by using “start process” activity we recommend using Adobe Acrobat Reader for compatibility reasons.
  • Extract Invoice To data from image by using “Get OCR Text with Abbyy cloud OCR Activity” and store these outputs into invoice to variable.
  • Extract Invoice number data from image by using “Get OCR Text with Abbyy cloud OCR Activity” and store these outputs into invoice number variable
  • Similarly extract the required field from the image and store it in to relative output fields.
  • Once you fetch all the required data from scanned image, we can write it into excel sheet using “write range activity”.
  • Text translate using OCR
  • Create the Sequence
  • By using Read specific page activity we read the pdf from particular page and give the output value as PDF.
  • Take Message box Value should be pass from the Read specific page activity.
  • Next, I used Translator Text of Uipath.MicrosoftTranslatorText.Activities.TranslatorText. To read the Value in Other Language. For this activity we have to get the subscription Key from Microsoft Azure will be provide for free. But this activity will use to translate only 300 characters. Output value should assign.
  • Use message box to see the output value.
  • Take write Text file Activity to save the output in the Document.
  • This is how flow should be:

Sharing is caring!

Leave a Comment

Your email address will not be published. Required fields are marked *