Image Entity Recognition with OCR

An entity can be a word or series of words that consistently refer to the same thing.
Every
detected entity is classified into a prelabelled category.
For example, a NER model might
detect the word
“London” in a text and classify it as a ‘Geography’.

But we need to find the entities from the images. So for this purpose, we need to
extract text
from the images, so for extracting text we are a technique called OCR.

What is Optimal character recognition(OCR)?

OCR stands for Optical Character Recognition. It is widespread technology to recognize
text
inside images, such as scanned documents and photos. OCR is used to convert any kind of image
containing text
like(typed handwritten or printed) into machine-readable text format.

For extracting the text we are using open-source software called tesseract which can be
implemented using the Pytesseract package.

Techniques used

Open CV
Spacy
Nltk
OCR
Regex
Pandas

How does Pytesseract work?

Pytesseract detects the images in five different stages where we can collect complete
text step
by step

Step: 1 -> detect complete page
Step: 2 -> detect individual blocks of the image
Step : 3 -> detect paragraphs
Step: 4 -> detect Line
Step: 5 -> detect words

For detecting entities, we collected individual words from images using step 5 and
created a
rectangle box on the top of each word using geometric transformations

Same procedure we applied for entire data and collected individual words and saved it in a CSV file.

For detecting the entities we need class labels for each word, so for creating custom entity
recognition on
images we used a technique called BIO, where B – Token begins an entity, I – Token is inside an
entity, O –
Token is outside an entity. Using five different labels we made unstructured data into a structured
format.

Class labels we used to train the models are,

Now for training the custom entity recognition, we selected a spacy pre-trained model, so we convert
the data
into a spacy format like complete image data and its corresponding words and labels into dictionary
type, This
process, we applied for entire data and divide the data into the training part and the testing
part.

For training purposes, we used 50 epochs where at the end of the training we got 94% accuracy for
the model,
91% precision, and 90.6% recall. We test around 20 images using our trained model and check the
results,

But we got predictions based on the labels we gave for each word while training using the BOI
concept, so we
find a solution to collect the index position, left and right positions which we get from
Pytesseract, for the
predicted word and if any word matched BOI format with the same name we can add those detected
labels.

Finally, after adding their index positions it works well to detect entities from the images.

Few Predicted Images :

What is Optimal character recognition(OCR)?

Techniques used

How does Pytesseract work?

Take the first step and Connect with us today

Master Named Entity Recognition Techniques

Deep Learning for Image and Video Colorization

Our AI Products

Our Services

Industries

Insights

About

What is Optimal character recognition(OCR)?

Techniques used

How does Pytesseract work?

Take the first step and Connect with us today

Tags:

Master Named Entity Recognition Techniques

Deep Learning for Image and Video Colorization