December 10, 2020

Python For Character Recognition – Tesseract

Shubham Prasadwhoami.kdm

DURATION

10min

INSTALLATION PYTHON (3.X)

Open terminal/ command prompt and type:
~pip install pytesseract
~pip install opencv-python

OPENING A SIMPLE IMAGE

Import cv2.
Import pytesseract.
Save the test image in the same directory.
Create a variable to store the image using cv2.imread() function and pass the name of the image as parameter.
To resize the image use cv2.resize() function and pass the required resolution.
Use cv2.imshow(‘window_name’, image_name).
Add a cv2.waitKey(0) to display image for infinity.

1
2
3
4
5
6
import pytesseract
import cv2
img = cv2.imread('test.jpg')
img = cv2.resize(img, (720, 480))
cv2.imshow('Result', img)
cv2.waitKey(0)

Screenshot 2020-12-10 10:52:24

CONVERTING IMAGE TO STRING

Import cv2, pytesseract.
Save the test image in the same directory.
Create a variable to store the image using cv2.imread() function and pass the name of the image as parameter.
Use cv2.imshow(‘window_name’, Image_name).
To convert to string use pytesseract.image_to_string(‘image_name’) and store it in a variable.
Print the string.
Add a cv2.waitKey(0) to display image for infinity.

1
2
3
4
5
6
7
8
import pytesseract
import cv2
img = cv2.imread('test.jpg')

img = cv2.resize(img, (600, 360))
print(pytesseract.image_to_string(img))
cv2.imshow('Result', img)
cv2.waitKey(0)

Screenshot 2020-12-10 10:52:39

PRINTING THE EXACT POSITION OF TEXT/NUMBERS

image_to_boxes() function creates imaginary boxes around each text and returns four values for each character, which are as follows:
a. x coordinate.
b. y coordinate.
c. diagonal point of x coordinate.
d. diagonal point of y coordinate.

Screenshot 2020-12-10 10:53:03

Import cv2, pytesseract.
Save the test image in the same directory.
Create a variable to store the image using cv2.imread() function and pass the name of the image as parameter.
Use cv2.imshow(‘window_name’, Image_name).
To return coordinates pytesseract.image_to_boxes (‘image_name’) and store it in a variable.
Print the string.
Add a cv2.waitKey(0) to display image for infinity.

1
2
3
4
5
6
7
import pytesseract
import cv2
img = cv2.imread('test.jpg')
img = cv2.resize(img, (600, 360))
print(pytesseract.image_to_boxes(img))
cv2.imshow('Result', img)
cv2.waitKey(0)

Screenshot 2020-12-10 10:53:22

DRAW BOXES AROUND THE DETECTED CHARACTERS AND LABEL THEM

To add boxes around the text and label we need two function of OpenCV:

1
2
~cv2.rectangle(‘image_name’, x_coordinate, y_coordinate, RGB_value_of_color, thickness_of_box)
  ~cv2.putText(‘image_name’, x_coordinate, y_coordinate, ‘font_name’, font_size, RGB_value_of_color, thickness_of_text)

Import pytesseract, cv2.
Read and show using imread().
Create two variables to store the dimensions of each character using img.shape().
Make imaginary text around each character using pytesseract.image_to_boxes(img)
Create a for loop which converts all the coordinates in the form of list for easy access.
Initialize four variables for x-coordinate, y- coordinate, width, height.
Assign their respective values from the above created list.
As the list elements are in the form of string, convert it to integer.[ex: int(b[1]) ]
Use cv2.rectangle() function to create boxes around the characters.
Use cv2.putText() to add labels around the characters.
Use imshow() function to display a final image.
Add an infinite delay using cv2.waitKey(0).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import pytesseract
import cv2
img = cv2.imread('test.jpg')
img = cv2.resize(img, (600, 360))
hImg, wImg, _ = img.shape

boxes = pytesseract.image_to_boxes(img)
for b in boxes.splitlines():
  b = b.split(' ')
print(b)
x, y, w, h = int(b[1]), int(b[2]), int(b[3]), int(b[4])
cv2.rectangle(img, (x, hImg - y), (w, hImg - h), (50, 50, 255), 1)
cv2.putText(img, b[0], (x, hImg - y + 13), cv2.FONT_HERSHEY_SIMPLEX, 0.4, (50, 205, 50), 1)

cv2.imshow('Detected text', img)
cv2.waitKey(0)

Screenshot 2020-12-10 10:53:46

CONVERTING IMAGE-TEXT TO AUDIO

To convert image to audio we first need to convert image to text and text to audio.

Import tesseract and cv2
Import os.
Open command prompt and type ~pip install gtts.
From gtts import gTTS.
Follow the above steps to convert image to string.
Store the extracted string in a variable.
Play the audio using gTTS() function and pass the parameter as text, language.
Save the audio using save() function.
Play the audio using os.system(‘file_name’)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import pytesseract
import cv2
from gtts
import gTTS
import os
img = cv2.imread('test.jpg')

img = cv2.resize(img, (600, 360))
hImg, wImg, _ = img.shape

boxes = pytesseract.image_to_boxes(img)
xy = pytesseract.image_to_string(img)
for b in boxes.splitlines():
  b = b.split(' ')

x, y, w, h = int(b[1]), int(b[2]), int(b[3]), int(b[4])
cv2.rectangle(img, (x, hImg - y), (w, hImg - h), (50, 50, 255), 1)
cv2.putText(img, b[0], (x, hImg - y + 13), cv2.FONT_HERSHEY_SIMPLEX, 0.4, (50, 205, 50), 1)

cv2.imshow('Detected text', img)

audio = gTTS(text = xy, lang = 'en', slow = False)
audio.save("saved_audio.wav")
os.system("saved_audio.wav")

Screenshot 2020-12-10 10:54:03

Chat on Discord

December 10, 2020

Python For Character Recognition – Tesseract

DURATION

categories

Tags