Tesseract is an optical character recognition tool in Python. It is used to detect embedded characters in an image. Tesseract, when integrated with powerful libraries like OpenCV, can be used to combine the tasks of localizing text (Text detection) in an image along with understanding what the text is (Text recognition).
Open terminal/ command prompt and type:~pip install pytesseract
~pip install opencv-python
Import cv2.
Import pytesseract.
Save the test image in the same directory.
Create a variable to store the image using cv2.imread()
function and pass the name of the image as parameter.
To resize the image use cv2.resize()
function and pass the required resolution.
Use cv2.imshow(‘window_name’, image_name)
.
Add a cv2.waitKey(0)
to display image for infinity.
1
2
3
4
5
6
import pytesseract
import cv2
img = cv2.imread('test.jpg')
img = cv2.resize(img, (720, 480))
cv2.imshow('Result', img)
cv2.waitKey(0)
Import cv2, pytesseract.
Save the test image in the same directory.
Create a variable to store the image using cv2.imread()
function and pass the name of the image as parameter.
Use cv2.imshow(‘window_name’, Image_name)
.
To convert to string use pytesseract.image_to_string(‘image_name’)
and store it in a variable.
Print the string.
Add a cv2.waitKey(0)
to display image for infinity.
1
2
3
4
5
6
7
8
import pytesseract
import cv2
img = cv2.imread('test.jpg')
img = cv2.resize(img, (600, 360))
print(pytesseract.image_to_string(img))
cv2.imshow('Result', img)
cv2.waitKey(0)
image_to_boxes()
function creates imaginary boxes around each text and returns four values for each character, which are as follows:
a. x coordinate.
b. y coordinate.
c. diagonal point of x coordinate.
d. diagonal point of y coordinate.
Import cv2, pytesseract.
Save the test image in the same directory.
Create a variable to store the image using cv2.imread()
function and pass the name of the image as parameter.
Use cv2.imshow(‘window_name’, Image_name)
.
To return coordinates pytesseract.image_to_boxes (‘image_name’)
and store it in a variable.
Print the string.
Add a cv2.waitKey(0)
to display image for infinity.
1
2
3
4
5
6
7
import pytesseract
import cv2
img = cv2.imread('test.jpg')
img = cv2.resize(img, (600, 360))
print(pytesseract.image_to_boxes(img))
cv2.imshow('Result', img)
cv2.waitKey(0)
To add boxes around the text and label we need two function of OpenCV:
1
2
~cv2.rectangle(‘image_name’, x_coordinate, y_coordinate, RGB_value_of_color, thickness_of_box)
~cv2.putText(‘image_name’, x_coordinate, y_coordinate, ‘font_name’, font_size, RGB_value_of_color, thickness_of_text)
Import pytesseract, cv2.
Read and show using imread().
Create two variables to store the dimensions of each character using img.shape()
.
Make imaginary text around each character using pytesseract.image_to_boxes(img)
Create a for loop which converts all the coordinates in the form of list for easy access.
Initialize four variables for x-coordinate, y- coordinate, width, height.
Assign their respective values from the above created list.
As the list elements are in the form of string, convert it to integer.[ex: int(b[1]) ]
Use cv2.rectangle()
function to create boxes around the characters.
Use cv2.putText()
to add labels around the characters.
Use imshow()
function to display a final image.
Add an infinite delay using cv2.waitKey(0)
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import pytesseract
import cv2
img = cv2.imread('test.jpg')
img = cv2.resize(img, (600, 360))
hImg, wImg, _ = img.shape
boxes = pytesseract.image_to_boxes(img)
for b in boxes.splitlines():
b = b.split(' ')
print(b)
x, y, w, h = int(b[1]), int(b[2]), int(b[3]), int(b[4])
cv2.rectangle(img, (x, hImg - y), (w, hImg - h), (50, 50, 255), 1)
cv2.putText(img, b[0], (x, hImg - y + 13), cv2.FONT_HERSHEY_SIMPLEX, 0.4, (50, 205, 50), 1)
cv2.imshow('Detected text', img)
cv2.waitKey(0)
To convert image to audio we first need to convert image to text and text to audio.
Import tesseract and cv2
Import os.
Open command prompt and type ~pip install gtts
.
From gtts import gTTS.
Follow the above steps to convert image to string.
Store the extracted string in a variable.
Play the audio using gTTS()
function and pass the parameter as text, language.
Save the audio using save()
function.
Play the audio using os.system(‘file_name’)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import pytesseract
import cv2
from gtts
import gTTS
import os
img = cv2.imread('test.jpg')
img = cv2.resize(img, (600, 360))
hImg, wImg, _ = img.shape
boxes = pytesseract.image_to_boxes(img)
xy = pytesseract.image_to_string(img)
for b in boxes.splitlines():
b = b.split(' ')
x, y, w, h = int(b[1]), int(b[2]), int(b[3]), int(b[4])
cv2.rectangle(img, (x, hImg - y), (w, hImg - h), (50, 50, 255), 1)
cv2.putText(img, b[0], (x, hImg - y + 13), cv2.FONT_HERSHEY_SIMPLEX, 0.4, (50, 205, 50), 1)
cv2.imshow('Detected text', img)
audio = gTTS(text = xy, lang = 'en', slow = False)
audio.save("saved_audio.wav")
os.system("saved_audio.wav")