A STUDY TOWARDS MULTI-SCRIPT ARTISTIC IMAGE ANALYSIS
No Thumbnail Available
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Aliah University
Abstract
Computational perception has indeed been dramatically modified and re-
formed from handcrafted feature-based techniques to the advent of deep learning.
Scene text identification and recognition have inexorably been touched by this
bow effort of upheaval, ushering in the period of deep learning. It is an important
aspect of machine vision. Society has seen significant improvements in thinking,
approach, and effectiveness over the course of time. The goal of this study is
to analyze the important developments and to address the issues in multi-script
artistic scene text localization, script identification, and recognition of the same.
Real-world images often encompass embedded texts which adhere to disparate
disciplines like tourism, advertisement business, education, and amusement to
name a few. Such images are graphically rich in terms of font attributes, color dis-
tribution, foreground-background similarity, and component organization. These
texts in natural images play an important role in portraying information in mul-
titudinous fields such as communication, education, and entertainment to name a
few. Recognizing text in scene images is challenging not only due to the inherent
complexity of the images but also due to the presence of multiple scripts in vari-
ous forms. Text recognition in natural images involves script identification, which
requires text localization. This is not trivial for natural scene images due to the
presence of disparate foreground/background components. This aggravates the
difficulty of recognizing texts from these images. Such characteristics are very
prominent and the challenge is more dominant in the case of multi-script artistic
scene images like movie posters. The challenges aggravate due to the presence
of composite characteristics of posters like complex graphics background and the
presence of different texts like a movie title, names of actors, producers, directors,
ii
and tagline. These texts have miscellaneous fonts, variations in colors, size, orien-
tation, and textures. One of the primal information in movie posters is the title.
Automatic recognition of movie titles from images can aid in efficient indexing
as well as information conveyance. However, it is accompanied by other texts
like names of actors, producers, taglines, dates, etc. Though the organization of
components is somewhat similar across different film industries like Tollywood
(West Bengal), Bollywood (Mumbai), and Hollywood (Los Angeles), the graffiti
patterns differ in multifarious instances. To address the problem of movie title un-
derstanding, a dataset named MOvie POsters-Hollywood Bollywood Tollywood
(MOPO-HBT) encompassing movie posters from the aforementioned industries is
proposed. The texts were extracted using an M-EAST (modified EAST) model,
which is based on the EAST (efficient and accurate scene text detector) model for
text localization. It was observed that the FPS (frame processed/sec.) value of
M-EAST is higher by 4.6 than EAST. A novel movie title extraction algorithm is
thereafter proposed to extract the movie title from the pool of text in the image.
Script identification serves as a precursor of the text recognition through op-
tical character recognition (OCR). The OCR engines are generally script depen-
dent, as mixing of scripts happens at the block, line, and word levels. Things
get complicated when the artistic texts are generated with a mixing of scripts
in a single word i.e., at the character level. So, before, script identification, it is
important to identify the script-type i.e., single or mixed-script. A CNN based
deep learning framework, is developed to detect single/mixed script images. An
OCR has been also built to recognize the title of the movie poster. To consider
the deployment of the system in low-resource environments shallow convolutional
neural network (SCNN) architecture was also developed for the same.