A STUDY TOWARDS MULTI-SCRIPT ARTISTIC IMAGE ANALYSIS

Ghosh, Mridul

A STUDY TOWARDS MULTI-SCRIPT ARTISTIC IMAGE ANALYSIS

Files

Mridul thesis final.pdf (30.11 MB)

Date

2023

Authors

Ghosh, Mridul

Publisher

Aliah University

Abstract

Computational perception has indeed been dramatically modified and re- formed from handcrafted feature-based techniques to the advent of deep learning. Scene text identification and recognition have inexorably been touched by this bow effort of upheaval, ushering in the period of deep learning. It is an important aspect of machine vision. Society has seen significant improvements in thinking, approach, and effectiveness over the course of time. The goal of this study is to analyze the important developments and to address the issues in multi-script artistic scene text localization, script identification, and recognition of the same. Real-world images often encompass embedded texts which adhere to disparate disciplines like tourism, advertisement business, education, and amusement to name a few. Such images are graphically rich in terms of font attributes, color dis- tribution, foreground-background similarity, and component organization. These texts in natural images play an important role in portraying information in mul- titudinous fields such as communication, education, and entertainment to name a few. Recognizing text in scene images is challenging not only due to the inherent complexity of the images but also due to the presence of multiple scripts in vari- ous forms. Text recognition in natural images involves script identification, which requires text localization. This is not trivial for natural scene images due to the presence of disparate foreground/background components. This aggravates the difficulty of recognizing texts from these images. Such characteristics are very prominent and the challenge is more dominant in the case of multi-script artistic scene images like movie posters. The challenges aggravate due to the presence of composite characteristics of posters like complex graphics background and the presence of different texts like a movie title, names of actors, producers, directors, ii and tagline. These texts have miscellaneous fonts, variations in colors, size, orien- tation, and textures. One of the primal information in movie posters is the title. Automatic recognition of movie titles from images can aid in efficient indexing as well as information conveyance. However, it is accompanied by other texts like names of actors, producers, taglines, dates, etc. Though the organization of components is somewhat similar across different film industries like Tollywood (West Bengal), Bollywood (Mumbai), and Hollywood (Los Angeles), the graffiti patterns differ in multifarious instances. To address the problem of movie title un- derstanding, a dataset named MOvie POsters-Hollywood Bollywood Tollywood (MOPO-HBT) encompassing movie posters from the aforementioned industries is proposed. The texts were extracted using an M-EAST (modified EAST) model, which is based on the EAST (efficient and accurate scene text detector) model for text localization. It was observed that the FPS (frame processed/sec.) value of M-EAST is higher by 4.6 than EAST. A novel movie title extraction algorithm is thereafter proposed to extract the movie title from the pool of text in the image. Script identification serves as a precursor of the text recognition through op- tical character recognition (OCR). The OCR engines are generally script depen- dent, as mixing of scripts happens at the block, line, and word levels. Things get complicated when the artistic texts are generated with a mixing of scripts in a single word i.e., at the character level. So, before, script identification, it is important to identify the script-type i.e., single or mixed-script. A CNN based deep learning framework, is developed to detect single/mixed script images. An OCR has been also built to recognize the title of the movie poster. To consider the deployment of the system in low-resource environments shallow convolutional neural network (SCNN) architecture was also developed for the same.

URI

http://ssm.ndl.gov.in/handle/123456789/838

Collections

Department of Computer Science

Full item page