LINGUIST List 25.3622

Mon Sep 15 2014

Diss: English, Spanish; Comp Ling, Text/Corpus Ling, Translation: Burgos H: 'Towards an image-term co-occurence model...'

Editor for this issue: Danuta Allen <danutalinguistlist.org>


Date: 14-Sep-2014
From: Diego Burgos H. <burgosdawfu.edu>
Subject: Towards an image-term co-occurence model for multilingual terminology alignment and cross-language image indexing
E-mail this message to a friend

Institution: Universitat Pompeu Fabra
Program: Linguistic Sciences and Applied Linguistics
Dissertation Status: Completed
Degree Date: 2014

Author: Diego A. Burgos H.

Dissertation Title: Towards an image-term co-occurence model for multilingual terminology alignment and cross-language image indexing

Dissertation URL: http://www.tdx.cat/handle/10803/145644

Linguistic Field(s): Computational Linguistics
                            Text/Corpus Linguistics
                            Translation

Subject Language(s): English (eng)
                            Spanish (spa)

Dissertation Director:
Leo Wanner

Dissertation Abstract:

This thesis addresses the potential that the relation between terms and images in multilingual specialized documentation has for glossary compilation, terminology alignment, and image indexing. It takes advantage of the recurrent use of these two modes of communication (i.e., text and images) in digital documents to build a bimodal co-occurrence model which aims at dynamically compiling glossaries of a wider coverage. The model relies on the developments of content-based image retrieval (CBIR) and text processing techniques. CBIR is used to make two images from different origin match, and text processing supports term recognition, artifact noun classification, and image-term association. The model aligns one image with its denominating term from collateral text, and then aligns this image with another image of the same artifact from a different document, which also enables the alignment of the two equivalent denominating terms. The ultimate goal of the model is to tackle the limitations and drawbacks of current static terminological repositories by generating bimodal, bilingual glossaries that reflect real usage, even when terms and images may originate from noisy corpora.



Page Updated: 15-Sep-2014