Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/39387

Title: Portuguese Archives Handwritten text recognition of passport requisitions
Authors: Melo, Dora
Pimenta Rodrigues, Irene
Ferreira, Lígia
Keywords: handwritten recognition
document annotation
artificial intelligence
data analysis
Issue Date: 11-Jul-2024
Publisher: Universidade da Évora
Citation: Melo. D, Rodrigues. I.P., Ferreira, L. Portuguese Archives Handwritten text recognition of passport requisitions. In Anjos, A., Minhós, F:, Carapau, F., Bezzeghoud, M., Correio, P., Oliveira, R. J., Abreu, S. (2024). Book of Abstracts: 2nd International Workshop on Mathematics and Physical Sciences, Universidade de Évora, Évora.
Abstract: The DigitArq platform is the Portuguese National archive system that uses well-established description standards, namely the ISAD(G) (General International Standard Archival Des- cription) and ISAAR(CPF) (International Standard Archival Authority Record for Corporate Bodies, Persons and Families) with a hierarchical structure adapted to the nature of archival assets. In the EPISA project, one of the tasks included the migration of the DigitArq information into a linked open data model, CIDOC-CRM [5]. This task included the representation of textual description in the ISAD(G) element ‘Scope and Content’ by extracting the information from natural language written text. The dataset for handwritten recognition has 1000 registers with: digital representation, a text description of the digital content, and the semantic representation in CIDOC-CRM of the text description [6]. This information enables the automatic evaluation of handwritten recognition and can be used to improve the performance of handwritten recognition through the use of semantic in- formation. The handwritten data was selected from a set of registers with digital repre- sentation, a jpg file, from the Portuguese National Archive. The registers were chosen from those that have a text transcription of digital representation in the DigitArq platform. Handwritten text recognition is an important task in computer vision that has received considerable attention in recent years [1,2]. In our approach, the open-source document processing platform ArkIndex [3,4] (https://teklia.com/our-solutions/arkindex/) is used to automatize the document recognition system adapted to the passport registers with digital representation. Initially, a corpus of 100 registers was built up and a manual annotation was performed to represent the structure of the pages (text zones, pages and text zones transcriptions), producing an automatic transcription of the handwritten text. The described approach evaluation reveals promising results that confirm that the initial annotated corpus can be used to obtain a general tool for processing the passport registers in DIGITARQ.
URI: http://hdl.handle.net/10174/38651
http://hdl.handle.net/10174/39387
Type: lecture
Appears in Collections:INF - Comunicações - Em Congressos Científicos Internacionais

Files in This Item:

File Description SizeFormat
Book of Abstracts.pdf115.12 kBAdobe PDFView/Open
FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Dspace Dspace
DSpace Software, version 1.6.2 Copyright © 2002-2008 MIT and Hewlett-Packard - Feedback
UEvora B-On Curriculum DeGois