Repositório Digital de Publicações Científicas: Portuguese Archives Handwritten text recognition of passport requisitions


Sign on to:
	Login
	My DSpace authorized users
	Edit Profile
	Receive email updates

Browse
	Communities & Collections
	Issue Date
	Author
	Title
	Subject

Helps
	Regulamento RDPC
	Guia do Utilizador RDPC
	Depósito RDPC
	Faq's RDPC

	Integração CV DeGóis
	Workshop Open Access

	Newsletter Open Access


	About Dspace
	DSpace Software

Repositorio Digital de Publicacoes Cientificas da Universidade de Evora

/ Departamento de Informática / INF - Comunicações - Em Congressos Científicos Internacionais /

Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/39387

Title:	Portuguese Archives Handwritten text recognition of passport requisitions
Authors:	Melo, Dora Pimenta Rodrigues, Irene Ferreira, Lígia
Keywords:	handwritten recognition document annotation artificial intelligence data analysis
Issue Date:	11-Jul-2024
Publisher:	Universidade da Évora
Citation:	Melo. D, Rodrigues. I.P., Ferreira, L. Portuguese Archives Handwritten text recognition of passport requisitions. In Anjos, A., Minhós, F:, Carapau, F., Bezzeghoud, M., Correio, P., Oliveira, R. J., Abreu, S. (2024). Book of Abstracts: 2nd International Workshop on Mathematics and Physical Sciences, Universidade de Évora, Évora.
Abstract:	The DigitArq platform is the Portuguese National archive system that uses well-established description standards, namely the ISAD(G) (General International Standard Archival Des- cription) and ISAAR(CPF) (International Standard Archival Authority Record for Corporate Bodies, Persons and Families) with a hierarchical structure adapted to the nature of archival assets. In the EPISA project, one of the tasks included the migration of the DigitArq information into a linked open data model, CIDOC-CRM [5]. This task included the representation of textual description in the ISAD(G) element ‘Scope and Content’ by extracting the information from natural language written text. The dataset for handwritten recognition has 1000 registers with: digital representation, a text description of the digital content, and the semantic representation in CIDOC-CRM of the text description [6]. This information enables the automatic evaluation of handwritten recognition and can be used to improve the performance of handwritten recognition through the use of semantic in- formation. The handwritten data was selected from a set of registers with digital repre- sentation, a jpg file, from the Portuguese National Archive. The registers were chosen from those that have a text transcription of digital representation in the DigitArq platform. Handwritten text recognition is an important task in computer vision that has received considerable attention in recent years [1,2]. In our approach, the open-source document processing platform ArkIndex [3,4] (https://teklia.com/our-solutions/arkindex/) is used to automatize the document recognition system adapted to the passport registers with digital representation. Initially, a corpus of 100 registers was built up and a manual annotation was performed to represent the structure of the pages (text zones, pages and text zones transcriptions), producing an automatic transcription of the handwritten text. The described approach evaluation reveals promising results that confirm that the initial annotated corpus can be used to obtain a general tool for processing the passport registers in DIGITARQ.
URI:	http://hdl.handle.net/10174/38651 http://hdl.handle.net/10174/39387
Type:	lecture
Appears in Collections:	INF - Comunicações - Em Congressos Científicos Internacionais

Files in This Item:

File	Description	Size	Format
Book of Abstracts.pdf		115.12 kB	Adobe PDF	View/Open

Serviços de Ciência e Cooperação - Universidade de Évora