People | Locations | Statistics |
---|---|---|
Naji, M. |
| |
Motta, Antonella |
| |
Aletan, Dirar |
| |
Mohamed, Tarek |
| |
Ertürk, Emre |
| |
Taccardi, Nicola |
| |
Kononenko, Denys |
| |
Petrov, R. H. | Madrid |
|
Alshaaer, Mazen | Brussels |
|
Bih, L. |
| |
Casati, R. |
| |
Muller, Hermance |
| |
Kočí, Jan | Prague |
|
Šuljagić, Marija |
| |
Kalteremidou, Kalliopi-Artemi | Brussels |
|
Azam, Siraj |
| |
Ospanova, Alyiya |
| |
Blanpain, Bart |
| |
Ali, M. A. |
| |
Popa, V. |
| |
Rančić, M. |
| |
Ollier, Nadège |
| |
Azevedo, Nuno Monteiro |
| |
Landes, Michael |
| |
Rignanese, Gian-Marco |
|
Meziane, Farid
University of Derby
in Cooperation with on an Cooperation-Score of 37%
Topics
Publications (4/4 displayed)
- 2024Investigation of Artifact Contamination Impact on EEG Oscillations Towards Enhanced Motor Function Characterization
- 2023The Impact of Arabic Diacritization on Word Embeddingscitations
- 2021Natural Language Processing and Information Systems; 26th International Conference on Applications of Natural Language to Information Systems, NLDB 2021, Saarbrücken, Germany, June 23–25, 2021, Proceedingscitations
- 2015An architecture to support ultrasound report generation and standardisation
Places of action
Organizations | Location | People |
---|
article
The Impact of Arabic Diacritization on Word Embeddings
Abstract
Word embedding is used to represent words for text analysis. It plays an essential role in many Natural Language Processing (NLP) studies and has hugely contributed to the extraordinary developments in the field in the last few years. In Arabic, diacritic marks are a vital feature for the readability and understandability of the language. Current Arabic word embeddings are non-diacritized. In this paper, we aim to develop and compare word embedding models based on diacritized and non-diacritized corpora to study the impact of Arabic diacritization on word embeddings. We propose evaluating the models in four different ways: clustering of the nearest words; morphological semantic analysis; part-of-speech tagging; and semantic analysis. For a better evaluation, we took the challenge to create three new datasets from scratch for the three downstream tasks. We conducted the downstream tasks with eight machine learning algorithms and two deep learning algorithms. Experimental results show that the diacritized model exhibits a better ability to capture syntactic and semantic relations and in clustering words of similar categories.Overall, the diacritized model outperforms the non-diacritized model. Interestingly, we obtained some more interesting findings. For example, from the morphological semantics analysis, we found that with the increase in the number of target words, the advantages of the diacritized model are also more obvious, and the diacritic marks have more significance in POS tagging than in other tasks.