
Who We Are
Arabic data provider and NLP consultation firm
AraData was established in response to the growing need of the artificial intelligence market for excellent data in Arabic. Although Arabic is the fifth most spoken language in the world, and it is one of the six formal languages in the United Nations, the data available in Arabic is still remarkably of low quality. This constitutes a major obstacle to the computer processing of the Arabic language, especially machine learning models that rely heavily on data volume, quality, and good organisation.
Our Services
How We Build The Linguistic Resources
Projects We Can Help You With
Building Dictionaries
Building Dictionaries
– general or domain-specific
– based on real well-balanced corpora
– represents the current usage of Arabic
– flexible organizations to suit different uses
Text To Speech (TTS)
Text To Speech (TTS)
– prepare phone set, stress and intonation rules, etc
– grapheme to phoneme rules
– prepare scripts for recordings
– foreign and dialectical words phonetic dictionary
Automatic Speech Recognition (ASR)
Automatic Speech Recognition (ASR)
– solutions for generic and domain-specific
– prepare the language model
– prepare the phonetic dictionary
– MSA and dialects
Diacritization (Tashkeel)
Diacritization (Tashkeel)
– systematic diacritization
– fast ML training data preparation
– solutions for dialectical and foreign words
– insightful testing datasets
Spell Checkeing
Spell Checkeing
– determine spelling mistake patterns
– prepare the language model
– create the SC dictionary
– parallel corpus (mistaken and corrected)
Grammatical Error Correction (GEC)
Grammatical Error Correction (GEC)
– define grammatical mistake patterns in Arabic
– develop innovated solutions
– rule-based and ML solutions
– insightful testing datasets
Dialect Projects
Dialect Projects
– understand characteristics of each dialect
– build lexical resources for dialects
– collect and analyze real representative data
– smart ways to link between MSA and dialects
Latest Articles
Is Arabic the most difficult among languages?
There is no easy language or difficult language. There is a language that is properly studied and cared for, and a language that has not received sufficient study and care.
Why do Arabic language programs fail to reach high accuracy levels? (4/4)
Choosing the structure of linguistic resources and identifying the appropriate tags to solve the problem is not a simple process, and it depends on the understanding of the problem and the way in which it is approached.
Why do Arabic language programs fail to reach high accuracy levels? (3/4)
There is a great need for language resources that are built by specialized linguists, and are properly reviewed. There is also a need to coordinate efforts between the different entities in building language resources.