Who We Are

Arabic data provider and NLP consultation firm

AraData was established in response to the growing need of the artificial intelligence market for excellent data in Arabic. Although Arabic is the fifth most spoken language in the world, and it is one of the six formal languages ​​in the United Nations, the data available in Arabic is still remarkably of low quality. This constitutes a major obstacle to the computer processing of the Arabic language, especially machine learning models that rely heavily on data volume, quality, and good organisation.

Read More

Our Services

How We Build The Linguistic Resources

Projects We Can Help You With

Building Dictionaries

Building Dictionaries

Building Dictionaries

– general or domain-specific
– based on real well-balanced corpora
– represents the current usage of Arabic
– flexible organizations to suit different uses

Text To Speech (TTS)

Text To Speech (TTS)

Text To Speech (TTS)

– prepare phone set, stress and intonation rules, etc
– grapheme to phoneme rules
– prepare scripts for recordings
– foreign and dialectical words phonetic dictionary

Automatic Speech Recognition (ASR)

Automatic Speech Recognition (ASR)

– solutions for generic and domain-specific
– prepare the language model
– prepare the phonetic dictionary
– MSA and dialects

Diacritization (Tashkeel)

Diacritization (Tashkeel)

– systematic diacritization
– fast ML training data preparation
– solutions for dialectical and foreign words
– insightful testing datasets

Spell Checkeing

Spell Checkeing

– determine spelling mistake patterns
– prepare the language model
– create the SC dictionary
– parallel corpus (mistaken and corrected)

Grammatical Error Correction (GEC)

Grammatical Error Correction (GEC)

– define grammatical mistake patterns in Arabic
– develop innovated solutions
– rule-based and ML solutions
– insightful testing datasets

DIALECT PROJECTS

Dialect Projects

Dialect Projects

– understand characteristics of each dialect
– build lexical resources for dialects
– collect and analyze real representative data
– smart ways to link between MSA and dialects

Latest Articles

VISIT OUR BLOG

Get Your Free Consultation With Us

Contact Us