Building Lexical Resources

Lexical resources are very important both for the general use and for many NLP tasks. Different types of dictionaries are used by language learners and researchers on a daily basis. Other special-purpose lexical resources are used in different NLP tasks.

Examples of lexical resources we can build: Phonetic dictionaries, Semantic databases, Dialectical dictionaries, Named Entity lists, Foreign words lists, Functional words lists, etc.

ML Training Data

ML models rely mainly on the data they are trained on. We build different training datasets (annotated corpora, parallel corpora, language models, etc) to help with different NLP tasks and solutions. Such tasks include but are not limited to PoS tagging, spell checkeing, diacritization, TTS, ASR, etc.

We build data fast, smart, and most importantly based on deep linguistic knowledge. We tailor data to suit your task in terms of size and format.

Testing Data

Knowing how good your model is forms a crucial element for success in the real world. We design the best well-balanced testing datasets for your task, so you can get an insight about your model and benchmark it against competitors’ models.

Language Consultations

Our team has deep linguistic knowledge and long experience in the NLP field in Arabic. We would be glad to help you with planning your project, implementing it and testing it.

We can also provide consultation on general language projects in the fields of education, publishing, media and more.


Most of the Arabic content, including books and different documents, is available in PDF format. These files are often photocopied or scanned, which makes it impossible to modify these files or search their content. That is why we have developed a complete solution that allows converting files from PDF to Word or TXT formats.