Why do Arabic language programs fail to reach high accuracy levels? (1/4)
Despite the common features between languages, each language has its own characteristics, which makes the tendency to copy a computer solution from one language to another a process doomed to failure.
It is no secret to those who follow the market of software programs dealing with human languages that there is a great disparity in the level of accuracy of these programs in languages, such as English, and their level in the Arabic language. This makes the question about the reasons for this disparity a licit or even a necessary question.
In this series of short articles, we will try to clarify the most important reasons behind this disparity in the light of practical experience.
Technology solutions to language problems fall into two types: rule-based solutions and machine learning solutions. While some people may think that the first type of solutions is the only one that requires linguistic knowledge, the truth is that both types require linguistic knowledge and the ability to describe the linguistic phenomenon in a scientific way.
Therefore, the first reason for the shortcomings in Arabic programs is that most of them are not based on a real understanding of the linguistic phenomenon that they deal with, but rather they rush to develop solutions to the problems before defining them, describing them, and putting them in their scientific frames; thus, these programs are doomed to failure before they start.
But why does that happen?
One of the reasons is that the majority of those who create these programs either come from a purely technical background (software engineers or data scientists) where understanding and explaining linguistic phenomena falls outs their area of expertise, or they are language engineers (computational linguists) who believe that solutions which prove successful in English can be applied as they are to Arabic. The truth is that languages are too complicated to deal with in such simplification.
Let us take a practical example on how languages are different by looking at the case of any software aiming to correct grammatical mistakes in the Arabic language. Arabic involves grammatical cases (e.g. nominative, accusative, and genitive) which pose a problem for most people who may commit grammatical mistakes in choosing the right case. English, on the other hand, does not involve similar grammatical cases and, therefore, such mistakes do not exist in English. Consequently, the solution to correct such grammatical problems must start from defining, describing and framing this unique problem rather than copying the solution from other languages.