Language has been accepted as a system of sound-meaning connections for thousands of years (Hauser, Chomsky & Fitch, 2002). This work aims to highlight various ethical challenges with regards to the relationship between intelligent computing and the construct of language. Intelligent computing has had a drastic impact on human-computer interaction—specifically when aiming to replicate and mimic the key components of human-to-human interactions. New discoveries and research in this field has driven the frontier of human and computer interaction into a new realm where the users are now faced with issues of ethics that have not been part of traditional discussions around human and computer interaction. This is not only an extremely difficult technical challenge but includes a number of ethical issues that humans, and specifically developers and other users of such systems, must start to recognize and manage appropriately. This work aims to bring some of these ethical challenges to light, and to provide the reader with a better understanding of what should be done, from a development perspective. It also highlights what the user should be wary of when using these systems.
Language is an extremely intimate and unique aspect of humanity. When developers attempt to replicate it, it is easy to miss the subtleties that underlie, and are the basis of, human-to-human communication and interaction. Although language is a system of communication, this definition fails to recognise that which is unsaid: a system of signs and symbols; gestures, posture and facial expressions that are practically intangible; that are extremely relevant in the act of communication between individuals (Wilkinson, 1975). A number of non-verbal communications; such as chronemics, vocalics, haptics, kinesics, proxemics and artifactual communication (Surbhi, 2015) greatly inform communication and therefore enhance the understanding of a particular language.
One of the key challenges with Natural Language Processing (NLP) systems is the, increasingly, widening gap between resource rich and resource poor languages. This
stands alongside the work that still needs to be done to further increase the machines’ ability to achieve a greater level of communication that is capable of replicating existing human-to-human interaction on a better level. Many existing algorithms and systems, such as machine translation, are able to manage spell-checking, speech synthesis, with recognition systems having a much stronger body of research and training within Western languages (Duong, 2018). This challenge is evident across a range of different languages and this work will not focus on a particular language but rather draw a distinction between resource rich and resource poor languages. When an issue is brought to focus, even if an example is brought from a specific language, the goal is to address the wider issue: distinguishing between resource rich and resource poor language. The steps that can be taken to avoid the ethical issues illustrated within intelligent language processing as a whole are also addressed. Often, simply making developers and users aware of the issue is enough. Highlighting the existing lack of development work is adequate warning for users, and provides a framework for developers to focus efforts on widening access to such technologies to other languages.
How a particular construct is measured in terms of intelligence depends on the context— for instance specific populations with cultural idiosyncrasies must be considered in this regard. Language recognition is a particular form of intelligence that has a wide variation from language to language, particularly amongst languages that are from a completely separate “family.” An Artificial Intelligent (AI) system that has been built, and has shown to have a high degree of accuracy, runs the risk of humans and companies alike trusting and leaning too heavily on biased software. In turn, this can then lead to many issues not being addressed when the system fails. This phenomenon is happening in real-time. The recent internal document leak known as the “Facebook Documents” has indicated that the company’s AI failed in terms of recognising hate speech and terror-related posts, particularly in languages where the AI had less training resources or manual moderation. In this particular instance, the two went hand in hand—the system would learn from manual moderators (what would be classified as inappropriate for the community standard from the manual moderator) and then be more equipped to flag such content independently. The particular lack of Arabic moderators added to the gap detected in the AI’s ability to classify content as inappropriate. This emphasises the importance that companies and corporations need to be wary of when developing such AI screening methods. The drawbacks involved with languages with fewer training samples have proved to be incredibly important considerations . In addition, this is a unique example — where the user is the same company that develops the software. The developers and the users are, in fact, the same organisation. This in itself poses a serious ethical dilemma that must be underscored. Manual moderation is a critical component in content moderation, and without it, the result was that the very AI system that is there to assist the manual process failed due to the lack of uniquely human input.
A key technique within NLP is to use processes such as stemming and lemmatization, which includes either removing the initial or concluding characters in order to reduce a word to a stem form. This technique may not necessarily be as effective within the scope of all languages. Mager et al., illustrates this using languages in the Americas where morphological phenomena are more common, and the languages are not always suffixal— and as a result it is not adequate to remove inflectional endings to obtain a stem. However, Marger (2018) goes on to provide solutions whereby neural methods have been used to tackle the rich morphology of these languages. This issue illustrates how important it is to be aware that NLP requires a broader approach in comparison to those that have been shown to work within Western languages.
Efforts have been made to address the need for context and to improve the accuracy in NLP. For example, with machine translation from Japanese to English, Tetsuya Nasukawa shows that these efforts to provide an accurate and precise translation are limited in their scope. Subsequently, the degree at which one can safely rely on and use a machine translation algorithm without concern for mistranslation is still significant. Especially, when the translation service becomes a component whereby one party charges the other party or uses the translation for a use—where mistranslation can lead to a miscarriage of justice such as when machine translation is used in court and is handed off to a machine (Jingyuan Xie, 2019). Although machine translation has come a long way as with traditional human translation, context and human intuition remains a critical component within the translation process. As such research and development still needs to be performed to further incorporate context and other human-like intuition into machine translation.
A further example of a growing ethical challenge with non-Western language algorithms is its use to prevent clickbait. Clickbait’s main purpose is to use content in such a way to attract attention and encourage visitors to click on a link to a particular web-page. This is a concern for many social media companies and internet content sites. Clickbait is a severe issue in which NLP has shown much promise in terms of helping tackle the challenges involved. Marreddy et al (2021)., highlights the many resource poor languages that still require a huge concerted effort by developers to train existing language processing models, for example, a Robustly Optimized BERT Pretraining Approach (RoBERTa). In this case, the developers worked on Teluga and improved the RoBERTa model. It has shown promise in helping recognise clickbait content in Teluga. However, in spite of these efforts, users should take particular care when browsing non-Western content and understand that the tools that can tackle clickbait for resource poor languages still require significant development.
Furthermore, a word can be interpreted differently across different cultures—even in the same language. As a result, the same construct is not being measured across different
cultural groups and could obscure significant interpretation of the data (Dolanetal, 2006; Milfon & Fischer, 2010, as cited in Cockcroft, Alloway, Copello, & Milligan, 2015). For example in isiXhosa, one of South Africa’s official eleven languages, the colour vocabulary is limited as the isiXhosa word for “blue” and “green” is synonymous (Foxcroft, 2002). Using both colours in a test could potentially pose as problematic as the constructs to the isiXhosa speaker are undifferentiated. Developers should be aware of the nuances and differences such as this example with isiXhosa—as even if the algorithm can interpret it in a way that may be technically correct, it would still distort the meaning. This represents the limits of intelligent computing, and its ability with communication. These are issues inherent within cross-cultural translation and not just simple language translation.
Lastly, the final ethical issue is how NLP and AI algorithms use a form of grouping words and the manner in which they are tagged, which is often by placing them in a bag of words or a bucket. Some of these algorithms will perform surprisingly well. However, there are many dangers with such methods that one must be aware of. In Bamman et al. (2014) the warning against drawing clear lines, distinctions, and categories to develop NLP is made clear:
“If we start with the assumption that ‘female’ and ‘male’ are the relevant categories, then our analyses are incapable of revealing violations of this assumption. . . . [W]hen we turn to a descriptive account of the interaction between language and gender, this analysis becomes a house of mirrors, which by design can only find evidence to support the underlying assumption of a binary gender opposition (p. 148).”
While this warning drew on a particular issue of gender within language, it can be extended to many different situations such as social and cultural differences as well. The system may be developed to simply draw a line where the nuances or diversity for these constructs may not be recognised. What the developer must therefore construct is a robust yet flexible process that can process language without harming or eroding the nuances of idiosyncrasy and diversity. Similarly, the user must always be wary of how technology is far from perfect and unable to provide clear cut solutions at this stage, especially with languages with less resources.
In summary, the difficulty with replicating human-to-human communication, and developing an intelligent system that is able to provide an adequate human to computer communication process that can be utilised on a global scale, is particularly challenging. This challenge is made even more difficult given the lack of resources that are available for developers for some languages. It has been shown that developers will often struggle to use existing methods such as stemming and lemmatization, and wrongfully apply them to other languages. Moreover, users of non-Western content online need to be wary of issues such as clickbait, hate speech and incitement. The recent example of
the Facebook documents underscores the drawbacks to resource poor languages, and shows the diverse set of ethical challenges that exist with using AI and intelligent NLP systems. Finally, Bamman’s warning about assuming relevant categories is perhaps the most critical thought that developers must keep in mind in order to avoid bias and to create systems that are fair, flexible, and most imperative of all — accurate.
References
Cockcroft, K., Alloway, T., Copello, E. and Milligan, R., 2015. A cross-cultural comparison between South African and British students on the wechsler adult intelligence scales third edition (WAIS-III). Frontiers in Psychology, 6, p.297.
David Bamman, Jacob Eisenstein, and Tyler Schnoe- belen. 2014. Gender identity and lexical varia- tion in social media. Journal of Sociolinguistics, 18(2):135–160.
Duong, L., 2017. Natural language processing for resource-poor languages (Doctoral dissertation).
Foxcroft, C.D., 2011. Ethical issues related to psychological testing in Africa: What I have learned (so far). Online readings in psychology and culture, 2(2), pp.2307-0919.
Hauser, M.D., Chomsky, N. and Fitch, W.T., 2002. The faculty of language: what is it, who has it, and how did it evolve?. science, 298(5598), pp.1569-1579
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L. and Stoyanov, V., 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
Mager, M., Gutierrez-Vasques, X., Sierra, G. and Meza, I., 2018. Challenges of language technologies for the indigenous languages of the Americas. arXiv preprint arXiv:1806.04291.
M. Marreddy, S. R. Oota, L. S. Vakada, V. C. Chinni and R. Mamidi, “Clickbait Detection in Telugu: Overcoming NLP Challenges in Resource-Poor Languages using Benchmarked Techniques,” 2021 International Joint Conference on Neural Networks (IJCNN), 2021, pp. 1-8, doi: 10.1109/IJCNN52387.2021.9534382.
Metgud, R., Surbhi, N.S. and Patel, S., 2015. Odontometrics: A useful method for gender determination in Udaipur population. J Forensic Investigation, 3(2), pp.1-5.
Nasukawa, T., 1996. Full-text processing: improving a practical NLP system based on surface information within the context. In COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics.
Scheck, D.S., Jeff Horwitz and Justin (2021). Facebook Says AI Will Clean Up the Platform. Its Own Engineers Have Doubts. Wall Street Journal. [online] 17 Oct. Available at: https://www.wsj.com/articles/facebook-ai-enforce-rules-engineers-doubtful-artificial-intell igence-11634338184?mod=article_inline [Accessed 30 Oct. 2021].
Wilkinson, G.R. and Shand, D.G., 1975. A physiological approach to hepatic drug clearance. Clinical Pharmacology & Therapeutics, 18(4), pp.377-390.
Xie, J., 2019, April. Study on the Ethics Problems Between Translation Service Providers and Consumers. In 3rd International Conference on Culture, Education and Economic Development of Modern Society (ICCESE 2019) (pp. 476-479). Atlantis Press.