That there are academic articles written by AI is something that has been proven beforethe question is how serious it is. To know the magnitude of this practice, a group of researchers has reviewed millions of paper abstracts published in PubMed and have found something interesting: there is a word that the AI loves and the reason why it likes it so much is quite murky.
Delve. Its translation is ‘go deeper’ and its use multiplied by 28 between 2022 and 2024, which coincidentally coincides with the boom of ChatGPT and language models. Other words such as ‘underscore’ or ‘showcasing’ are also cited, with a frequency increase of x13.8 and x10.7 respectively. None of them are a noun or a word related to the content, but rather have more to do with the style of writing and are very characteristic of the flowery language that LLMs usually use.
flowery language. Does this mean that if we see one of these words in a paper it was written with AI? Not necessarily, but the increase is brutal. Researchers have compared the rise of ‘delve’ to other keywords, such as pandemic, which had a huge peak in 2020 and began to decline in 2021. The increase in the frequency of use of ‘delve’ is much more pronounced than all the others.
It’s not coincidental. There is a stage in the process of creating a chatbot like ChatGPT that requires human intervention to fine-tune the responses; This is what is known as reinforcement learning from human feedback (for its acronym in English). RLHF). It turns out that most of the workers who are dedicated to this refining work are in African countries, such as Nigeria. guess where The use of these words in formal English is quite common. Exactly, in Nigeria.
African style. ‘Delve’ is a fairly common word in business English in Africa, especially in Nigeria, and it is not the only one. There are also others like ‘leverage’, ‘explore’ or ‘tapestry’ that are more common in African English. According to 311institutealthough human feedback is very small compared to the enormous amounts of training data, it has a great impact since it is what defines the tone of the model when responding to us.
Data labeling. It is a key step for training large language models and requires humans to be behind it. The problem is that the majority of workers who dedicate themselves to this are from impoverished countries such as Nigeria, Kenya or India, among others. In case the endless days and the ridiculous salaries were not enough, many times workers must review violent and very explicit imagesall without any type of psychological support.
Image | National Institute of Allergy and Infectious Diseases in Unsplash

GIPHY App Key not set. Please check settings