There is a word that has multiplied exaggeratedly in scientific articles for a reason: ChatGPT likes it

That there are academic articles written by AI is something that has been proven beforethe question is how serious it is. To know the magnitude of this practice, a group of researchers has reviewed millions of paper abstracts published in PubMed and have found something interesting: there is a word that the AI ​​loves and the reason why it likes it so much is quite murky.

Delve. Its translation is ‘go deeper’ and its use multiplied by 28 between 2022 and 2024, which coincidentally coincides with the boom of ChatGPT and language models. Other words such as ‘underscore’ or ‘showcasing’ are also cited, with a frequency increase of x13.8 and x10.7 respectively. None of them are a noun or a word related to the content, but rather have more to do with the style of writing and are very characteristic of the flowery language that LLMs usually use.

flowery language. Does this mean that if we see one of these words in a paper it was written with AI? Not necessarily, but the increase is brutal. Researchers have compared the rise of ‘delve’ to other keywords, such as pandemic, which had a huge peak in 2020 and began to decline in 2021. The increase in the frequency of use of ‘delve’ is much more pronounced than all the others.

It’s not coincidental. There is a stage in the process of creating a chatbot like ChatGPT that requires human intervention to fine-tune the responses; This is what is known as reinforcement learning from human feedback (for its acronym in English). RLHF). It turns out that most of the workers who are dedicated to this refining work are in African countries, such as Nigeria. guess where The use of these words in formal English is quite common. Exactly, in Nigeria.

African style. ‘Delve’ is a fairly common word in business English in Africa, especially in Nigeria, and it is not the only one. There are also others like ‘leverage’, ‘explore’ or ‘tapestry’ that are more common in African English. According to 311institutealthough human feedback is very small compared to the enormous amounts of training data, it has a great impact since it is what defines the tone of the model when responding to us.

Data labeling. It is a key step for training large language models and requires humans to be behind it. The problem is that the majority of workers who dedicate themselves to this are from impoverished countries such as Nigeria, Kenya or India, among others. In case the endless days and the ridiculous salaries were not enough, many times workers must review violent and very explicit imagesall without any type of psychological support.

In Xataka | Being a porn moderator is not fun at all. He was exposed to “extreme, violent, graphic and sexually explicit content”

Image | National Institute of Allergy and Infectious Diseases in Unsplash

Leave your vote

Leave a Comment

GIPHY App Key not set. Please check settings

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.