Large language model

A large language model (LLM) is a type of artificial intelligence that can understand and create human language. These models learn by studying huge amounts of text from books, websites, and other sources.[1]

How they work

LLMs work by finding patterns in language. They learn grammar, facts, and how words relate to each other by looking at billions of examples. The most powerful LLMs use a special design called "transformer," which helps them process large amounts of text quickly.[2]

Limitations

While LLMs are powerful, they can make mistakes. They sometimes include biases from their training data, and they can produce incorrect information. They learn from existing text rather than having true understanding like humans do.[3]

History

Before 2017, language models were much simpler. The big change came when Google created the "transformer" design, which made language models much more powerful.[4]

Important developments include:

  • 2018: BERT was released, which helped computers better understand language[5]
  • 2019: GPT-2 was created but was considered so powerful that its creators worried about misuse[6]
  • 2022: ChatGPT was released and became very popular with the public[7]
  • 2023: GPT-4 came out and could understand both text and images[8]

Modern developments

Today, there are many different LLMs available. Some are private, like GPT-4, while others are open for anyone to use, like Deepseek and Mistral. As of 2024, GPT-4 was considered one of the most capable language models.[9]

Large Language Model Media

References

  1. Better Language Models and Their Implications. OpenAI (2019-02-14). Retrieved 2019-08-25.
  2. Vaswani, Ashish. Attention is All you Need. Advances in Neural Information Processing Systems 30 (2017)Curran Associates, Inc.. Retrieved 2024-01-21.
  3. Manning, Christopher D.. Human Language Understanding & Reasoning. Daedalus 151 (2) (2022). p. 127–138. doi:10.1162/daed_a_01905. Retrieved 2023-03-09.
  4. Bahdanau, Dzmitry; Cho, Kyunghyun; Bengio, Yoshua (2014). "Neural Machine Translation by Jointly Learning to Align and Translate". arXiv:1409.0473 [cs.CL]. 
  5. Rogers, Anna. A Primer in BERTology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics 8 (2020). p. 842–866. doi:10.1162/tacl_a_00349. Retrieved 2024-01-21.
  6. Hern, Alex. New AI fake text generator may be too dangerous to release, say creators. The Guardian (14 February 2019). Retrieved 20 January 2024.
  7. ChatGPT a year on: 3 ways the AI chatbot has completely changed the world in 12 months (November 30, 2023)Euronews. Retrieved January 20, 2024.
  8. Heaven, Will. GPT-4 is bigger and better than ChatGPT—but OpenAI won't say why (March 14, 2023)MIT Technology Review. Retrieved January 20, 2024.
  9. LMSYS Chatbot Arena Leaderboard. huggingface.co. Retrieved June 12, 2024.