Llama.cpp

llama.cpp
Original author(s)	Georgi Gerganov
Developer(s)	Georgi Gerganov and community
Initial release	Alpha ( b1083 ) / August 26, 2023; 14 months ago
Written in	C++
Engine
License	MIT License
Website	github.com/ggerganov/llama.cpp

Script error: No such module "Draft topics". Script error: No such module "AfC topic". Script error: No such module "AfC submission catcheck".

Search Llama.cpp on Amazon.

Llama.cpp is an open source software library that performs inference on various Large Language Models such as LLaMA.^[1] It is written in C++ and is generally smaller in size and complexity than most existing inference frameworks like TensorFlow. It currently has 55 thousand stars on GitHub.^[2]

History[edit]

Llama.cpp began development by Georgi Gerganov to implement LLaMA in pure C++ with no dependencies. The advantage of this method was that it could run on more hardware compared to other inference libraries that depended on hardware dependent closed source libraries like CUDA. Before Lamma.cpp, Gerganov worked on a similar library called whisper.cpp^[3] which implemented OpenAI's "whisper" speech to text model. Lamma.cpp gained traction from users who did not have specialized hardware as it could run on just a CPU including on Android devices.^[4] In March 2023 Gerganov started a company around llama.cpp called ggml.ai.^[5]

Architecture[edit]

Llama.cpp initially could only run on CPUs but now can run on GPUs using multiple different back-ends including Vulkan and SYCL. These back-ends make up the GGML tensor library which is used by the front-end model-specific llama.cpp code and is also used by other projects such as whisper.cpp.^[6] Llama.cpp has it's own model format called GGUF (previously referred to as GMML format).^[7] It is required to convert models from other formats to GGUF, and sometimes not all tensor functions required by a given model are supported by GGML/GGUF. Llama.cpp in general follows the KISS principle in order to make it as small and easy to use a dependency as possible.

References[edit]

↑ Connatser, Matthew. "How this open source LLM chatbot runner hit the gas on x86, Arm CPUs". theregister.com. Retrieved 15 April 2024.
↑ "ggerganov/llama.cpp". GitHub.
↑ "ggerganov/whisper.cpp". GitHub.
↑ Edwards, Benj (13 March 2023). "You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi". arstechnica.com. Retrieved 15 April 2024.
↑ "GGML - AI at the edge".
↑ "GGML - AI at the edge". ggml.ai. Retrieved 16 April 2024.
↑ Pounder, Les (25 March 2023). "How To Create Your Own AI Chatbot Server With Raspberry Pi 4". tomshardware.com. Retrieved 16 April 2024.

This article "Llama.cpp" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Llama.cpp. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.

[1] Connatser, Matthew. "How this open source LLM chatbot runner hit the gas on x86, Arm CPUs". theregister.com. Retrieved 15 April 2024.

[2] "ggerganov/llama.cpp". GitHub.

[3] "ggerganov/whisper.cpp". GitHub.

[4] Edwards, Benj (13 March 2023). "You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi". arstechnica.com. Retrieved 15 April 2024.

[5] "GGML - AI at the edge".

[6] "GGML - AI at the edge". ggml.ai. Retrieved 16 April 2024.

[7] Pounder, Les (25 March 2023). "How To Create Your Own AI Chatbot Server With Raspberry Pi 4". tomshardware.com. Retrieved 16 April 2024.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Llama.cpp

History[edit]

Architecture[edit]

References[edit]

📰 Article(s) of the same category(ies)[edit]