Why should I use LattePanda 3 Delta for running small language models?

LattePanda 3 Delta is optimal for running smaller SLMs due to its compact design and sufficient computing power, making it suitable for applications that do not require extensive computational resources. However, for larger models, you may experience slower execution speeds, indicating that it's less ideal for high-demand tasks.

Is LattePanda 3 Delta suitable for high-speed SLM deployment?

No, LattePanda 3 Delta may not be the best choice for high-speed SLM deployment. Models like Qwen-0.5b perform well with speeds up to 7.17 tokens/s, but larger models such as Mathstral and Llama 3.1 show significantly slower speeds, highlighting limitations in computational power for demanding applications.

How do I install and run a model like Mathstral-7B on LattePanda 3 Delta?

To install and run Mathstral-7B on LattePanda 3 Delta, you must set up the 'ollama' runtime framework and execute the model command. Ensure sufficient disk space and consider slower token speeds due to the model's size and computational demands, which can affect performance.

How does the performance of SLMs on LattePanda 3 Delta compare to Lattepanda Sigma?

SLMs perform better on Lattepanda Sigma, with higher token speeds due to its superior computational capabilities. For example, Phi3 achieves 12 tokens/s on Sigma compared to less than 1 token/s on Delta, indicating that Sigma is preferable for larger model execution.

Run SLMs (phi3, gemma2, mathstral, llama3.1) on SBC (LattePanda 3 Delta)

by L.P

Introduction

In today's era of intelligent computing, Single Board Computers (SBC) have gained increasing popularity among developers due to their compact design and exceptional computing performance. At the same time, Small Language Models (SLMs) play a crucial role in diverse application scenarios, thanks to their efficiency and convenience. This article aims to provide an in-depth analysis of the performance of various SLMs on the LattePanda 3 Delta x86 hardware, running Ubuntu 22.04. We will conduct a detailed comparison of models such as mathstral, phi 3, llama 3.1, deepseek v2, gemma2 2b, qiwen, tinyllama,Deepseek coder V2 in terms of execution speed, model size, open-source licenses, and runtime frameworks. Our goal is to provide developers with valuable data and insights.

mathstral-7B-v0.1-q4

Model size: 4.1GB

Speed: <1 tokens/s

Open-source license: Apache 2.0

Runtime framework: ollama

Mathstral is built on Mistral 7B, supporting a context window length of 32k. It is a specialized large code model based on the Mamba2 architecture for mathematical reasoning.

Install ollama and run the command:

CODE

curl -fsSL https://ollama.com/install.sh | sh
sudo ollama run mathstral

Token speed of mathstral-7b-v0.1 running on LattePanda 3 Delta

phi3 3.8b-q4

Model size: 2.2GB

Speed: <1 tokens/s

Open-source license: MIT

Runtime framework: ollama

Install ollama and run the command:

CODE

sudo ollama run phi3

Token speed of phi3 3.8b-q4 running on LattePanda 3 Delta

Llama 3.1-8b-q4

Model size: 4.7GB

Speed: <1 tokens/s

Open-source license: llama3.1

Runtime framework: ollama

Install ollama and run the command:

CODE

sudo ollama run llama3.1

Token speed of Llama 3.1-8b-q4 running on LattePanda 3 Delta

gemma2-2b-q4

Model size: 1.6 GB

Speed: 1.4 tokens/s

Open-source license: gemma license

Runtime framework: ollama

Install ollama and run the command:

CODE

sudo ollama run gemma2

Token speed of gemma2-2b-q4 running on LattePanda 3 Delta

qwen-0.5b

Model size: 395MB

Speed: 7.17 tokens/s

Open-source license: Apache 2.0

Runtime framework: ollama

Install ollama and run the command:

CODE

sudo ollama run qwen:0.2b

Token speed of qwen-0.5b running on LattePanda 3 Delta

tinyllama

Model size: 638MB

Speed: 2.1 tokens/s

Open-source license: Apache 2.0

Runtime framework: ollama

Install ollama and run the command:

CODE

sudo ollama run tinyllama

Token speed of tinyllama running on LattePanda 3 Delta

Summary

Differences in SLMs

- Mathstral-7B-v0.1-q4: Focuses on mathematical reasoning problems, based on the Mamba2 architecture, suitable for scenarios requiring complex mathematical calculations and reasoning.

- Deepseek V2-7b-q4: Specializes in code-related issues, offering efficient code generation and understanding capabilities, ideal for development and programming applications.

- Phi3 3.8b-q4: Versatile with a wide range of applications, highly flexible, and suitable for general natural language processing tasks.

- Llama 3.1-8b-q4: A powerful general-purpose language model, well-suited for various NLP tasks, including text generation, translation, and dialogue systems.

- Gemma2-2b-q4: A smaller model designed for resource-constrained environments, while still delivering decent performance.

- Qwen-0.5b: Supports Chinese, small in size, and fast, making it ideal for real-time applications that require high responsiveness.

- Tinyllama: Designed for lightweight tasks, offering faster processing speed and a smaller model size.

Comparison of Different SLMs on LattePanda 3 Delta

Performance Summary of SLMs on LattePanda 3 Delta

This article compares the performance of various small language models (SLMs) on LattePanda 3 Delta hardware. The test results show that Qwen-0.5b performs best in execution speed, reaching 7.17 tokens per second, followed by Tinyllama at 2.1 tokens per second. Larger models like Mathstral-7B-v0.1-q4 and Llama 3.1-8b-q4, however, perform relatively slower.

Comparison of Different SLMs on Lattepanda Sigma

Previously, we tested various small language models using the Lattepanda Sigma. For more detailed information, please refer to the following article: Run Small Language Models (mathstral, phi 3, llama 3.1, mamba codestral, deepseek v2, gemma2 2b, gemma2 9b) on SBC Lattepanda Sigma.

Comparison of SLM Performance on LattePanda 3 Delta and Lattepanda Sigma

By comparing the performance of SLMs on LattePanda 3 Delta and Lattepanda Sigma hardware, we found that different hardware platforms significantly impact model execution. On the Lattepanda Sigma, the execution speed of various models was generally higher than on the LattePanda 3 Delta. For example, Phi3 3.8b-q4 reached a speed of 12 tokens per second on the Lattepanda Sigma, while on the LattePanda 3 Delta, it only achieved 0.98 tokens per second. Similarly, Deepseek-v2-16b-q4 performed at 17 tokens per second on the Lattepanda Sigma, outperforming other models.

This indicates that the superior computational power of the Lattepanda Sigma makes it more advantageous for handling larger models, while the LattePanda 3 Delta is better suited for running smaller models.

FAQs

Why should I use LattePanda 3 Delta for running small language models?

LattePanda 3 Delta is optimal for running smaller SLMs due to its compact design and sufficient computing power, making it suitable for applications that do not require extensive computational resources. However, for larger models, you may experience slower execution speeds, indicating that it's less ideal for high-demand tasks.
Is LattePanda 3 Delta suitable for high-speed SLM deployment?

No, LattePanda 3 Delta may not be the best choice for high-speed SLM deployment. Models like Qwen-0.5b perform well with speeds up to 7.17 tokens/s, but larger models such as Mathstral and Llama 3.1 show significantly slower speeds, highlighting limitations in computational power for demanding applications.
How do I install and run a model like Mathstral-7B on LattePanda 3 Delta?

To install and run Mathstral-7B on LattePanda 3 Delta, you must set up the 'ollama' runtime framework and execute the model command. Ensure sufficient disk space and consider slower token speeds due to the model's size and computational demands, which can affect performance.
How does the performance of SLMs on LattePanda 3 Delta compare to Lattepanda Sigma?

SLMs perform better on Lattepanda Sigma, with higher token speeds due to its superior computational capabilities. For example, Phi3 achieves 12 tokens/s on Sigma compared to less than 1 token/s on Delta, indicating that Sigma is preferable for larger model execution.