✅ Supported Operating Systems
The Kairntech platform has been successfully validated on the following operating systems, for both CPU and GPU deployments:
- Ubuntu 18.04.6 LTS (x64), or higher
- RHEL / CentOS7 (x64), or higher
ℹ️ Note: When deploying in a CPU-only environment, Docker-based virtualization removes most OS-level constraints. However, for GPU-based deployment, compatibility is restricted due to NVIDIA driver requirements. Only the above-listed OS versions are officially supported for GPU setups.
✅ Supported GPU Hardware
To operate on GPU, the Kairntech platform requires NVIDIA GPUs with CUDA compute capability ≥ 7.5, such as:
- NVIDIA T4
- NVIDIA A2
- NVIDIA Titan RTX
- NVIDIA RTX 3000 series (e.g., 3060, 3080, 3090, etc.)
👉 You can consult the full list of compute capabilities by NVIDIA here.
💻 General Hardware Recommendations (without GPU)
Component | Specification |
---|---|
CPU | 8 physical cores (see details below) |
RAM | 64 GB (128 GB recommended when using entity-fishing or Transformer-based models) |
Disk | 400 GB SSD with high read IOPS (≥ 10,000), especially important for entity-fishing |
Operating System | Ubuntu 20.04.6 LTS or RHEL / CentOS 7 (x64), or higher |
💻 CPU Guidelines per Component
Component | CPU Requirements |
---|---|
Deep Learning (Flair, Transformers, SpaCy…) | High core count is prioritized over single-thread speed |
Entity Linking (entity-fishing) | Requires CPUs with high sustained all-core turbo frequencies, ideally > 3 GHz |
Other Components | Prefer same specs as above, but can run on more modest setups |
✅ Recommended CPU Baseline
For optimal performance, we recommend CPUs that meet the following criteria:
- At least 8 cores, preferably more
- Base clock speed ≥ 3.0 GHz
-
Single Thread Performance (STP) Index ≥ 2200
(Refer to CPU Benchmark for reference)
ℹ️ Example:
- CPU: AMD Ryzen 7 5700G (8 cores / 16 threads, STP index: 3273)
- RAM: 128 GB DDR4
- SSD: Samsung 980 Pro 1 TB (PCIe 4.0 NVMe, high IOPS)
Example #1: Hardware Recommendation for Running NER + Categorization Pipelines (with CPU)
🎯 Objective
Deploy Scikit-Learn, Spacy or Flair with or without contextual embeddings (e.g., BERT
, RoBERTa
) in a production environment. Flair with contextual embeddings will run with low inference performance.
✅ Recommended Configuration
Component | Recommended Specification |
---|---|
CPU | AMD Ryzen 7 5700G (8 cores / 16 threads, STP index: 3273) |
RAM | 128 GB DDR4 |
Storage | 1 TB NVMe SSD (PCIe Gen4, high IOPS – e.g., Samsung 980 Pro) |
Operating System | Ubuntu 20.04.6 LTS or RHEL / CentOS 7 |
Containerization | Docker |
🧪 Example Realistic Setup
Server: Custom workstation or small-form server (e.g., ASUS ESC500 G6 or equivalent)
GPU: ❌ None (CPU-only setup)
CPU: AMD Ryzen 7 5700G (8 cores / 16 threads, STP index: 3273)
RAM: 128 GB DDR4
Disk: 1 TB NVMe SSD (PCIe Gen4, high IOPS – e.g., Samsung 980 Pro)
OS: Ubuntu 20.04.6 LTS or RHEL / CentOS 7
Containerization: Docker
Use Case: Mid-range NLP workloads (entity-fishing, basic NER/Categorization with Flair, CPU-only inference)
ℹ️ Note: This setup is ideal for CPU-only deployments of NLP services such as lightweight NER/classification pipelines using Flair with precomputed or static embeddings. The Ryzen 7 5700G offers excellent single-thread performance, which benefits components that are sensitive to per-core throughput. While powerful, this configuration may show limited performance for large Transformer-based embeddings (e.g., BERT) or heavy concurrent loads without GPU acceleration. For environments where latency or model size is a concern, this setup can serve as a staging or pre-production node, or as a low-cost inference server.
🔧 Example #2: Hardware Recommendation for Running NER + Categorization Pipelines in Production (with GPU)
🎯 Objective
Deploy Scikit-Learn, Spacy, Flair with contextual embeddings (e.g., BERT
, RoBERTa
) in a production environment with high inference performance.
Deploy Transformers in any environment.
✅ Recommended Configuration (with GPU)
Component | Recommended Specification |
---|---|
GPU | NVIDIA L4 (CUDA compute capability 8.9, 24 GB VRAM, optimized for NLP workloads) |
CPU | AMD Ryzen 9 5950X or Intel Xeon Silver 4314 (≥ 16 cores, ≥ 3.0 GHz base clock) |
RAM | 128 GB DDR4 or DDR5 |
Storage | 1 TB NVMe SSD (PCIe Gen4, high IOPS – e.g., Samsung 980 Pro) |
Operating System | Ubuntu 20.04.6 LTS or RHEL / CentOS 7 |
Containerization | Docker + NVIDIA Container Toolkit (for standardized deployment) |
Why the NVIDIA L4 (and not the L40)?
- The L4 is highly optimized for NLP inference workloads, with excellent performance-per-watt efficiency.
- Its 24 GB VRAM is sufficient for most large transformer models like
bert-large
orroberta-base
, even with moderate batch sizes. - It’s cost-effective, fits in 1U/2U servers, and is widely adopted in production environments for inference APIs.
- The L40 (48 GB VRAM) is more powerful but often overkill for typical NLP tasks — better suited for image generation or large-scale model training.
🧪 Example Realistic Setup
Server: Dell PowerEdge R750xs or equivalent
GPU: NVIDIA L4 (24 GB VRAM)
CPU: 16 to 32 cores (e.g., AMD Ryzen 9 / Threadripper or Intel Xeon Silver)
RAM: 128 GB
Disk: 1 TB NVMe SSD (high throughput)
OS: Ubuntu 20.04.6 LTS
ℹ️ Note: This setup is designed for high-performance inference using contextual embeddings (e.g., BERT, RoBERTa) with Flair. This setup is also valid for Transformers engine.
🔧 Example #3: Hardware Recommendation for Running entity-fishing Component
🎯 Objective
Deploy entity linking on texts in 3–4 languages, using Wikidata-based annotations.
✅ Recommended Configuration (without GPU)
Component | Recommended Spec |
---|---|
CPU | Intel Xeon Silver 4314 / AMD EPYC 7313P (≥ 16 cores, sustained all-core turbo ≥ 3 GHz) |
RAM | 128 GB DDR4 (multilingual + caching of Wikidata dumps) |
Disk | 1 TB NVMe SSD (very high read IOPS ≥ 10,000; Wikidata index access) |
GPU | ❌ Not required |
OS | Ubuntu 20.04.6 LTS or RHEL / CentOS 7 |
Network | 1 Gbps Ethernet (for inter-service communication and response time) |
ℹ️ Note: entity-fishing is CPU-bound and sensitive to disk latency and multithreading performance. No GPU needed.
🔧 Example #4: Hardware Recommendation for Local LLM Inference Server (with GPU)
🎯 Objective
Run a local LLM (Large Language Model) (e.g., Mistral 7B
, LLaMA 3
, Mixtral
) for inference or lightweight fine-tuning.
✅ Recommended Configuration (with GPU)
Component | Recommended Spec |
---|---|
CPU | AMD Threadripper 7970X or Intel Xeon Gold 6426Y (24–32 cores) |
GPU | NVIDIA A100 80GB or RTX 6000 Ada (48 GB) depending on model size |
RAM | 128 GB DDR4 or DDR5 |
Disk | 2 TB NVMe SSD (PCIe Gen4) |
OS | Ubuntu 20.04.6 LTS |
LLM Frameworks | llama.cpp, vLLM, Hugging Face Transformers + Accelerate |
Quantized Models | Optional: GGUF or GPTQ formats to reduce GPU VRAM needs |
ℹ️ Note: This setup enables low-latency LLM inference for models up to 65B parameters (with quantization) or native inference for 7B–13B models.
🔧 Example #5: Hardware Recommendation for Local Retrieval-Augmented Generation server including chatbot (with GPU)
🎯 Objective
A production-ready RAG chatbot running fully on-premise, capable of retrieving from a moderately large corpus (e.g., 500K docs) and generating answers using a local language model (e.g., Mistral 7B
, LLaMA 3 8B
).
✅ Recommended Configuration (with GPU)
Component | Specification |
---|---|
Server | Supermicro SYS-420GP-TNAR+ or equivalent workstation/server |
GPU | NVIDIA RTX 6000 Ada (48 GB VRAM) or NVIDIA A100 (80 GB VRAM) |
CPU | AMD Threadripper PRO 5975WX or Intel Xeon Gold 6430 (24–32 cores) |
RAM | 128 GB DDR4/DDR5 |
Disk | 2 TB NVMe SSD (PCIe Gen4, high IOPS) |
OS | Ubuntu 20.04.6 LTS or Ubuntu 22.04 |
Retriever Stack | FAISS, Weaviate, or Elasticsearch |
Model Host | vLLM, llama.cpp, or Transformers + Accelerate |
Containerization | Docker (optionally with Docker Compose or Kubernetes) |
ℹ️ Note: If you’re using quantized models (e.g., GGUF via llama.cpp), VRAM requirements drop significantly and you can run 7B models on cards like RTX 4090 (24 GB). The retriever’s performance depends heavily on RAM and disk speed, especially with larger corpora or high query throughput. For chatbots that require conversational context memory, consider integrating a vector cache or session-level context tracking. If using external LLM APIs, you can reduce the hardware footprint significantly — the GPU may not be needed at all.
✅ Summary Table
Use Case | GPU | CPU Cores | RAM | SSD | Notes |
---|---|---|---|---|---|
NER + Categorization Pipelines (CPU) | ❌ | 8+ | 128 GB | 1 TB NVMe | Low-performance inference with embeddings (Flair), No Transformer engine not supported |
NER + Categorization Pipelines (GPU) | ✅ L4 | 16+ | 128 GB | 1 TB NVMe | High-performance inference (Flair), Transformer engines supported |
Entity Linking with entity-fishing engine (3–4 langs) | ❌ | 16+ | 128 GB | 1 TB NVMe | CPU-bound, high IOPS needed |
Local LLM Inference | ✅ A100/RTX | 24+ | 128 GB | 2 TB NVMe | LLaMA/Mistral, quantization optional |
Retrieval-Augmented Generation including chatbot | ✅ RTX 6000 or A100 | 24+ | 128 GB | 2 TB NVMe | The hardware needs vary depending on the retriever scale (size of corpus, vector DB engine) |
Installation steps
All listed commands below come from the environment UBUNTU 18.04 LTS x64.
Host configuration prerequisites:
Kairntech platform Docker volumes prerequisites
Kairntech platform installation
Host configuration prerequisites:
ELASTICSEARCH recommendation
You may need to increase the vm.max_map_count
kernel parameter to avoid running out of map areas.
In order to avoid such message:
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
It is recommended to edit file /etc/sysctl.conf
and insert the following lines:
# ES - at least 262144 for production use
vm.max_map_count=262144
Apply the modification with using the following command
sudo sysctl -p
INOTIFY recommendation
You may need to increase the fs.inotify.max_user_instances
parameter to avoid reaching user limits on the number of inotify resources.
In order to avoid such message
[Errno 24] inotify instance limit reached
It is recommended to edit file /etc/sysctl.conf
and insert the following lines:
# Prevent [Errno 24] inotify instance limit reached
fs.inotify.max_user_instances = 65530
Apply the modification with using the following command
sudo sysctl -p
HAPROXY recommendation
You may need to set net.ipv4.ip_unprivileged_port_start
to let to non root user haproxy the permission to run on priviledged port 443.
In order to avoid such message (in haproxy container console output)
[ALERT] (1) : Starting frontend http-in-sherpa: cannot bind socket (Permission denied) [0.0.0.0:443]
[ALERT] (1) : [haproxy.main()] Some protocols failed to start their listeners! Exiting.
It is recommended to edit file /etc/sysctl.conf
and insert the following lines:
# Enable haproxy to listen to 443
net.ipv4.ip_unprivileged_port_start=0
Apply the modification with using the following command
sudo sysctl -p
User/Folders creation
USER creation
Is it highly advised to create a specific user, for the deployment of the platform:
# FOR A STANDARD USER
sudo adduser kairntech
# OR FOR A HEADLESS USER
sudo adduser --disabled-password --gecos "" kairntech
FOLDERS creation
Is it highly advised to create specific folders, for the deployment of the platform:
sudo mkdir -p /opt/sherpa
sudo chown -R kairntech. /opt/sherpa
mkdir -p ~/embeddings
mkdir -p ~/vectorizers
The content of the prepared folders will consist in:
Directory /opt/sherpa/
will store all files and folders relative to the platform (delivered by Kairntech)
- File
docker-compose.yml
to be used to deploy/pull Docker images of the platform - Folder
sherpa-core
to be used to store authentication mechanism keys and deploy specific components - Folder
sherpa-haproxy
to be used in case redirections are set (optionnal)
Directory ~/embeddings
will store all files required for Embeddings volumes (delivered by Kairntech)
- File
deploy-embeddings-flair.sh
to be used to deploy Flair embeddings - File
docker-compose.flair.volumes.yml
also to be used to deploy Flair embeddings - File
deploy-embeddings-fasttext.sh
to be used to deploy fastText embeddings - File
docker-compose.fasttext.volumes.yml
also to be used to deploy fastText embeddings - File
deploy-knowledge-entityfishing.sh
to be used to deploy entity-fishing knowledge - File
docker-compose.ef.volumes.yml
also to be used to deploy entity-fishing knowledge
Directory ~/vectorizers
will store all files required for Vectorizers volumes (delivered by Kairntech)
- File
docker-compose.vectorizer.allminilml6v2.yml
to be used to deployallMiniLML6V2
model - File
docker-compose.vectorizer.multiminilml12v2.yml
to be used to deploymultiMiniLML12V2
model - File
docker-compose.vectorizer.spdilacamembertgpl.yml
to be used to deployspDilaCamembert
model - File
docker-compose.vectorizer.sentencecamembertbase.yml
to be used to deploysentenceCamembertBase
model
Binaries installation
Docker / Docker Compose installation
The platform being based on a Docker-type solution, please install docker and docker compose plugin.
The official page indicating the installation commands is located here.
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Then you will have to add the kairntech user to the docker group
sudo usermod -aG docker kairntech
As mentionned in the installation guide, log out and log back in so that your group membership is re-evaluated.
If you want to test, open a new session terminal and run
sudo su - kairntech
docker run hello-world
After installing the compose plugin, you can test via:
sudo su - kairntech
docker compose version
Docker volumes to mount
In order to feed docker volumes with embeddings files, some scripts will be provided (by Kairntech) in a zip file. You’ll need to have unzip binary on hand to uncompress.
FLAIR embeddings
In order to fully utilize the Flair engine, « embeddings » files must be downloaded.
These static files are stored as Docker volumes. In order to download these items, please run:
sudo su - kairntech
cd ~/embeddings
# INSTALL AR, DE, EN AND FR
export FLAIR_LANGS=ar,de,en,fr
docker compose -f docker-compose.flair.volumes.yml -p volumes-flair up
# OR INSTALL ALL LANGUAGES
export FLAIR_LANGS=all
docker compose -f docker-compose.flair.volumes.yml -p volumes-flair up
Once deployed, you should get the following sizes (all languages)
sudo du -hs /var/lib/docker/volumes/sherpashared_flair_suggester_datasets
12K
sudo du -hs /var/lib/docker/volumes/sherpashared_flair_suggester_embeddings
35GB
The Docker container can be removed, once Flair embeddings are deployed, via:
docker rm flair-suggester-init-job
The table below gives disk usage required to deploy available languages:
Language | Size |
---|---|
Arabic (AR) | 2.9G |
German (DE) | 4.3G |
English (EN) | 3.8G |
Spanish (ES) | 4.2G |
Farsi (FA) | 768M |
French (FR) | 4.2G |
Hindi (HI) | 1.0G |
Italian (IT) | 3.9G |
Dutch (NL) | 3.9G |
Portuguese (PT) | 2.7G |
Russian (RU) | 4.1G |
Chinese (ZH) | 1.6G |
All | 35G |
FASTTEXT embeddings
In order to fully utilize the fastText engine, « embeddings » files must be downloaded.
These static files are stored as Docker volumes. In order to download these items, please run:
sudo su - kairntech
cd ~/embeddings
# INSTALL AR, DE, EN AND FR
export FASTTEXT_LANGS=ar,de,en,fr
docker compose -f docker-compose.fasttext.volumes.yml -p volumes-fasttext up
# OR INSTALL ALL LANGUAGES
export FASTTEXT_LANGS=all
docker compose -f docker-compose.fasttext.volumes.yml -p volumes-fasttext up
Once deployed, you should get the following sizes (all languages)
sudo du -hs /var/lib/docker/volumes/sherpashared_fasttext_suggester_embeddings/
29G /var/lib/docker/volumes/sherpashared_fasttext_suggester_embeddings/
The Docker container can be removed, once fastText embeddings are deployed, via:
docker rm fasttext-suggester-init-job
The table below gives disk usage required to deploy available languages:
Language | Size |
---|---|
Arabic (AR) | 1.5G |
German (DE) | 5.6G |
English (EN) | 6.2G |
Spanish (ES) | 2.5G |
French (FR) | 2.9G |
Italian (IT) | 2.2G |
Japanese (JA) | 1.3G |
Portuguese (PT) | 1.5G |
Russian (RU) | 4.7G |
Chinese (ZH) | 822M |
All | 29G |
ENTITY-FISHING knowledge
In order to fully utilize the entity-fishing engine, « knowledge » files must be downloaded.
These static files are generated every month, and stored as Docker volumes. In order to download these items, please run:
sudo su - kairntech
cd ~/embeddings
# INSTALL AR, DE, EN AND FR
export EF_LANGS=ar,de,en,fr
export EF_DATE=02-03-2023
docker compose -f docker-compose.ef.volumes.yml -p volumes-ef up
# OR INSTALL ALL LANGUAGES
export EF_LANGS=all
export EF_DATE=02-03-2023
docker compose -f docker-compose.ef.volumes.yml -p volumes-ef up
Once deployed, you should get the following sizes (all languages)
sudo du -hs /var/lib/docker/volumes/sherpa_entityfishing_data
100GB
The Docker container can be removed, once entity-fishing knowledge is deployed, via:
docker rm entity-fishing-init-job
The table below gives disk usage required to deploy available languages:
Language | Size |
---|---|
Arabic (AR) | 36.7G (3.7G + 33G) |
German (DE) | 40.0G (6.0G + 33G) |
English (EN) | 49G (16G + 33G) |
Spanish (ES) | 37.4G (4.4G + 33G) |
Farsi (FA) | 36.5G (3.5G + 33G) |
French (FR) | 38.6G (5.6G + 33G) |
Italian (IT) | 36.9G (3.9G + 33G) |
Japanese (JA) | 36.6G (3.6G + 33G) |
Portuguese (PT) | 35.8G (2.8G + 33G) |
Russian (RU) | 39.4G (6.4G + 33G) |
Chinese (ZH) | 36.1G (3.1G + 33G) |
Ukrainian (UA) | 36.6G (3.6G + 33G) |
Indian (HI) | 33.5G (455M + 33G) |
Swedish (SE) | 37.2G (4.2G + 33G) |
Bengali (BD) | 33.7G (700M + 33G) |
All | 100G (67G + 33G) |
In these metrics, the common knowledge takes 33G of disk usage, and is mandatory.
VECTORIZERS
In order to fully utilize the vectorizers, languages models files must be downloaded.
These static files are stored as Docker volumes. In order to download these items, please run:
sudo su - kairntech
cd ~/vectorizers
# INSTALL allMiniLML6V2
docker compose -f docker-compose.vectorizer.allminilml6v2.yml -p allminilml6v2 up
# INSTALL multiMiniLML12V2
docker compose -f docker-compose.vectorizer.multiminilml12v2.yml -p multiminilml12v2 up
# INSTALL spDilaCamembert
docker compose -f docker-compose.vectorizer.spdilacamembertgpl.yml -p spdilacamembertgpl up
# INSTALL sentenceCamembertBase
docker compose -f docker-compose.vectorizer.sentencecamembertbase.yml -p sentencecamembertbase up
Kairntech platform installation
As a first step, in order to configure JWT authentication, run the following commands:
sudo su - kairntech
cd /opt/sherpa/sherpa-core/jwt
## In order to generate private.pem
openssl genrsa -out private.pem 2048
## In order to generate private_key.pem
openssl pkcs8 -topk8 -inform PEM -in private.pem -out private_key.pem -nocrypt
## In order to generate public.pem
openssl rsa -in private.pem -outform PEM -pubout -out public.pem
This will generate 3 files:
-
private.pem
, to be kept in a safe place -
private_key.pem
, to be used insherpa-core/jwt
folder -
public.pem
, to be used insherpa-core/jwt
folder
Then, in order to download the different images needed to install the platform, you must first connect to dockerhub.
(The password to be used will be delivered by Kairntech ).
sudo su - kairntech
cd /opt/sherpa
docker login
username: ktguestkt
password:
Once logged in, you can start downloading the images:
docker compose -f docker-compose.yml pull
Finally, to start the platform, run:
docker compose -f docker-compose.yml up -d
Once the platform is started, you can check the status of the containers; the following console output is given as an example. Some containers may not be present, depending on the kind of deployment you processed.
docker ps -a --format "{{.ID}}\t\t{{.Names}}\t\t{{.Status}}"
79e235f82787 sherpa-core Up 20 sec
e69f95855809 sherpa-crfsuite-suggester Up 20 sec
c9d95639c808 sherpa-entityfishing-suggester Up 20 sec
94e4574b95de sherpa-fasttext-suggester Up 20 sec
8f13e72aeb0d sherpa-phrasematcher-test-suggester Up 20 sec
0f49dec91340 sherpa-phrasematcher-train-suggester Up 20 sec
aa08f1008770 sherpa-sklearn-test-suggester Up 20 sec
988976ef327d sherpa-sklearn-train-suggester Up 20 sec
bed6169d9185 sherpa-spacy-test-suggester Up 20 sec
302bd98a44ab sherpa-spacy-train-suggester Up 20 sec
7754162ae44c sherpa-flair-test-suggester Up 20 sec
08d1ad415adb sherpa-flair-train-suggester Up 20 sec
4835129a77c9 sherpa-bertopic-test-suggester Up 20 sec
b999a848044c sherpa-bertopic-train-suggester Up 20 sec
0826e0dd9c85 sherpa-elasticsearch Up 20 sec
7f781bf11ddf sherpa-mongodb Up 20 sec
d3b0e0557309 sherpa-builtins-importer Up 20 sec
cf075d3b06f4 sherpa-multirole Up 20 sec
ae1b24e0ccdb sherpa-pymultirole Up 20 sec
2a737b399388 sherpa-pymultirole-trf Up 20 sec
f43121e96544 sherpa-pymultirole-ner Up 20 sec