Ollama RAG

An “Ollama RAG app” is a web service that uses Ollama and a Vector DB to provide “Retrieval-Augmented Generation”. Here we set up a local LLM instance with ollama and chroma db for result augmentation.

Rough Notes from the initial test, mostly based from hackernoon.

# Check that we have Video Card Support
lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2)

# Verify "Quadro" supports compute 5 at https://developer.nvidia.com/cuda-gpus

# install ollama as per https://github.com/ollama/ollama/blob/main/README.md#quickstart

curl -fsSL https://ollama.com/install.sh | sh

ollama run llama3.2
 
>>> what is your knowledge cutoff?
My knowledge cutoff is currently December 2023. This means that I have information up to that date, but I may not be aware of events, updates, or developments that have occurred after that time.

Install ChromaDB and connect it to ollama.

# install python deps
pip install --q chromadb
pip install --q unstructured langchain langchain-text-splitters
pip install --q "unstructured[all-docs]"
pip install --q flask

# Install the text embedding model
ollama pull nomic-embed-text

# Is ollama running? the CURL install add it as a service, I suspect
curl localhost:11434
Ollama is running


# Add a Markdown Document about the holiday schedule

curl --request POST \
  --url http://localhost:8080/embed \
  --header 'Content-Type: multipart/form-data' \
  --form file=@/fall_schedule.md
  
{
  "message": "File embedded successfully"
}

# Ask it a question about an event

 curl --request POST \
  --url http://localhost:8080/query \
  --header 'Content-Type: application/json' \
  --data '{ "query": "When is fall break?" }'
{
  "message": "Fall break occurs from October 9-12."
}