Semantic caching for LLM responses on SAP HANA Cloud.
Stores prompt embeddings and LLM responses in HANA Cloud. When a semantically similar prompt comes in, it returns the cached response instead of calling the LLM — saving tokens and reducing latency.
- User sends a prompt to the LLM
- The cache embeds the prompt using the configured embedding model
- Searches HANA for cached entries using
COSINE_SIMILARITYon aREAL_VECTORcolumn - If similarity exceeds the threshold (default 0.95), returns the cached response — no LLM call
- If no match, calls the LLM normally, caches the prompt embedding + response, returns the response
pip install langchain-hana-cacheimport hdbcli.dbapi
from langchain_hana_cache import HANASemanticLLMCache
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.globals import set_llm_cache
connection = hdbcli.dbapi.connect(
address="your-host.hanacloud.ondemand.com",
port=443,
user="DBADMIN",
password="your-password",
encrypt=True,
)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
cache = HANASemanticLLMCache(
connection=connection,
embedding=embeddings,
table_name="LLM_CACHE",
similarity_threshold=0.95,
ttl_seconds=86400,
)
set_llm_cache(cache)
llm = ChatOpenAI(model="gpt-4o")
response1 = llm.invoke("What are the reporting requirements for article 12?")
response2 = llm.invoke("Tell me about article 12 reporting requirements") # cache hitfrom langchain_core.outputs import Generation
# Store a response
cache.update(
"What is the capital of France?",
"gpt-4o",
[Generation(text="The capital of France is Paris.")],
)
# Look up a similar prompt
result = cache.lookup("Tell me the capital of France", "gpt-4o")
# result = [Generation(text="The capital of France is Paris.")]# Remove entries older than TTL
cache.evict_expired()
# Keep only the 1000 most recently accessed entries
cache.evict_lru(max_entries=1000)
# Clear all cached entries
cache.clear()| Parameter | Type | Default | Description |
|---|---|---|---|
connection |
hdbcli.dbapi.Connection |
required | HANA database connection |
embedding |
Embeddings |
required | LangChain embedding model for encoding prompts |
table_name |
str |
"LLM_CACHE" |
Name of the cache table |
similarity_threshold |
float |
0.95 |
Minimum cosine similarity for a cache hit |
ttl_seconds |
int | None |
None |
Time-to-live in seconds (None = no expiry) |
git clone https://github.com/stubborncoder/langchain-hana-cache.git
cd langchain-hana-cache
pip install -e ".[dev]"
# Run unit tests
pytest tests/test_utils.py tests/test_llm_cache.py -v
# Run integration tests (requires HANA credentials in .env)
pytest tests/test_integration.py -vMIT