Skip to content

stubborncoder/langchain-hana-cache

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

langchain-hana-cache

Semantic caching for LLM responses on SAP HANA Cloud.

Stores prompt embeddings and LLM responses in HANA Cloud. When a semantically similar prompt comes in, it returns the cached response instead of calling the LLM — saving tokens and reducing latency.

How it works

  1. User sends a prompt to the LLM
  2. The cache embeds the prompt using the configured embedding model
  3. Searches HANA for cached entries using COSINE_SIMILARITY on a REAL_VECTOR column
  4. If similarity exceeds the threshold (default 0.95), returns the cached response — no LLM call
  5. If no match, calls the LLM normally, caches the prompt embedding + response, returns the response

Installation

pip install langchain-hana-cache

Usage

As LangChain global cache

import hdbcli.dbapi
from langchain_hana_cache import HANASemanticLLMCache
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.globals import set_llm_cache

connection = hdbcli.dbapi.connect(
    address="your-host.hanacloud.ondemand.com",
    port=443,
    user="DBADMIN",
    password="your-password",
    encrypt=True,
)

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

cache = HANASemanticLLMCache(
    connection=connection,
    embedding=embeddings,
    table_name="LLM_CACHE",
    similarity_threshold=0.95,
    ttl_seconds=86400,
)

set_llm_cache(cache)

llm = ChatOpenAI(model="gpt-4o")
response1 = llm.invoke("What are the reporting requirements for article 12?")
response2 = llm.invoke("Tell me about article 12 reporting requirements")  # cache hit

Manual usage

from langchain_core.outputs import Generation

# Store a response
cache.update(
    "What is the capital of France?",
    "gpt-4o",
    [Generation(text="The capital of France is Paris.")],
)

# Look up a similar prompt
result = cache.lookup("Tell me the capital of France", "gpt-4o")
# result = [Generation(text="The capital of France is Paris.")]

Eviction

# Remove entries older than TTL
cache.evict_expired()

# Keep only the 1000 most recently accessed entries
cache.evict_lru(max_entries=1000)

# Clear all cached entries
cache.clear()

Parameters

Parameter Type Default Description
connection hdbcli.dbapi.Connection required HANA database connection
embedding Embeddings required LangChain embedding model for encoding prompts
table_name str "LLM_CACHE" Name of the cache table
similarity_threshold float 0.95 Minimum cosine similarity for a cache hit
ttl_seconds int | None None Time-to-live in seconds (None = no expiry)

Development

git clone https://github.com/stubborncoder/langchain-hana-cache.git
cd langchain-hana-cache
pip install -e ".[dev]"

# Run unit tests
pytest tests/test_utils.py tests/test_llm_cache.py -v

# Run integration tests (requires HANA credentials in .env)
pytest tests/test_integration.py -v

License

MIT

About

Semantic caching for LLM responses on SAP HANA Cloud

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors