transformer-inference-overhead

Here are 3 public repositories matching this topic...

maestrosalah-dev / relational-time-engine

Relational Time Engine (RTE): runtime density regulation for compute-efficient transformer inference. Demonstrates up to 75% layer reduction with improved latency and throughput.

python benchmark machine-learning deep-learning transformer event-driven energy-efficiency ai-systems inference-optimization early-exit transformer-architecture green-ai ai-optimization runtime-systems efficient-ai energy-efficient-ai transformer-inference-overhead runtime-gating relational-time

Updated Mar 12, 2026
Python

edlansiaux / swiftembed-benchmarks

Star

Repository of benchmarking scripts for the SwiftEmbed embedding system, a static token lookup approach for ultra-low latency text embeddings.

nlp machine-learning lua token embedding ultra-low-latency text-embeddings real-time-applications rust-implementation static-token-lookup transformer-inference-overhead mean-pooling

Updated Oct 30, 2025
Lua

ha-196120 / swiftembed-benchmarks

Star

🚀 Evaluate SwiftEmbed's performance with benchmarking scripts for ultra-fast text embeddings using static token lookup methods.

nlp machine-learning lua token embedding ultra-low-latency text-embeddings real-time-applications rust-implementation static-token-lookup transformer-inference-overhead mean-pooling

Updated May 4, 2026
Lua

Improve this page

Add a description, image, and links to the transformer-inference-overhead topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the transformer-inference-overhead topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly