Cache Compression Algorithm

Nvidia says it can shrink LLM memory 20x without changing model weights

Nvidia's KV Cache Transform Coding (KVTC) compresses LLM key-value cache by 20x without model changes, cutting GPU memory costs and time-to-first-token by up to 8x for multi-turn AI applications.

The Atlantic

AI’s Memorization Crisis

Editor’s note: This work is part of AI Watchdog, The Atlantic’s ongoing investigation into the generative-AI industry. On Tuesday, researchers at Stanford and Yale revealed something that AI companies ...

GitHub

KVO-LLM: Boosting Long-Context Generation Throughput for Batched LLM Inference (DAC'25)

With the widespread deployment of long-context large language models (LLMs), efficient and high-quality generation is becoming increasingly important. Modern LLMs employ batching and key-value (KV) ...

Phys.org

Google claims its latest quantum algorithm can outperform supercomputers on a real-world task

Researchers from Google Quantum AI report that their quantum processor, Willow, ran an algorithm for a quantum computer that solved a complex physics problem thousands of times faster than the world's ...

Bleeping Computer

New FileFix attack uses cache smuggling to evade security software

A new variant of the FileFix social engineering attack uses cache smuggling to secretly download a malicious ZIP archive onto a victim’s system and bypassing security software. The new phishing and ...

IEEE

EVASION: Efficient KV CAche CompreSsion vIa PrOduct QuaNtization

Abstract: Large language models (LLMs) are increasingly utilized for complex tasks requiring longer context lengths, with some models supporting up to 128K or 1M tokens. This trend, however, presents ...

GitHub

compression-algorithms

File Compressor v2 is an advanced, user-friendly web application for compressing and decompressing files using both Huffman Coding and Lempel–Ziv (LZ77/LZW) algorithms. Designed with efficiency in ...

Scientific Research Publishing

Design Algorithm of FIR Filter Based on Coefficient Compression ()

College of Rail Transit Locomotive and Rolling Stock, Hunan Railway Professional Technology College, Zhuzhou, China. Many people have made efforts to design and improve high-performance FIR filters, ...

Case Western Reserve University

Gourav Datta

Our group is currently working on energy-efficient computer vision and multimodal deep learning for a wide range of applications (e.g., smart healthcare). We focus on cutting-edge research including i ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results