Homepage - Sangwoo Kwon

Sangwoo Kwon

Ph.D. Student
Seoul National University

kwonsw055(at)snu.ac.kr

I am a 5th year Ph.D. student in the Computer Science and Engineering Department at Seoul National University. I am a member of the Architecture and Code Optimization (ARC) Lab, led by Professor Jae W. Lee. My research interests focus on efficient deployment of deep learning models and LLMs, including efficient computation algorithms and quantization techniques.

Education

Seoul National University

Department of Computer Science and Engineering

Ph.D. Student

Mar. 2022 - present

B.S. in Computer Science and Engineering

Mar. 2018 - Feb. 2022

Teaching Experience

System Programming (M1522.000800)

Spring, 2024
Logic Design (M1522.000700)

Spring, 2020

Services

Student Volunteer, International Symposium on Code Generation and Optimization (CGO)

2022

Publications

DP-LLM: Runtime Model Adaptation with Dynamic Layer-wise Precision Assignment

Sangwoo Kwon, Seong Hoon Seo, Jae W. Lee, Yeonhong Park

39th Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

A novel mechanism that dynamically assigns precision to each layer based on input values. DP-LLM augments each linear layer in an LLM with a precision selector that determines the bitwidth at runtime using a lightweight error estimator and threshold values learned through fine-tuning.

[Paper]

DP-LLM: Runtime Model Adaptation with Dynamic Layer-wise Precision Assignment

Sangwoo Kwon, Seong Hoon Seo, Jae W. Lee, Yeonhong Park

39th Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

[Paper]

KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction

Jang-Hyun Kim, Jinuk Kim, Sangwoo Kwon, Jae W. Lee, Sangdoo Yun, Hyun Oh Song

39th Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

A query-agnostic KV cache eviction method enabling effective reuse of compressed KV caches across diverse queries. KVzip quantifies the importance of a KV pair using the underlying LLM to reconstruct original contexts from cached KV pairs, subsequently evicting pairs with lower importance.

[paper]

KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction

Jang-Hyun Kim, Jinuk Kim, Sangwoo Kwon, Jae W. Lee, Sangdoo Yun, Hyun Oh Song

39th Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

[paper]

A 40nm 5.6TOPS/W 239GOPS/mm² Self-Attention Processor with Sign Random Projection-based Approximation

Seong Hoon Seo, Soosung Kim, Sung Jun Jung, Sangwoo Kwon, Hyunseung Lee, Jae W. Lee

48th European Solid-State Circuits Conference (ESSCIRC), 2022

A novel self-attention accelerator that skips most of the computation by utilizing an approximate candidate selection algorithm. Implemented in a 40nm CMOS technology, the 5.64 mm² chip operates at 100-600 MHz consuming 48.3-685 mW to achieve the energy and area efficiency of 0.354-5.61 TOPS/W and 239 GOPS/mm2, respectively.

[Paper]

A 40nm 5.6TOPS/W 239GOPS/mm² Self-Attention Processor with Sign Random Projection-based Approximation

Seong Hoon Seo, Soosung Kim, Sung Jun Jung, Sangwoo Kwon, Hyunseung Lee, Jae W. Lee

48th European Solid-State Circuits Conference (ESSCIRC), 2022

[Paper]

Warning

Action required

Education

Teaching Experience

Services

Publications

DP-LLM: Runtime Model Adaptation with Dynamic Layer-wise Precision Assignment

DP-LLM: Runtime Model Adaptation with Dynamic Layer-wise Precision Assignment

KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction

KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction

A 40nm 5.6TOPS/W 239GOPS/mm² Self-Attention Processor with Sign Random Projection-based Approximation

A 40nm 5.6TOPS/W 239GOPS/mm² Self-Attention Processor with Sign Random Projection-based Approximation