Portrait
Sangwoo Kwon
Ph.D. Student
Seoul National University

I am a 5th year Ph.D. student in the Computer Science and Engineering Department at Seoul National University. I am a member of the Architecture and Code Optimization (ARC) Lab, led by Professor Jae W. Lee. My research interests focus on efficient deployment of deep learning models and LLMs, including efficient computation algorithms and quantization techniques.

Education
  • Seoul National University
    Seoul National University
    Department of Computer Science and Engineering
    Ph.D. Student
    Mar. 2022 - present

    B.S. in Computer Science and Engineering
    Mar. 2018 - Feb. 2022

Teaching Experience
  • System Programming (M1522.000800)
    Spring, 2024
  • Logic Design (M1522.000700)
    Spring, 2020
Services
  • Student Volunteer, International Symposium on Code Generation and Optimization (CGO)
    2022
Publications
DP-LLM: Runtime Model Adaptation with Dynamic Layer-wise Precision Assignment
DP-LLM: Runtime Model Adaptation with Dynamic Layer-wise Precision Assignment

Sangwoo Kwon, Seong Hoon Seo, Jae W. Lee, Yeonhong Park

39th Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

A novel mechanism that dynamically assigns precision to each layer based on input values. DP-LLM augments each linear layer in an LLM with a precision selector that determines the bitwidth at runtime using a lightweight error estimator and threshold values learned through fine-tuning.

DP-LLM: Runtime Model Adaptation with Dynamic Layer-wise Precision Assignment
DP-LLM: Runtime Model Adaptation with Dynamic Layer-wise Precision Assignment

Sangwoo Kwon, Seong Hoon Seo, Jae W. Lee, Yeonhong Park

39th Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

A novel mechanism that dynamically assigns precision to each layer based on input values. DP-LLM augments each linear layer in an LLM with a precision selector that determines the bitwidth at runtime using a lightweight error estimator and threshold values learned through fine-tuning.

KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction
KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction

Jang-Hyun Kim, Jinuk Kim, Sangwoo Kwon, Jae W. Lee, Sangdoo Yun, Hyun Oh Song

39th Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

A query-agnostic KV cache eviction method enabling effective reuse of compressed KV caches across diverse queries. KVzip quantifies the importance of a KV pair using the underlying LLM to reconstruct original contexts from cached KV pairs, subsequently evicting pairs with lower importance.

KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction
KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction

Jang-Hyun Kim, Jinuk Kim, Sangwoo Kwon, Jae W. Lee, Sangdoo Yun, Hyun Oh Song

39th Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

A query-agnostic KV cache eviction method enabling effective reuse of compressed KV caches across diverse queries. KVzip quantifies the importance of a KV pair using the underlying LLM to reconstruct original contexts from cached KV pairs, subsequently evicting pairs with lower importance.

A 40nm 5.6TOPS/W 239GOPS/mm² Self-Attention Processor with Sign Random Projection-based Approximation
A 40nm 5.6TOPS/W 239GOPS/mm² Self-Attention Processor with Sign Random Projection-based Approximation

Seong Hoon Seo, Soosung Kim, Sung Jun Jung, Sangwoo Kwon, Hyunseung Lee, Jae W. Lee

48th European Solid-State Circuits Conference (ESSCIRC), 2022

A novel self-attention accelerator that skips most of the computation by utilizing an approximate candidate selection algorithm. Implemented in a 40nm CMOS technology, the 5.64 mm² chip operates at 100-600 MHz consuming 48.3-685 mW to achieve the energy and area efficiency of 0.354-5.61 TOPS/W and 239 GOPS/mm2, respectively.

A 40nm 5.6TOPS/W 239GOPS/mm² Self-Attention Processor with Sign Random Projection-based Approximation
A 40nm 5.6TOPS/W 239GOPS/mm² Self-Attention Processor with Sign Random Projection-based Approximation

Seong Hoon Seo, Soosung Kim, Sung Jun Jung, Sangwoo Kwon, Hyunseung Lee, Jae W. Lee

48th European Solid-State Circuits Conference (ESSCIRC), 2022

A novel self-attention accelerator that skips most of the computation by utilizing an approximate candidate selection algorithm. Implemented in a 40nm CMOS technology, the 5.64 mm² chip operates at 100-600 MHz consuming 48.3-685 mW to achieve the energy and area efficiency of 0.354-5.61 TOPS/W and 239 GOPS/mm2, respectively.