Speculative Speculative Decoding: Hiding Draft Latency Through Asynchronous Speculation on Multi-GPU Systems

Course project for CMU 15-418 Parallel Computer Architecture and Programming by Marcus Alenius and William Chien.