Time. 2025–present
Affiliation. The University of Hong Kong (Final Year Project)
Role. FYP owner (architecture + implementation)
Tagline. P2P collaborative inference platform that aims to stitch consumer GPUs into one logical accelerator.
Summary. ParaMind is an ongoing FYP exploring how households of heterogeneous devices can jointly serve large LLMs by combining pipeline/tensor parallelism, NAT traversal, and transparent scheduling. The current focus is on designing the end-to-end architecture and building a working multi-peer prototype.
Highlights.
big_modeling offload to maximize single-node capacity, while coordinating multi-peer execution with static cut points and KV-cache reuse.Keywords. Distributed inference, P2P, pipeline parallelism, systems for ML.
Links. Work in progress – FYP repo, report, and slides to be released after project completion.