Details

Description
Petals is a decentralized platform that enables users to run large language models collaboratively. By loading a small part of the model and working together with others who serve other parts, users can run inference or fine-tuning on these models. It offers faster inference times, with single-batch inference running at approximately 1 second per step, which is up to 10 times faster than offloading. This speed makes it suitable for interactive applications like chatbots. Additionally, Petals provides more flexibility compared to traditional language model APIs. Users can employ various fine-tuning and sampling methods, execute custom paths, or access hidden states within the model. It combines the convenience of an API with the flexibility of PyTorch. The tool is currently in development and interested users can join the waitlist to try it out.
Link