P
Pamela_Fox
If you haven't tried it already, Ollama is a great tool built on top of llama.cpp that makes it easier to run small language models (SLMs) like Phi-3 and Llama3-8B on your own machine, even if your personal computer has no GPU or has an ARM chip. Ollama provides both a command-line interface to chat with the language model, as well as an OpenAI-compatible chat completion endpoint.
What if your personal computer can't run Ollama for some reason, like if you're using a ChromeBook or iPad without the ability to install? GitHub Codespaces to the rescue! Codespaces is a way to open any GitHub repository in the browser, inside a web-based VS Code running a containerized development environment, all customizable via a devcontainer.json file.
We can add Ollama to the Codespace for a repository by adding this community-created feature in devcontainer.json:
Once we open a repository with that feature added, we can open the terminal of the Codespace in the browser and run a model from the Ollama models catalog:
We can also call that Ollama server programmatically, either via its standard endpoint or via its OpenAI-compatible endpoint using an OpenAI SDK:
To make it easy for you to get started using Ollama with Python, open the Codespace for this repository:
GitHub - pamelafox/ollama-python-playground: A dev container with ollama and ollama examples with the Python OpenAI SDK
That repo includes the Ollama feature, OpenAI SDK, a notebook with demonstrations of few-shot and RAG, and a Python script for an interactive chat. It's designed to be a helpful resource for teachers and students who want a quick and easy way to get started with small language models.
If you want to use Ollama from .NET instead, open this C# playground in Codespaces:
GitHub - elbruno/Ollama-CSharp-Playground: This project is designed to be opened in GitHub Codespaces as an easy way for anyone to try out SLMs (small language models) entirely in the browser.
That repo also includes sample code for common tasks with the SLMs. Bruno put together a video walking through it as well:
We've also added the Ollama feature to the Phi-3Cookbook repository:
GitHub - microsoft/Phi-3CookBook: This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks.
If you open that repo in a GitHub Codespace, then you can use Ollama while reading through the guides. Of course, there is a limit to what you can do in a Codespace, due to the lack of a GPU and general resource constraints. Ollama does a great job optimizing for those scenarios, but that cookbook also contains Jupyter notebooks that use the transformers package, which is not optimized for the non-GPU case. When I tried to run the Phi-3 inference notebook, it took my Codespace a full 1.5 hours to complete the inference. :face_screaming_in_fear:
We hope that this lowers the barrier even more for everyone interested in trying out small language models! Let us know in the comments if you've added Ollama support to any repositories or if you have other ways that you like to experiment with small language models.
Continue reading...
What if your personal computer can't run Ollama for some reason, like if you're using a ChromeBook or iPad without the ability to install? GitHub Codespaces to the rescue! Codespaces is a way to open any GitHub repository in the browser, inside a web-based VS Code running a containerized development environment, all customizable via a devcontainer.json file.
We can add Ollama to the Codespace for a repository by adding this community-created feature in devcontainer.json:
Code:
"features": {
"ghcr.io/prulloac/devcontainer-features/ollama:1": {}
},
Once we open a repository with that feature added, we can open the terminal of the Codespace in the browser and run a model from the Ollama models catalog:
We can also call that Ollama server programmatically, either via its standard endpoint or via its OpenAI-compatible endpoint using an OpenAI SDK:
Code:
import openai
client = openai.OpenAI(
base_url="http://localhost:11434/v1",
api_key="nokeyneeded",
)
response = client.chat.completions.create(
model="phi3:mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a haiku about a hungry cat"},
],
)
print(response.choices[0].message.content)
Ollama Python Playground
To make it easy for you to get started using Ollama with Python, open the Codespace for this repository:
GitHub - pamelafox/ollama-python-playground: A dev container with ollama and ollama examples with the Python OpenAI SDK
That repo includes the Ollama feature, OpenAI SDK, a notebook with demonstrations of few-shot and RAG, and a Python script for an interactive chat. It's designed to be a helpful resource for teachers and students who want a quick and easy way to get started with small language models.
Ollama C# Playground
If you want to use Ollama from .NET instead, open this C# playground in Codespaces:
GitHub - elbruno/Ollama-CSharp-Playground: This project is designed to be opened in GitHub Codespaces as an easy way for anyone to try out SLMs (small language models) entirely in the browser.
That repo also includes sample code for common tasks with the SLMs. Bruno put together a video walking through it as well:
Phi-3 CookBook
We've also added the Ollama feature to the Phi-3Cookbook repository:
GitHub - microsoft/Phi-3CookBook: This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks.
If you open that repo in a GitHub Codespace, then you can use Ollama while reading through the guides. Of course, there is a limit to what you can do in a Codespace, due to the lack of a GPU and general resource constraints. Ollama does a great job optimizing for those scenarios, but that cookbook also contains Jupyter notebooks that use the transformers package, which is not optimized for the non-GPU case. When I tried to run the Phi-3 inference notebook, it took my Codespace a full 1.5 hours to complete the inference. :face_screaming_in_fear:
We hope that this lowers the barrier even more for everyone interested in trying out small language models! Let us know in the comments if you've added Ollama support to any repositories or if you have other ways that you like to experiment with small language models.
Continue reading...