Chat Comparison App
Compare responses from multiple LLM providers side-by-side to evaluate performance, accuracy, and response styles for your specific use cases.
Real-time Model Comparison
Send the same prompt to multiple models and compare their responses instantly
Use Cases
π Prompt Engineering
Test how different models interpret your prompts and refine them for optimal results
βοΈ Model Selection
Choose the best model for your specific task based on actual performance comparisons
π° Cost Optimization
Find the most cost-effective model that meets your quality requirements
Key Features
- Side-by-Side Comparison: View responses from multiple models simultaneously
- Conversation History: Maintain context across multiple interactions
- Dynamic Model Selection: Switch between models on the fly
- Response Time Tracking: Monitor and compare model latencies
Architecture
βββββββββββββββ ββββββββββββββββ βββββββββββββββ β User UI ββββββΆβ AISuite ββββββΆβ Provider 1 β β (Streamlit) β β Client β βββββββββββββββ βββββββββββββββ β β βββββββββββββββ β Unified ββββββΆβ Provider 2 β β Interface β βββββββββββββββ β β βββββββββββββββ βββββββββββββββββββββΆβ Provider N β βββββββββββββββ
The app uses AISuite's unified interface to communicate with multiple providers through a single API, making it trivial to add new models or switch between them.
Implementation
import streamlit as st
import aisuite as ai
from dotenv import load_dotenv
# Initialize AISuite client
load_dotenv()
client = ai.Client()
# Configure available models
models = [
{"name": "GPT-4", "provider": "openai", "model": "gpt-4"},
{"name": "Claude 3", "provider": "anthropic", "model": "claude-3-sonnet"},
{"name": "Gemini Pro", "provider": "google", "model": "gemini-pro"},
]
# Streamlit UI
st.set_page_config(layout="wide")
st.title("π€ LLM Response Comparison")
# Model selection
col1, col2 = st.columns(2)
with col1:
model1 = st.selectbox("Model 1", [m["name"] for m in models], index=0)
with col2:
model2 = st.selectbox("Model 2", [m["name"] for m in models], index=1)
# User input
user_prompt = st.text_area("Enter your prompt:", height=100)
if st.button("Compare Responses"):
if user_prompt:
col1, col2 = st.columns(2)
# Get response from Model 1
with col1:
st.subheader(f"{model1} Response")
with st.spinner("Generating..."):
model1_config = next(m for m in models if m["name"] == model1)
response1 = client.chat.completions.create(
model=f"{model1_config['provider']}:{model1_config['model']}",
messages=[{"role": "user", "content": user_prompt}]
)
st.write(response1.choices[0].message.content)
# Get response from Model 2
with col2:
st.subheader(f"{model2} Response")
with st.spinner("Generating..."):
model2_config = next(m for m in models if m["name"] == model2)
response2 = client.chat.completions.create(
model=f"{model2_config['provider']}:{model2_config['model']}",
messages=[{"role": "user", "content": user_prompt}]
)
st.write(response2.choices[0].message.content)
β¨ AISuite Features Highlighted
- βΈUnified Client: Single client instance works with all providers
- βΈProvider Format: Simple "provider:model" string format
- βΈConsistent API: Same method signature for all providers
- βΈParallel Execution: Compare multiple models simultaneously
Try It Out
Quick Start
- 1. Clone the repository:
git clone https://github.com/andrewyng/aisuite.git cd aisuite/examples/chat-ui
- 2. Install dependencies:
pip install aisuite streamlit python-dotenv
- 3. Configure providers:
# Create .env file with your API keys OPENAI_API_KEY=your-key ANTHROPIC_API_KEY=your-key GOOGLE_API_KEY=your-key
- 4. Run the app:
streamlit run chat.py
Extend It
πΎ Add Conversation Export
Save comparison results to JSON or CSV for analysis
# Export to JSON
results = {
"prompt": user_prompt,
"responses": responses
}
json.dump(results, file)
π Add Response Metrics
Track response time, token usage, and costs
# Track metrics
start = time.time()
response = client.chat...
latency = time.time() - start
tokens = response.usage
π― Add Evaluation Scoring
Let users rate responses to build preference datasets
# User feedback
rating = st.slider(
"Rate this response",
min_value=1, max_value=5
)
π Add Batch Processing
Process multiple prompts across models automatically
# Batch comparison
for prompt in prompts:
responses = compare_all(
prompt, models
)