Chat Comparison App

Compare responses from multiple LLM providers side-by-side to evaluate performance, accuracy, and response styles for your specific use cases.

Real-time Model Comparison

Send the same prompt to multiple models and compare their responses instantly

Use Cases

📝 Prompt Engineering

Test how different models interpret your prompts and refine them for optimal results

⚖️ Model Selection

Choose the best model for your specific task based on actual performance comparisons

💰 Cost Optimization

Find the most cost-effective model that meets your quality requirements

Key Features

Side-by-Side Comparison: View responses from multiple models simultaneously
Conversation History: Maintain context across multiple interactions
Dynamic Model Selection: Switch between models on the fly
Response Time Tracking: Monitor and compare model latencies

Architecture

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   User UI   │────▶│   AISuite    │────▶│  Provider 1 │
│ (Streamlit) │     │    Client    │     └─────────────┘
└─────────────┘     │              │     ┌─────────────┐
                    │   Unified    │────▶│  Provider 2 │
                    │   Interface  │     └─────────────┘
                    │              │     ┌─────────────┐
                    └──────────────┘────▶│  Provider N │
                                         └─────────────┘

The app uses AISuite's unified interface to communicate with multiple providers through a single API, making it trivial to add new models or switch between them.

Implementation

chat_comparison.py

import streamlit as st
import aisuite as ai
from dotenv import load_dotenv

# Initialize AISuite client
load_dotenv()
client = ai.Client()

# Configure available models
models = [
    {"name": "GPT-4", "provider": "openai", "model": "gpt-4"},
    {"name": "Claude 3", "provider": "anthropic", "model": "claude-3-sonnet"},
    {"name": "Gemini Pro", "provider": "google", "model": "gemini-pro"},
]

# Streamlit UI
st.set_page_config(layout="wide")
st.title("🤖 LLM Response Comparison")

# Model selection
col1, col2 = st.columns(2)
with col1:
    model1 = st.selectbox("Model 1", [m["name"] for m in models], index=0)
with col2:
    model2 = st.selectbox("Model 2", [m["name"] for m in models], index=1)

# User input
user_prompt = st.text_area("Enter your prompt:", height=100)

if st.button("Compare Responses"):
    if user_prompt:
        col1, col2 = st.columns(2)
        
        # Get response from Model 1
        with col1:
            st.subheader(f"{model1} Response")
            with st.spinner("Generating..."):
                model1_config = next(m for m in models if m["name"] == model1)
                response1 = client.chat.completions.create(
                    model=f"{model1_config['provider']}:{model1_config['model']}",
                    messages=[{"role": "user", "content": user_prompt}]
                )
                st.write(response1.choices[0].message.content)
        
        # Get response from Model 2
        with col2:
            st.subheader(f"{model2} Response")
            with st.spinner("Generating..."):
                model2_config = next(m for m in models if m["name"] == model2)
                response2 = client.chat.completions.create(
                    model=f"{model2_config['provider']}:{model2_config['model']}",
                    messages=[{"role": "user", "content": user_prompt}]
                )
                st.write(response2.choices[0].message.content)

✨ AISuite Features Highlighted

▸Unified Client: Single client instance works with all providers
▸Provider Format: Simple "provider:model" string format
▸Consistent API: Same method signature for all providers
▸Parallel Execution: Compare multiple models simultaneously

Try It Out

Quick Start

1. Clone the repository:

git clone https://github.com/andrewyng/aisuite.git cd aisuite/examples/chat-ui

2. Install dependencies:

pip install aisuite streamlit python-dotenv

3. Configure providers:

# Create .env file with your API keys OPENAI_API_KEY=your-key ANTHROPIC_API_KEY=your-key GOOGLE_API_KEY=your-key

4. Run the app:
```
streamlit run chat.py
```

Extend It

💾 Add Conversation Export

Save comparison results to JSON or CSV for analysis

# Export to JSON
results = {
  "prompt": user_prompt,
  "responses": responses
}
json.dump(results, file)

📊 Add Response Metrics

Track response time, token usage, and costs

# Track metrics
start = time.time()
response = client.chat...
latency = time.time() - start
tokens = response.usage

🎯 Add Evaluation Scoring

Let users rate responses to build preference datasets

# User feedback
rating = st.slider(
  "Rate this response",
  min_value=1, max_value=5
)

🔄 Add Batch Processing

Process multiple prompts across models automatically

# Batch comparison
for prompt in prompts:
  responses = compare_all(
    prompt, models
  )

View on GitHub