Write agentic apps once, run with any LLM provider ✨

PDF Q&A Assistant

Build an intelligent document assistant that can answer questions about any PDF using RAG (Retrieval-Augmented Generation) powered by your choice of LLM provider.

Document Intelligence at Scale

Extract insights from PDFs using advanced RAG techniques

Use Cases

πŸ“‹ Contract Analysis

Quickly find specific clauses, terms, and conditions in legal documents

πŸ”¬ Research Papers

Extract key findings, methodologies, and conclusions from academic papers

πŸ“š Documentation

Create intelligent help systems for technical documentation

Key Features

  • Smart Chunking: Intelligently splits documents for optimal context retrieval
  • Vector Search: Uses embeddings for semantic similarity matching
  • Context-Aware Answers: Provides relevant excerpts with answers
  • Multi-Provider Support: Works with any LLM through AISuite

RAG Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   PDF    │────▢│  Text        │────▢│   Chunking  β”‚
β”‚  Upload  β”‚     β”‚  Extraction  β”‚     β”‚   Strategy  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                              β”‚
                                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   User   β”‚     β”‚   Vector     │◀────│  Embeddings β”‚
β”‚ Question │────▢│   Search     β”‚     β”‚   Creation  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β–Ό
                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                 β”‚   Relevant   β”‚
                 β”‚   Chunks     β”‚
                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Answer  │◀────│   AISuite    │────▢│     LLM     β”‚
β”‚          β”‚     β”‚   Client     β”‚     β”‚  Provider   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation

pdf_qa.py
import aisuite as ai
from PyPDF2 import PdfReader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
import streamlit as st

class PDFQuestionAnswering:
    def __init__(self, provider="openai:gpt-4"):
        self.client = ai.Client()
        self.provider = provider
        self.embeddings = OpenAIEmbeddings()
        self.vector_store = None
        
    def process_pdf(self, pdf_file):
        """Extract and chunk PDF text"""
        reader = PdfReader(pdf_file)
        text = ""
        for page in reader.pages:
            text += page.extract_text()
        
        # Split text into chunks
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200,
            length_function=len
        )
        chunks = text_splitter.split_text(text)
        
        # Create vector store
        self.vector_store = FAISS.from_texts(chunks, self.embeddings)
        return len(chunks)
    
    def answer_question(self, question):
        """Answer question based on PDF content"""
        if not self.vector_store:
            return "Please upload a PDF first."
        
        # Find relevant chunks
        relevant_docs = self.vector_store.similarity_search(question, k=4)
        context = "\n\n".join([doc.page_content for doc in relevant_docs])
        
        # Create prompt with context
        prompt = f"""Based on the following context from the document, 
        please answer the question. If the answer is not in the context, 
        say so clearly.
        
        Context:
        {context}
        
        Question: {question}
        
        Answer:"""
        
        # Get response from AI
        response = self.client.chat.completions.create(
            model=self.provider,
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": prompt}
            ]
        )
        
        return response.choices[0].message.content

# Streamlit UI
st.title("πŸ“„ PDF Q&A Assistant")
st.write("Upload a PDF and ask questions about its content")

# Initialize the QA system
@st.cache_resource
def init_qa_system():
    return PDFQuestionAnswering()

qa_system = init_qa_system()

# File upload
uploaded_file = st.file_uploader("Choose a PDF file", type="pdf")

if uploaded_file:
    with st.spinner("Processing PDF..."):
        chunk_count = qa_system.process_pdf(uploaded_file)
        st.success(f"PDF processed! Created {chunk_count} text chunks.")
    
    # Question input
    question = st.text_input("Ask a question about the document:")
    
    if question:
        with st.spinner("Generating answer..."):
            answer = qa_system.answer_question(question)
            st.write("### Answer:")
            st.write(answer)

✨ AISuite Integration Points

  • β–ΈLLM Flexibility: Switch between providers for cost/quality optimization
  • β–ΈUnified API: Same code works with GPT-4, Claude, or Gemini
  • β–ΈContext Management: Handles large documents through chunking
  • β–ΈError Handling: Gracefully handles provider failures

Try It Out

Quick Start

  1. 1. Clone the repository:
    git clone https://github.com/andrewyng/aisuite.git cd aisuite/examples
  2. 2. Install dependencies:
    pip install aisuite PyPDF2 langchain faiss-cpu streamlit
  3. 3. Set up API keys:
    export OPENAI_API_KEY="your-key"  # For embeddings and LLM
  4. 4. Run the application:
    streamlit run pdf_qa.py

Extend It

πŸ“ Add Source Citations

Include page numbers and excerpts in answers

# Track source pages
chunks_with_metadata = [
  {"text": chunk, "page": i}
  for i, chunk in enumerate(chunks)
]

πŸ—‚οΈ Multi-Document Support

Query across multiple PDFs simultaneously

# Combine multiple PDFs
for pdf in pdf_files:
  chunks = process_pdf(pdf)
  vector_store.add_texts(chunks)

πŸ’Ύ Persistent Storage

Save processed documents for faster retrieval

# Save vector store
vector_store.save_local("./db")
# Load later
FAISS.load_local("./db", embeddings)

πŸ” Advanced Retrieval

Implement hybrid search with BM25 + embeddings

# Hybrid retrieval
bm25_results = bm25_search(query)
vector_results = vector_search(query)
combined = merge_results(...)