Back to Blog
DockerDjangoDevOpsPythonReact

I Built an AI Hiring Agent That Screens Resumes and Conducts Voice Interviews

CodeWithMuh
Muhammad (CodeWithMuh)Mar 28, 2026 · 7 min read
I Built an AI Hiring Agent That Screens Resumes and Conducts Voice Interviews

Text-to-Speech

Speech synthesis not supported
AI Hiring Agent — Full Recruitment Pipeline

Hiring is broken. Recruiters spend 23 hours screening resumes for a single hire. First-round phone screens eat up entire days. And most companies still do this manually in 2026.

So I built an AI agent that handles the entire pipeline: resume screening, voice interviews, transcript analysis, and candidate ranking — all automated, all open-source.

In this article, I'll walk you through exactly how I built it, the architecture decisions, and how you can deploy it yourself with one Docker Compose command.

What It Does

The AI Hiring Agent automates the first two stages of any hiring pipeline:

  1. Resume Screening — Upload a PDF resume, Claude reads it natively (no parser needed), and scores it 0-100 in under 5 seconds
  2. Voice Interview — Candidates scoring 60+ automatically receive a phone call from an AI interviewer that asks 4 structured questions
  3. Transcript Analysis — After the call, Claude analyzes the transcript and scores communication, technical ability, enthusiasm, and experience
  4. Smart Ranking — Overall score = 60% resume + 40% interview. Score 75+ = shortlisted, under 50 = rejected

The entire flow is automated. A candidate applies, gets screened in seconds, receives a phone call, and gets ranked — all without a human touching anything.

The Tech Stack

ComponentTechnologyWhy
BackendDjango 5 + DRFBattle-tested, great ORM, built-in admin
FrontendNext.js 16 + React 19App Router, server components, fast builds
AI (Screening)Claude SonnetNative PDF reading, structured JSON output
AI (Interviews)Claude via VapiCustom LLM endpoint with agentic tool use
VoiceVapi + ElevenLabsOutbound calls with natural-sounding TTS
DatabasePostgreSQL 16Reliable, UUID primary keys, JSON fields
InfraDocker ComposeOne command deployment, all services orchestrated

How Resume Screening Works

Most resume screeners use PDF parsers like PyPDF2 or pdfplumber, then feed the extracted text to an LLM. This loses formatting, tables, and layout — all things that matter in a resume.

Instead, I send the raw PDF as a base64-encoded document directly to Claude's document mode. Claude reads the PDF natively — headers, tables, bullet points, everything preserved.

The screening prompt asks Claude to score on 4 weighted dimensions:

  • Relevant Experience (40%) — Years and depth of relevant work
  • Skills Match (30%) — How well skills align with job requirements
  • Education (15%) — Degree relevance and institution quality
  • Career Trajectory (15%) — Growth pattern, promotions, trajectory

Claude returns a structured JSON response with the score, 3-5 strengths, red flags (if any), and a recommendation: interview (60+), maybe (40-59), or reject (<40).

The whole screening takes under 5 seconds. Compare that to the 7 minutes a human recruiter spends per resume.

How the AI Voice Interview Works

This is the most interesting part. When a candidate scores 60+, the system automatically triggers an outbound phone call using Vapi.

Here's the flow:

  1. Django calls Vapi's API to initiate an outbound call to the candidate's phone number
  2. Vapi handles the voice transport — speech-to-text (STT) and text-to-speech (TTS via ElevenLabs Rachel voice)
  3. For the AI brain, Vapi calls my custom LLM endpoint at /api/vapi/chat/completions/
  4. This endpoint converts Vapi's OpenAI-format messages to Claude format, runs the conversation through Claude, and returns the response
  5. Claude has access to an end_call tool — it uses this to hang up after all 4 questions are answered

The Interview Questions

Claude asks 4 structured questions, one at a time:

  1. "Tell me about your experience with {key_skill}" — Tailored to the candidate's strongest skill from resume screening
  2. "Describe a challenging project you worked on recently"
  3. "Why are you interested in this role?"
  4. "What is your availability and salary expectation?"

The system prompt instructs Claude to keep responses to 1-2 sentences (it's a phone call, not a chatbot), be conversational, and end the call after all questions are answered. The whole interview takes about 5 minutes.

The Agentic Loop

The custom LLM endpoint implements a proper agentic pattern:

def handle_conversation_turn(messages, candidate_name, key_skill):
    # Convert OpenAI format to Claude format
    claude_messages = openai_messages_to_claude(messages)

    # Call Claude with end_call tool available
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        system=get_interview_system_prompt(candidate_name, key_skill),
        messages=claude_messages,
        tools=[END_CALL_TOOL],
    )

    # Check if Claude wants to end the call
    for block in response.content:
        if block.type == "tool_use" and block.name == "end_call":
            return text_response, True  # Signal Vapi to hang up

    return text_response, False

When Claude calls the end_call tool, the endpoint sets an X-Vapi-End-Call header that tells Vapi to terminate the call gracefully.

Transcript Analysis

After the call ends, Vapi sends a webhook to /api/webhooks/vapi/ with the full transcript. Claude then analyzes it across 4 dimensions, each scored 0-25:

  • Communication (0-25) — Clarity, articulation, coherence
  • Technical (0-25) — Domain knowledge, problem-solving ability
  • Enthusiasm (0-25) — Interest level, energy, cultural fit
  • Experience (0-25) — Depth and relevance of background

Total interview score = sum of all 4 dimensions (0-100). Claude also returns highlights (positive moments), concerns (red flags), and a final recommendation: strong_yes, yes, maybe, or no.

The Scoring Formula

The final candidate ranking combines both stages:

Overall Score = (Resume Score × 0.6) + (Interview Score × 0.4)

If score >= 75 → SHORTLISTED
If score < 50  → REJECTED
If 50-75       → INTERVIEWING (needs human decision)

Why 60/40? Resumes show qualifications, but interviews reveal communication and fit. Weighting resumes slightly higher prevents a charismatic but unqualified candidate from gaming the system.

The Dashboard

The Next.js frontend includes a full recruitment dashboard:

  • Stats Overview — Total candidates, average score, interviewed count, shortlisted count (all with animated counters)
  • Candidate Table — Filterable by status, sortable by score, with status badges and action links
  • Candidate Detail Page — Resume score donut chart, interview score breakdown (4 donut charts), pipeline progress visualization, full transcript with chat-bubble styling, strengths/red flags lists
  • Job Board — Browse open positions, filter by department, see applicant counts
  • Multi-Step Application — Personal info → resume upload → time slot picker → review → submit

The frontend also has a mock data fallback — if the Django API is down, it renders demo data so you can still see the UI.

Deploy It Yourself

The entire project is open-source. Here's how to get it running:

# Clone the repo
git clone https://github.com/codewithmuh/ai-hiring-agent.git
cd ai-hiring-agent

# Configure environment
cp backend/.env.example backend/.env
# Add: ANTHROPIC_API_KEY, VAPI_API_KEY, ELEVENLABS keys, DB credentials

# Start everything
docker compose up -d --build

# Seed demo data (5 jobs, 10 candidates, 35 interview slots)
docker compose exec backend python manage.py migrate
docker compose exec backend python manage.py seed_data

# For voice interviews, expose backend publicly
ngrok http 8000
# Set BACKEND_PUBLIC_URL in .env to ngrok URL

Frontend runs on localhost:3001, backend API on localhost:8000. Total infrastructure cost for a small team: roughly $50/month (Claude API + Vapi calls + a small server).

What I Learned Building This

Claude's PDF reading is underrated. Sending base64 PDFs directly eliminates an entire category of parsing bugs. No more PyPDF2 extraction failures on complex layouts.

Voice AI needs short responses. The biggest mistake in voice agent design is having the AI talk too much. Phone conversations need 1-2 sentence responses, not paragraphs.

The agentic tool pattern matters for call control. Without the end_call tool, there's no clean way to terminate a Vapi call from the AI side. The tool-use pattern gives Claude explicit control over call flow.

Weighted scoring prevents gaming. If you only score interviews, a charming candidate with no qualifications can pass. If you only score resumes, you miss communication red flags. The 60/40 blend catches both.

What's Next

This is a v1. Some things I'd like to add:

  • Email notifications to candidates at each stage
  • Slack/Teams integration for hiring manager alerts
  • Multi-language interview support
  • Interview recording playback in the dashboard
  • Analytics: conversion funnel, time-to-hire, score distributions

The full source code is on GitHub. Star it if you find it useful, and feel free to fork and customize for your own hiring needs.

Found this helpful? Share it with a developer friend who's building AI agents.

Written by Muhammad Rashid (CodeWithMuh) — I build AI agents, automate developer workflows, and deploy them to production. Follow me on LinkedIn for daily AI insights.

Share:

Comments

Sign in to leave a comment

No comments yet. Be the first to comment!