Qualifications

Verify that operators and AI agents are competent to handle specific workflows before routing real interactions.

Intermediate
12 min read

Qualifications

Qualifications let you verify that a member -- human or AI -- is ready to handle a specific workflow. Before an operator goes live or an agent is assigned real interactions, they complete requirements (knowledge questions or simulated chat scenarios) that are graded automatically and reviewed by an admin.

This ensures that every person or agent interacting with patients meets a measurable standard of competency before they handle real cases.

Key Concepts

Qualification

A qualification defines what "competent" means for a particular workflow. Each workflow can have at most one active qualification.

A qualification has:

  • Name and description
  • Linked workflow -- which workflow this qualification applies to
  • Requirements -- what a member must do to qualify
  • Auto-advance -- whether members automatically progress when all requirements pass

Status Lifecycle

Members progress through these statuses:

StatusMeaningCan Handle Real Interactions?
EnrolledWorking on requirementsNo
SubmittedAll requirements passedNo
ProvisionalAdmin approved submitted workYes (with monitoring)
QualifiedAdmin confirmed real-world readinessYes
SuspendedTemporarily removedNo

Key transitions:

  • Enrolled -> Submitted -- Automatic when all requirements have passing submissions (if auto-advance is on)
  • Submitted -> Provisional -- Admin reviews and approves the submitted work
  • Provisional -> Qualified -- Admin confirms the member performs well on real interactions
  • Any -> Suspended -- Admin action (requires a reason)

The Provisional -> Qualified step is always a human decision. Passing automated tests does not prove real-world readiness.

Requirements

Requirements define what a member must complete. Two types:

Question requirements -- The member answers a written knowledge question. Use these for policy knowledge, protocol understanding, or situational judgment.

Chat requirements -- The member completes a simulated conversation through the workflow. A test scenario is set up with a simulated persona, and an AI plays the role of the caller/patient. Use these for practical skill evaluation.

Requirements are ordered and identified by a unique slug within the qualification.

Criteria

Each requirement is graded by one or more criteria:

Formula criteria -- Automatic pass/fail using CEL (Common Expression Language) expressions. For example: answer.contains("HIPAA") for a question, or assignment.outcome == "success" for a chat scenario. See Writing Formula Criteria for the full reference.

AI Evaluation criteria -- LLM-evaluated using a prompt template. You write a Jinja2 prompt describing what to evaluate, and an AI scores the response on a 0.0--1.0 scale. See Writing AI Evaluation Criteria for the full reference.

Each criterion has a weight (for overall scoring) and a passing threshold (for AI Evaluation criteria).

Setting Up Qualifications

1. Create a Qualification

  1. Go to People -> Qualifications
  2. Click Create Qualification
  3. Enter a name, optional description, and select the workflow
  4. Choose a color for visual identification

2. Add Requirements

  1. Open the qualification and go to the Requirements tab
  2. Click Add Requirement
  3. Configure:
    • Slug -- Unique identifier (e.g., identity-verification)
    • Name -- Display name
    • Type -- Question or Chat
    • For questions: enter the Question Text
    • For chats: configure the Scenario Definition (see below)

3. Add Criteria

For each requirement:

  1. Click Add Criterion
  2. Choose Formula or AI Evaluation
  3. For Formula criteria: write a CEL expression (see Writing Formula Criteria)
  4. For AI Evaluation criteria: write an evaluation prompt template (see Writing AI Evaluation Criteria)
  5. Set the weight and passing threshold

4. Enroll Members

  1. Go to a member's profile -> Qualifications tab
  2. Click Assign Qualification
  3. The member starts in Enrolled status

Chat Scenario Definition

Chat requirements use a scenario definition to set up the simulated environment:

yaml
  • persona -- Defines the simulated caller. The identity field is the system prompt that drives the AI playing the caller role. Write it in second person ("You are...") and include personality traits, goals, and constraints that make the scenario realistic.
  • dataRecords -- Pre-populated data records for the test member, keyed by data type slug. These appear in the system as if they already existed, so the agent or operator can look them up.
  • labels -- Labels applied to the test member before the scenario starts.

The scenario definition only sets up the test environment. Pass/fail evaluation is handled by the criteria you define on the requirement.

Writing Formula Criteria (CEL Expressions)

Formula criteria use CEL (Common Expression Language) expressions that evaluate to true (pass) or false (fail). The available context variables depend on whether the requirement is a question or a chat.

Context Variables for Question Requirements

VariableTypeDescription
answerstringThe member's answer text
questionstringThe requirement's question text
memberobject{id, name, labels, form_data} -- the member being evaluated

Context Variables for Chat Requirements

Aggregate variables

These provide summary information about the completed chat:

VariableTypeDescription
assignmentobject{status, outcome, is_test} -- the assignment result
chatobject{channel, message_count, duration_seconds} -- conversation stats
assignment_tasksobject{visited: [...], visited_count} -- which tasks (steps) were visited
memberobject{id, name, labels, form_data} -- the member being evaluated

The messages list

The messages variable is a normalized sequential list of ALL messages in the conversation. This is the most powerful context variable for chat evaluation -- it lets you inspect exactly what happened, in what order.

Each message in the list has the following fields. All fields are always present (empty string or empty list when not applicable), so you can safely access any field without null checks.

FieldTypeDescription
indexintPosition in the conversation (0-based)
rolestring"user", "assistant", "operator", or "tool_result"
contentstringThe text content of the message
tool_callslistTool calls made in this message. Each has {name, arguments}
tool_namestringFor tool_result messages, the name of the tool that produced this result

Example CEL Expressions

Question criteria

Check answer length:

cel

Check for required keywords:

cel

Combine length and content checks:

cel

Chat outcome checks

Verify the assignment completed successfully:

cel

Require a minimum conversation length:

cel

Require the conversation finished within a time limit:

cel

Verify a specific task (step) was visited during the conversation:

cel

Tool call existence

Check that the agent called a specific tool at any point:

cel

Tool call with specific arguments

Verify a tool was called with particular argument values:

cel

Ordering -- tool A before tool B

Verify that identity was checked before medical info was shared:

cel

Negative assertions

Forbidden tool -- the agent must never escalate:

cel

Forbidden word (case-sensitive):

cel

Case-insensitive check via regex:

cel

Only check assistant messages for forbidden content:

cel

Tool result checks

Verify that a tool returned a specific result:

cel

Counting

Limit how many times a tool was called (e.g., no more than 3 searches):

cel

Combined real-world example

A complete criterion that checks multiple behaviors at once -- the agent must look up the appointment, then reschedule it, confirm the change to the caller, and never escalate:

cel

CEL Quick Reference

FunctionDescription
size(list) / size(string)Returns the length of a list or string
string.contains("substr")Returns true if the string contains the substring
string.matches("regex")Returns true if the string matches the regex. Supports (?i) for case-insensitive matching
list.exists(x, condition)Returns true if any element in the list satisfies the condition
list.all(x, condition)Returns true if every element in the list satisfies the condition
list.filter(x, condition)Returns a new list containing only elements that satisfy the condition
"value" in listReturns true if the value is contained in the list

Writing AI Evaluation Criteria (Jinja2 Templates)

AI Evaluation criteria use a Jinja2 prompt template that is rendered with context variables and sent to an LLM. The LLM returns a structured evaluation: a score (0.0--1.0), a pass/fail determination based on the passing threshold you set, and written reasoning.

Use AI Evaluation criteria when the judgment is too nuanced for a formula -- things like empathy, communication quality, clinical accuracy, or adherence to complex protocols.

Available Template Variables

For question requirements

VariableDescription
{{ requirement.name }}The requirement's display name
{{ requirement.question }}The question text that was asked
{{ submission.answer }}The member's submitted answer
{{ member.name }}The member's name
{{ member.labels }}Labels assigned to the member
{{ answers.other_requirement_slug }}The member's answer to another requirement (by slug). Useful for cross-referencing.

For chat requirements

VariableDescription
{{ transcript }}List of message dicts from the conversation
{{ assignment.status }}The assignment's final status
{{ assignment.outcome }}The assignment's outcome (e.g., "success", "failure")
{{ requirement.scenario }}The scenario definition YAML
{{ member.name }}The member's name

Example Prompts

Question evaluation

jinja2

Chat evaluation

jinja2

Cross-referencing other answers

You can reference the member's answers to other requirements by slug. This is useful for evaluating consistency between what someone says they will do and what they actually do in a simulation.

jinja2

Evaluation Pipeline

When a member submits an answer or completes a chat scenario:

  1. Formula criteria run first (CEL auto-evaluation)
  2. If any Formula criterion fails -> Auto Failed (AI Evaluation criteria are skipped)
  3. AI Evaluation criteria run next (LLM evaluation)
  4. If AI Evaluation criteria exist -> Pending Review (needs human review)
  5. If only Formula criteria and all pass -> Auto Passed
  6. An overall score is computed as a weighted average of all criterion scores

Human Review

Admins can review any submission:

  1. Go to the qualification -> Submissions tab
  2. Click a submission to see criterion results
  3. Approve or Reject the submission
  4. Optionally add review notes
  5. Override individual criterion scores if needed

After approval, if auto-advance is enabled and all requirements now have passing submissions, the member automatically advances from Enrolled to Submitted.

Permissions

ActionRequired Scope
View qualifications and submissionsmembers:read
Create/edit qualifications, requirements, criteriamembers:write
Enroll members, review submissionsmembers:write
Archive/delete qualificationsmembers:admin

Tips

  • Start with Formula criteria for clear-cut requirements (e.g., "assignment must complete successfully"), then add AI Evaluation criteria for nuanced evaluation.
  • Use messages.exists() for tool call checks instead of relying solely on aggregate variables -- it gives you ordering context and argument-level inspection.
  • Start simple. assignment.outcome == "success" catches most chat failures. Add more specific criteria only when you need them.
  • Use (?i) in matches() for case-insensitive word checks in CEL expressions.
  • Test your CEL expressions against sample data before deploying them to a live qualification.
  • Use auto-advance to reduce manual work -- members progress automatically when requirements pass.
  • Provisional status is your safety net -- let members handle real interactions under monitoring before granting full qualification.
  • One qualification per workflow keeps things simple -- if a workflow needs different skill levels, use different requirements within one qualification.
  • Review AI Evaluation criteria results carefully -- LLM evaluation is helpful but not infallible; the human review step exists for a reason.