Custom Scoring Rubrics

Gradient’s default rubric (v5) evaluates candidates across five categories (Correctness, Deliverable Quality, Reflection Quality, AI Fluency, Prioritized Skills) totaling 100 points. You can customize most of it to match your organization’s priorities. See Scoring for what each category measures.

Default rubric

The overall score out of 100 is a weighted blend of the five categories. The category weights are fixed by the rubric version; what you customize is the sub-criteria inside each category.

Category	Weight	Scoring method	Sub-criteria customizable
Correctness	20%	Deterministic	Yes
Deliverable Quality	20%	LLM Judge	Yes
Reflection Quality	10%	LLM Judge	Yes
AI Fluency	25%	Hybrid	No (managed by Gradient)
Prioritized Skills	25%	LLM Judge	Yes (built from the role’s priority skills)

Each category contains sub-criteria. Every sub-criterion carries a point value, a short description, and a 0-4 anchored scale (its anchors) that defines what each score from 0 to 4 means. The anchors are the scale the judge actually grades against, and they are what calibration reads and sharpens. Prioritized Skills has one sub-criterion per approved priority skill on the role.

AI Fluency is locked. It is centrally managed by Gradient and cannot be edited or reweighted. Edits to it are ignored on write.

Customizing the rubric

Update an assessment’s scoring rubric via the API. Send the full scoring_rubric object with your changes:

curl -X PATCH "https://app.trygradient.ai/api/assessments/ASSESSMENT_ID" \
  -H "Authorization: Bearer gai_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "scoring_rubric": {
      "version": "5.0",
      "totalPoints": 100,
      "categories": [
        {
          "id": "correctness_v5",
          "name": "Correctness",
          "points": 33,
          "scoringMethod": "deterministic",
          "subCriteria": [
            {
              "id": "fact_accuracy",
              "name": "Fact Accuracy",
              "points": 15,
              "description": "Are the facts in the deliverable correct against the ground truth?",
              "enabled": true,
              "anchors": [
                { "level": 0, "descriptor": "No facts correct, or key figures wrong." },
                { "level": 1, "descriptor": "A few facts correct; several material errors." },
                { "level": 2, "descriptor": "Most facts correct; some errors remain." },
                { "level": 3, "descriptor": "Nearly all facts correct; at most a minor slip." },
                { "level": 4, "descriptor": "Every fact correct against the ground truth." }
              ]
            },
            {
              "id": "required_elements",
              "name": "Required Elements Present",
              "points": 8,
              "description": "Did the candidate include everything the brief asked for?",
              "enabled": true,
              "anchors": [
                { "level": 0, "descriptor": "None of the required elements present." },
                { "level": 1, "descriptor": "One required element present." },
                { "level": 2, "descriptor": "About half the required elements present." },
                { "level": 3, "descriptor": "Most required elements present." },
                { "level": 4, "descriptor": "Every required element present." }
              ]
            },
            {
              "id": "judgment",
              "name": "Judgment",
              "points": 10,
              "description": "Did the candidate make sound judgment calls on the task?",
              "enabled": true,
              "anchors": [
                { "level": 0, "descriptor": "Took the obvious-but-wrong answer on every judgment call." },
                { "level": 1, "descriptor": "Got one call right, missed the rest." },
                { "level": 2, "descriptor": "Got about half the calls right." },
                { "level": 3, "descriptor": "Got most calls right, missed one." },
                { "level": 4, "descriptor": "Made the correct call every time." }
              ]
            }
          ]
        }
      ]
    }
  }'

Scoring is frozen once an assessment is published or has candidates, so lock in the rubric before you go live. To change scoring after that, duplicate the assessment.

Anchored 0-4 scales

Every sub-criterion should carry an anchors ladder: five descriptors, one for each score from 0 to 4, where 0 is absent or nothing and 4 is exceptional.

"anchors": [
  { "level": 0, "descriptor": "None of the required elements present." },
  { "level": 1, "descriptor": "One required element present." },
  { "level": 2, "descriptor": "About half the required elements present." },
  { "level": 3, "descriptor": "Most required elements present." },
  { "level": 4, "descriptor": "Every required element present." }
]

The anchors are not decoration. They matter in two ways:

They are the scoring scale. The judge grades a sub-criterion against its anchor descriptors, not its one-line description. The description frames the criterion; the anchors define what each level actually means. A criterion with vague or missing anchors produces vague scores.
They are what calibration tunes. When a reviewer grades a candidate on the 0-4 ladder and disagrees with the AI, calibration sharpens these anchor descriptors so future scoring matches your standard. A sub-criterion with no anchors cannot be calibrated, because there is no ladder to sharpen.

Write anchors as observable, mutually exclusive descriptions of the work, not as adjectives. “Every required element present” beats “excellent.” The clearer the rung, the more consistent the score and the less calibration you need.

Scoring methods

Sub-criteria are scored by one of these methods:

Deterministic

Ground-truth checks against known-correct facts and required elements. Used for Correctness.

LLM Judge

An AI evaluator assesses quality against your rubric criteria. Best for subjective qualities like insight and reflection.

Event Analysis

Automated analysis of candidate behavior patterns. Best for measurable actions like whether a figure was verified before use.

Hybrid

Combines approaches. AI Fluency is graded this way, centrally, by Gradient.

Tips for effective rubrics

Keep total points at 100. The scoring engine normalizes to 100 points for percentile calculations. Using a different total will produce unexpected percentile rankings.

Weight what matters most. If correctness matters more than polish for your role, allocate points accordingly.
Write specific sub-criteria descriptions. The LLM judge uses these to evaluate. Vague descriptions produce vague scores.
Use customInstructions on sub-criteria to give the judge role-specific context (for example, “For a data analyst role, prioritize accuracy of calculations over visual design”). This is also the channel that calibration sharpens automatically.
Disable sub-criteria you don’t need. Set enabled: false rather than removing them, so you can re-enable later.
Do not try to edit AI Fluency. It is locked; changes are ignored.

​Custom Scoring Rubrics

​Default rubric

​Customizing the rubric

​Anchored 0-4 scales

​Scoring methods

Deterministic

LLM Judge

Event Analysis

Hybrid

​Tips for effective rubrics

Custom Scoring Rubrics

Default rubric

Customizing the rubric

Anchored 0-4 scales

Scoring methods

Tips for effective rubrics