Resources · Science

The Science Behind
Capability-Based Hiring

Why credentials fail, what actually predicts job performance, and how structured proof of work changes the signal-to-noise ratio in hiring decisions.

8 min readResearch-backedLast updated May 2026
01The Problem

Why credentials are broken as a hiring signal

Degrees, job titles, and CV bullet points share a fundamental flaw: they measure proximity to capability, not capability itself.

A degree from a prestigious university tells you that someone was admitted, paid tuition, and completed the required coursework — not that they can do the work you're hiring them to do. A job title tells you what someone was called, not what they built. A CV bullet point claiming "increased revenue by 40%" is unverifiable, uncontextualised, and entirely self-reported.

The hiring research literature has been clear on this for decades. A landmark meta-analysis by Schmidt and Hunter (1998), covering 85 years of selection research, found that unstructured interviews and educational credentials are among the weakest predictors of job performance — yet they remain the dominant hiring tools in most organizations.

The validity coefficient of unstructured interviews is approximately 0.18 on a 0-1 scale, where 1.0 would be perfect prediction. By comparison, work sample tests score 0.54, and structured behavioral interviews reach 0.51. Most hiring still relies on methods in the 0.10-0.20 range.

The problem is compounded by reference inflation — the practice of providing only hand-picked referees, who provide almost uniformly positive assessments. Research shows that reference checks have a predictive validity of around 0.26, but this drops sharply when references are self-selected (which they almost always are in standard hiring).

CV claims are unverified

Studies show 56-75% of CVs contain at least one significant inaccuracy. There is no standard for what "led", "owned", or "drove" means.

References are biased by design

Candidates choose their own referees. The system is structurally incapable of providing independent verification.

Credentials measure proximity

A degree proves institutional access, not capability. The correlation between degree prestige and job performance is consistently weak across research.

Skill tests lack context

Knowing how to answer a case study is not the same as having shipped under real constraints. Abstract tests miss applied capability entirely.

02The Evidence

What actually predicts job performance

The research is unambiguous: the closer a selection method is to actual work, the better it predicts performance.

The Schmidt & Hunter hierarchy of predictive validity places work sample tests at the top (validity ~0.54), followed by structured behavioral interviews (~0.51) and cognitive ability tests combined with structured assessment (~0.63). What all of these have in common is that they require candidates to demonstrate capability, rather than describe it.

The implication for hiring is direct: the best evidence of future performance is structured past performance — specific work, in specific contexts, with measurable outcomes, verified by people who were there.

Selection MethodPredictive Validity (r)Used in Veryfy
CV screening (unstructured)~0.18 — low
Reference checks (self-selected)~0.26 — moderate
Unstructured interview~0.18 — low
Structured interview~0.51 — strong
Work sample test~0.54 — strong
Structured past performance (Veryfy)~0.55–0.62 (estimated)✓ Core model

Veryfy's model is designed around structured past performance — not CV claims, but structured work entries with defined context, contribution, measurable outcome, and multi-source verification. This is the only selection input that combines the predictive validity of work samples with the breadth of a full career history.

03The Model

Structured proof of work: the 5-layer architecture

Each work entry in a Veryfy Passport is structured around five mandatory fields. This structure is what makes entries comparable, searchable, and verifiable.

Context

Company, team, time period, scope of role. Places the work in a verifiable situation.

Contribution

Specific role in the work. What the candidate personally did vs. the team.

Artifacts

Linked evidence: repos, designs, documents, platform integrations.

Outcome

Measurable result. Tagged to outcome categories for searchability.

Verification

Third-party signals confirming the entry is accurate.

This 5-layer structure means that every entry in a Veryfy Passport answers the same set of questions. It's not a portfolio — it's structured evidence. A hiring manager reviewing 20 passports is comparing apples to apples, not sifting through 20 different formats of self-promotion.

The outcome tagging layer is particularly powerful. Tags like +18pp conversion, 0-to-1 product, or team of 8 are machine-readable, enabling semantic search across the full candidate pool without requiring a human to read every entry.

04Trust Architecture

Signal weights and the verification trust hierarchy

Not all verification is equal. Veryfy weights signals according to their structural independence from the candidate.

The core principle is simple: the less the candidate can influence or select the verifier, the higher the trust weight of that signal. Platform Pull — automated data from GitHub, Figma, or analytics platforms — carries the highest weight because the candidate cannot fabricate a commit history or inflate a conversion rate retroactively.

40%

Platform Pull

Automated verification from integrated platforms (GitHub, Figma, Jira, Analytics). Cannot be coached or staged.

Highest
30%

Manager Stamp

Named manager confirms the role, contribution, and outcome via one-click confirmation. Manager's own trust score modulates the weight.

High
20%

Peer Endorsement

Collaborators on the same project confirm involvement. Cross-reference checks prevent multiple people claiming sole ownership.

Medium
10%

Self-Declaration

The baseline layer — the candidate's own structured claim. Carries low weight alone, but acts as the foundation all other signals build on.

Base

Signal weights are not static. They decay over time (older verifications from people who've since left the company carry less weight), and they are subject to anomaly detection that flags sudden coordinated verification spikes. The system is designed to be expensive to game and cheap to be honest on.

05The Score

The LeaDe Score: a composite of five capability signals

The LeaDe Score is not a rating — it's a weighted composite of five distinct capability dimensions, each independently measured.

LeaDe Score Components
Depth
30%
Task Performance
25%
Recency
20%
Verification Density
15%
Outcome Quality
10%

Depth captures the breadth and quality of a candidate's strongest domain, weighted by the significance of outcomes. Recency applies temporal decay — a 90-day active profile scores higher than identical work done three years ago, reflecting the reality that skills and context change. Verification Density measures what percentage of entries are independently verified, not just self-declared.

The score is designed to be informative, not decisive. A score of 84 vs. 86 is not meaningful; a score of 84 vs. 54 is. The real value is in the composition of the score — a candidate with high Depth but low Recency tells a very different story than one with high Recency but low Verification Density.

Veryfy explicitly discourages using the score as a filter threshold. The score is a navigation tool, not a gate. It directs attention to the structured evidence beneath it — the actual work entries, artifacts, and verification signals that form the real basis of a hiring decision.

See capability-based hiring in practice

Browse verified candidates, explore real Passports, and see what structured proof of work looks like on a live profile.