How It Works

How We Built a Map of the Entire U.S. Labor Market

Behind the data infrastructure that powers PathScorer: 1,000+ occupations, 55,000 job titles, federal salary records, and a skill graph that connects them.

The hardest part of building PathScorer wasn’t the matching algorithm. It wasn’t the resume parser, the scoring logic, or the geographic salary comparison engine.

It was the map.

Before you can match someone’s skill profile against the labor market, you need a model of the labor market that’s actually worth matching against. That means answering a deceptively complex question: what is the complete structure of work in the United States, described at a level of detail fine enough to be useful?

This is what we spent the most time on. Here’s how it works.

The raw material: two federal databases nobody talks about

The U.S. government has quietly assembled some of the most detailed labor market data in the world. Two databases sit at the foundation of everything PathScorer does.

The first is O*NET, the Occupational Information Network, maintained by the Department of Labor. O*NET describes over 1,000 distinct occupations, each characterized across dozens of dimensions: the skills required, the knowledge domains involved, the physical and cognitive demands, the work context, the technology tools used, the educational requirements, and the typical activities performed day to day. The data comes from surveys of people actually doing each job, updated on a rolling basis, with occupational analysts reviewing and validating the results.

The second is the Bureau of Labor Statistics Occupational Employment and Wage Statistics program, which surveys roughly 1.1 million businesses twice a year and publishes median wage data for every occupation at the national level, at the state level, and for more than 600 metropolitan and nonmetropolitan areas. This is where the salary layer comes from. Not crowdsourced estimates, not self-reported figures from a job board, but federal wage data derived from employer payroll records at significant scale.

Neither database was built for the purpose of individual career matching. They were built for workforce policy, labor market research, and economic analysis. Repurposing them for a consumer-facing product required building a considerable amount of infrastructure on top of them.

The occupational taxonomy: 1,000+ isn’t a round number

O*NET’s 1,000+ occupations are not an arbitrary count. They reflect a genuine attempt to carve the labor market at its joints, distinguishing occupations that require meaningfully different skill profiles while aggregating roles that are substantially similar.

The taxonomy is hierarchical. At the top level are 23 major occupational groups, things like “Business and Financial Operations” or “Healthcare Practitioners and Technical.” These break into progressively finer categories, down to individual occupation codes. A Registered Nurse is 29-1141.00. A Nurse Practitioner is 29-1171.00. A Clinical Nurse Specialist is 29-1141.04.

The decimal structure matters. Occupations sharing the same six-digit base are close relatives in skill space. Occupations in the same two-digit major group are cousins. The taxonomy encodes a rough skill proximity structure that’s useful as a starting framework, even before you run any calculations.

Major Group: 29 — Healthcare Practitioners

Minor Group: 29-1000 — Health Diagnosing/Treating

Broad: 29-1140 — Registered Nurses

Detailed: 29-1141.00 — Registered Nurses

Detailed: 29-1141.04 — Clinical Nurse Specialists

Broad: 29-1170 — Nurse Practitioners

Detailed: 29-1171.00 — Nurse Practitioners

The taxonomy is not perfect. It lags emerging roles by several years, which is an inherent limitation of any system built on survey research and government publication cycles. We’ll come back to how that gets handled.

Curious what the map finds for your profile? PathScorer runs on O*NET, BLS wage data, and a skill graph covering 1,000+ occupations. Two minutes, free.

Score my career — free

The problem with 1,000 occupations: 55,000 job titles

Here’s where things get complicated.

Real job postings don’t use O*NET occupation titles. They use whatever title the hiring company chose to print on the requisition. And companies have not coordinated on this.

“Account Executive,” “Sales Representative,” “Business Development Manager,” “Revenue Growth Specialist,” “Client Partnership Lead,” and “Commercial Account Manager” are all titles that appear in job postings for roles that map to the same small cluster of O*NET occupations. Conversely, “Software Engineer” can map to a dozen different O*NET occupations depending on what the role actually involves.

The BLS estimates there are approximately 55,000 distinct job titles in active use across the U.S. labor market. They all need to map somewhere in the 1,000-occupation taxonomy.

Building that mapping is not a one-time project. It’s ongoing maintenance, because the pool of active job titles shifts constantly as companies invent new vocabulary, rebrand existing roles, and create genuinely new positions that don’t fit cleanly into existing categories.

The mapping layer is built using a combination of approaches. For well-established titles, the O*NET crosswalk provides direct correspondence. For novel or ambiguous titles, the mapping uses the skill requirements extracted from job postings as the primary signal: if the actual required skills cluster around a particular occupation’s profile, the title maps there regardless of what it’s called.

55,000 job titles

↓

Title normalization layer

aliases, variants, compound titles

↓

Skill extraction from posting text

required skills, preferred skills, responsibilities, tools

↓

O*NET occupation mapping

1,000+ occupation nodes

↓

BLS salary attachment

by occupation × geography

This is the foundation of the map. Every job title in active use sits somewhere in the taxonomy. Every occupation in the taxonomy has a salary record attached to it. The structure is navigable.

The skill matrix: 35 dimensions across 1,000 occupations

The core data structure for matching is a matrix. Rows are occupations. Columns are skill dimensions. Each cell contains a score representing how important that skill is to competent performance in that occupation, rated on a standardized scale.

O*NET provides this matrix in raw form, collected through structured surveys of job incumbents and occupational analysts. The 35 dimensions cover the range of what distinguishes one occupation from another:

Skills (cross-occupation)

Active ListeningSpeakingReading ComprehensionWritingMathematicsScienceCritical ThinkingActive LearningLearning StrategiesMonitoringSocial PerceptivenessCoordinationPersuasionNegotiationInstructingService OrientationComplex Problem SolvingOperations AnalysisTechnology DesignEquipment SelectionInstallationProgrammingOperations MonitoringOperation and ControlEquipment MaintenanceTroubleshootingQuality Control AnalysisJudgment & Decision MakingSystems AnalysisSystems EvaluationTime ManagementMgmt of Financial ResourcesMgmt of Material ResourcesMgmt of Personnel Resources

A few examples of what this looks like across different occupations:

	Persuasion	Programming	Equip. Maint.	Coordination
Sales Manager	88	12	8	82
Software Engineer	31	94	22	61
Aircraft Mechanic	18	24	97	74
High School Teacher	61	19	9	78

The matrix encodes the actual skill topology of the labor market. Occupations that require similar mixes of these dimensions sit close together in skill space regardless of what industry they’re in or what title vocabulary surrounds them. That proximity structure is what makes cross-sector matching possible.

The salary layer: geography as a dimension

Attaching salary data to the occupation matrix introduces a third dimension beyond skill profile and industry category: geography.

BLS wage data is available at four geographic levels: national, state, metropolitan statistical area (MSA), and nonmetropolitan area. For the MSA level, the data covers over 600 distinct labor markets across the country.

The salary layer isn’t a single number per occupation. It’s a distribution, with separate values at the 10th, 25th, median, 75th, and 90th percentile for each occupation in each geography. This matters because the interesting comparison is often not “what does this occupation pay nationally” but “what does the 75th percentile look like in the highest-paying metro for this occupation versus my current market.”

Occupation: Registered Nurse (29-1141.00)

National median: $81,220

By MSA (selected)

Columbus, OH$66,430

Chicago, IL$77,890

Seattle, WA$97,860

San Jose, CA$133,340

San Francisco, CA$151,640

Percentile spread (national)

P10$59,450

P25$69,830

P50$81,220

P75$97,180

P90$120,550

The geographic salary layer is what makes relocation analysis possible. Given a user’s current location, target occupation, and stated openness to relocation, the system can calculate not just what the occupation pays here but what it pays in the places most likely to represent a material improvement.

The skill graph: edges matter as much as nodes

A matrix is a good data structure for matching. It’s not the best structure for understanding relationships between occupations, which is a different and equally important problem.

For that, the occupation taxonomy gets converted into a graph. Nodes are occupations. Edges connect occupations that share significant skill overlap, weighted by the strength of that overlap. The resulting structure looks nothing like the hierarchical taxonomy it was derived from.

O*NET Taxonomy (hierarchical)

Healthcare

Nurses

Practitioners

Business

Managers

Analysts

Skill Graph (by actual overlap)

RN — NP (strong)

RN — Healthcare Social Worker (moderate)

Sales Mgr — Marketing Mgr (strong)

Sales Mgr — Ops Mgr (moderate)

Tech Writer — Instructional Designer (strong)

Tech Writer — Software Dev (weak-mod)

The graph structure exposes things the taxonomy hides. A Technical Writer and a Software Developer are in completely different branches of the O*NET hierarchy, different major groups, different career ladders. But they share meaningful overlap in systems thinking, logical reasoning, and communication skills. The graph edge between them is weak but real, and it’s exactly the kind of cross-category connection that conventional career thinking misses.

The graph also powers the “adjacent roles” analysis: given an occupation a user currently occupies, what are the nearest neighbors in skill space, and what’s the salary differential between the current position and each neighbor? The edges with high skill overlap and high salary uplift are the interesting ones.

Handling what the data doesn’t cover

Federal occupational data has two well-known limitations.

The first is lag. O*NET and BLS data takes time to collect, validate, and publish. Emerging roles often don’t appear in the taxonomy until they’ve been widespread for several years. AI prompt engineer, for example, or machine learning operations engineer, or climate risk analyst. These roles exist and have real labor markets, but their formal O*NET profiles are either absent or underdeveloped.

The approach here is to map novel titles to the closest available occupation profiles based on skill requirements extracted from actual job postings, then flag them as “emerging role” in the output so users understand the data confidence is lower. A best available approximation is more useful than a blank.

The second limitation is granularity. Some occupation categories in O*NET are broader than they ideally should be for matching purposes. “Software Developers” is one code that covers frontend engineers, backend engineers, systems programmers, and embedded software engineers, roles that have somewhat different skill profiles and meaningfully different compensation ranges.

For broadly defined occupations like this, the skill profiles get supplemented with data extracted from actual job postings, which provide finer-grained signal about the specific skill clusters within the category. The matching can then operate at a sub-occupation level even when the formal taxonomy doesn’t support it.

What the map looks like in practice

Put together, the infrastructure looks like this:

Raw data sources

•O*NET occupation database — 1,000+ occupations, 35 skill dimensions
•BLS OEWS wage surveys — salary by occupation × geography
•Job posting corpus — 55,000 title variants, real-time skill signals

Processing layers

•Title normalization — 55,000 titles → 1,000+ occupation nodes
•Skill matrix construction — occupation × skill dimension scores
•Salary layer attachment — occupation × MSA × percentile
•Occupation graph generation — skill-weighted edges between occupations
•Automation risk scoring — task-type analysis per occupation
•Growth projection layer — BLS 10-year employment projections

Output structures

•Occupation vectors — for cosine similarity matching
•Skill gap matrices — for transition cost analysis
•Salary comparison tables — for geographic optimization
•Career path graphs — for "how do I get there" analysis

The user’s skill vector, extracted from their resume and supplemented with hidden skills they add during intake, is matched against the occupation vectors in this map. The result is a ranked list of occupations with associated salary data, skill gap analysis, and transition paths, all grounded in the same federal data that labor economists and workforce policy researchers rely on.

Why this infrastructure matters for the output

The map determines what’s possible for the matching algorithm. A system built on a thin or incomplete model of the labor market produces thin or incomplete recommendations, regardless of how sophisticated the matching logic is.

The depth of the O*NET skill profiles is what makes cross-sector discovery work. Without 35-dimensional occupation vectors, the system can’t recognize that a logistics coordinator and an operations analyst are close neighbors in skill space even though they carry different titles in different industries.

The BLS salary layer at MSA granularity is what makes geographic optimization concrete rather than vague. “You could earn more on the coasts” is not useful. “Nurse practitioners in San Jose earn $133,000 compared to $89,000 in Columbus, and here are the three highest-paying metros within driving distance of your current location” is.

The job title mapping across 55,000 variants is what connects the map to reality. The labor market doesn’t run on O*NET codes. It runs on job postings with idiosyncratic titles. Without the translation layer, the map is accurate but unreachable from where people actually start their searches.

The map is the infrastructure. Everything else is analysis built on top of it.

See what the map finds for you

PathScorer runs on O*NET, BLS wage data, and a skill graph covering 1,000+ occupations and 55,000 job titles. Two minutes, free to try.

Score my career — free

Keep reading

How PathScorer Scores 1,000+ Careers in Two Minutes How We Built the Career Matching Architecture at PathScorer Hidden Skills: The Data Your Resume Doesn't Show

labor market data infrastructureO*NET skill taxonomyBLS occupational wage datacareer matching algorithmjob title mapping