Parse Job Postings into Hiring Briefs with ScrapeGraphAI

View source

Build a .NET agent that reads public job postings and turns them into structured role briefs with requirements, signals, and next steps.

Foundational modules skipped

You've skipped some modules in this path. For a better understanding of the concepts used here, we recommend following the sequence.

Go to first skipped module

Overview

Hiring pages are packed with useful information, but they are rarely shaped for quick decisions. A recruiter wants intake notes. A candidate wants preparation guidance. A hiring manager wants a quick check on role scope, seniority, and required skills. Everyone is looking at the same public job posting, but each person needs it turned into a practical brief.

In this tutorial, you will build a Job Page Parser with Microsoft Agent Framework and ScrapeGraphAI.AgentFramework. The agent takes a public job posting URL, reads or extracts the important role facts, and returns a concise hiring brief with a source URL.

The workflow is intentionally simple:

  1. Provide a public job posting URL.
  2. Choose direct extraction or page reading.
  3. Preserve missing or uncertain fields.
  4. Return a brief a person can act on.

Agent Anatomy

An extraction agent still needs a persona and a brain, but the most important capability is the bridge from a messy public page to structured role facts.

🎭
Persona

A concise hiring analyst.

🧠
Brain

Reasoning about role fit.

Building
🧾
Extractor

extract_from_page pulls role facts.

Building
📖
Reader

scrape_page returns page text.

Result
📌
Brief

Role summary and next steps.

From Page Content to Business Action

A job posting is not just text. It contains requirements, seniority signals, team priorities, and hidden hints about the hiring process. extract_from_page is best when you need fields. scrape_page is best when you want the agent to read the page and write a more flexible brief.

Setup your environment

Create a console app and install the packages that connect Agent Framework, OpenAI-compatible chat, and ScrapeGraphAI tools.

📋 Pre-flight Checklist

  • 🛠️ .NET 10.0 SDK (or later) installed.
  • 🤖 AI Provider: An OpenAI-compatible endpoint such as LM Studio, Ollama, OpenAI, or a compatible hosted service.
  • 🔑 ScrapeGraphAI: A valid SGAI_API_KEY for web scraping and extraction.
  • 🌐 Job URL: A public job posting page you are allowed to access.

1 Create the project

Open your terminal and create a new console application:

dotnet new console -n ScrapeGraphAI.JobPageParser -f net10.0
cd ScrapeGraphAI.JobPageParser

2 Install packages

Install the ScrapeGraphAI Agent Framework integration, the OpenAI-compatible Agent Framework bridge, and the supporting configuration packages.

dotnet add package ScrapeGraphAI.AgentFramework --version 1.0.0
dotnet add package Microsoft.Agents.AI.OpenAI
dotnet add package Microsoft.Extensions.Configuration
dotnet add package Microsoft.Extensions.DependencyInjection
dotnet add package Microsoft.Extensions.Logging.Console
dotnet add package OpenAI
dotnet restore
Package Anatomy
🛠️ScrapeGraphAI.AgentFramework

Registers ScrapeGraphAI web, scrape, extract, crawl, monitor, history, health, and credit tools as Agent Framework AITool instances.

🔌Microsoft.Agents.AI.OpenAI

Provides the Agent Framework OpenAI integration and the AsAIAgent extension used to build the agent.

🤖OpenAI

Creates an OpenAI-compatible chat client. This works with local endpoints such as LM Studio as well as hosted compatible services.

3 Configure credentials

Before setting environment variables, get your ScrapeGraphAI API key from the dashboard.

Get your ScrapeGraphAI API key

  1. Sign in to the ScrapeGraphAI dashboard.
  2. Open Settings.
  3. Copy the API key from the API Key section.
  4. Store it in an environment variable named SGAI_API_KEY.

The official docs describe the same flow in Managing your API keys. Keep the key server-side, never commit it to source control, and rotate it from Settings if it is exposed.

Set the ScrapeGraphAI key and your OpenAI-compatible chat endpoint.

$env:SGAI_API_KEY = "<your-scrapegraphai-api-key>"
$env:OPENAI_BASE_URL = "http://localhost:1234/v1"
$env:OPENAI_MODEL = "<your-tool-capable-model>"
$env:OPENAI_API_KEY = "lm-studio"
export SGAI_API_KEY="<your-scrapegraphai-api-key>"
export OPENAI_BASE_URL="http://localhost:1234/v1"
export OPENAI_MODEL="<your-tool-capable-model>"
export OPENAI_API_KEY="lm-studio"

For LM Studio, the API key can be a placeholder. For a hosted OpenAI-compatible endpoint, use the real API key required by that provider.

Build the job parser

The agent starts with one ScrapeGraphAI tool at a time. Use the tabs below to choose the behavior you want to teach.

sequenceDiagram
    participant User
    participant Agent as Job Parser Agent
    participant Extract as extract_from_page
    participant Scrape as scrape_page
    participant Brief as Hiring Brief

    User->>Agent: Analyze a public job posting URL
    alt Direct structured extraction
        Agent->>Extract: Extract role facts from the URL
        Extract-->>Agent: Structured job details
    else Page reading
        Agent->>Scrape: Fetch the posting as markdown
        Scrape-->>Agent: Page content
    end
    Agent->>Brief: Create concise brief with missing fields
    Brief-->>User: Role summary, skills, signals, and next steps

Choose the tool

ScrapeGraphAI charges different credit amounts depending on the service and format you call. For this tutorial’s two job-posting paths, the important baseline is:

  • scrape_page with markdown output: 1 credit for a basic page scrape.
  • extract_from_page: 5 credits for structured data extraction.

See the official ScrapeGraphAI pricing page and Scrape service pricing for the latest credit table before running high-volume workflows.

Use extract_from_page when…

You know the fields you want: role title, company, skills, responsibilities, seniority, work mode, and unclear fields. It costs more because ScrapeGraphAI performs structured extraction, but it is the better default for repeatable business workflows.

Use scrape_page when…

You want the page content first: markdown, summaries, links, or a more exploratory brief where the agent decides what matters from the posting text. Markdown scraping is cheaper, but the model must infer the structure from the returned content.

1 Implement the Agent and Tools
🎭 Persona
🛠️ Tools

Replace the contents of Program.cs with one of these complete programs:

using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;
using OpenAI;
using OpenAI.Chat;
using ScrapeGraphAI;
using ScrapeGraphAI.AgentFramework;
using System.ClientModel;

const string DefaultJobUrl = "https://example.com/careers/software-engineer";

var runAgent = args.Contains("--run", StringComparer.OrdinalIgnoreCase);
var jobUrl = GetOption(args, "--url") ?? DefaultJobUrl;
var prompt = GetOption(args, "--prompt") ?? $"""
Analyze this job posting: {jobUrl}.
Extract the role title, company, location, work mode, seniority, required skills,
preferred skills, responsibilities, hiring signals, and candidate preparation notes.
Return a concise brief with source URL.
""";

var endpoint = GetOption(args, "--endpoint")
    ?? Environment.GetEnvironmentVariable("OPENAI_BASE_URL")
    ?? Environment.GetEnvironmentVariable("LMSTUDIO_BASE_URL")
    ?? "http://localhost:1234/v1";

var configuration = new ConfigurationBuilder()
    .AddInMemoryCollection(new Dictionary<string, string?>
    {
        ["OpenAI:ApiKey"] = Environment.GetEnvironmentVariable("OPENAI_API_KEY") ?? "lm-studio",
        ["OpenAI:Model"] = GetOption(args, "--model")
            ?? Environment.GetEnvironmentVariable("OPENAI_MODEL")
            ?? Environment.GetEnvironmentVariable("LMSTUDIO_MODEL")
            ?? "google/gemma-4-e4b",
        ["OpenAI:Endpoint"] = endpoint,
        ["ScrapeGraphAI:ApiKey"] = Environment.GetEnvironmentVariable("SGAI_API_KEY")
            ?? (runAgent ? null : "sgai-placeholder")
    })
    .Build();

var openAiApiKey = configuration["OpenAI:ApiKey"];
var model = configuration["OpenAI:Model"]!;
var openAiEndpoint = configuration["OpenAI:Endpoint"]!;
var scrapeGraphApiKey = configuration["ScrapeGraphAI:ApiKey"];

using var cancellation = new CancellationTokenSource();
Console.CancelKeyPress += (_, eventArgs) =>
{
    eventArgs.Cancel = true;
    cancellation.Cancel();
};

var services = new ServiceCollection();
services.Configure<ScrapeGraphOptions>(configuration.GetSection("ScrapeGraphAI"));
services.AddLogging(logging =>
{
    logging.AddSimpleConsole(options => options.SingleLine = true);
    logging.SetMinimumLevel(LogLevel.Warning);
});

services.AddScrapeGraphAI()
    .ConfigureHttpClient(client =>
    {
        client.Timeout = TimeSpan.FromSeconds(90);
    });

services.AddScrapeGraphAgentTools(options =>
{
    options.MaxResultCharacters = 12_000;
    options.IncludedTools =
    [
        ScrapeGraphAgentToolNames.ExtractFromPage
    ];
});

await using var provider = services.BuildServiceProvider();
var loggerFactory = provider.GetRequiredService<ILoggerFactory>();
var scrapeGraphTools = provider.GetRequiredService<ScrapeGraphAgentTools>();
var aiTools = scrapeGraphTools
    .AsAITools(
        ScrapeGraphAgentToolNames.ExtractFromPage)
    .ToArray();

Console.WriteLine("ScrapeGraphAI job page parser");
Console.WriteLine();
Console.WriteLine($"Model: {model}");
Console.WriteLine($"OpenAI-compatible endpoint: {openAiEndpoint}");
Console.WriteLine($"Job URL: {jobUrl}");
Console.WriteLine("Registered tools:");
foreach (AITool tool in aiTools)
{
    Console.WriteLine($"- {tool.Name}: {tool.Description}");
}
Console.WriteLine();

if (!runAgent)
{
    Console.WriteLine("No API calls were made. Add --run after setting SGAI_API_KEY and starting your model endpoint.");
    return 0;
}

if (string.IsNullOrWhiteSpace(openAiApiKey))
{
    Console.Error.WriteLine("Set OPENAI_API_KEY before running with --run, or use a local endpoint placeholder such as lm-studio.");
    return 2;
}

if (string.IsNullOrWhiteSpace(scrapeGraphApiKey))
{
    Console.Error.WriteLine("Set SGAI_API_KEY before running with --run.");
    return 2;
}

if (!Uri.TryCreate(openAiEndpoint, UriKind.Absolute, out var endpointUri))
{
    Console.Error.WriteLine($"Invalid OpenAI-compatible endpoint: {openAiEndpoint}");
    return 2;
}

var chatClient = new ChatClient(
    model,
    new ApiKeyCredential(openAiApiKey),
    new OpenAIClientOptions
    {
        Endpoint = endpointUri
    });

var agent = chatClient.AsAIAgent(
    name: "JobPageParser",
    instructions: """
        You are a concise hiring analyst.

        For job posting analysis:
        - Use extract_from_page before answering.
        - Prefer structured extraction for role facts from the URL.
        - Preserve uncertainty when a field is missing or unclear.
        - Do not invent company details, compensation, seniority, or work mode.
        - Return a practical brief, not a generic summary.
        - Include the source URL.
        - End with one recommended next action.
        """,
    tools: aiTools,
    clientFactory: innerClient => new FunctionInvokingChatClient(innerClient, loggerFactory, provider)
    {
        MaximumIterationsPerRequest = 10
    },
    loggerFactory: loggerFactory,
    services: provider);

Console.WriteLine("Prompt:");
Console.WriteLine(prompt);
Console.WriteLine();
Console.WriteLine("Response:");

var response = await agent.RunAsync(prompt, cancellationToken: cancellation.Token).ConfigureAwait(false);
Console.WriteLine(response.Text);

return 0;

static string? GetOption(string[] args, string name)
{
    for (var i = 0; i < args.Length; i++)
    {
        if (string.Equals(args[i], name, StringComparison.OrdinalIgnoreCase) && i + 1 < args.Length)
        {
            return args[i + 1];
        }

        var prefix = name + "=";
        if (args[i].StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
        {
            return args[i][prefix.Length..];
        }
    }

    return null;
}
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;
using OpenAI;
using OpenAI.Chat;
using ScrapeGraphAI;
using ScrapeGraphAI.AgentFramework;
using System.ClientModel;

const string DefaultJobUrl = "https://example.com/careers/software-engineer";

var runAgent = args.Contains("--run", StringComparer.OrdinalIgnoreCase);
var jobUrl = GetOption(args, "--url") ?? DefaultJobUrl;
var prompt = GetOption(args, "--prompt") ?? $"""
Analyze this job posting: {jobUrl}.
Read the page, identify the role title, company, location, work mode, seniority,
required skills, preferred skills, responsibilities, hiring signals, and candidate preparation notes.
Return a concise brief with source URL.
""";

var endpoint = GetOption(args, "--endpoint")
    ?? Environment.GetEnvironmentVariable("OPENAI_BASE_URL")
    ?? Environment.GetEnvironmentVariable("LMSTUDIO_BASE_URL")
    ?? "http://localhost:1234/v1";

var configuration = new ConfigurationBuilder()
    .AddInMemoryCollection(new Dictionary<string, string?>
    {
        ["OpenAI:ApiKey"] = Environment.GetEnvironmentVariable("OPENAI_API_KEY") ?? "lm-studio",
        ["OpenAI:Model"] = GetOption(args, "--model")
            ?? Environment.GetEnvironmentVariable("OPENAI_MODEL")
            ?? Environment.GetEnvironmentVariable("LMSTUDIO_MODEL")
            ?? "google/gemma-4-e4b",
        ["OpenAI:Endpoint"] = endpoint,
        ["ScrapeGraphAI:ApiKey"] = Environment.GetEnvironmentVariable("SGAI_API_KEY")
            ?? (runAgent ? null : "sgai-placeholder")
    })
    .Build();

var openAiApiKey = configuration["OpenAI:ApiKey"];
var model = configuration["OpenAI:Model"]!;
var openAiEndpoint = configuration["OpenAI:Endpoint"]!;
var scrapeGraphApiKey = configuration["ScrapeGraphAI:ApiKey"];

using var cancellation = new CancellationTokenSource();
Console.CancelKeyPress += (_, eventArgs) =>
{
    eventArgs.Cancel = true;
    cancellation.Cancel();
};

var services = new ServiceCollection();
services.Configure<ScrapeGraphOptions>(configuration.GetSection("ScrapeGraphAI"));
services.AddLogging(logging =>
{
    logging.AddSimpleConsole(options => options.SingleLine = true);
    logging.SetMinimumLevel(LogLevel.Warning);
});

services.AddScrapeGraphAI()
    .ConfigureHttpClient(client =>
    {
        client.Timeout = TimeSpan.FromSeconds(90);
    });

services.AddScrapeGraphAgentTools(options =>
{
    options.DefaultFormat = ScrapeFormatType.Markdown;
    options.MaxResultCharacters = 12_000;
    options.IncludedTools =
    [
        ScrapeGraphAgentToolNames.ScrapePage
    ];
});

await using var provider = services.BuildServiceProvider();
var loggerFactory = provider.GetRequiredService<ILoggerFactory>();
var scrapeGraphTools = provider.GetRequiredService<ScrapeGraphAgentTools>();
var aiTools = scrapeGraphTools
    .AsAITools(
        ScrapeGraphAgentToolNames.ScrapePage)
    .ToArray();

Console.WriteLine("ScrapeGraphAI job page parser");
Console.WriteLine();
Console.WriteLine($"Model: {model}");
Console.WriteLine($"OpenAI-compatible endpoint: {openAiEndpoint}");
Console.WriteLine($"Job URL: {jobUrl}");
Console.WriteLine("Registered tools:");
foreach (AITool tool in aiTools)
{
    Console.WriteLine($"- {tool.Name}: {tool.Description}");
}
Console.WriteLine();

if (!runAgent)
{
    Console.WriteLine("No API calls were made. Add --run after setting SGAI_API_KEY and starting your model endpoint.");
    return 0;
}

if (string.IsNullOrWhiteSpace(openAiApiKey))
{
    Console.Error.WriteLine("Set OPENAI_API_KEY before running with --run, or use a local endpoint placeholder such as lm-studio.");
    return 2;
}

if (string.IsNullOrWhiteSpace(scrapeGraphApiKey))
{
    Console.Error.WriteLine("Set SGAI_API_KEY before running with --run.");
    return 2;
}

if (!Uri.TryCreate(openAiEndpoint, UriKind.Absolute, out var endpointUri))
{
    Console.Error.WriteLine($"Invalid OpenAI-compatible endpoint: {openAiEndpoint}");
    return 2;
}

var chatClient = new ChatClient(
    model,
    new ApiKeyCredential(openAiApiKey),
    new OpenAIClientOptions
    {
        Endpoint = endpointUri
    });

var agent = chatClient.AsAIAgent(
    name: "JobPageParser",
    instructions: """
        You are a concise hiring analyst.

        For job posting analysis:
        - Use scrape_page before answering.
        - Read the returned markdown carefully before writing the brief.
        - Preserve uncertainty when a field is missing or unclear.
        - Do not invent company details, compensation, seniority, or work mode.
        - Return a practical brief, not a generic summary.
        - Include the source URL.
        - End with one recommended next action.
        """,
    tools: aiTools,
    clientFactory: innerClient => new FunctionInvokingChatClient(innerClient, loggerFactory, provider)
    {
        MaximumIterationsPerRequest = 10
    },
    loggerFactory: loggerFactory,
    services: provider);

Console.WriteLine("Prompt:");
Console.WriteLine(prompt);
Console.WriteLine();
Console.WriteLine("Response:");

var response = await agent.RunAsync(prompt, cancellationToken: cancellation.Token).ConfigureAwait(false);
Console.WriteLine(response.Text);

return 0;

static string? GetOption(string[] args, string name)
{
    for (var i = 0; i < args.Length; i++)
    {
        if (string.Equals(args[i], name, StringComparison.OrdinalIgnoreCase) && i + 1 < args.Length)
        {
            return args[i + 1];
        }

        var prefix = name + "=";
        if (args[i].StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
        {
            return args[i][prefix.Length..];
        }
    }

    return null;
}

2 Run a dry registration check

Run the app without --run first. This checks that dependency injection and tool registration work without calling ScrapeGraphAI or your model endpoint.

Credit-safe dry run

Running without —run only prints configuration and registered tools. It does not call ScrapeGraphAI, does not call your model endpoint, and should not consume ScrapeGraphAI credits.

dotnet run

Expected output:

ScrapeGraphAI job page parser

Model: <your model>
OpenAI-compatible endpoint: http://localhost:1234/v1
Job URL: https://example.com/careers/software-engineer
Registered tools:
- extract_from_page: Extract structured JSON from a URL, raw HTML, or markdown using a natural-language prompt.

No API calls were made. Add --run after setting SGAI_API_KEY and starting your model endpoint.
ScrapeGraphAI job page parser

Model: <your model>
OpenAI-compatible endpoint: http://localhost:1234/v1
Job URL: https://example.com/careers/software-engineer
Registered tools:
- scrape_page: Fetch a URL and return content in markdown, HTML, links, images, summary, JSON, branding, or screenshot format.

No API calls were made. Add --run after setting SGAI_API_KEY and starting your model endpoint.

3 Run the agent

Once your model endpoint is running and SGAI_API_KEY is set, pass a real public job posting URL:

ScrapeGraphAI credit usage

Running with —run can consume ScrapeGraphAI credits because the selected tool fetches or extracts from the target page. In the default tutorial paths, markdown scrape_page starts at 1 credit, while extract_from_page starts at 5 credits. Use a small number of test URLs while developing, and avoid repeatedly running against the same page unless you are intentionally testing the live workflow.

dotnet run --run --url "https://example.com/careers/software-engineer"

The answer will vary because the agent is working from live page content, but it should identify role facts, list missing fields, and provide a concrete next step.

Try it

Run these prompts with either Program.cs tab. Prefer ExtractFromPage for repeatable field extraction, and ScrapePage when the agent needs more room to interpret the posting text.

dotnet run --run --url "https://example.com/careers/software-engineer" --prompt "Analyze this job posting for a candidate. Summarize must-have skills, likely interview topics, gaps to prepare for, and a 3-step study plan. Include the source URL."

Works with either approach. Use ScrapePage when you want a richer narrative brief; use ExtractFromPage when you want the prep plan grounded in specific extracted fields.

dotnet run --run --url "https://example.com/careers/software-engineer" --prompt "Analyze this job posting for recruiter intake. Extract role title, team, location, seniority, must-have skills, nice-to-have skills, screening questions, and unclear fields. Include the source URL."

Best with ExtractFromPage because the output depends on stable fields a recruiter can reuse for sourcing, screening, or handoff notes.

dotnet run --run --url "https://example.com/careers/software-engineer" --prompt "Analyze this job posting and create a skills gap checklist for a candidate with C# and Azure experience but limited frontend experience. Include required skills, matching strengths, gaps, and next learning steps."

Often better with ScrapePage because the agent can reason across the full posting text, but ExtractFromPage is useful when you want the gap analysis tied to explicit required and preferred skills.

dotnet run --run --url "https://example.com/careers/software-engineer" --prompt "Analyze this job posting and generate 8 interview questions mapped to the role's required skills and responsibilities. Include the source URL."

Often better with ScrapePage because interview questions benefit from the posting’s full context, tone, and responsibilities.

Guardrails for job-page agents

Job postings often omit important details. The agent should treat missing information as missing, not as an invitation to guess.

Do not invent hiring facts

This tutorial intentionally registers one ScrapeGraphAI tool at a time. Whether you choose extract_from_page or scrape_page, the agent should not infer compensation, remote policy, sponsorship, seniority, or interview process unless the page provides evidence. Missing fields should be called out explicitly.

The most important prompt rules are:

  • Use the selected ScrapeGraphAI tool before answering.
  • Preserve the source URL.
  • Separate required skills from preferred skills.
  • Mark missing or ambiguous fields.
  • Avoid guessing compensation, work mode, or seniority.
  • Keep the final brief short enough to use in a hiring workflow.

What to build next

This tutorial gives you a useful first job-page workflow. From here, you can grow the agent in four directions:

Structured Output

Add a JSON schema so the brief can feed an ATS, CRM, or spreadsheet.

Multi-page Career Research

Add crawl tools to inspect multiple postings from the same careers site.

Posting Monitor

Add monitor tools to detect when a role changes or disappears.

Candidate Matcher

Compare extracted role requirements with a candidate profile or resume summary.

Summary

You built a focused job-page parser that turns a public posting into a role brief. The key move was comparing two narrow tool paths before adding more capability:

job URL -> extract_from_page -> hiring brief
job URL -> scrape_page -> hiring brief

That loop is simple enough to debug, useful enough for recruiters or candidates, and strong enough to become the base for structured output, monitoring, and candidate matching.