Parse Job Postings into Hiring Briefs with ScrapeGraphAI Done
View sourceBuild a .NET agent that reads public job postings and turns them into structured role briefs with requirements, signals, and next steps.
Overview
Hiring pages are packed with useful information, but they are rarely shaped for quick decisions. A recruiter wants intake notes. A candidate wants preparation guidance. A hiring manager wants a quick check on role scope, seniority, and required skills. Everyone is looking at the same public job posting, but each person needs it turned into a practical brief.
In this tutorial, you will build a Job Page Parser with Microsoft Agent Framework and ScrapeGraphAI.AgentFramework. The agent takes a public job posting URL, reads or extracts the important role facts, and returns a concise hiring brief with a source URL.
The workflow is intentionally simple:
- Provide a public job posting URL.
- Choose direct extraction or page reading.
- Preserve missing or uncertain fields.
- Return a brief a person can act on.
Agent Anatomy
An extraction agent still needs a persona and a brain, but the most important capability is the bridge from a messy public page to structured role facts.
A concise hiring analyst.
Reasoning about role fit.
extract_from_page pulls role facts.
scrape_page returns page text.
Role summary and next steps.
From Page Content to Business Action
A job posting is not just text. It contains requirements, seniority signals, team priorities, and hidden hints about the hiring process. extract_from_page is best when you need fields. scrape_page is best when you want the agent to read the page and write a more flexible brief.
Setup your environment
Create a console app and install the packages that connect Agent Framework, OpenAI-compatible chat, and ScrapeGraphAI tools.
📋 Pre-flight Checklist
- 🛠️ .NET 10.0 SDK (or later) installed.
- 🤖 AI Provider: An OpenAI-compatible endpoint such as LM Studio, Ollama, OpenAI, or a compatible hosted service.
- 🔑 ScrapeGraphAI: A valid
SGAI_API_KEYfor web scraping and extraction. - 🌐 Job URL: A public job posting page you are allowed to access.
1 Create the project
Open your terminal and create a new console application:
dotnet new console -n ScrapeGraphAI.JobPageParser -f net10.0
cd ScrapeGraphAI.JobPageParser
2 Install packages
Install the ScrapeGraphAI Agent Framework integration, the OpenAI-compatible Agent Framework bridge, and the supporting configuration packages.
dotnet add package ScrapeGraphAI.AgentFramework --version 1.0.0
dotnet add package Microsoft.Agents.AI.OpenAI
dotnet add package Microsoft.Extensions.Configuration
dotnet add package Microsoft.Extensions.DependencyInjection
dotnet add package Microsoft.Extensions.Logging.Console
dotnet add package OpenAI
dotnet restore
ScrapeGraphAI.AgentFrameworkRegisters ScrapeGraphAI web, scrape, extract, crawl, monitor, history, health, and credit tools as Agent Framework AITool instances.
Microsoft.Agents.AI.OpenAIProvides the Agent Framework OpenAI integration and the AsAIAgent extension used to build the agent.
OpenAICreates an OpenAI-compatible chat client. This works with local endpoints such as LM Studio as well as hosted compatible services.
3 Configure credentials
Before setting environment variables, get your ScrapeGraphAI API key from the dashboard.
Get your ScrapeGraphAI API key
- Sign in to the ScrapeGraphAI dashboard.
- Open Settings.
- Copy the API key from the API Key section.
- Store it in an environment variable named
SGAI_API_KEY.
The official docs describe the same flow in Managing your API keys. Keep the key server-side, never commit it to source control, and rotate it from Settings if it is exposed.
Set the ScrapeGraphAI key and your OpenAI-compatible chat endpoint.
$env:SGAI_API_KEY = "<your-scrapegraphai-api-key>"
$env:OPENAI_BASE_URL = "http://localhost:1234/v1"
$env:OPENAI_MODEL = "<your-tool-capable-model>"
$env:OPENAI_API_KEY = "lm-studio" export SGAI_API_KEY="<your-scrapegraphai-api-key>"
export OPENAI_BASE_URL="http://localhost:1234/v1"
export OPENAI_MODEL="<your-tool-capable-model>"
export OPENAI_API_KEY="lm-studio" For LM Studio, the API key can be a placeholder. For a hosted OpenAI-compatible endpoint, use the real API key required by that provider.
Build the job parser
The agent starts with one ScrapeGraphAI tool at a time. Use the tabs below to choose the behavior you want to teach.
sequenceDiagram
participant User
participant Agent as Job Parser Agent
participant Extract as extract_from_page
participant Scrape as scrape_page
participant Brief as Hiring Brief
User->>Agent: Analyze a public job posting URL
alt Direct structured extraction
Agent->>Extract: Extract role facts from the URL
Extract-->>Agent: Structured job details
else Page reading
Agent->>Scrape: Fetch the posting as markdown
Scrape-->>Agent: Page content
end
Agent->>Brief: Create concise brief with missing fields
Brief-->>User: Role summary, skills, signals, and next steps
Choose the tool
ScrapeGraphAI charges different credit amounts depending on the service and format you call. For this tutorial’s two job-posting paths, the important baseline is:
scrape_pagewith markdown output: 1 credit for a basic page scrape.extract_from_page: 5 credits for structured data extraction.
See the official ScrapeGraphAI pricing page and Scrape service pricing for the latest credit table before running high-volume workflows.
extract_from_page when…You know the fields you want: role title, company, skills, responsibilities, seniority, work mode, and unclear fields. It costs more because ScrapeGraphAI performs structured extraction, but it is the better default for repeatable business workflows.
scrape_page when…You want the page content first: markdown, summaries, links, or a more exploratory brief where the agent decides what matters from the posting text. Markdown scraping is cheaper, but the model must infer the structure from the returned content.
1 Implement the Agent and Tools 🎭 Persona 🛠️ Tools
Replace the contents of Program.cs with one of these complete programs:
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;
using OpenAI;
using OpenAI.Chat;
using ScrapeGraphAI;
using ScrapeGraphAI.AgentFramework;
using System.ClientModel;
const string DefaultJobUrl = "https://example.com/careers/software-engineer";
var runAgent = args.Contains("--run", StringComparer.OrdinalIgnoreCase);
var jobUrl = GetOption(args, "--url") ?? DefaultJobUrl;
var prompt = GetOption(args, "--prompt") ?? $"""
Analyze this job posting: {jobUrl}.
Extract the role title, company, location, work mode, seniority, required skills,
preferred skills, responsibilities, hiring signals, and candidate preparation notes.
Return a concise brief with source URL.
""";
var endpoint = GetOption(args, "--endpoint")
?? Environment.GetEnvironmentVariable("OPENAI_BASE_URL")
?? Environment.GetEnvironmentVariable("LMSTUDIO_BASE_URL")
?? "http://localhost:1234/v1";
var configuration = new ConfigurationBuilder()
.AddInMemoryCollection(new Dictionary<string, string?>
{
["OpenAI:ApiKey"] = Environment.GetEnvironmentVariable("OPENAI_API_KEY") ?? "lm-studio",
["OpenAI:Model"] = GetOption(args, "--model")
?? Environment.GetEnvironmentVariable("OPENAI_MODEL")
?? Environment.GetEnvironmentVariable("LMSTUDIO_MODEL")
?? "google/gemma-4-e4b",
["OpenAI:Endpoint"] = endpoint,
["ScrapeGraphAI:ApiKey"] = Environment.GetEnvironmentVariable("SGAI_API_KEY")
?? (runAgent ? null : "sgai-placeholder")
})
.Build();
var openAiApiKey = configuration["OpenAI:ApiKey"];
var model = configuration["OpenAI:Model"]!;
var openAiEndpoint = configuration["OpenAI:Endpoint"]!;
var scrapeGraphApiKey = configuration["ScrapeGraphAI:ApiKey"];
using var cancellation = new CancellationTokenSource();
Console.CancelKeyPress += (_, eventArgs) =>
{
eventArgs.Cancel = true;
cancellation.Cancel();
};
var services = new ServiceCollection();
services.Configure<ScrapeGraphOptions>(configuration.GetSection("ScrapeGraphAI"));
services.AddLogging(logging =>
{
logging.AddSimpleConsole(options => options.SingleLine = true);
logging.SetMinimumLevel(LogLevel.Warning);
});
services.AddScrapeGraphAI()
.ConfigureHttpClient(client =>
{
client.Timeout = TimeSpan.FromSeconds(90);
});
services.AddScrapeGraphAgentTools(options =>
{
options.MaxResultCharacters = 12_000;
options.IncludedTools =
[
ScrapeGraphAgentToolNames.ExtractFromPage
];
});
await using var provider = services.BuildServiceProvider();
var loggerFactory = provider.GetRequiredService<ILoggerFactory>();
var scrapeGraphTools = provider.GetRequiredService<ScrapeGraphAgentTools>();
var aiTools = scrapeGraphTools
.AsAITools(
ScrapeGraphAgentToolNames.ExtractFromPage)
.ToArray();
Console.WriteLine("ScrapeGraphAI job page parser");
Console.WriteLine();
Console.WriteLine($"Model: {model}");
Console.WriteLine($"OpenAI-compatible endpoint: {openAiEndpoint}");
Console.WriteLine($"Job URL: {jobUrl}");
Console.WriteLine("Registered tools:");
foreach (AITool tool in aiTools)
{
Console.WriteLine($"- {tool.Name}: {tool.Description}");
}
Console.WriteLine();
if (!runAgent)
{
Console.WriteLine("No API calls were made. Add --run after setting SGAI_API_KEY and starting your model endpoint.");
return 0;
}
if (string.IsNullOrWhiteSpace(openAiApiKey))
{
Console.Error.WriteLine("Set OPENAI_API_KEY before running with --run, or use a local endpoint placeholder such as lm-studio.");
return 2;
}
if (string.IsNullOrWhiteSpace(scrapeGraphApiKey))
{
Console.Error.WriteLine("Set SGAI_API_KEY before running with --run.");
return 2;
}
if (!Uri.TryCreate(openAiEndpoint, UriKind.Absolute, out var endpointUri))
{
Console.Error.WriteLine($"Invalid OpenAI-compatible endpoint: {openAiEndpoint}");
return 2;
}
var chatClient = new ChatClient(
model,
new ApiKeyCredential(openAiApiKey),
new OpenAIClientOptions
{
Endpoint = endpointUri
});
var agent = chatClient.AsAIAgent(
name: "JobPageParser",
instructions: """
You are a concise hiring analyst.
For job posting analysis:
- Use extract_from_page before answering.
- Prefer structured extraction for role facts from the URL.
- Preserve uncertainty when a field is missing or unclear.
- Do not invent company details, compensation, seniority, or work mode.
- Return a practical brief, not a generic summary.
- Include the source URL.
- End with one recommended next action.
""",
tools: aiTools,
clientFactory: innerClient => new FunctionInvokingChatClient(innerClient, loggerFactory, provider)
{
MaximumIterationsPerRequest = 10
},
loggerFactory: loggerFactory,
services: provider);
Console.WriteLine("Prompt:");
Console.WriteLine(prompt);
Console.WriteLine();
Console.WriteLine("Response:");
var response = await agent.RunAsync(prompt, cancellationToken: cancellation.Token).ConfigureAwait(false);
Console.WriteLine(response.Text);
return 0;
static string? GetOption(string[] args, string name)
{
for (var i = 0; i < args.Length; i++)
{
if (string.Equals(args[i], name, StringComparison.OrdinalIgnoreCase) && i + 1 < args.Length)
{
return args[i + 1];
}
var prefix = name + "=";
if (args[i].StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
{
return args[i][prefix.Length..];
}
}
return null;
} using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;
using OpenAI;
using OpenAI.Chat;
using ScrapeGraphAI;
using ScrapeGraphAI.AgentFramework;
using System.ClientModel;
const string DefaultJobUrl = "https://example.com/careers/software-engineer";
var runAgent = args.Contains("--run", StringComparer.OrdinalIgnoreCase);
var jobUrl = GetOption(args, "--url") ?? DefaultJobUrl;
var prompt = GetOption(args, "--prompt") ?? $"""
Analyze this job posting: {jobUrl}.
Read the page, identify the role title, company, location, work mode, seniority,
required skills, preferred skills, responsibilities, hiring signals, and candidate preparation notes.
Return a concise brief with source URL.
""";
var endpoint = GetOption(args, "--endpoint")
?? Environment.GetEnvironmentVariable("OPENAI_BASE_URL")
?? Environment.GetEnvironmentVariable("LMSTUDIO_BASE_URL")
?? "http://localhost:1234/v1";
var configuration = new ConfigurationBuilder()
.AddInMemoryCollection(new Dictionary<string, string?>
{
["OpenAI:ApiKey"] = Environment.GetEnvironmentVariable("OPENAI_API_KEY") ?? "lm-studio",
["OpenAI:Model"] = GetOption(args, "--model")
?? Environment.GetEnvironmentVariable("OPENAI_MODEL")
?? Environment.GetEnvironmentVariable("LMSTUDIO_MODEL")
?? "google/gemma-4-e4b",
["OpenAI:Endpoint"] = endpoint,
["ScrapeGraphAI:ApiKey"] = Environment.GetEnvironmentVariable("SGAI_API_KEY")
?? (runAgent ? null : "sgai-placeholder")
})
.Build();
var openAiApiKey = configuration["OpenAI:ApiKey"];
var model = configuration["OpenAI:Model"]!;
var openAiEndpoint = configuration["OpenAI:Endpoint"]!;
var scrapeGraphApiKey = configuration["ScrapeGraphAI:ApiKey"];
using var cancellation = new CancellationTokenSource();
Console.CancelKeyPress += (_, eventArgs) =>
{
eventArgs.Cancel = true;
cancellation.Cancel();
};
var services = new ServiceCollection();
services.Configure<ScrapeGraphOptions>(configuration.GetSection("ScrapeGraphAI"));
services.AddLogging(logging =>
{
logging.AddSimpleConsole(options => options.SingleLine = true);
logging.SetMinimumLevel(LogLevel.Warning);
});
services.AddScrapeGraphAI()
.ConfigureHttpClient(client =>
{
client.Timeout = TimeSpan.FromSeconds(90);
});
services.AddScrapeGraphAgentTools(options =>
{
options.DefaultFormat = ScrapeFormatType.Markdown;
options.MaxResultCharacters = 12_000;
options.IncludedTools =
[
ScrapeGraphAgentToolNames.ScrapePage
];
});
await using var provider = services.BuildServiceProvider();
var loggerFactory = provider.GetRequiredService<ILoggerFactory>();
var scrapeGraphTools = provider.GetRequiredService<ScrapeGraphAgentTools>();
var aiTools = scrapeGraphTools
.AsAITools(
ScrapeGraphAgentToolNames.ScrapePage)
.ToArray();
Console.WriteLine("ScrapeGraphAI job page parser");
Console.WriteLine();
Console.WriteLine($"Model: {model}");
Console.WriteLine($"OpenAI-compatible endpoint: {openAiEndpoint}");
Console.WriteLine($"Job URL: {jobUrl}");
Console.WriteLine("Registered tools:");
foreach (AITool tool in aiTools)
{
Console.WriteLine($"- {tool.Name}: {tool.Description}");
}
Console.WriteLine();
if (!runAgent)
{
Console.WriteLine("No API calls were made. Add --run after setting SGAI_API_KEY and starting your model endpoint.");
return 0;
}
if (string.IsNullOrWhiteSpace(openAiApiKey))
{
Console.Error.WriteLine("Set OPENAI_API_KEY before running with --run, or use a local endpoint placeholder such as lm-studio.");
return 2;
}
if (string.IsNullOrWhiteSpace(scrapeGraphApiKey))
{
Console.Error.WriteLine("Set SGAI_API_KEY before running with --run.");
return 2;
}
if (!Uri.TryCreate(openAiEndpoint, UriKind.Absolute, out var endpointUri))
{
Console.Error.WriteLine($"Invalid OpenAI-compatible endpoint: {openAiEndpoint}");
return 2;
}
var chatClient = new ChatClient(
model,
new ApiKeyCredential(openAiApiKey),
new OpenAIClientOptions
{
Endpoint = endpointUri
});
var agent = chatClient.AsAIAgent(
name: "JobPageParser",
instructions: """
You are a concise hiring analyst.
For job posting analysis:
- Use scrape_page before answering.
- Read the returned markdown carefully before writing the brief.
- Preserve uncertainty when a field is missing or unclear.
- Do not invent company details, compensation, seniority, or work mode.
- Return a practical brief, not a generic summary.
- Include the source URL.
- End with one recommended next action.
""",
tools: aiTools,
clientFactory: innerClient => new FunctionInvokingChatClient(innerClient, loggerFactory, provider)
{
MaximumIterationsPerRequest = 10
},
loggerFactory: loggerFactory,
services: provider);
Console.WriteLine("Prompt:");
Console.WriteLine(prompt);
Console.WriteLine();
Console.WriteLine("Response:");
var response = await agent.RunAsync(prompt, cancellationToken: cancellation.Token).ConfigureAwait(false);
Console.WriteLine(response.Text);
return 0;
static string? GetOption(string[] args, string name)
{
for (var i = 0; i < args.Length; i++)
{
if (string.Equals(args[i], name, StringComparison.OrdinalIgnoreCase) && i + 1 < args.Length)
{
return args[i + 1];
}
var prefix = name + "=";
if (args[i].StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
{
return args[i][prefix.Length..];
}
}
return null;
} 2 Run a dry registration check
Run the app without --run first. This checks that dependency injection and tool registration work without calling ScrapeGraphAI or your model endpoint.
Credit-safe dry run
Running without —run only prints configuration and registered tools. It does not call ScrapeGraphAI, does not call your model endpoint, and should not consume ScrapeGraphAI credits.
dotnet run
Expected output:
ScrapeGraphAI job page parser
Model: <your model>
OpenAI-compatible endpoint: http://localhost:1234/v1
Job URL: https://example.com/careers/software-engineer
Registered tools:
- extract_from_page: Extract structured JSON from a URL, raw HTML, or markdown using a natural-language prompt.
No API calls were made. Add --run after setting SGAI_API_KEY and starting your model endpoint. ScrapeGraphAI job page parser
Model: <your model>
OpenAI-compatible endpoint: http://localhost:1234/v1
Job URL: https://example.com/careers/software-engineer
Registered tools:
- scrape_page: Fetch a URL and return content in markdown, HTML, links, images, summary, JSON, branding, or screenshot format.
No API calls were made. Add --run after setting SGAI_API_KEY and starting your model endpoint. 3 Run the agent
Once your model endpoint is running and SGAI_API_KEY is set, pass a real public job posting URL:
ScrapeGraphAI credit usage
Running with —run can consume ScrapeGraphAI credits because the selected tool fetches or extracts from the target page. In the default tutorial paths, markdown scrape_page starts at 1 credit, while extract_from_page starts at 5 credits. Use a small number of test URLs while developing, and avoid repeatedly running against the same page unless you are intentionally testing the live workflow.
dotnet run --run --url "https://example.com/careers/software-engineer"
The answer will vary because the agent is working from live page content, but it should identify role facts, list missing fields, and provide a concrete next step.
Try it
Run these prompts with either Program.cs tab. Prefer ExtractFromPage for repeatable field extraction, and ScrapePage when the agent needs more room to interpret the posting text.
dotnet run --run --url "https://example.com/careers/software-engineer" --prompt "Analyze this job posting for a candidate. Summarize must-have skills, likely interview topics, gaps to prepare for, and a 3-step study plan. Include the source URL."Works with either approach. Use ScrapePage when you want a richer narrative brief; use ExtractFromPage when you want the prep plan grounded in specific extracted fields.
dotnet run --run --url "https://example.com/careers/software-engineer" --prompt "Analyze this job posting for recruiter intake. Extract role title, team, location, seniority, must-have skills, nice-to-have skills, screening questions, and unclear fields. Include the source URL."Best with ExtractFromPage because the output depends on stable fields a recruiter can reuse for sourcing, screening, or handoff notes.
dotnet run --run --url "https://example.com/careers/software-engineer" --prompt "Analyze this job posting and create a skills gap checklist for a candidate with C# and Azure experience but limited frontend experience. Include required skills, matching strengths, gaps, and next learning steps."Often better with ScrapePage because the agent can reason across the full posting text, but ExtractFromPage is useful when you want the gap analysis tied to explicit required and preferred skills.
dotnet run --run --url "https://example.com/careers/software-engineer" --prompt "Analyze this job posting and generate 8 interview questions mapped to the role's required skills and responsibilities. Include the source URL."Often better with ScrapePage because interview questions benefit from the posting’s full context, tone, and responsibilities.
Guardrails for job-page agents
Job postings often omit important details. The agent should treat missing information as missing, not as an invitation to guess.
Do not invent hiring facts
This tutorial intentionally registers one ScrapeGraphAI tool at a time. Whether you choose extract_from_page or scrape_page, the agent should not infer compensation, remote policy, sponsorship, seniority, or interview process unless the page provides evidence. Missing fields should be called out explicitly.
The most important prompt rules are:
- Use the selected ScrapeGraphAI tool before answering.
- Preserve the source URL.
- Separate required skills from preferred skills.
- Mark missing or ambiguous fields.
- Avoid guessing compensation, work mode, or seniority.
- Keep the final brief short enough to use in a hiring workflow.
What to build next
This tutorial gives you a useful first job-page workflow. From here, you can grow the agent in four directions:
Add a JSON schema so the brief can feed an ATS, CRM, or spreadsheet.
Add crawl tools to inspect multiple postings from the same careers site.
Add monitor tools to detect when a role changes or disappears.
Compare extracted role requirements with a candidate profile or resume summary.
Summary
You built a focused job-page parser that turns a public posting into a role brief. The key move was comparing two narrow tool paths before adding more capability:
job URL -> extract_from_page -> hiring brief
job URL -> scrape_page -> hiring brief
That loop is simple enough to debug, useful enough for recruiters or candidates, and strong enough to become the base for structured output, monitoring, and candidate matching.