
Confidently Extract Data with Validation and Inference Support: What’s New in the Extraction API
Extracting structured data from unstructured content is a foundational capability in many AI-powered workflows — from reading receipts and invoices to parsing forms and legal documents. With the latest enhancements to the Extraction API, we’re making that process not only smarter, but more trustworthy and configurable than ever before.
In this post, we’ll walk through the two key new capabilities:
- Validation Configuration for hallucination detection and accuracy checks
- Inference Flags for properties that should be inferred, not directly extracted
Let’s look at how these work in practice with a real example.
✅ Validating the Extraction
You can now configure validation behavior as part of your extraction request. This includes:
- Hallucination Detection: Checks whether extracted values actually appear in the source text
- Ground Truth Comparison: Compares extracted data against known correct values to assess accuracy
Here’s a sample request using the new validationConfig
option:
"validationConfig": {
"validationLevel": "advanced",
"groundTruthData": {
"passengers": [
{ "firstName": "Mathieu", "lastName": "Isabel" },
{ "firstName": "Arthur", "lastName": "Isabel" }
],
"transactions": [
{
"transactionDate": "2024-01-14",
"description": "Flight Montreal to Moscow",
"category": "Air Fare",
"quantity": 2,
"unitPrice": 7000,
"subTotal": 7000
},
...
]
}
}
When paired with the input text — in this case, a booking receipt — the API extracts both passengers
and transactions
, and validates every field against the ground truth.
🚨 Detecting Hallucinations, Highlighting Accuracy
The response includes a detailed validation
section for every extracted property. Here’s a snippet:
{
"jsonPath": "$.transactions[2].description",
"extractionReasoning": "The description 'Deluxe Suite' was extracted from the table under 'Description', specifying the nature of the transaction as a hotel booking.",
"hallucinated": false,
"matchesGroundTruth": true
}
For each property, you get:
- The path to the field in the response (
jsonPath
) - The reasoning behind the extraction
- Whether it was hallucinated
- Whether it matched the ground truth
This makes debugging and auditing AI behavior dramatically easier — especially in production-grade pipelines.
The summary stats in the response are even more telling:
"summary": {
"overallAccuracyPercentage": 100.0,
"hallucinationPercentage": 0.0
}
🧠 Flagging Inferred Properties
Some properties aren’t always present verbatim in the text — they’re derived from context. For example:
"category": {
"type": "string",
"description": "The category assigned the transaction based on its description.",
"enum": ["Air Fare", "Hotel", "Other"],
"inferred": true
}
By setting "inferred": true
, you’re telling the API:
- This property doesn’t need to be textually grounded
- It should be excluded from hallucination detection
- It can still be compared to ground truth (if provided)
In our example, all three transaction category
values — “Air Fare”, “Air Fare”, and “Hotel” — are inferred correctly from the description field, such as “Deluxe Suite” or “Flight Montreal to Moscow.”
This makes it possible to blend high-confidence extractions with contextual AI inferences, without penalizing the latter in validation logic.
🔍 End-to-End Confidence
With these new features, the Extraction API gives you:
- Transparent validation reporting
- Control over validation depth via configuration
- Separation of direct extraction vs. inference
- Compatibility with ground truth evaluation for test and prod
Whether you’re ingesting receipts, medical notes, or insurance forms, these updates help you trust the output — and prove its accuracy.
📌 Try It Out
You can start using these features today in any extraction request by including:
validationConfig
"inferred": true
flags on schema properties where applicable
We’d love to hear how you’re using these features and what else you’d like to see.
Go check out the Content API on RapidAPI and give the extraction capability a try!