Confidently Extract Data with Validation and Inference Support: What’s New in the Extraction API

Extracting structured data from unstructured content is a foundational capability in many AI-powered workflows — from reading receipts and invoices to parsing forms and legal documents. With the latest enhancements to the Extraction API, we’re making that process not only smarter, but more trustworthy and configurable than ever before.

In this post, we’ll walk through the two key new capabilities:

Validation Configuration for hallucination detection and accuracy checks
Inference Flags for properties that should be inferred, not directly extracted

Let’s look at how these work in practice with a real example.

✅ Validating the Extraction

You can now configure validation behavior as part of your extraction request. This includes:

Hallucination Detection: Checks whether extracted values actually appear in the source text
Ground Truth Comparison: Compares extracted data against known correct values to assess accuracy

Here’s a sample request using the new validationConfig option:

"validationConfig": {
  "validationLevel": "advanced",
  "groundTruthData": {
    "passengers": [
      { "firstName": "Mathieu", "lastName": "Isabel" },
      { "firstName": "Arthur", "lastName": "Isabel" }
    ],
    "transactions": [
      {
        "transactionDate": "2024-01-14",
        "description": "Flight Montreal to Moscow",
        "category": "Air Fare",
        "quantity": 2,
        "unitPrice": 7000,
        "subTotal": 7000
      },
      ...
    ]
  }
}

When paired with the input text — in this case, a booking receipt — the API extracts both passengers and transactions, and validates every field against the ground truth.

🚨 Detecting Hallucinations, Highlighting Accuracy

The response includes a detailed validation section for every extracted property. Here’s a snippet:

{
  "jsonPath": "$.transactions[2].description",
  "extractionReasoning": "The description 'Deluxe Suite' was extracted from the table under 'Description', specifying the nature of the transaction as a hotel booking.",
  "hallucinated": false,
  "matchesGroundTruth": true
}

For each property, you get:

The path to the field in the response (jsonPath)
The reasoning behind the extraction
Whether it was hallucinated
Whether it matched the ground truth

This makes debugging and auditing AI behavior dramatically easier — especially in production-grade pipelines.

The summary stats in the response are even more telling:

"summary": {
  "overallAccuracyPercentage": 100.0,
  "hallucinationPercentage": 0.0
}

🧠 Flagging Inferred Properties

Some properties aren’t always present verbatim in the text — they’re derived from context. For example:

"category": {
  "type": "string",
  "description": "The category assigned the transaction based on its description.",
  "enum": ["Air Fare", "Hotel", "Other"],
  "inferred": true
}

By setting "inferred": true, you’re telling the API:

This property doesn’t need to be textually grounded
It should be excluded from hallucination detection
It can still be compared to ground truth (if provided)

In our example, all three transaction category values — “Air Fare”, “Air Fare”, and “Hotel” — are inferred correctly from the description field, such as “Deluxe Suite” or “Flight Montreal to Moscow.”

This makes it possible to blend high-confidence extractions with contextual AI inferences, without penalizing the latter in validation logic.

🔍 End-to-End Confidence

With these new features, the Extraction API gives you:

Transparent validation reporting
Control over validation depth via configuration
Separation of direct extraction vs. inference
Compatibility with ground truth evaluation for test and prod

Whether you’re ingesting receipts, medical notes, or insurance forms, these updates help you trust the output — and prove its accuracy.

📌 Try It Out

You can start using these features today in any extraction request by including:

validationConfig
"inferred": true flags on schema properties where applicable

We’d love to hear how you’re using these features and what else you’d like to see.

Go check out the Content API on RapidAPI and give the extraction capability a try!

Dretza