Content Analysis
mathieu.isabel  

Refining Extraction Requests with Feedback: A Smarter Way to Boost Data Accuracy

In document intelligence workflows, accurate information extraction is critical—but often elusive. Whether due to formatting quirks, ambiguous structures, or specific business rules (like how names should appear), getting perfect results from the first extraction pass isn’t always feasible.

That’s why we’re introducing a powerful new capability: refinable extraction requests. This lets you use structured feedback—whether from a human reviewer or an LLM—to automatically improve the definition of an extraction request. The result? Better performance, greater alignment with real-world needs, and fewer cycles of trial and error.

Let’s take a look at how this works through a real-world example involving a booking receipt.

1. The Original Extraction Request

In this example, the goal is to extract structured data from a booking receipt that includes both passenger names and travel-related transactions.

The original request defines an object with two arrays: passengers and transactions. It includes validation rules and uses sample input that resembles a typical travel invoice.

Key fields:

  • Passengers – Each with a firstName and lastName.
  • Transactions – Including transactionDate, description, category, quantity, unitPrice, and subTotal.

The request also uses advanced validation and provides ground truth data for benchmarking.

But after running this request, feedback revealed a formatting issue.

2. The Feedback Loop

A reviewer submitted feedback noting:

“Passenger names should have been in upper case.”

They marked this as a formatting requirement that had not been enforced in the original request. While the extracted values were correct, the format was not acceptable to downstream systems.

This is the kind of issue that often gets flagged late in the workflow—by QA teams, integration layers, or customers. Now, instead of reworking the request manually, the feedback is structured and submitted alongside the original extraction.

3. The Refinement Process

The refinement engine produced a refined extraction request based on the feedback, along with a clear summary of changes. Below are the actual changes, with before and after views for each key area.

1. Schema Restructuring

Before:

{
  "schema": {
    "type": "object",
    "properties": {
      "passengers": { ... },
      "transactions": { ... }
    }
  }
}

After:

{
  "schema": {
    "type": "object",
    "properties": {
      "bookingReceipt": {
        "type": "object",
        "properties": {
          "passengers": { ... },
          "transactions": { ... }
        }
      }
    }
  }
}

🔁 Reason: Grouping related fields under bookingReceipt improves organization and allows for easier expansion later.

2. Formatting Enforcement

Before:

"firstName": {
  "type": "string",
  "description": "Passenger first name"
}

After:

"firstName": {
  "type": "string",
  "description": "Passenger first name (must be in uppercase)",
  "examples": ["JANE"]
}

Same applies to lastName:

Before:

"lastName": {
  "type": "string",
  "description": "Passenger last name"
}

After:

"lastName": {
  "type": "string",
  "description": "Passenger last name (must be in uppercase)",
  "examples": ["DOE"]
}

🔁 Reason: Ensures downstream systems that require uppercase input (e.g. flight booking systems) receive it in the correct format.

3. Instruction Improvements

Before:

"instructions": "Extract the requested information from the receipt."

After:

"instructions": "Extract all passengers and transactions from the receipt. Ensure that all passenger names are fully capitalized (uppercase). Double-check for accuracy and consistency before submitting the extracted values."

🔁 Reason: Clearer expectations and actionable formatting guidance reduce ambiguity and improve consistency.

4. Property Descriptions and Examples

More descriptive metadata was added across properties:

Before (transaction field example):

"quantity": {
  "type": "integer"
}

After:

"quantity": {
  "type": "integer",
  "description": "Number of items in this transaction line",
  "examples": [2]
}

🔁 Reason: Descriptions and examples help guide human annotators and improve model alignment.

4. Summary of Key Refinements

AreaChange
Data StructureNested under bookingReceipt
Formatting RuleNames must be uppercase
InstructionsMore specific, with formatting and verification guidance
Property MetadataAdded examples and clarified descriptions

Each proposed refinement is also explicitly tracked with a path, description, and reasoning—for example:

{
  "propertyPath": "$.passengers[*].firstName",
  "type": "modified",
  "description": "Passenger first names must be in uppercase letters.",
  "reasoning": "The system requires passenger names to be in uppercase to ensure consistent formatting and acceptance."
}

5. Why It Matters

This kind of refinement capability turns feedback into compounding value:

  • Reduces Rework – Fix issues at the definition level, not in post-processing.
  • Captures Tribal Knowledge – Business-specific rules like formatting preferences become part of the system.
  • Improves Generalization – Each refinement makes future extractions more accurate—even across similar document types.
  • Builds Trust – Teams can see how feedback is applied, closing the loop between input and improvement.

6. Next Steps

You can start using refinement feedback in your extraction flows right now. Whether your feedback comes from human reviewers, automated QA checks, or even LLM-based validators, you can use it to improve your definitions automatically.

And this is just the beginning—we’re working on extending refinements to cover logic-level changes, schema restructuring based on missing fields, and even multi-document alignment scenarios.

Want to try it yourself? Refine your extraction requests and feedback into our platform, and see how smarter definitions lead to smarter automation.

Leave A Comment