
Refining Extraction Requests with Feedback: A Smarter Way to Boost Data Accuracy
In document intelligence workflows, accurate information extraction is critical—but often elusive. Whether due to formatting quirks, ambiguous structures, or specific business rules (like how names should appear), getting perfect results from the first extraction pass isn’t always feasible.
That’s why we’re introducing a powerful new capability: refinable extraction requests. This lets you use structured feedback—whether from a human reviewer or an LLM—to automatically improve the definition of an extraction request. The result? Better performance, greater alignment with real-world needs, and fewer cycles of trial and error.
Let’s take a look at how this works through a real-world example involving a booking receipt.
1. The Original Extraction Request
In this example, the goal is to extract structured data from a booking receipt that includes both passenger names and travel-related transactions.
The original request defines an object with two arrays: passengers
and transactions
. It includes validation rules and uses sample input that resembles a typical travel invoice.
Key fields:
- Passengers – Each with a
firstName
andlastName
. - Transactions – Including
transactionDate
,description
,category
,quantity
,unitPrice
, andsubTotal
.
The request also uses advanced validation and provides ground truth data for benchmarking.
But after running this request, feedback revealed a formatting issue.
2. The Feedback Loop
A reviewer submitted feedback noting:
“Passenger names should have been in upper case.”
They marked this as a formatting requirement that had not been enforced in the original request. While the extracted values were correct, the format was not acceptable to downstream systems.
This is the kind of issue that often gets flagged late in the workflow—by QA teams, integration layers, or customers. Now, instead of reworking the request manually, the feedback is structured and submitted alongside the original extraction.
3. The Refinement Process
The refinement engine produced a refined extraction request based on the feedback, along with a clear summary of changes. Below are the actual changes, with before
and after
views for each key area.
✅ 1. Schema Restructuring
Before:
{
"schema": {
"type": "object",
"properties": {
"passengers": { ... },
"transactions": { ... }
}
}
}
After:
{
"schema": {
"type": "object",
"properties": {
"bookingReceipt": {
"type": "object",
"properties": {
"passengers": { ... },
"transactions": { ... }
}
}
}
}
}
🔁 Reason: Grouping related fields under bookingReceipt
improves organization and allows for easier expansion later.
✅ 2. Formatting Enforcement
Before:
"firstName": {
"type": "string",
"description": "Passenger first name"
}
After:
"firstName": {
"type": "string",
"description": "Passenger first name (must be in uppercase)",
"examples": ["JANE"]
}
Same applies to lastName
:
Before:
"lastName": {
"type": "string",
"description": "Passenger last name"
}
After:
"lastName": {
"type": "string",
"description": "Passenger last name (must be in uppercase)",
"examples": ["DOE"]
}
🔁 Reason: Ensures downstream systems that require uppercase input (e.g. flight booking systems) receive it in the correct format.
✅ 3. Instruction Improvements
Before:
"instructions": "Extract the requested information from the receipt."
After:
"instructions": "Extract all passengers and transactions from the receipt. Ensure that all passenger names are fully capitalized (uppercase). Double-check for accuracy and consistency before submitting the extracted values."
🔁 Reason: Clearer expectations and actionable formatting guidance reduce ambiguity and improve consistency.
✅ 4. Property Descriptions and Examples
More descriptive metadata was added across properties:
Before (transaction field example):
"quantity": {
"type": "integer"
}
After:
"quantity": {
"type": "integer",
"description": "Number of items in this transaction line",
"examples": [2]
}
🔁 Reason: Descriptions and examples help guide human annotators and improve model alignment.
4. Summary of Key Refinements
Area | Change |
---|---|
Data Structure | Nested under bookingReceipt |
Formatting Rule | Names must be uppercase |
Instructions | More specific, with formatting and verification guidance |
Property Metadata | Added examples and clarified descriptions |
Each proposed refinement is also explicitly tracked with a path, description, and reasoning—for example:
{
"propertyPath": "$.passengers[*].firstName",
"type": "modified",
"description": "Passenger first names must be in uppercase letters.",
"reasoning": "The system requires passenger names to be in uppercase to ensure consistent formatting and acceptance."
}
5. Why It Matters
This kind of refinement capability turns feedback into compounding value:
- ✅ Reduces Rework – Fix issues at the definition level, not in post-processing.
- ✅ Captures Tribal Knowledge – Business-specific rules like formatting preferences become part of the system.
- ✅ Improves Generalization – Each refinement makes future extractions more accurate—even across similar document types.
- ✅ Builds Trust – Teams can see how feedback is applied, closing the loop between input and improvement.
6. Next Steps
You can start using refinement feedback in your extraction flows right now. Whether your feedback comes from human reviewers, automated QA checks, or even LLM-based validators, you can use it to improve your definitions automatically.
And this is just the beginning—we’re working on extending refinements to cover logic-level changes, schema restructuring based on missing fields, and even multi-document alignment scenarios.
Want to try it yourself? Refine your extraction requests and feedback into our platform, and see how smarter definitions lead to smarter automation.