Dynamic Structuring of Content Analysis
While working on the next evolution of how content is processed on my personal project with Dretza, I had to design and build new APIs to assist in that area. These APIs offer powerful capabilities for generating assertions, determining critical aspects for content reviews and automate the content analysis process itself. There was also a few updates to the content assertion API to improve flexibility and output quality.
The overarching goal with the recent updates was to automate a big portion of the onboarding of new product categories in the platform. These APIs get used in various stages of both the onboarding of product categories but also the ingestion of the specific products in the catalog. Those capabilities are also core at being able to do that dynamically based on specific product research inquiries on the platform.
While designing for that purpose, I tried to keep the design as generic as possible so the APIs could apply to use cases outside of the scope of that particular site.
Here’s a closer look at each API and its potential use cases.
1. Generate Analysis Aspects
The Generate Analysis Aspects API dynamically determines a list of critical aspects to consider when reviewing a particular item. This API ensures that all essential factors are covered, providing a comprehensive review framework.
Examples
Here are a couple of simple outputs from the API:
User Feedback Analysis
In the following example, we’re asking to generate the various aspects to consider when analyzing user feedback for an application:
Product Type Review Analysis
[
{
"label": "Usability",
"description": "Evaluate how intuitive and user-friendly the application is.",
"reasoning": "Usability directly impacts user satisfaction and can highlight areas where users struggle to navigate or use the application effectively."
},
{
"label": "Performance",
"description": "Assess the speed, responsiveness, and stability of the application.",
"reasoning": "Performance issues can lead to frustration and decreased productivity, making it crucial to identify and address any lag, crashes, or slow load times."
},
{
"label": "Functionality",
"description": "Examine whether the application meets the users' needs and performs its intended functions correctly.",
"reasoning": "Ensuring that the application fulfills its core purposes is essential for user satisfaction and overall effectiveness."
},
{
"label": "Accessibility",
"description": "Determine how accessible the application is to users with disabilities.",
"reasoning": "Accessibility is important for inclusivity and can expand the user base by making the application usable for people with various impairments."
},
{
"label": "Aesthetics",
"description": "Consider the visual appeal and design consistency of the application.",
"reasoning": "Aesthetics can influence user perception and engagement, making the application more attractive and enjoyable to use."
},
... rest of aspects removed to keep it shorter
]
In this example, we’re asking for review aspects to analyze reviews about soundbars.
[
{
"label": "Sound Quality",
"description": "The overall audio performance, including clarity, bass, treble, and balance.",
"reasoning": "Sound quality is the primary function of a soundbar, and it directly impacts the listening experience."
},
{
"label": "Connectivity",
"description": "The types and number of input/output options available, such as HDMI, Bluetooth, and optical inputs.",
"reasoning": "Connectivity options determine how easily the soundbar can be integrated with other devices and systems."
},
{
"label": "Ease of Use",
"description": "The user-friendliness of the soundbar, including setup, controls, and remote functionality.",
"reasoning": "A soundbar that is difficult to use can frustrate users and detract from the overall experience."
},
{
"label": "Design and Build Quality",
"description": "The aesthetic appeal and durability of the soundbar, including materials used and overall construction.",
"reasoning": "A well-designed and sturdy soundbar not only looks good but also tends to last longer."
},
... rest of aspects removed to keep it shorter
]
2. Analyze Content
The Analyze Text API extracts relevant information from content based on a predefined list of analysis aspects. This API automates the extraction process, ensuring that all critical information is captured and analyzed efficiently. This is the logical successor to the previous step.
Example
Analyzing User Feedback
In the upcoming example, we’ll analyze a fictitious piece of feedback that was received from a user.
Request
[
{
"subject":"Application Review",
"analysisAspects":[
... the analysis aspects from previous step
],
"segments": [
{
"text": "I had issues all day with the application. Every page transition was slow for some reason. I also have a hard time understanding the forms as there's very little explanation as to what I need to provide. The document upload experience was terrible. You ask way too much information for a simple request."
}
]
}
]
Response
Note the following from the output below:
- Each of the analysis aspect that were picked-up during the analysis are returned in a structured way
- The relevant statements from the user feedback are extracted and are labeled in positivity and under the proper aspect
- A short summary of the statement is also provided as well as the rationale behind that
With this structured data now in hand, one could do further analysis on that data. In the context of product review, it could look something like this:


The API response itself with some of the content removed to keep it short.
[
{
"contentAnalysis": {
"positivityRatio": 0.0,
"aspects": [
{
"name": "Usability",
"positivityRatio": 0.0,
"analysisStatements": [
{
"sentiment": "negative",
"extract": [
{
"languageCode": "en_us",
"value": "I had issues all day with the application."
}
],
"summary": [
{
"languageCode": "en_us",
"value": "Issues with the application all day."
}
],
"sentimentRationale": [
{
"languageCode": "en_us",
"value": "The user experienced ongoing problems with the application, indicating poor usability."
}
],
"metadata": {
"source": "",
"baseSource": "",
"scope": "",
"title": "",
"contentType": ""
}
}
]
}
]
}
}
]
3. Generate Assertions
Overview: The Generate Assertions API is designed to dynamically generate a set of relevant assertions for a particular content to support a specific conclusion. For instance, if you need to determine the relevance of a specific item, this API can create assertions that justify its relevance based on the content provided.
Use Cases:
- Product Reviews: Identify key aspects like quality, durability, user experience, and price when reviewing products.
- User Feedback Reviews: Identify key areas of improvements or focus given customer feedback about a product or solution.
- Content Quality Assessment: Determine critical factors such as readability, accuracy, engagement, and relevance in content evaluation.
Use Cases:
- Content Validation: Automatically generate assertions to verify the validity of information in articles, reports, or research papers.
- Decision Support: Aid in decision-making processes by generating assertions that support or refute specific conclusions.
Example
Let’s see how that API works. In order to generate the assertions, you need to provide some key information:
- Content to help the engine understand what kind of content you’re dealing with
- A conclusion as to what you’re trying to validate
In the example below, we’re trying to check whether a particular search engine result is relevant to what we’re after. We’ve all been there when the content the search engine thinks is the most relevant is not exactly what we’re looking for.
[
{
"assertionMethodVersion":"v2",
"assertionReasoningLevel": "advanced",
"content": "Content Title: iPad Air 6th Generation Specs\nContent Snippet: Here are the specifications for the iPad Air 6th generation. Apple M2",
"conclusion":"Determine if the content is relevant to the topic: iPad Air 2024"
}
]
Result
As you can see in the output below, a list of assertions to be made against the content have been generated. Also note how the API provides reasoning as to why that assertion would be important in order to reach the conclusion.
[
{
"content": "Content Title: iPad Air 6th Generation Specs\nContent Snippet: Here are the specifications for the iPad Air 6th generation. Apple M2",
"conclusion": "Determine if the content is relevant to the topic: iPad Air 2024",
"assertions": [
{
"assertionInstruction": "Check if the content mentions the release year of the iPad Air 6th generation.",
"reasoning": "The release year is a critical piece of information to determine if the content is relevant to the topic 'iPad Air 2024'.",
"outcomeDataType": "boolean",
"possibleOutcomes": [
"true",
"false"
]
},
{
"assertionInstruction": "Verify if the content includes the term '2024'.",
"reasoning": "The presence of the term '2024' would directly indicate relevance to the topic 'iPad Air 2024'.",
"outcomeDataType": "boolean",
"possibleOutcomes": [
"true",
"false"
]
},
... other assertions left out to keep the example brief
]
}
]
4. Content Assertion Updates
In a previous post, I went over the content assertion API. Since then, the API received a few enhancements. Here’s some of the improvements:
- In order to guide better the assertion, the API can now accept reasoning to provide additional context while running the assertion against the content.
- Better control on how much reasoning/assertion behavior
- You can ask the API to provide reasoning on the assertion output
- Depending on the complexity of the assertion to be made, you can adjust the level of reasoning required. This was done in order to keep cost down for simple assertions.
Example
Continuing with the previous example, here’s a sample request to make some assertion about the content and return structured information.
- Note how each assertion output have different reasoning level and output requirement.
Request
[
{
"assertionMethodVersion":"v2",
"assertionReasoningLevel":"advanced",
"content": "Content Title: iPad Air 6th Generation Specs\nContent Snippet: Here are the specifications for the iPad Air 6th generation. Apple M2",
"desiredAssertions": [
{
"assertionInstruction": "Check if the content mentions the release year 2024.",
"reasoning": "The topic is specifically about the iPad Air 2024, so the content should mention the release year to be relevant.",
"assertionReasoningLevel" : "advanced",
"provideReasoning": true,
"outcomeDataType": "boolean",
"possibleOutcomes": [
"true",
"false"
]
},
{
"assertionInstruction": "Verify if the content specifies the generation of the iPad Air.",
"reasoning": "The content should mention the generation to ensure it is discussing the correct model of the iPad Air.",
"assertionReasoningLevel" : "basic",
"provideReasoning": false,
"outcomeDataType": "boolean",
"possibleOutcomes": [
"true",
"false"
]
}
]
}
]
Result
Given the above request, here’s what the API returns. Note the following:
- The assertion returns the most relevant outcome given the requested assertion
- When required, the assertion API will provide reasoning as to why it picked a particular outcome for the assertion.
[
{
"content": "Content Title: iPad Air 6th Generation Specs\nContent Snippet: Here are the specifications for the iPad Air 6th generation. Apple M2",
"assertions": [
{
"assertionInstruction": "Check if the content mentions the release year 2024.",
"outcome": "false",
"reasoning": "The provided content snippet does not mention the release year 2024. It only mentions the specifications for the iPad Air 6th generation and the Apple M2 chip."
},
{
"assertionInstruction": "Verify if the content specifies the generation of the iPad Air.",
"outcome": "true"
}
]
}
]
Content Analyzer – Building on top of Dynamic Structuring of Content Analysis | Mathieu Isabel's Weblog
[…] my previous post, I went to some level of technical details about core API capabilities that were built to support a […]