Technology
Mathieu Isabel  

Content Analyzer Updates – Content Q&A Support and More…

Content Analyzer has been augmented with a suite of new features designed to streamline the process of content analysis, especially when dealing with unstructured data. These enhancements include support for PDF content ingestion, the introduction of criteria under analysis aspects, advanced data extraction capabilities, question answering based on content analysis, and generating proposals based on assertion outcomes. Let’s dive into each of these exciting new additions.

1. Support for Question Answering Based on Content Analysis

Overview: The Content Analyzer now includes a powerful question-answering feature that allows users to query the content directly and receive precise answers based on the analysis. This enhancement leverages advanced natural language processing to provide accurate and contextually relevant responses.

Use Cases:

  • Customer Support: Quickly find answers to customer queries by analyzing support documents and FAQs.
  • Research Assistance: Get precise answers to specific questions from large volumes of research data.
  • Business Intelligence: Query business reports and documents to extract relevant insights and data points.

The ability to have a conversation with content had been present on Dretza for around a year now but with the major refactoring effort around content analysis and management, it’s now more flexible as it’s less focused on product Q&A.

One key aspect with how the Q&A functionality works in the platform is that it relies on some of the heavy lifting done through prior analyses made on top of the document. This improves relevancy of the knowledge chunks on top of which the answers are generated from.

Use Case: Questions about Credit Card Benefits

Here’s an example of how you could ask questions about the content in a particular document explaining your benefits as a cardholder.

Sometime you may want to know where the answer actually came from in the source document, clicking the Source link on the answer provides additional information. For example:

2. Support for Unstructured Content Ingestion from PDFs

Overview: The Content Analyzer now supports the ingestion of unstructured content directly from PDF files. This enhancement allows users to analyze and extract meaningful information from a wide range of documents, including reports, articles, research papers, and more. That capability has been there almost since the beginning of the development of the project but was somewhat hidden behind the scene. It’s now more cleanly integrated in the content management features of the platform.

Use Cases:

  • Document Management: Automatically analyze large volumes of PDF documents for relevant information.
  • Research and Academia: Extract and analyze data from academic papers and research documents.
  • Business Reports: Ingest and evaluate content from business reports and financial statements.

As next steps, I’d like to add support for additional file formats and also improve the extraction logic to maintain better the semantic structure of the original document.

Use Case: Analyzing Credit Card Benefits

As an example, step back and see how we originally ingested the content from the Q&A example mentioned earlier in this post. Here’s a snapshot of what the document that describes the cardholder benefits looks like:

The first step to ingest that document is to create the content entity through the API and specify the source URL of the document:

POST <api base URL>/contents
{
    "contentType": "Ad-hoc",
    "subject": "credit card benefit wording",
    "url": "https://www.tangerine.ca/fberoot/pdf/en/Tangerine_World_ECCI-SM_EN.pdf"
}

Once the content entity has been created, you can now proceed to the next step in the process to extract the content by calling the scrape method on that content:

POST <api base URL>/contents/{contentId}/scrape

The process will then proceed to fetch the document from its source a store two versions of the original text:

  • The raw text of the full document
  • A segmented version of the document based on its sections

Those will serve as the foundation to the various analyses performed on top of the content.

3. Introduction of Criteria Under Analysis Aspect

Overview: The notion of criteria under analysis aspects has been added, providing a more granular approach to content analysis. This allows users to define specific criteria that need to be evaluated within each aspect, leading to a more detailed and structured review process.

Better clarifying the specific analysis criteria under an analysis aspect improves the quality of the analysis results.

Example

Here’s an example of an analysis aspect with the new notion of criteria:

{
                    "label": "Display Quality",
                    "description": "Evaluates the resolution, color accuracy, and brightness of the device's screen.",
                    "reasoning": "High-quality display enhances user experience, especially for media consumption and productivity tasks.",
                    "criteria": [
                        {
                            "label": "Resolution",
                            "description": "The number of pixels on the screen, determining the sharpness of the display.",
                            "reasoning": "Higher resolution provides clearer and more detailed images."
                        },
                        {
                            "label": "Color Accuracy",
                            "description": "How accurately the display reproduces colors compared to real life.",
                            "reasoning": "Accurate colors are important for media editing and viewing."
                        },
                        {
                            "label": "Brightness",
                            "description": "The maximum brightness level of the display.",
                            "reasoning": "Brighter displays are easier to see in various lighting conditions."
                        }
                    ]
                }

Here’s an example of the results where we can see the aspect, the criteria under it and the associated statement from the content:

4. Support for Data Extraction from Content

Overview: The Content Analyzer can now detect an analysis objective around extracting data points. This builds on top of the extraction capabilities that have been there since the beginning of the project.

One new twist that was added recently was the ability to generate automatically the list of data points to extract given a particular objective. This has been as separate API method but is also embedded in the overall content analysis process.

Use Case – Boarding Pass Extraction

Let’s see how that looks in the Content Analyzer. Given the following boarding pass:

Let’s describe what we’re trying to do:

The engine will then proceed to determine what would be the relevant data points to extract automatically and return that as structured data.

Here’s what the results of the extraction look like:

Note this approach didn’t require any particular training.

Right now, the approach is limited to extract key/value pairs and cannot extract arrays of items at the moment. Maybe in the future!

5. Proposals Based on Assertion Outcomes

Overview: The Content Analyzer can now generate proposals based on the outcomes of assertions. This feature helps users to formulate actionable recommendations and strategies based on the analysis results, making the tool not just analytical but also prescriptive.

Use Cases:

  • Strategic Planning: Develop business strategies based on the analysis of market research and competitive intelligence.
  • Product Development: Create proposals for product improvements or new features based on customer feedback analysis.
  • Policy Making: Formulate policy recommendations based on the analysis of regulatory documents and reports.

Use Case – User Story Analysis

Here’s an example of a proposal that was done in the context of analyzing a user story. Here’s the background of what was provided to the content analyzer:

After the analysis was completed, we can dig into the results and see that two particular assertions failed but you can also see the proposal made in order to address those points.

Conclusion

The latest enhancements to the Content Analyzer significantly broaden its capabilities, making it an indispensable tool for organizations dealing with vast amounts of unstructured data. With support for PDF content ingestion, detailed criteria under analysis aspects, advanced data extraction, robust question answering, and the ability to generate actionable proposals, the Content Analyzer is poised to transform the way content is analyzed and utilized. These new features not only enhance analytical accuracy but also provide actionable insights, driving informed decision-making across various domains.

As the content analysis capabilities continue to develop to support other areas of my personal project, I’ll be exploring other areas of improvements both as a learning opportunity but also as a way to refine site and make it easier to use and consume. Stay tuned!

1 Comment

  1. […] also explored that leverages this capability, is how the refiner can be used during a user conversation to enhance the quality of the answers provided by the chatbot. For example, sometimes the question […]

Leave A Comment