Extract, Understand, and Transform Unstructured Content
The Content API is your toolkit for working with raw data — text, documents, images, and more. Whether you’re building content analytics, processing sensitive information, or powering AI features, the Content API gives you everything you need to parse, analyze, and generate content with precision.
Depending on the analysis task to achieve, there are multiple ways to tackle it. Here’s the most common approach:

At a high level:
- Create an Analysis Definition that will describe the analysis to be performed
- Analysis Definitions allow you to define the analysis task to be performed once and apply it to multiple content objects afterwards
- Create the content object within the platform which will serve as the central point for future analysis tasks
- Once the content object is created, you can then proceed with the raw text extraction of the document
- This can be done by providing an Analysis Definition that dictates how the raw text extraction should occur
- For instance, you can define if you want to store the extracted text in its entirety or simply an anonymized version in the event where you don’t want to persist PII/PHI
- Now that you have the raw text, you can then proceed to performing more advanced analysis on the content, such as summarizations, extractions, assertions, etc.
Sample Walkthroughs
From PDF to JSON: Extracting Credit‑Card Statements with the Content API
🧱 Core Capabilities
📁 Content Ingestion Methods
Manage and Store Content Objects
CRUD operations let you create, read, update, and delete content objects — including documents, HTML pages, image files, and audio transcripts.
- Store the content in its original form (i.e. the original file in PDF for instance)
- Trigger text extraction and to store the raw text from the original content
- Trigger analysis tasks
🧠 Analysis Definition Methods
Define Custom Analyses
Tailor analysis workflows by defining exactly what to extract or measure.
- Define reusable analysis tasks which can then be leveraged against multiple content objects
- Currently supported
- Summarization
- Assertion
- Text extraction/scraping
- Data points extraction
- Anonymization
📊 Analysis Results Methods
Retrieve Structured Analysis Output
Fetch detailed insights from completed analyses.
- Retrieve the results of prior content analysis
Core Analysis Methods
The core analysis methods serve as the foundation to the higher level content analysis APIs. Depending on your use case, you might simply opt to use the lower level version if you don’t need the more advanced capabilities of the higher level Content API methods.
✍️ Text Extraction & Analysis
Extract Meaning from Raw Text
Built-in tools analyze structure, language, and sentiment.
- Key/value pair extraction
- Object extraction
- Sentiment and tone analysis
- Entity recognition
🔎 Extraction Methods
Pinpoint Data from Documents
Extract structured fields from PDFs, HTML, and more.
- Large Language Model based data extraction
Related Blog Posts
Confidently Extract Data with Validation and Inference Support: What’s New in the Extraction API
🧾 Assertion Methods
Validate Conditions or Presence of Key Elements
Make assertions about content using logical checks or presence tests.
📝 Summarization Methods
Summarize Text and Visual Content
Automatically condense large documents or image-based content.
- Abstractive and extractive summarization
- Context-aware multi-paragraph summarization
- Image captioning and visual summaries
✨ Generation Methods
Create Content from Prompts or Inputs
Leverage generative AI to produce new content.
- Text rewriting and auto-expansion
- Custom prompt-based generation
🛡️ Anonymization Methods
Protect Privacy, Stay Compliant
Strip sensitive or personal data from content.
- Detect and redact PII/PHI
- Replace with tokens or generalizations
- Log anonymization steps for auditing
🚫 Text Filtering Methods
Filter for Efficiency
Flag or remove unwanted or unsafe content.
- Remove unwanted text to speed up and focus downstream analysis tasks
🔧 Use It Your Way
- RESTful API with JSON input/output
- Plug into pipelines or standalone workflows