How do I see metadata in a PowerPoint file?

Open the file in PowerPoint and go to File > Info to see basic properties like author, title, and dates. For a deeper inspection, click Check for Issues > Inspect Document to scan for hidden metadata including comments, speaker notes, custom XML, and revision tracking data. For programmatic access, use python-pptx in Python or extract the .pptx as a ZIP archive and read the XML files in the docProps folder.

Do PowerPoint speaker notes count as metadata?

Yes. Speaker notes are stored as XML inside the .pptx archive and travel with the file when shared. They are not visible during a normal slideshow, but anyone who opens the file in PowerPoint can view them through the Notes pane or Notes Page view. The Document Inspector categorizes them as hidden content and can remove them in bulk.

How do I remove hidden data from PowerPoint before sharing?

Use the Document Inspector: go to File > Info > Check for Issues > Inspect Document. Run the inspection on all categories, then click Remove All for each category with findings. Always do this on a copy of the original, because removal cannot be undone. For batch cleanup, use python-pptx or a document sanitization API to strip properties and notes programmatically.

What metadata do embedded images in PowerPoint carry?

Embedded images can retain their original EXIF metadata, which may include GPS coordinates, camera model, capture timestamp, lens information, and software used for editing. This metadata persists when the image is inserted into a presentation. The Document Inspector does not scan embedded media EXIF data, so you need to extract images from the ppt/media/ folder inside the .pptx archive and use an EXIF reader to audit them.

Can I extract metadata from older .ppt files?

Yes, but the approach differs. Older .ppt files use a binary format rather than the ZIP-based Open XML format. Libraries like Apache POI (Java) or python-pptx (which only supports .pptx) cannot read them directly. For .ppt files, you can use the COM automation approach on Windows, convert them to .pptx first, or use libraries like Aspose.Slides that support both formats.

How do I extract metadata from PowerPoint files in bulk?

Write a script that walks your file directory and processes each .pptx file. Python with python-pptx is the most common approach. Extract core properties, count slides with notes, and inventory embedded media for each file. Output the results to JSON or CSV for compliance review. For cloud-stored files, platforms like Fast.io with Intelligence Mode can auto-index and extract document metadata at upload time.

How to Extract Metadata from PowerPoint Presentations

What Counts as PowerPoint Metadata

PowerPoint metadata goes well beyond the author name and creation date most people think of. A .pptx file is actually a ZIP archive built on the Open Packaging Conventions (OPC) standard. Inside that archive, metadata lives in several distinct locations.

Document properties sit in the docProps/ folder. The core.xml file stores Dublin Core fields like title, subject, author, keywords, description, last modified by, revision number, and creation/modification timestamps. The app.xml file stores application-level properties: which version of PowerPoint created the file, total editing time, slide count, hidden slide count, and company name.

Slide-level metadata includes speaker notes attached to each slide, comments and annotations left by reviewers, and hidden slides that remain in the file but are not shown during playback.

Embedded media metadata is the layer most people miss. Every image, video, or audio file embedded in a presentation carries its own metadata. A photo dropped into a slide might still contain EXIF data with GPS coordinates, camera model, and capture timestamp. An embedded Excel chart carries its own document properties. This creates a metadata chain where one presentation can leak information from dozens of source files.

Revision and tracking data includes the names of everyone who has saved the file, the unique machine identifiers (GUIDs) of computers that touched it, and timestamps for every edit session.

Helpful references: Fast.io Workspaces, Fast.io Collaboration, Fast.io AI, and Document Data Extraction.

What to check before scaling metadata extraction from powerpoint presentations

Microsoft PowerPoint includes the Document Inspector, which can surface most metadata categories without any third-party software.

To access it, open your presentation and go to File > Info > Check for Issues > Inspect Document. The inspector scans for:

Document properties and personal information
Comments and annotations
Speaker notes on all slides
Hidden slides
Custom XML data
Embedded document content
Content add-ins and task pane add-ins

After the scan completes, each category shows whether items were found. You can click Remove All next to any category to strip that metadata. Always work on a copy of the original file, because removal is not reversible.

Limitations to know about. The Document Inspector cannot detect metadata embedded inside grouped objects or complex OLE (Object Linking and Embedding) items. If someone embedded a Word document inside your presentation, the Word file's own metadata survives inspection. Older versions of PowerPoint also cannot detect revision tracking data added by newer Microsoft 365 builds, so cross-version workflows create blind spots.

For a quick check without opening PowerPoint, right-click the file in Windows Explorer and select Properties > Details. This shows core properties like author, title, and dates, but it will not surface speaker notes or embedded media metadata.

Extracting Metadata Programmatically

When you need to process dozens or hundreds of presentations, manual inspection does not scale. Programmatic extraction gives you structured access to every metadata layer.

Python with python-pptx

The python-pptx library provides direct access to core document properties:

from pptx import Presentation

prs = Presentation("quarterly-review.pptx")
props = prs.core_properties

print(f"Title: {props.title}")
print(f"Author: {props.author}")
print(f"Last modified by: {props.last_modified_by}")
print(f"Created: {props.created}")
print(f"Modified: {props.modified}")
print(f"Revision: {props.revision}")
print(f"Keywords: {props.keywords}")
print(f"Subject: {props.subject}")

To extract speaker notes from every slide:

for i, slide in enumerate(prs.slides):
    notes = slide.notes_slide
    if notes and notes.notes_text_frame:
        text = notes.notes_text_frame.text
        if text.strip():
            print(f"Slide {i + 1} notes: {text}")

The ZIP Extraction Method

Since .pptx files are ZIP archives, you can extract raw XML metadata without any presentation library:

import zipfile
from xml.etree import ElementTree

with zipfile.ZipFile("quarterly-review.pptx", "r") as z:
    core = z.read("docProps/core.xml")
    app = z.read("docProps/app.xml")

tree = ElementTree.fromstring(core)
    for elem in tree.iter():
        if elem.text and elem.text.strip():
            tag = elem.tag.split("}")[-1]
            print(f"{tag}: {elem.text}")

This approach also lets you inventory embedded media files by listing everything in the ppt/media/ directory inside the archive, then extracting individual files to read their EXIF or XMP metadata separately.

PowerShell for Windows Environments

For IT teams working in Windows-heavy environments:

$shell = New-Object -ComObject Shell.Application
$folder = $shell.Namespace("C:\Presentations")
$file = $folder.ParseName("quarterly-review.pptx")
for ($i = 0; $i -lt 300; $i++) {
    $name = $folder.GetDetailsOf($null, $i)
    $value = $folder.GetDetailsOf($file, $i)
    if ($value) { Write-Output "$name = $value" }
}

Audit log showing extracted document metadata and properties

Centralize Presentation Metadata and Audit Trails

Fast.io Metadata Views extract author, slide count, speaker notes status, and any field you describe into a queryable grid across all your presentations. No scripts or templates needed. 50 GB free, no credit card required.

Speaker Notes and Hidden Content Risks

Speaker notes are the most commonly leaked metadata in shared presentations. They often contain internal talking points, client-specific pricing, competitive intelligence, or candid commentary that was never meant to leave the organization.

The risk is straightforward: when you export a .pptx to PDF, speaker notes are stripped by default. But when you share the .pptx file directly, every note on every slide travels with it. Recipients can view notes by switching to Notes Page view or opening the Notes pane.

Hidden slides present a similar problem. A presentation might contain slides with draft pricing, internal strategy, or rejected concepts that were hidden rather than deleted. The slides remain fully readable in the file. Anyone who opens it in PowerPoint can unhide them with a right-click.

Embedded content creates deeper exposure. Over 60% of corporate presentations contain embedded media with its own metadata layer. A product photo might carry EXIF GPS coordinates revealing where the image was taken. An embedded Excel file might expose formulas, named ranges, or hidden sheets with sensitive data. A linked OLE object might reference a network path that reveals internal server names or directory structures.

For compliance teams, the risk compounds with volume. A single presentation shared externally might contain metadata from the original author, three reviewers, two embedded spreadsheets, and fifteen photographs, each carrying its own metadata chain. Auditing this manually is not realistic at scale.

Bulk Extraction for Compliance and Auditing

Organizations that handle sensitive presentations regularly need systematic extraction workflows rather than file-by-file inspection.

Building a Metadata Inventory

A practical approach is to build a script that walks a directory of .pptx files and produces a structured inventory:

import os
import json
from pptx import Presentation

def extract_metadata(filepath):
    prs = Presentation(filepath)
    props = prs.core_properties
    notes_count = sum(
        1 for s in prs.slides
        if s.notes_slide and
        s.notes_slide.notes_text_frame and
        s.notes_slide.notes_text_frame.text.strip()
    )
    return {
        "file": os.path.basename(filepath),
        "author": props.author,
        "last_modified_by": props.last_modified_by,
        "created": str(props.created),
        "modified": str(props.modified),
        "revision": props.revision,
        "slide_count": len(prs.slides),
        "slides_with_notes": notes_count,
    }

results = []
for root, dirs, files in os.walk("/path/to/presentations"):
    for f in files:
        if f.endswith(".pptx"):
            path = os.path.join(root, f)
            results.append(extract_metadata(path))

with open("metadata_inventory.json", "w") as out:
    json.dump(results, out, indent=2)

This gives compliance teams a searchable record of who created what, when it was last touched, and which files contain speaker notes that need review before external sharing.

Cloud-Based Extraction at Scale

For teams that store presentations in cloud workspaces, platforms with built-in intelligence features can automate the extraction step. Fast.io's Intelligence Mode auto-indexes uploaded files for semantic search and summarization. For structured extraction, Metadata Views go further: describe the columns you need in plain English (author, slide count, has speaker notes, last modified by, company name) and the AI extracts those fields into a sortable, filterable grid across every presentation in the workspace. No scripts, no templates. Add new extraction columns later, like "contains external data connections" or "hidden slide count," without reprocessing existing files.

This approach works well alongside programmatic methods. Use Metadata Views for the document-level properties and audit layer that compliance teams check most often, and use scripts for deep extraction (embedded media EXIF, OLE metadata chains) that requires format-specific parsing.

Other options for cloud-based extraction include GroupDocs (which offers both API and web-based extraction) and Aspose.Slides (with .NET and Python SDKs for server-side processing).

Neural indexing of documents for automated metadata extraction

Cleaning Metadata Before Sharing

Extraction is half the equation. The other half is making sure metadata is cleaned before presentations leave your organization.

Manual Cleanup Workflow

Make a copy of the original file. Never clean the master copy.
Open the copy in PowerPoint. Go to File > Info > Check for Issues > Inspect Document.
Check all categories and click Inspect.
Click Remove All for each category that found results.
Save and close.
Verify by reopening and running the inspector again.

Automated Cleanup

For batch processing, combine extraction with removal. The python-pptx library lets you clear core properties:

from pptx import Presentation

prs = Presentation("outgoing-deck.pptx")
props = prs.core_properties
props.author = ""
props.last_modified_by = ""
props.comments = ""
props.keywords = ""
props.subject = ""

for slide in prs.slides:
    notes = slide.notes_slide
    if notes and notes.notes_text_frame:
        for paragraph in notes.notes_text_frame.paragraphs:
            for run in paragraph.runs:
                run.text = ""

prs.save("outgoing-deck-clean.pptx")

Remember that this clears core properties and speaker notes, but does not reach embedded media metadata or OLE object properties. For complete sanitization, you need to also extract and re-embed media files after stripping their EXIF data, or use a dedicated document sanitization tool.

Setting Up a Pre-Share Checklist

For teams that regularly share presentations externally, build a standard checklist:

Run Document Inspector and remove all findings
Check for hidden slides and either delete or unhide them
Review embedded media for EXIF data (especially GPS coordinates)
Verify that linked OLE objects do not reference internal network paths
Confirm that the file's revision history does not expose sensitive contributor names
Store the cleaned version in a dedicated outgoing workspace with audit logging enabled

How to Extract Metadata from PowerPoint Presentations

What Counts as PowerPoint Metadata

What to check before scaling metadata extraction from powerpoint presentations

Extracting Metadata Programmatically

Python with python-pptx

The ZIP Extraction Method

PowerShell for Windows Environments

Centralize Presentation Metadata and Audit Trails

Speaker Notes and Hidden Content Risks

Bulk Extraction for Compliance and Auditing

Building a Metadata Inventory

Cloud-Based Extraction at Scale

Cleaning Metadata Before Sharing

Manual Cleanup Workflow

Automated Cleanup

Setting Up a Pre-Share Checklist

Frequently Asked Questions

Related Resources

Centralize Presentation Metadata and Audit Trails