Security

How to Compare File Metadata and Detect Changes Between Versions

File metadata comparison tools let you spot changes in authorship, timestamps, permissions, and embedded data between file versions. This guide covers the CLI tools, Python scripts, and platform features that forensics teams, compliance officers, and developers use to track metadata changes across files.

Fast.io Editorial Team 11 min read
Audit trail showing file metadata changes and version history

What File Metadata Comparison Actually Does

A file metadata diff tool compares the properties and attributes of two or more files to identify changes in authorship, timestamps, permissions, or embedded data between versions. Think of it as running a diff on everything except the file's visible content.

Every file carries metadata that most people never see. A JPEG stores the camera model, GPS coordinates, and the software used to edit it. A PDF embeds the author name, creation date, and the tool that generated it. A Word document tracks revision history, total editing time, and the last person who saved it.

When you compare metadata between two copies of the same file, you can answer questions that content-level diffs cannot:

  • Did someone change the "Date Created" field to make a file look older than it is?
  • Was a document opened and re-saved with different software between versions?
  • Did file permissions change when the file moved between systems?
  • Has the embedded author name been stripped or altered?

These questions matter in digital forensics, regulatory compliance, legal discovery, and software version control. Forensics investigators routinely compare metadata across dozens or hundreds of file versions during a single investigation. Metadata tampering is detectable in the vast majority of cases through timestamp inconsistency analysis and tool-signature comparison, because altering one metadata field without updating correlated fields leaves forensic artifacts behind.

Helpful references: Fast.io Workspaces, Fast.io Collaboration, Fast.io AI, and Document Data Extraction.

What to check before scaling file metadata comparison and diff tool

ExifTool, created by Phil Harvey, is the most widely used command-line tool for reading, writing, and comparing file metadata. It supports over 400 file formats and recognizes thousands of metadata tags across EXIF, IPTC, XMP, ICC Profile, and format-specific schemas.

Installing ExifTool

ExifTool runs on Windows, macOS, and Linux. Install it through your package manager or download directly from exiftool.org.

macOS (Homebrew):

brew install exiftool

Ubuntu/Debian:

sudo apt install libimage-exiftool-perl

Windows:

Download the standalone executable from exiftool.org and add it to your PATH.

Comparing Two Files Side by Side

The simplest approach extracts metadata from both files and pipes the output to your system's diff tool:

exiftool -a -G1 -s file_v1.pdf > meta_v1.txt
exiftool -a -G1 -s file_v2.pdf > meta_v2.txt
diff meta_v1.txt meta_v2.txt

The -a flag shows duplicate tags, -G1 groups tags by their metadata family (EXIF, XMP, IPTC), and -s uses short tag names for cleaner output. This approach works with any file format ExifTool supports and produces output that standard diff tools can process.

Reading Specific Tags for Targeted Comparison

When you only care about certain properties, extract just those tags:

exiftool -CreateDate -ModifyDate -Author -Software file_v1.docx file_v2.docx

This prints the selected tags for both files in sequence, making it easy to visually compare timestamps, authorship, and the software used to create each version.

Batch Comparison Across Directories

To compare metadata across all files in two directories:

exiftool -r -csv dir_v1/ > batch_v1.csv
exiftool -r -csv dir_v2/ > batch_v2.csv
diff batch_v1.csv batch_v2.csv

The -csv flag outputs structured data you can also open in a spreadsheet for side-by-side review. The -r flag recurses into subdirectories.

Audit log showing metadata changes tracked across file versions

Python Scripts for Custom Metadata Diffing

When ExifTool's command-line output does not fit your workflow, Python gives you full control over extraction, comparison logic, and output formatting. The standard library covers filesystem metadata, while third-party packages handle format-specific embedded data.

Comparing Filesystem Metadata with os.stat

Python's os.stat() returns the filesystem-level properties of a file, including size, permissions, and timestamps:

import os
from datetime import datetime

def get_file_meta(path):
    stat = os.stat(path)
    return {
        "size": stat.st_size,
        "modified": datetime.fromtimestamp(stat.st_mtime),
        "created": datetime.fromtimestamp(stat.st_ctime),
        "permissions": oct(stat.st_mode),
    }

meta_a = get_file_meta("report_v1.pdf")
meta_b = get_file_meta("report_v2.pdf")

for key in meta_a:
    if meta_a[key] != meta_b[key]:
        print(f"{key}: {meta_a[key]} -> {meta_b[key]}")

This catches permission changes, size differences, and timestamp shifts. It does not read embedded metadata like EXIF or XMP, which lives inside the file's binary content rather than the filesystem.

Comparing Embedded Metadata with PyExifTool

For embedded metadata, the PyExifTool package wraps ExifTool in a Python-friendly interface:

import exiftool

files = ["contract_v1.pdf", "contract_v2.pdf"]

with exiftool.ExifToolHelper() as et:
    metadata = et.get_metadata(files)

meta_a, meta_b = metadata[0], metadata[1]
all_keys = set(meta_a.keys()) | set(meta_b.keys())

for key in sorted(all_keys):
    val_a = meta_a.get(key)
    val_b = meta_b.get(key)
    if val_a != val_b:
        print(f"{key}:")
        print(f"  v1: {val_a}")
        print(f"  v2: {val_b}")

Hashing for Integrity Verification

To confirm whether the actual file content changed (not just metadata), combine metadata comparison with content hashing:

import hashlib

def file_hash(path, algo="sha256"):
    h = hashlib.new(algo)
    with open(path, "rb") as f:
        while chunk := f.read(65536):
            h.update(chunk)
    return h.hexdigest()

hash_a = file_hash("document_v1.pdf")
hash_b = file_hash("document_v2.pdf")

if hash_a == hash_b:
    print("File content is identical; check metadata for differences")
else:
    print("File content has changed")

When the hash matches but metadata differs, you know someone edited properties like the author name or timestamps without changing the document's content. That pattern is a red flag in forensic investigations.

Fastio features

Track File Changes Without Running Diff Scripts

Fast.io logs every file operation with timestamps and user identity. Enable Intelligence on a workspace and your files are automatically indexed for search and comparison. 50 GB free, no credit card required. Built for file metadata comparison and diff tool workflows.

Use Cases: Forensics, Compliance, and Version Control

Metadata comparison is not an abstract exercise. It solves concrete problems across several fields.

Digital Forensics

Forensic investigators compare metadata to establish whether evidence has been tampered with. A photo presented as evidence in court should have consistent metadata: the creation timestamp should precede the modification timestamp, the camera model should match the device in question, and the GPS coordinates should align with the claimed location.

When someone alters a file's "Date Created" to backdate a document, they rarely update all correlated timestamp fields. NTFS filesystems maintain separate timestamps in the Master File Table ($MFT) and the USN Journal, and inconsistencies between these records are a strong indicator of manipulation. Forensic tools like Autopsy, FTK, and X-Ways Forensics automate these cross-reference checks.

The examination typically involves:

  • Extracting metadata from the original evidence and any copies
  • Comparing timestamps across EXIF, filesystem, and journal records
  • Identifying tool signatures (what software created or modified the file)
  • Flagging inconsistencies that suggest post-creation editing

Regulatory Compliance

Industries with document retention requirements need to prove that files have not been altered after a certain date. Financial services firms, healthcare organizations, and government agencies compare metadata to verify document integrity during audits.

A compliance workflow might extract metadata from every version of a regulated document, compare timestamps and author fields across versions, and flag any version where the "last modified" date precedes the "created" date of the previous version, since that ordering is physically impossible without tampering.

Software Version Control

Developers compare file metadata to debug build issues, verify deployment artifacts, and track asset provenance. When a compiled binary behaves differently between environments, comparing its metadata (compiler version, build timestamp, signing certificate) can reveal the root cause faster than reading source diffs.

Asset pipelines for games and media applications also depend on metadata comparison. When a texture file passes through multiple tools, comparing the embedded metadata at each stage confirms that color profiles, resolution, and compression settings were preserved correctly.

Dedicated Comparison Tools Beyond ExifTool

ExifTool handles most metadata comparison tasks, but several other tools serve specific niches.

Beyond Compare Beyond Compare by Scooter

Software supports file metadata comparison alongside its content diffing capabilities. It can display EXIF data for images in its picture comparison mode and lets you define custom comparison criteria. This is useful when you want a visual side-by-side view rather than command-line output.

Best for: Teams that already use Beyond Compare for code review and want metadata comparison in the same tool.

Metadata2Go

Metadata2Go is a free online tool that extracts and displays metadata from uploaded files. It supports comparison between two files by running separate extractions and highlighting differences. No installation required.

Best for: Quick one-off comparisons when you do not want to install command-line tools.

online-metadata.com

This web-based tool lets you upload two files and view their metadata side by side with differences highlighted. It supports images, PDFs, and office documents.

Best for: Non-technical users who need a visual comparison without learning CLI commands.

X-Ways Forensics

X-Ways

Forensics includes a dedicated "Compare Data" feature designed for forensic investigators. It creates a search list of all differences between two disk images or file sets, including metadata discrepancies. It works alongside the broader X-Ways investigation workflow.

Best for: Professional forensic investigators working on legal cases.

Fast.io Audit Trails and Metadata Views

For teams that need ongoing metadata tracking rather than one-time comparisons, Fast.io's audit trail logs every file operation with timestamps, user identity, and action type. Metadata Views add a structured layer on top: define extraction columns (author, creation date, software version, last modified by) in natural language, and the AI populates a sortable, filterable grid across all files in the workspace. This creates a persistent comparison baseline without running scripts. When a new file version is uploaded, the extracted fields update automatically, so compliance teams can spot changes in authorship, tool signatures, or timestamps at a glance.

Best for: Teams managing shared file workflows who need persistent, queryable metadata records alongside their file storage.

File organization hierarchy showing permissions and access controls

Building a Metadata Comparison Workflow

A repeatable metadata comparison process saves time and reduces the chance of missing changes. Here is a practical workflow you can adapt to your needs.

Step 1: Define What You Are Comparing

Start by listing the metadata fields that matter for your use case. Forensics teams care about timestamps, GPS data, and tool signatures. Compliance teams focus on author fields, modification dates, and document versions. Developers track compiler versions, build timestamps, and signing certificates.

Narrowing your scope prevents information overload. A JPEG can carry hundreds of metadata tags, but most comparisons only need a dozen.

Step 2: Extract Metadata Consistently

Use the same tool and flags for every extraction. If you extract metadata from version 1 with exiftool -a -G1 -s and version 2 with exiftool -G1, the output formats will not match and your diff will show false positives.

For automated pipelines, write a wrapper script that standardizes extraction:

#!/bin/bash
exiftool -a -G1 -s -json "$1" > "${1%.}_meta.json"

JSON output is easier to parse programmatically than ExifTool's default text format.

Step 3: Compare and Filter

Run your comparison and filter out noise. Some metadata fields change every time a file is opened (like "last accessed" timestamps) and are rarely meaningful. Exclude those from your comparison unless access tracking is specifically what you are investigating.

Step 4: Log and Store Results

Store comparison results alongside the files they describe. For ongoing compliance workflows, this means keeping a record of every comparison run, what was compared, and what differences were found.

Platforms like Fast.io handle this automatically through audit trails. Every file upload, download, and modification is logged with timestamps and user identity. For teams managing shared workspaces, this eliminates the need to run separate metadata extraction scripts because the platform captures the change history as it happens.

Step 5: Automate for Recurring Checks

If you compare metadata regularly, automate the extraction and comparison steps. A cron job that runs ExifTool against a watched directory, diffs the output against the previous run, and emails a report on changes can catch unauthorized modifications within hours instead of weeks.

For teams using Fast.io, webhooks provide a cleaner alternative to polling. Configure a webhook to fire when files change in a workspace, then run your metadata comparison script in response to each event.

Frequently Asked Questions

How do you compare metadata between two files?

The most common method uses ExifTool on the command line. Extract metadata from both files with 'exiftool -a -G1 -s' and pipe the results through your system's diff command. For programmatic comparison, Python's os.stat() handles filesystem metadata while PyExifTool wraps ExifTool for embedded metadata like EXIF, XMP, and IPTC tags.

What tools can diff file metadata?

ExifTool is the standard CLI tool for metadata extraction and comparison across 400+ file formats. Beyond Compare offers visual metadata comparison alongside content diffs. Online tools like Metadata2Go and online-metadata.com provide browser-based comparison without installation. For forensic investigations, X-Ways Forensics and Autopsy include specialized metadata comparison features.

How do you detect if file metadata has been modified?

Look for inconsistencies across correlated metadata fields. When someone changes a 'Date Created' timestamp, they rarely update all related records. On NTFS filesystems, comparing Master File Table timestamps against USN Journal entries reveals manipulation. Tool-signature analysis can also show if a file was re-saved with different software than originally created it.

Can you track metadata changes over time?

Yes. For manual tracking, schedule regular ExifTool extractions and diff each run against the previous one. For automated tracking, platforms like Fast.io log every file operation in an audit trail with timestamps and user identity. Webhook integrations can trigger metadata comparison scripts whenever files change in a workspace.

What is the difference between filesystem metadata and embedded metadata?

Filesystem metadata includes properties managed by the operating system: file size, permissions, creation date, and modification date. You can read these with os.stat() in Python or 'stat' on the command line. Embedded metadata lives inside the file itself and includes format-specific data like EXIF camera info in photos, author fields in documents, and XMP tags in creative files. Changing embedded metadata requires tools that understand the file format.

Does copying a file preserve its metadata?

It depends on the method. A standard file copy on most operating systems preserves embedded metadata (EXIF, XMP) but resets filesystem timestamps like the creation date. Transferring files via email or cloud upload may strip certain metadata fields. Forensic imaging tools create exact bit-for-bit copies that preserve all metadata, which is why they are used for evidence collection.

Related Resources

Fastio features

Track File Changes Without Running Diff Scripts

Fast.io logs every file operation with timestamps and user identity. Enable Intelligence on a workspace and your files are automatically indexed for search and comparison. 50 GB free, no credit card required. Built for file metadata comparison and diff tool workflows.