AI & Agents

How to Extract File Metadata with C# and .NET Libraries

C# has several mature libraries for reading metadata from files, each targeting different formats. This guide compares MetadataExtractor for images, TagLib# for audio, iText7 for PDFs, and built-in .NET APIs for basic file properties. You will learn how to install each library, write extraction code, handle cross-platform pitfalls, and build a metadata pipeline that works in ASP.NET and Azure Functions.

Fast.io Editorial Team 11 min read

What File Metadata Is and Why C# Developers Extract It

Every file stores two kinds of information: the content itself and the properties embedded alongside it. A JPEG carries the pixel data you see, but it also holds EXIF tags recording the camera model, shutter speed, GPS coordinates, and capture timestamp. An MP3 bundles ID3 tags with artist, album, track number, and album art. A PDF embeds author, title, creation date, and producer application. These embedded properties are metadata.

C# metadata extraction means programmatically reading those properties so you can catalog, filter, audit, or transform files without opening each one by hand. Common use cases include:

  • Digital asset management: Sorting thousands of images by camera, resolution, or capture date
  • Media libraries: Building searchable indexes from audio and video tag data
  • Document compliance: Auditing PDFs and Office files for authorship, revision history, or hidden properties before sharing externally
  • Data pipelines: Filtering incoming files by format, dimensions, or encoding before processing
  • Forensic analysis: Extracting timestamps, GPS data, and software identifiers for incident investigation

The complication is that metadata formats vary by file type. EXIF uses a binary IFD (Image File Directory) structure, ID3 tags have their own framing, and PDF metadata lives in a document info dictionary or XMP stream. No single library handles all of them, which is why the .NET ecosystem has specialized packages for each file family.

Comparing the Major .NET Metadata Libraries

Before writing any code, here is a breakdown of the major libraries and what each one covers.

MetadataExtractor (drewnoakes/metadata-extractor-dotnet)

  • File types: JPEG, TIFF, PNG, BMP, GIF, ICO, PCX, WebP, PSD, HEIF/HEIC, AVIF, TGA, EPS, camera RAW (CR2, CR3, NEF, ARW, ORF, DNG, RAF, PEF, RW2, and more), MP4, MOV, AVI, MP3, WAV
  • Metadata scope: EXIF, IPTC, XMP, ICC profiles, Photoshop data, JFIF, makernotes from 20+ camera manufacturers
  • Install: dotnet add package MetadataExtractor
  • Current version: 2.9.3 (April 2026), targeting .NET 8, .NET Standard 2.0, and .NET Standard 2.1
  • Best for: Image-heavy workflows and any project that needs to read metadata from a wide range of formats without external dependencies. Supports NativeAOT on .NET 8.

TagLib# (TagLibSharp)

  • File types: MP3, FLAC, OGG, AAC, AIFF, WMA, WAV, APE, DSF, M4A, M4B, MPC, Opus, WavPack, WebM, and additional audio/video containers
  • Metadata scope: ID3v1/v2 tags, Vorbis comments, APEv2 tags, MP4 atoms (title, artist, album, track, bitrate, duration, album art)
  • Install: dotnet add package TagLibSharp
  • Current version: 2.3.0
  • Best for: Audio metadata workflows, music libraries, and podcast management. Also reads basic video properties.

iText7

  • File types: PDF
  • Metadata scope: Document info dictionary (author, title, subject, keywords, creator, producer, creation/modification dates), XMP metadata, page count, encryption status
  • Install: dotnet add package itext7
  • Best for: PDF document processing, compliance auditing, and legal discovery workflows. Supports both reading and writing metadata.

System.IO.FileInfo (built-in)

  • File types: Any file on disk
  • Metadata scope: File system properties only: creation time, last modified time, last access time, file size, attributes (read-only, hidden, archive)
  • Install: Built into .NET, no package needed
  • Best for: Basic file inventory where you only need filesystem-level properties, not format-specific embedded data

System.Drawing.Common (legacy)

  • File types: JPEG, PNG, TIFF, BMP, GIF (Windows only in .NET 6+)
  • Metadata scope: EXIF property items via Image.PropertyItems
  • Install: dotnet add package System.Drawing.Common
  • Best for: Legacy .NET Framework projects on Windows. Not recommended for new development. Microsoft restricted this package to Windows-only starting in .NET 6 and deprecated it further in .NET 7+, because its cross-platform implementation depended on libgdiplus, an untested 30,000-line C reimplementation of Windows GDI+.
Diagram showing metadata extraction across different file formats

Reading Image Metadata with MetadataExtractor

MetadataExtractor is the go-to library for image metadata in .NET. It parses the binary structure of each file format directly, so it works on any platform without native dependencies.

Install the package:

dotnet add package MetadataExtractor

Here is a working example that reads EXIF data from a JPEG:

using MetadataExtractor;
using MetadataExtractor.Formats.Exif;

var directories = ImageMetadataReader.ReadMetadata("photo.jpg");

var subIfd = directories.OfType<ExifSubIfdDirectory>().FirstOrDefault();
var ifd0 = directories.OfType<ExifIfd0Directory>().FirstOrDefault();

Console.WriteLine($"Camera: {ifd0?.GetDescription(ExifDirectoryBase.TagMake)}");
Console.WriteLine($"Model: {ifd0?.GetDescription(ExifDirectoryBase.TagModel)}");
Console.WriteLine($"Date: {subIfd?.GetDescription(ExifDirectoryBase.TagDateTimeOriginal)}");
Console.WriteLine($"ISO: {subIfd?.GetDescription(ExifDirectoryBase.TagIsoEquivalent)}");
Console.WriteLine($"Aperture: {subIfd?.GetDescription(ExifDirectoryBase.TagFNumber)}");

The ImageMetadataReader.ReadMetadata() method returns a collection of Directory objects, each representing a metadata block (EXIF IFD0, EXIF Sub IFD, IPTC, XMP, GPS, and so on). You filter by type and look up specific tags.

To dump every metadata tag from any supported file:

var directories = ImageMetadataReader.ReadMetadata("input-file.cr2");

foreach (var directory in directories)
{
    Console.WriteLine($"--- {directory.Name} ---");
    foreach (var tag in directory.Tags)
    {
        Console.WriteLine($"  {tag.Name}: {tag.Description}");
    }
}

This works the same way for JPEG, PNG, TIFF, HEIF, WebP, RAW files, MP4, MOV, MP3, and WAV. The library detects the format automatically and applies the correct parser.

Extracting GPS Coordinates

GPS data sits in its own directory:

using MetadataExtractor.Formats.Exif;

var gps = directories.OfType<GpsDirectory>().FirstOrDefault();
var location = gps?.GetGeoLocation();

if (location != null)
{
    Console.WriteLine($"Latitude: {location.Latitude}");
    Console.WriteLine($"Longitude: {location.Longitude}");
}

The GetGeoLocation() method handles the conversion from degrees/minutes/seconds to decimal coordinates, which saves you from parsing the raw EXIF GPS tags manually.

Reading from Streams

If your files come from a network request, blob storage, or an upload endpoint, read directly from a stream instead of saving to disk first:

using var stream = File.OpenRead("photo.jpg");
var directories = ImageMetadataReader.ReadMetadata(stream);

This is essential for ASP.NET applications and Azure Functions where files arrive as IFormFile or Stream objects.

Fastio features

Extract and query file metadata without writing custom parsers

Fast.io Metadata Views turn documents into a searchable, filterable database. Describe the fields you need in plain English and structured data appears automatically. 50 GB free, no credit card required.

Extracting Audio Tags with TagLib#

TagLib# handles audio metadata across every common format. Where MetadataExtractor focuses on images and basic audio, TagLib# provides deep access to ID3 tags, Vorbis comments, and format-specific audio properties.

Install the package:

dotnet add package TagLibSharp

Read tags from an MP3 file:

using TagLib;

var file = TagLib.File.Create("track.mp3");

Console.WriteLine($"Title: {file.Tag.Title}");
Console.WriteLine($"Artist: {file.Tag.FirstPerformer}");
Console.WriteLine($"Album: {file.Tag.Album}");
Console.WriteLine($"Year: {file.Tag.Year}");
Console.WriteLine($"Track: {file.Tag.Track}");
Console.WriteLine($"Genre: {file.Tag.FirstGenre}");
Console.WriteLine($"Duration: {file.Properties.Duration}");
Console.WriteLine($"Bitrate: {file.Properties.AudioBitrate} kbps");
Console.WriteLine($"Sample Rate: {file.Properties.AudioSampleRate} Hz");

The Tag property gives you a unified interface regardless of whether the underlying format uses ID3v2, Vorbis comments, or APE tags. TagLib# normalizes the differences so you write one set of property accesses for all formats.

Extracting Album Art

var pictures = file.Tag.Pictures;

if (pictures.Length > 0)
{
    var cover = pictures[0];
    System.IO.File.WriteAllBytes(
        $"cover.{cover.MimeType.Split('/')[1]}",
        cover.Data.Data
    );
    Console.WriteLine($"Saved album art ({cover.MimeType})");
}

FLAC and OGG Files

The same API works for FLAC, OGG Vorbis, Opus, and other formats:

var flac = TagLib.File.Create("album.flac");
Console.WriteLine($"Title: {flac.Tag.Title}");
Console.WriteLine($"Channels: {flac.Properties.AudioChannels}");
Console.WriteLine($"Bits per sample: {flac.Properties.BitsPerSample}");

TagLib# detects the container format automatically and applies the right tag parser. You do not need to specify the format in advance.

Reading PDF Metadata with iText7

For PDF documents, iText7 provides access to both the document info dictionary and XMP metadata streams. It reads author, title, creation dates, producer application, and custom properties.

Install the package:

dotnet add package itext7

Extract standard PDF metadata:

using iText.Kernel.Pdf;

using var reader = new PdfReader("document.pdf");
using var pdf = new PdfDocument(reader);

var info = pdf.GetDocumentInfo();

Console.WriteLine($"Title: {info.GetTitle()}");
Console.WriteLine($"Author: {info.GetAuthor()}");
Console.WriteLine($"Subject: {info.GetSubject()}");
Console.WriteLine($"Keywords: {info.GetKeywords()}");
Console.WriteLine($"Creator: {info.GetCreator()}");
Console.WriteLine($"Producer: {info.GetProducer()}");
Console.WriteLine($"Pages: {pdf.GetNumberOfPages()}");

Checking Encryption Status

Before processing PDFs in a pipeline, you often need to know whether a file is encrypted:

using var reader = new PdfReader("encrypted.pdf");
Console.WriteLine($"Encrypted: {reader.IsEncrypted()}");

For password-protected files, pass the password to the reader:

var readerProps = new ReaderProperties()
    .SetPassword(System.Text.Encoding.UTF8.GetBytes("mypassword"));
using var reader = new PdfReader("protected.pdf", readerProps);
using var pdf = new PdfDocument(reader);

Reading XMP

Metadata PDFs can also carry XMP (Extensible Metadata Platform) data, which stores richer metadata in XML format:

var xmpBytes = pdf.GetXmpMetadata();
if (xmpBytes != null)
{
    var xmpString = System.Text.Encoding.UTF8.GetString(xmpBytes);
    Console.WriteLine(xmpString);
}

One thing to be aware of: iText7 uses the AGPL license for its open-source version. If you are building a commercial product and cannot comply with AGPL terms, you will need either a commercial iText license or an alternative library like PdfPig (MIT license), which provides read-only PDF parsing with metadata access.

Structured data fields extracted from document metadata

Building a Multi-Format Metadata Pipeline

Real projects rarely deal with a single file type. A document management system receives images, PDFs, audio files, and Office documents. Here is a pattern that routes files to the correct library based on extension and returns a consistent result:

using MetadataExtractor;
using MetadataExtractor.Formats.Exif;

public record FileMetadata(
    string FileName,
    string Format,
    Dictionary<string, string> Properties
);

public static class MetadataService
{
    private static readonly HashSet<string> ImageExts = new(
        StringComparer.OrdinalIgnoreCase)
    { ".jpg", ".jpeg", ".png", ".tiff", ".heic",
      ".webp", ".bmp", ".gif", ".cr2", ".nef", ".arw" };

private static readonly HashSet<string> AudioExts = new(
        StringComparer.OrdinalIgnoreCase)
    { ".mp3", ".flac", ".ogg", ".m4a", ".wav",
      ".aac", ".wma", ".opus" };

public static FileMetadata Extract(string filePath)
    {
        var ext = Path.GetExtension(filePath);
        var props = new Dictionary<string, string>();

if (ImageExts.Contains(ext))
            ExtractImage(filePath, props);
        else if (AudioExts.Contains(ext))
            ExtractAudio(filePath, props);
        else if (ext.Equals(".pdf", StringComparison.OrdinalIgnoreCase))
            ExtractPdf(filePath, props);
        else
            ExtractBasic(filePath, props);

return new FileMetadata(
            Path.GetFileName(filePath),
            ext.TrimStart('.').ToUpperInvariant(),
            props
        );
    }

private static void ExtractImage(
        string path, Dictionary<string, string> props)
    {
        var dirs = ImageMetadataReader.ReadMetadata(path);
        var ifd0 = dirs.OfType<ExifIfd0Directory>().FirstOrDefault();
        var sub = dirs.OfType<ExifSubIfdDirectory>().FirstOrDefault();

if (ifd0 != null)
        {
            AddIfPresent(props, "Camera",
                ifd0.GetDescription(ExifDirectoryBase.TagMake));
            AddIfPresent(props, "Model",
                ifd0.GetDescription(ExifDirectoryBase.TagModel));
        }
        if (sub != null)
        {
            AddIfPresent(props, "DateTaken",
                sub.GetDescription(ExifDirectoryBase.TagDateTimeOriginal));
            AddIfPresent(props, "ISO",
                sub.GetDescription(ExifDirectoryBase.TagIsoEquivalent));
        }
    }

private static void ExtractAudio(
        string path, Dictionary<string, string> props)
    {
        var file = TagLib.File.Create(path);
        props["Title"] = file.Tag.Title ?? "";
        props["Artist"] = file.Tag.FirstPerformer ?? "";
        props["Album"] = file.Tag.Album ?? "";
        props["Duration"] = file.Properties.Duration.ToString();
        props["Bitrate"] = $"{file.Properties.AudioBitrate} kbps";
    }

private static void ExtractPdf(
        string path, Dictionary<string, string> props)
    {
        using var reader = new iText.Kernel.Pdf.PdfReader(path);
        using var pdf = new iText.Kernel.Pdf.PdfDocument(reader);
        var info = pdf.GetDocumentInfo();
        props["Title"] = info.GetTitle() ?? "";
        props["Author"] = info.GetAuthor() ?? "";
        props["Pages"] = pdf.GetNumberOfPages().ToString();
    }

private static void ExtractBasic(
        string path, Dictionary<string, string> props)
    {
        var fi = new FileInfo(path);
        props["Size"] = $"{fi.Length} bytes";
        props["Created"] = fi.CreationTimeUtc.ToString("o");
        props["Modified"] = fi.LastWriteTimeUtc.ToString("o");
    }

private static void AddIfPresent(
        Dictionary<string, string> dict, string key, string? value)
    {
        if (!string.IsNullOrEmpty(value))
            dict[key] = value;
    }
}

This pattern works well in ASP.NET middleware, background services, and Azure Functions. The Extract method accepts a file path and returns a uniform FileMetadata record regardless of format.

Processing Directories in Parallel

For batch operations, use Parallel.ForEachAsync to process files concurrently:

var files = Directory.GetFiles("/uploads", "*.*",
    SearchOption.AllDirectories);

var results = new ConcurrentBag<FileMetadata>();

await Parallel.ForEachAsync(files,
    new ParallelOptions { MaxDegreeOfParallelism = 8 },
    async (file, ct) =>
{
    var metadata = MetadataService.Extract(file);
    results.Add(metadata);
});

Console.WriteLine($"Processed {results.Count} files");

Set MaxDegreeOfParallelism based on your workload. Metadata extraction is mostly I/O-bound for small files and CPU-bound for large RAW images, so 4 to 8 threads is a reasonable starting point.

Error Handling at the Boundary

Files from external sources can be corrupt, truncated, or misnamed. Wrap extraction calls in try-catch at the pipeline boundary:

public static FileMetadata? TryExtract(string filePath)
{
    try
    {
        return MetadataService.Extract(filePath);
    }
    catch (ImageProcessingException ex)
    {
        Console.WriteLine($"Skipping {filePath}: {ex.Message}");
        return null;
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Unexpected error on {filePath}: {ex.Message}");
        return null;
    }
}

MetadataExtractor throws ImageProcessingException for files it cannot parse, while TagLib# throws CorruptFileException. Catching these at the boundary lets the pipeline continue processing remaining files.

Scaling Beyond Code with Workspace-Level Extraction

Writing extraction code works well for developer tooling and automated pipelines. But when you are managing thousands of files across a team, or when the metadata you need is semantic rather than technical (contract dates, invoice amounts, policy numbers), maintaining custom parsers for every format becomes a burden.

Local extraction code has a few constraints at scale. You handle file storage, access control, result persistence, and schema changes yourself. If a teammate needs to query extracted metadata, they either run your code or you build a shared database layer. When requirements change, someone updates the code and reprocesses everything.

Fast.io's Metadata Views approach the problem differently. Instead of writing format-specific extraction logic, you describe the fields you want in plain English. The system designs a typed schema (text, integer, decimal, boolean, URL, JSON, date), matches files in the workspace, and populates a sortable, filterable spreadsheet. Adding a new column does not require reprocessing existing files. It works across PDFs, images, Word documents, spreadsheets, presentations, scanned pages, and handwritten notes.

This complements C# extraction pipelines well. You might use MetadataExtractor for technical metadata (EXIF settings, codecs, dimensions) and Metadata Views for business metadata that requires AI interpretation, like extracting counterparty names from contracts or coverage limits from insurance documents.

For agent-driven workflows, Metadata Views are accessible through the Fast.io MCP server, so an AI agent can create views, trigger extraction, and query results programmatically. The free plan includes 50 GB storage and 5,000 monthly credits with no credit card required.

Other cloud services cover parts of this space. AWS Textract handles document extraction, Google Document AI processes forms and invoices, and Azure AI Document Intelligence works well inside the Azure ecosystem. Fast.io differentiates by combining extraction with workspace features like file versioning, granular permissions, audit trails, and Intelligence Mode for semantic search across your entire file library.

Workspace interface showing structured metadata extracted from uploaded files

Frequently Asked Questions

How do you read EXIF data in C#?

Use the MetadataExtractor library. Call ImageMetadataReader.ReadMetadata() with a file path or stream, then filter the returned directories by type. For example, cast to ExifIfd0Directory to access camera make and model, or ExifSubIfdDirectory for exposure settings and capture date. The library handles JPEG, TIFF, PNG, HEIF, WebP, and RAW formats with a single method call.

What is the best .NET library for metadata extraction?

MetadataExtractor is the best general-purpose option. It supports over 30 image, video, and audio formats, runs cross-platform on .NET 8 and .NET Standard, and has no native dependencies. For audio-specific workflows with ID3 tags, TagLib# provides deeper tag access. For PDF metadata, iText7 or PdfPig are the standard choices.

How do you extract metadata from PDF in C#?

Use iText7. Create a PdfReader and PdfDocument, then call GetDocumentInfo() to access title, author, subject, keywords, creator, and producer fields. For page count, use GetNumberOfPages(). For XMP metadata, call GetXmpMetadata() to get the raw XML. If you need an MIT-licensed alternative, PdfPig provides similar read-only access.

Can C# read ID3 tags from MP3 files?

Yes. TagLib# reads ID3v1 and ID3v2 tags from MP3 files through a clean property-based API. Access file.Tag.Title, file.Tag.FirstPerformer, file.Tag.Album, and file.Tag.Year for standard tags. Duration and bitrate are available through file.Properties. The same API works for FLAC, OGG, AAC, and other audio formats.

Why should I avoid System.Drawing.Common for metadata extraction?

Microsoft restricted System.Drawing.Common to Windows-only starting in .NET 6 because its cross-platform implementation depended on libgdiplus, a largely untested reimplementation of Windows GDI+. In .NET 7 and later, using it on Linux or macOS throws a PlatformNotSupportedException. MetadataExtractor is the recommended cross-platform replacement for EXIF reading.

Does MetadataExtractor support camera RAW files?

Yes. MetadataExtractor reads metadata from CR2, CR3, NEF, ARW, ORF, DNG, RAF, PEF, RW2, and other RAW formats. It extracts EXIF, makernote, and XMP data without decoding the full image, so it runs quickly even on large RAW files.

Related Resources

Fastio features

Extract and query file metadata without writing custom parsers

Fast.io Metadata Views turn documents into a searchable, filterable database. Describe the fields you need in plain English and structured data appears automatically. 50 GB free, no credit card required.