How to Extract File Metadata with JavaScript and Node.js
JavaScript and Node.js metadata extraction libraries let developers read embedded file properties like EXIF tags, ID3 audio tags, and PDF document info directly in server-side or browser-based applications. This guide compares the five most popular libraries, walks through code examples for each file type, and covers how to handle metadata extraction at scale without writing custom parsers.
What File Metadata Means for JavaScript Developers
Every file carries hidden properties alongside its visible content. A JPEG stores the camera model, GPS coordinates, and exposure settings. An MP3 embeds artist names, album art, and track numbers. A PDF records its author, creation date, and page count. These embedded properties are metadata, and reading them programmatically is a common requirement in web applications, media pipelines, and content management systems.
Node.js is a natural fit for this work. It handles binary data well through Buffers and streams, runs the same language on server and client, and has a package ecosystem with specialized parsers for nearly every file format. Whether you are building a photo gallery that sorts images by capture date, a podcast platform that reads episode tags from MP3 uploads, or a document management system that indexes PDF properties, there is a Node.js library purpose-built for the job.
The catch is that no single library covers every format. Image EXIF data, audio ID3 tags, and PDF document properties each use different binary structures. You need to pick the right tool for each file type and understand the tradeoffs between speed, bundle size, streaming support, and format coverage.
Common use cases for metadata extraction in JavaScript:
- Photo management: Reading GPS coordinates, camera settings, and timestamps from uploaded images
- Music applications: Parsing artist, album, genre, and cover art from audio files
- Document indexing: Extracting author, title, and creation dates from PDFs and Office files
- Content pipelines: Filtering uploads by resolution, codec, duration, or file properties before processing
- Forensic analysis: Auditing files for hidden properties, revision history, and embedded identifiers
Helpful references: Fast.io Workspaces, Fast.io Collaboration, and Fast.io AI.
Comparing Node.js Metadata Libraries
Choosing the right library depends on what file types you work with, whether you need streaming support for large files, and how much bundle size you can afford. Here is a side-by-side comparison of the five most widely used options.
exifr is the fast option for image EXIF data. It reads files in chunks rather than loading the entire file into memory, averaging 2.5ms per image in benchmarks. The library supports EXIF, GPS, XMP, IPTC, and ICC metadata and works in both Node.js and the browser with zero dependencies.
ExifReader offers the most configurable approach to image metadata. You can filter specific tags with includeTags/excludeTags options, and tree-shake unused parsers to get the bundle as small as 4 KB gzipped. It recently added JPEG XL support and handles the widest range of image formats in this list.
sharp is primarily an image processing library built on the native libvips C library, but its metadata function provides fast access to image dimensions, format, color space, EXIF data, and ICC profiles without decoding pixel data. If you already use sharp for resizing or format conversion, its metadata extraction comes free. Sharp processes over 1 billion images per month across its user base.
music-metadata is the standard choice for audio files. It parses ID3v1, ID3v2, APE, Vorbis comments, iTunes/MP4 tags, and more across 30+ audio formats. The library supports Node.js Readable streams, making it practical for processing large audio files or reading metadata from network sources without downloading the full file.
pdf-parse extracts document properties from PDFs, including title, author, creator, producer, creation date, and modification date. It is written in pure TypeScript and works cross-platform without native dependencies.
Reading Image EXIF Data with exifr
exifr is the go-to library when you need image metadata fast and don't want to pull in native bindings. Install it with npm:
npm install exifr
Here is a basic example that reads EXIF data from a JPEG file:
import exifr from 'exifr';
const metadata = await exifr.parse('./photo.jpg');
console.log(metadata.Make); // "Canon"
console.log(metadata.Model); // "EOS R5"
console.log(metadata.DateTimeOriginal); // Date object
console.log(metadata.latitude); // 37.7749
console.log(metadata.longitude); // -122.4194
The parse function returns a flat object with all recognized tags. For GPS data, exifr automatically converts coordinates to decimal degrees, saving you from parsing DMS (degrees, minutes, seconds) values manually.
You can also extract specific segments to reduce processing time:
// Only read GPS data
const gps = await exifr.gps('./photo.jpg');
console.log(gps); // { latitude: 37.7749, longitude: -122.4194 }
// Only read orientation
const orientation = await exifr.orientation('./photo.jpg');
console.log(orientation); // 1
For browser environments, exifr accepts multiple input types:
// From a URL
const meta = await exifr.parse('https://example.com/photo.jpg');
// From an <img> element
const imgEl = document.querySelector('img');
const meta2 = await exifr.parse(imgEl);
// From a File object (drag-and-drop or file input)
const meta3 = await exifr.parse(fileInput.files[0]);
What makes exifr fast is its chunked reading strategy. Instead of loading the entire file into memory, it reads just enough bytes to find segment offsets, then jumps directly to the metadata blocks. On a benchmark of 5,000 images, exifr averaged 2.5ms per file compared to 9.5ms for ExifReader and 76ms for the exiftool wrapper.
If you need broader format support at the cost of some speed, ExifReader (v4.38.1) is a strong alternative. It handles JPEG XL, WebP, and GIF in addition to the formats exifr supports, and its tree-shakeable architecture lets you build minimal bundles for production:
import ExifReader from 'exifreader';
const tags = await ExifReader.load('./photo.webp');
console.log(tags.ImageWidth.value);
console.log(tags.DateTimeOriginal?.description);
Extract structured metadata from any file, no parsers required
Fast.io Metadata Views turns documents into queryable databases. Describe the fields you need in plain language and get structured data from PDFs, images, and documents. 50 GB free, no credit card required.
Parsing Audio File Tags with music-metadata
The music-metadata library handles audio metadata across 30+ formats, from MP3 and FLAC to Ogg Opus and WebM. It parses every major tagging standard: ID3v1, ID3v2, APE, Vorbis comments, iTunes/MP4 atoms, and RIFF/INFO headers.
npm install music-metadata
Basic usage for reading tags from a local file:
import { parseFile } from 'music-metadata';
const metadata = await parseFile('./track.mp3');
console.log(metadata.common.title); // "Bohemian Rhapsody"
console.log(metadata.common.artist); // "Queen"
console.log(metadata.common.album); // "A Night at the Opera"
console.log(metadata.common.year); // 1975
console.log(metadata.common.genre); // ["Rock"]
console.log(metadata.format.duration); // 354.32 (seconds)
console.log(metadata.format.codec); // "MPEG 1 Layer 3"
The library normalizes tags from different standards into a unified common object, so you don't need to know whether a file uses ID3v2, Vorbis comments, or iTunes atoms. The raw tags from each standard are also available in the native property if you need them.
For large files or network sources, streaming is the better approach. It reads only the metadata from the beginning of the file without downloading the entire track:
import { parseStream } from 'music-metadata';
import { createReadStream } from 'node:fs';
const stream = createReadStream('./large-album.flac');
const metadata = await parseStream(stream);
stream.destroy(); // Close the stream after reading metadata
console.log(metadata.common.title);
console.log(metadata.format.bitsPerSample); // 24
console.log(metadata.format.sampleRate); // 96000
Cover art extraction works too. Album artwork is returned as a Buffer with its MIME type, ready to save or serve:
const metadata = await parseFile('./track.mp3');
const picture = metadata.common.picture?.[0];
if (picture) {
console.log(picture.format); // "image/jpeg"
console.log(picture.data); // Buffer containing the image
// Write to disk or serve via HTTP
}
Extracting PDF Properties with pdf-parse
PDF files embed document metadata in their info dictionary: title, author, subject, creator application, producer, and timestamps. The pdf-parse library reads these properties without requiring native dependencies.
npm install pdf-parse
Here is how to extract metadata from a PDF:
import { PDFParse } from 'pdf-parse';
const parser = new PDFParse();
const result = await parser.loadFile('./contract.pdf');
const info = await result.getInfo();
console.log(info.Title); // "Service Agreement 2026"
console.log(info.Author); // "Legal Department"
console.log(info.Creator); // "Microsoft Word"
console.log(info.Producer); // "macOS Quartz PDFContext"
console.log(info.CreationDate); // "D:20260415120000Z"
console.log(info.ModDate); // "D:20260420093000Z"
PDF dates use a specific format (D:YYYYMMDDHHmmSS), so you may want to parse them into JavaScript Date objects:
function parsePdfDate(pdfDate) {
if (!pdfDate) return null;
const match = pdfDate.match(
/D:(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})/
);
if (!match) return null;
const [, year, month, day, hour, min, sec] = match;
return new Date(`${year}-${month}-${day}T${hour}:${min}:${sec}Z`);
}
For batch processing, you can combine pdf-parse with Node.js filesystem operations to scan a directory of PDFs and build a metadata index:
import { readdir } from 'node:fs/promises';
import { join } from 'node:path';
import { PDFParse } from 'pdf-parse';
async function indexPdfMetadata(directory) {
const files = await readdir(directory);
const pdfs = files.filter(f => f.endsWith('.pdf'));
const parser = new PDFParse();
const results = await Promise.all(
pdfs.map(async (filename) => {
const result = await parser.loadFile(join(directory, filename));
const info = await result.getInfo();
return {
filename,
title: info.Title || filename,
author: info.Author || 'Unknown',
created: parsePdfDate(info.CreationDate),
};
})
);
return results;
}
This pattern works well for small to medium collections. When you hit hundreds or thousands of PDFs and need the results queryable rather than just logged, dedicated extraction platforms become worth considering.
Scaling Metadata Extraction Beyond Custom Code
Writing extraction scripts works when you control the file types and volumes are manageable. The challenge grows when you face mixed file formats, thousands of files arriving continuously, or teams that need to query extracted metadata without running code.
At that point, you have a few options. You could build your own extraction service that combines these libraries behind an API, handle format detection, error recovery, and result storage yourself. Tools like Apache Tika provide a heavier, Java-based alternative that covers hundreds of formats through a unified interface. Cloud services from AWS (Textract), Google (Document AI), and Azure (Form Recognizer) offer managed extraction for specific document types, typically with per-page pricing.
For teams that work with mixed document types and want structured, queryable results without building infrastructure, Fast.io Metadata Views takes a different approach. Instead of writing extraction code for each format, you describe the fields you want in plain language. The AI designs a typed schema with columns for text, numbers, dates, booleans, or JSON, then extracts those fields from every matching file in your workspace. The results populate a sortable, filterable spreadsheet you can query through the UI or the Fast.io API.
Metadata Views work across PDFs, images, Word documents, spreadsheets, presentations, scanned pages, and handwritten notes. You can add new columns without reprocessing existing files. For a legal team, that might mean extracting contract dates and counterparties. For a media team, AI-tagging photos with subjects and dominant colors. For finance, pulling invoice line items and totals from a folder of vendor PDFs.
The distinction from code-based extraction is that Metadata Views handle format detection, schema design, and result storage as a single managed layer. Your Node.js scripts still have their place for embedded workflows, CI pipelines, and processing files before they reach a shared workspace. But when the goal is making file properties searchable by a team, a managed extraction layer removes the maintenance burden of keeping parsers updated across every format you encounter.
Fast.io offers a free plan with 50 GB of storage and 5,000 credits per month, enough to test Metadata Views on a real document collection without commitment.
Frequently Asked Questions
How do I read EXIF data in Node.js?
Install the exifr package with npm install exifr, then call exifr.parse() with a file path or Buffer. It returns an object containing EXIF properties like camera make, model, date taken, GPS coordinates, and exposure settings. For GPS specifically, exifr.gps() returns latitude and longitude as decimal numbers.
What is the best JavaScript library for image metadata?
It depends on your priorities. exifr is the fast option at ~2.5ms per file with chunked reading and zero dependencies. ExifReader supports the widest range of formats including JPEG XL and WebP, with tree-shakeable builds as small as 4 KB gzipped. sharp is best if you already use it for image processing, since its metadata function adds no extra dependency.
Can I extract PDF metadata with Node.js?
Yes. The pdf-parse package reads PDF document properties including title, author, creator, producer, creation date, and modification date. It is written in pure TypeScript with no native dependencies. For more comprehensive PDF processing, unpdf is a modern alternative with broader runtime support.
How do I read audio file tags in JavaScript?
The music-metadata library parses tags from 30+ audio formats including MP3, FLAC, WAV, OGG, and M4A. It normalizes tags from different standards (ID3v2, Vorbis, iTunes) into a unified common object. It also supports streaming, so you can read metadata from large files without loading them entirely into memory.
Can I extract metadata from files in the browser?
Both exifr and ExifReader work in browser environments. exifr accepts File objects from drag-and-drop or file inputs, img elements, and URLs. ExifReader supports ArrayBuffer and DataView inputs. music-metadata also has browser support through bundlers like Webpack. pdf-parse works in browser contexts as well.
How do I handle metadata extraction for multiple file types?
Use separate libraries for each file family and route files based on their MIME type or extension. exifr or ExifReader handles images, music-metadata handles audio, and pdf-parse handles PDFs. For a unified approach without maintaining multiple parsers, Fast.io Metadata Views can extract structured fields from any supported file type through a single interface.
Related Resources
Extract structured metadata from any file, no parsers required
Fast.io Metadata Views turns documents into queryable databases. Describe the fields you need in plain language and get structured data from PDFs, images, and documents. 50 GB free, no credit card required.