How to Preserve Metadata During File Conversion
Converting files between formats often strips embedded metadata without warning. Author names, creation dates, GPS coordinates, and copyright notices can vanish in a single Save As operation. This guide covers which metadata survives common conversions, which tools preserve it, and how to build a workflow that keeps your file properties intact.
Why Metadata Disappears During Conversion
Every file format stores metadata differently. JPEG files use EXIF for camera settings and GPS data, IPTC for captions and credits, and XMP for a broader range of custom fields. PNG files were designed without EXIF support and instead store metadata in text chunks (tEXt, zTXt, iTXt). Word documents embed properties in OPC package fields. MP4 videos carry metadata in atoms. When you convert between formats, the conversion tool has to decide what to do with metadata that exists in the source but has no equivalent container in the destination.
Most tools take the path of least resistance: they drop what does not fit. A JPEG-to-PNG converter typically reads the pixel data, compresses it as PNG, and writes a new file. The EXIF block, which contains your camera model, shutter speed, and location, gets left behind because PNG historically had no standard place to put it. The tool did not maliciously strip your data. It simply never read it in the first place.
Three patterns account for most metadata loss:
- Format incompatibility. The destination format lacks a container for the metadata type. PNG has no native EXIF block. Plain text has no metadata container at all. WAV files support limited metadata compared to FLAC.
- Tool behavior. Many conversion tools prioritize speed and file size over metadata fidelity. Browser-based converters and bulk processing scripts often skip metadata entirely. Even professional tools like Photoshop will strip EXIF from certain export paths unless you check the right option.
- Lossy re-encoding. Some converters rebuild the file from scratch rather than remuxing or transcoding in place. This destroys not just metadata but also any embedded profiles, thumbnails, and sidecar references.
The first step in preserving metadata is understanding which conversion you are performing and whether the destination format can hold what the source format contains.
What Survives Common File Conversions
Not all conversions are equally destructive. Some format pairs share metadata standards, making preservation straightforward. Others are fundamentally incompatible, requiring you to extract metadata separately before converting.
Here is how common conversion pairs handle metadata:
Two takeaways stand out. First, conversions within the same media type (image to image, document to document) preserve more metadata than cross-type conversions (video to audio). Second, formats designed for archiving, like TIFF for images and PDF/A for documents, tend to preserve metadata by design. The Library of Congress specifically recommends PDF/A for long-term document preservation because the ISO 19005 standard requires embedded XMP metadata.
When your conversion pair appears in the "usually lost" column, you will need to extract metadata before converting and reapply it afterward.
Keep Your File Metadata Organized After Conversion
Fast.io's Metadata Views extract and verify metadata fields across PDFs, images, and documents. Describe what you need in plain language and get a searchable, sortable spreadsheet. 50 GB free, no credit card required.
Tools That Preserve Metadata During Conversion
The right tool makes the difference between losing metadata and keeping it. Here are the most reliable options across file types.
ExifTool for Images
ExifTool by Phil Harvey is the standard for reading, writing, and transferring metadata across image formats. It supports over 400 file types and can copy metadata between files that a normal converter would treat as incompatible.
To copy all metadata from a source file to a converted file:
exiftool -TagsFromFile source.jpg converted.png
This reads every metadata tag from the JPEG and writes whatever the PNG format can accept. For batch processing across a directory:
exiftool -TagsFromFile %d%f.jpg -All:All %d%f.png
The -P flag preserves file modification timestamps, which is useful when your filesystem dates matter for sorting or compliance.
FFmpeg for Video and Audio
FFmpeg handles metadata transfer during media conversion through the -map_metadata flag. By default, FFmpeg copies global metadata from the first input to the output, but explicit mapping gives you more control:
ffmpeg -i input.mov -map_metadata 0 -c:v libx264 output.mp4
The 0 tells FFmpeg to copy metadata from input index 0. For stream-level metadata (per-track titles, language tags), add -map_metadata:s:0 0:s:0 to map stream metadata explicitly.
One common mistake: using FFmpeg's -an flag to strip audio also removes audio stream metadata. If you need the video metadata but not the audio track, use -map_metadata 0 alongside -an to keep global metadata intact.
LibreOffice for Documents
LibreOffice's command-line interface can convert documents to PDF while preserving standard properties:
libreoffice --headless --convert-to pdf document.docx
This carries over author, title, subject, and keywords. For archival needs, convert to PDF/A instead:
libreoffice --headless --convert-to "pdf:writer_pdf_Export:{'SelectPdfVersion':{'Value':2}}" document.docx
The SelectPdfVersion value of 2 produces PDF/A-2b, which requires XMP metadata and is the Library of Congress recommended format for long-term preservation.
ImageMagick for Batch Image Processing
ImageMagick strips metadata by default during conversion. To preserve it, use the -profile flag:
convert input.jpg -profile "*" output.tiff
Or use the newer magick command with -define preserve-profile:
magick input.jpg -define png:preserve-profile=true output.png
Without these flags, ImageMagick silently drops EXIF, IPTC, and color profiles during every conversion.
Step-by-Step Metadata Preservation Workflow
Preserving metadata across a batch of files requires a consistent process, not just the right tool. Here is a workflow that works across file types.
1. Audit Your Source Metadata Before converting anything, catalog what metadata your source files contain. ExifTool can generate a CSV report across an entire directory:
exiftool -csv -r /source/folder > metadata_audit.csv
This gives you a baseline. If metadata goes missing after conversion, you know exactly what was lost and can trace it back to a specific step.
2. Extract Metadata as Sidecar Files
For conversions where the destination format cannot hold your metadata, extract it into XMP sidecar files before converting:
exiftool -o %d%f.xmp /source/folder/*.jpg
XMP sidecars are plain XML files that sit alongside your converted files. Adobe applications, Darktable, and most digital asset management systems read them automatically. This approach decouples your metadata from the file format, so you never lose data regardless of how many conversions you perform.
3. Convert With Metadata-Aware Tools
Use the tools from the previous section. Avoid browser-based converters and free online tools for files where metadata matters. These services almost universally strip metadata, sometimes for privacy reasons and sometimes because they simply do not handle it.
4. Verify the Output
After conversion, run a metadata comparison between source and destination:
exiftool -a -G source.jpg > source_meta.txt
exiftool -a -G converted.png > converted_meta.txt
diff source_meta.txt converted_meta.txt
This shows exactly which fields were preserved, which were lost, and which were transformed. Pay special attention to date fields (formats vary between standards), GPS coordinates (precision may change), and copyright notices (encoding differences can corrupt special characters).
5. Reapply Missing Metadata For any fields that did not survive the conversion, reapply them from your sidecar files or the original audit:
exiftool -TagsFromFile source.jpg -All:All converted.png
This final step closes the loop. You started with an audit, converted with the best available tool, verified what survived, and patched what did not.
Format-Specific Considerations
Some conversion scenarios deserve extra attention because they are either unusually tricky or unusually common.
JPEG to PNG (and Back)
This is the most common metadata-destroying conversion. PNG was designed for lossless web graphics, not photography, so EXIF support was added late and inconsistently. If you must convert JPEG to PNG and need metadata preserved, your best option is to write XMP data into the PNG's iTXt chunk using ExifTool after conversion. Not all viewers will read it, but the data will be there.
Going the other direction (PNG to JPEG) is simpler. JPEG natively supports EXIF, IPTC, and XMP, so ExifTool can write whatever metadata you supply.
DOCX to PDF for Legal and Compliance
When converting Word documents to PDF for legal filing or regulatory compliance, use PDF/A rather than standard PDF. Standard PDF conversion through Microsoft Word's "Save as PDF" preserves basic document properties (author, title, creation date) but drops revision history, comments, and custom properties. PDF/A-2b or PDF/A-3b preserves standard metadata and, critically, requires the conversion tool to embed XMP metadata. This makes the resulting file self-describing, which is why courts and archives prefer it.
The GoldFynch eDiscovery blog documents cases where organizations destroyed legally relevant metadata by using "Print to PDF" instead of proper PDF export. The difference matters: Print to PDF creates a new document with a new creation date, while Export to PDF preserves the original document properties.
Video Format Conversions
Video containers (MP4, MKV, MOV, AVI) each store metadata in different internal structures. MP4 uses atoms, MKV uses EBML elements, and MOV uses a variant of the MP4 atom structure. Converting between containers with FFmpeg using -map_metadata 0 handles most global metadata, but stream-level metadata (track names, language codes, chapter markers) requires explicit stream mapping.
Extracting audio from video is particularly destructive. When you pull an MP3 from an MP4, the video-level metadata (resolution, frame rate, video codec) is irrelevant, but creation date, copyright, and custom tags often get dropped too. Use -map_metadata 0 and verify the output.
Bulk Conversion Pipelines
For organizations processing hundreds or thousands of files, manual metadata checks are not practical. Build verification into your pipeline: extract metadata before conversion, convert, extract metadata after, and flag any files where critical fields (copyright, author, date) are missing from the output. Tools like Apache Tika can extract metadata from over a thousand file formats, making it useful as a universal verification layer.
If your team stores converted files in shared workspaces, Fast.io's Metadata Views can automate post-conversion verification. Describe the metadata fields you care about in plain language, and the AI extraction layer checks each file against your schema. This replaces manual spot-checking with systematic coverage, and you can add new verification columns without reprocessing existing files.
Preventing Metadata Loss Before It Happens
The most reliable way to preserve metadata is to avoid unnecessary conversions in the first place. Before converting a file, ask whether the conversion is actually needed or whether the consuming application can handle the original format.
When conversion is unavoidable, these practices reduce the risk of metadata loss:
- Choose metadata-rich destination formats. TIFF over PNG for archival images. PDF/A over standard PDF for documents. FLAC over MP3 when audio quality and metadata both matter. WebP over PNG when you need web-optimized images with EXIF support.
- Use XMP as your universal metadata layer. XMP is supported across images, documents, video, and audio. It is extensible, human-readable (it is just XML), and can be stored as a sidecar file when the destination format cannot embed it. Adobe created XMP specifically to solve the cross-format metadata problem.
- Keep originals. Never delete source files after conversion until you have verified that all critical metadata survived. Storage is cheap compared to the cost of recreating lost metadata, especially for GPS coordinates, copyright assignments, and regulatory timestamps.
- Standardize your toolchain. Pick one conversion tool per file type and stick with it. Inconsistent tools produce inconsistent metadata results. Document which flags and settings your team uses so that conversions are reproducible.
- Automate verification. Even with the right tools and settings, individual files can fail silently. A script that compares source and destination metadata counts after each batch run catches problems before they compound.
For teams managing large file libraries, combining local conversion tools with cloud-based metadata extraction creates a safety net. Convert files locally with ExifTool or FFmpeg, upload to a shared workspace, and let Fast.io's Metadata Views verify that the fields you care about are present and correctly formatted. The extraction works across PDFs, images, Word documents, spreadsheets, and scanned pages, so it covers the full range of conversion outputs.
Frequently Asked Questions
Does converting a file remove metadata?
It depends on the format pair and the tool. Some conversions preserve metadata automatically (JPEG to TIFF, RAW to JPEG), while others strip it by default (JPEG to PNG, Print to PDF). The conversion tool matters as much as the format. ExifTool and FFmpeg preserve metadata when configured correctly, while many browser-based converters strip it entirely.
How do I keep EXIF data when converting images?
Use ExifTool to copy metadata after conversion. Run "exiftool -TagsFromFile source.jpg converted.png" to transfer all compatible tags from the original to the converted file. For batch processing, ExifTool supports wildcards and directory recursion. If your destination format does not support EXIF natively (like PNG), ExifTool writes the data into XMP fields instead.
Which file formats preserve the most metadata?
TIFF and JPEG support EXIF, IPTC, and XMP, making them the most metadata-rich image formats. PDF/A is the gold standard for documents because the ISO 19005 specification requires embedded XMP metadata. For audio, FLAC preserves Vorbis comments and supports extensive tagging. For video, MKV (Matroska) supports the broadest range of metadata fields across containers.
Does converting DOCX to PDF keep metadata?
Standard DOCX-to-PDF conversion preserves basic properties like author, title, and creation date, but drops tracked changes, comments, and custom document properties. Using "Export to PDF" in Word preserves more than "Print to PDF," which creates a new document with new timestamps. For maximum metadata preservation, convert to PDF/A format, which requires embedded XMP metadata by specification.
What is the best tool for preserving image metadata?
ExifTool is the most capable and widely used tool. It supports over 400 file types, can copy metadata between incompatible formats, and handles batch operations. It is free, open source, and available on Windows, macOS, and Linux. For specific use cases, ImageMagick (with the correct profile flags) and Adobe Bridge also preserve metadata during conversion.
Can I recover metadata after a conversion strips it?
Only if you still have the original file or extracted the metadata beforehand. ExifTool can generate XMP sidecar files from your originals, and these sidecars can be applied to converted files at any time. If the original is gone and no sidecar exists, the metadata cannot be recovered from the converted file alone.
Why does PDF/A preserve metadata better than standard PDF?
PDF/A follows ISO 19005, which mandates embedded XMP metadata as part of the format specification. Standard PDF allows metadata but does not require it, so conversion tools can skip it without producing an invalid file. PDF/A also restricts features like encryption and JavaScript that can interfere with long-term metadata accessibility.
Related Resources
Keep Your File Metadata Organized After Conversion
Fast.io's Metadata Views extract and verify metadata fields across PDFs, images, and documents. Describe what you need in plain language and get a searchable, sortable spreadsheet. 50 GB free, no credit card required.