How to Extract Metadata from CAD Files (DWG and DXF)
CAD file metadata includes drawing properties like author, title, revision history, creation and modification timestamps, units, coordinate systems, layer definitions, and block reference counts. This guide covers practical methods for extracting that metadata from DWG and DXF files using Python ezdxf, ODA File Converter, LibreDWG, and structured extraction platforms.
What Metadata Lives Inside CAD Files
Every DWG or DXF file carries two layers of information: the geometry people draw and the metadata that describes the drawing itself. That metadata is spread across the file header, drawing properties, and structural elements like layers and blocks. Knowing where to find it is the first step toward extracting it programmatically.
DWG is AutoCAD's native binary format, used by over 5 million AutoCAD users worldwide. DXF (Drawing Exchange Format) is its text-based sibling, designed for interoperability between CAD applications. Both formats store the same core metadata, but DXF exposes it as human-readable ASCII in its HEADER section, while DWG buries it in a proprietary binary structure that requires specialized parsers.
Here is what you can expect to find in a typical CAD file's metadata:
- Drawing properties: Author, title, subject, keywords, comments, and revision number. These are set manually in AutoCAD's Drawing Properties dialog or through scripting.
- Timestamps: Creation date (
$TDCREATE) and last modification date ($TDUPDATE), stored as Julian day fractions. - Version identifier: The
$ACADVERvariable tells you which AutoCAD version created the file. AC1032 means AutoCAD 2018 or later. - Units and scale:
$INSUNITSdefines the drawing units (millimeters, inches, meters), while$MEASUREMENTdistinguishes between Imperial and Metric defaults. - Coordinate system: The current UCS (User Coordinate System) origin, X-axis, and Y-axis define how geometry is oriented in space.
- Last saved by: The
$LASTSAVEDBYvariable records the username of the person who last modified the file. - Code page:
$DWGCODEPAGEspecifies the character encoding, which matters when filenames or text entities use non-ASCII characters.
DXF files store these variables in a HEADER section containing over 200 system variables. Each variable appears as a group code 9 line with the variable name, followed by one or more lines supplying the value. For DWG files, the same data exists but requires parsing the binary header, which is where tools like ODA SDK and LibreDWG come in.
Beyond header variables, structural metadata is equally valuable. Layer definitions tell you how a drawing is organized (electrical, plumbing, structural). Block references reveal reusable component counts. The CLASSES section describes custom objects. For architecture and engineering firms managing thousands of drawings, this structural metadata is often more useful than timestamps.
Extracting DXF Metadata with Python ezdxf
The ezdxf library is the standard Python tool for working with DXF files. It reads and writes all DXF versions from AutoCAD R12 (AC1009) through R2018 (AC1032), and it gives you direct access to every header variable and entity in the drawing.
Install and load a DXF file
pip install ezdxf
import ezdxf
doc = ezdxf.readfile("floorplan.dxf")
Read header variables
The header section is accessible through doc.header, which behaves like a dictionary keyed by variable names:
version = doc.header["$ACADVER"]
created = doc.header["$TDCREATE"]
updated = doc.header["$TDUPDATE"]
units = doc.header["$INSUNITS"]
last_saved_by = doc.header.get("$LASTSAVEDBY", "Unknown")
code_page = doc.header.get("$DWGCODEPAGE", "Unknown")
print(f"DXF Version: {version}")
print(f"Created: {created}")
print(f"Last Modified: {updated}")
print(f"Units: {units}")
print(f"Last Saved By: {last_saved_by}")
The get() method with a default value is important here. Not every DXF file includes every variable, especially older versions. Using bracket access on a missing variable raises a DXFKeyError, while get() returns your fallback value silently.
The $INSUNITS value is an integer code: 0 means unitless, 1 is inches, 2 is feet, 4 is millimeters, 5 is centimeters, 6 is meters. The ezdxf documentation includes a complete mapping, but those five cover the majority of real-world files.
Iterate over all header variables
If you want a complete dump of every header variable in a file, loop through the available names:
for varname in doc.header.varnames():
value = doc.header[varname]
print(f"{varname}: {value}")
This is useful for discovering which variables a particular file actually contains. A file from AutoCAD 2024 will have more variables than one from AutoCAD R14.
Extract layer definitions
Layers are one of the most informative structural metadata elements. They reveal how the drawing is organized, which disciplines are represented, and which layers are frozen or locked:
for layer in doc.layers:
print(f"Layer: {layer.dxf.name}")
print(f" Color: {layer.dxf.color}")
print(f" Linetype: {layer.dxf.linetype}")
print(f" Frozen: {layer.is_frozen()}")
print(f" Locked: {layer.is_locked()}")
print(f" Off: {layer.is_off()}")
A mechanical drawing might have layers like DIMENSIONS, CENTERLINES, HIDDEN, and SECTION-CUTS. An architectural plan might use A-WALL, A-DOOR, A-WINDOW, E-POWER, and P-PIPE. The layer naming convention alone can tell you the drawing's discipline and the standards it follows.
Count entities and blocks
Entity counts give you a quick profile of the drawing's complexity:
msp = doc.modelspace()
entity_counts = {}
for entity in msp:
etype = entity.dxftype()
entity_counts[etype] = entity_counts.get(etype, 0) + 1
for etype, count in sorted(entity_counts.items()):
print(f"{etype}: {count}")
Block definitions show reusable components. A drawing with 500 INSERT references to a "DOOR-36" block tells you more about the building than the raw line count does:
for block in doc.blocks:
if not block.name.startswith("*"):
entities = list(block)
print(f"Block: {block.name} ({len(entities)} entities)")
Blocks whose names start with * are anonymous blocks (hatches, dimensions) and are usually excluded from metadata reports.
Reading DWG Files with ODA File Converter and LibreDWG
DWG is a proprietary binary format. Autodesk has never published a full public specification, so third-party tools rely on reverse-engineered documentation or licensed SDKs. Two free options handle DWG reading for metadata extraction: ODA File Converter and GNU LibreDWG.
ODA File Converter
The Open Design Alliance provides a free converter that transforms DWG files to DXF (and vice versa). It runs on Windows, macOS, and Linux, and supports both a GUI and command-line interface.
Download it from the Open Design Alliance website. After installation, the command-line usage looks like this:
ODAFileConverter "input_folder" "output_folder" ACAD2018 DXF 0 1
The parameters are: input directory, output directory, output version (ACAD2018), output format (DXF), recurse subdirectories (0 = no, 1 = yes), and audit (0 = no, 1 = yes). The converter processes every DWG file in the input folder and writes DXF files to the output folder.
Once you have DXF files, use ezdxf to read them as shown in the previous section. This two-step approach (convert then parse) is the most reliable free method for DWG metadata extraction.
ezdxf's built-in ODA integration
The ezdxf library includes an odafc addon that wraps ODA File Converter, so you can load DWG files directly in Python without a manual conversion step:
from ezdxf.addons import odafc
doc = odafc.readfile("structure.dwg")
print(f"Loaded as DXF version: {doc.dxfversion}")
version = doc.header["$ACADVER"]
units = doc.header["$INSUNITS"]
print(f"AutoCAD version: {version}")
print(f"Drawing units: {units}")
Under the hood, odafc.readfile() converts the DWG to a temporary DXF file and loads it with ezdxf. You get the same full access to headers, layers, blocks, and entities. The ODA File Converter must be installed separately for this to work.
GNU LibreDWG
LibreDWG is a free C library licensed under GPLv3 that reads and writes DWG files directly without converting to DXF first. It ships with several command-line utilities:
- dwgread: Dumps DWG file contents including header variables, objects, and entities
- dwg2dxf: Converts DWG to DXF format
- dwggrep: Searches for text strings across DWG files
- dwglayers: Lists all layer names in a DWG file
To extract header metadata from a DWG file:
dwgread -v2 drawing.dwg 2>&1 | grep -E "^\$"
This produces a verbose dump filtered to header variables. For layer names only:
dwglayers drawing.dwg
LibreDWG works well for quick command-line inspection and scripting, but its support for newer DWG versions (2018+) can be inconsistent. For production pipelines, the ODA converter is more reliable across AutoCAD versions.
Commercial SDKs
If you need to process DWG files at scale without the conversion step, commercial options include the ODA Drawings SDK (C++/.NET), Aspose.CAD (Java/.NET/Python), and GroupDocs.Metadata. These provide direct binary parsing with full version support, but they come with licensing costs that may not justify the investment for occasional metadata extraction.
Centralize Your CAD Metadata Extraction
Upload DWG, DXF, and project documents to a Fast.io workspace. Define the fields you need in plain English and let Metadata Views extract structured data automatically. Free plan includes 50 GB storage and 5,000 credits per month, no credit card required.
Building a Batch Metadata Extraction Pipeline
Individual file inspection is useful for troubleshooting, but the real value of programmatic metadata extraction shows up when you need to catalog hundreds or thousands of CAD files. A batch pipeline pulls properties from every file in a directory tree and produces a structured report.
A complete batch extraction script
This script handles both DXF files (directly) and DWG files (through ODA conversion):
import ezdxf
import os
import csv
from pathlib import Path
def extract_metadata(filepath):
"""Extract metadata from a single DXF file."""
doc = ezdxf.readfile(str(filepath))
header = doc.header
layers = [layer.dxf.name for layer in doc.layers]
msp = doc.modelspace()
entity_count = sum(1 for _ in msp)
blocks = [
b.name for b in doc.blocks
if not b.name.startswith("*")
]
return {
"filename": filepath.name,
"version": header.get("$ACADVER", ""),
"created": str(header.get("$TDCREATE", "")),
"modified": str(header.get("$TDUPDATE", "")),
"units": header.get("$INSUNITS", 0),
"last_saved_by": header.get("$LASTSAVEDBY", ""),
"layer_count": len(layers),
"layers": "; ".join(layers),
"entity_count": entity_count,
"block_count": len(blocks),
}
def batch_extract(directory, output_csv):
"""Process all DXF files in a directory tree."""
results = []
for dxf_path in Path(directory).rglob("*.dxf"):
try:
meta = extract_metadata(dxf_path)
results.append(meta)
print(f"Extracted: {dxf_path.name}")
except Exception as e:
print(f"Failed: {dxf_path.name} - {e}")
if results:
with open(output_csv, "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=results[0].keys())
writer.writeheader()
writer.writerows(results)
print(f"Processed {len(results)} files -> {output_csv}")
batch_extract("/path/to/drawings", "cad_metadata.csv")
Adding DWG support with ODA
To handle DWG files in the same pipeline, add a conversion step before parsing. The ezdxf odafc addon makes this straightforward:
from ezdxf.addons import odafc
def extract_metadata_any(filepath):
"""Extract metadata from DXF or DWG files."""
ext = filepath.suffix.lower()
if ext == ".dwg":
doc = odafc.readfile(str(filepath))
elif ext == ".dxf":
doc = ezdxf.readfile(str(filepath))
else:
raise ValueError(f"Unsupported format: {ext}")
return extract_metadata_from_doc(doc, filepath.name)
Then update the glob pattern to rglob("*.d[wx][gf]") to match both extensions in one pass.
Interpreting unit codes
The $INSUNITS integer needs translation for human-readable reports:
UNIT_NAMES = {
0: "Unitless", 1: "Inches", 2: "Feet",
3: "Miles", 4: "Millimeters", 5: "Centimeters",
6: "Meters", 7: "Kilometers", 8: "Microinches",
9: "Mils", 10: "Yards", 11: "Angstroms",
12: "Nanometers", 13: "Microns", 14: "Decimeters",
15: "Decameters", 16: "Hectometers",
17: "Gigameters", 18: "Astronomical Units",
19: "Light Years", 20: "Parsecs",
}
unit_name = UNIT_NAMES.get(units_code, "Unknown")
Most architectural drawings use inches or feet. Mechanical and structural engineering files typically use millimeters or meters. Civil engineering drawings sometimes use feet with decimal subdivisions. Knowing the unit system matters when you are comparing dimensions across files from different teams or regions.
Handling timestamp conversion
The $TDCREATE and $TDUPDATE values are Julian day fractions, which are not immediately useful. Convert them to standard dates:
from datetime import datetime, timedelta
def julian_to_datetime(julian_value):
"""Convert AutoCAD Julian date to Python datetime."""
julian_day = int(julian_value)
fraction = julian_value - julian_day
base = datetime(year=-4713, month=11, day=24)
return base + timedelta(days=julian_day, seconds=fraction * 86400)
Note that some DXF files store timestamps as zero when the creating application does not populate them, so check for zero values before converting.
Structured Extraction with AI Platforms
Scripts work well when you know exactly which fields to extract and every file follows the same format. But CAD projects often involve mixed file versions, inconsistent property usage, and supplementary documents (specifications, revision logs, transmittal sheets) that carry metadata outside the DWG/DXF files themselves.
AI-powered extraction platforms handle this variability without custom parsing logic. Instead of writing regex patterns or field-specific code, you describe what you want in natural language and let the platform figure out where to find it.
Fast.io Metadata Views
Fast.io's Metadata Views turn uploaded documents into a live, queryable database. You describe the fields you want extracted in plain English, and the AI designs a typed schema with field types like Text, Integer, Decimal, Boolean, Date & Time, and JSON. The system matches files in your workspace and populates a sortable, filterable spreadsheet.
For CAD project management, you might define columns like:
- Drawing Number (Text): The unique identifier from the title block
- Revision (Text): Current revision letter or number
- Author (Text): Who created or last modified the drawing
- Last Modified (Date & Time): When the file was last saved
- Discipline (Text): Architectural, structural, mechanical, or electrical
Upload your CAD files (or supplementary PDFs, spreadsheets, and transmittals that reference drawing metadata) to a Fast.io workspace, define the columns, and the extraction runs automatically. Add new columns later without reprocessing existing files.
Agents can also create Views, trigger extraction, and query results through the Fast.io MCP server, which exposes Streamable HTTP at /mcp. This means you can integrate structured CAD metadata extraction into automated pipelines where an agent processes incoming drawings, extracts properties, and surfaces them for human review.
When to use scripts vs. AI extraction
Use ezdxf and ODA when you need exact header variables from DXF/DWG binary data, like $ACADVER, $INSUNITS, or layer definitions. These tools parse the file format directly and give you precise, deterministic results.
Use AI extraction when metadata lives in title blocks, revision tables, or accompanying documents rather than header variables. Title block text is geometry (lines and text entities), not structured metadata. Parsing it with ezdxf means writing custom code to find the right text entities in the right locations, which breaks when title block layouts change. AI extraction reads the visual layout the way a human would.
The most effective approach combines both: use ezdxf for header-level metadata and an AI platform like Fast.io for title block and document-level metadata.
Common Pitfalls and Troubleshooting
CAD metadata extraction sounds straightforward until you encounter the real-world edge cases that accumulate across decades of AutoCAD versions and third-party CAD applications.
Missing or empty properties
Many CAD applications do not populate drawing properties automatically. A file created in BricsCAD, DraftSight, or a custom CAD application might have blank $LASTSAVEDBY and zero-valued timestamps. Your extraction code should treat empty values as expected rather than errors. Use header.get() with fallback defaults, and document which files returned incomplete metadata rather than failing silently.
DWG version compatibility
The DWG format has changed across AutoCAD versions. Files from AutoCAD 2000 (AC1015), 2004 (AC1018), 2007 (AC1021), 2010 (AC1024), 2013 (AC1027), and 2018 (AC1032) each use a different binary encoding. LibreDWG handles most versions through 2018 but can struggle with newer files. ODA File Converter has the broadest version support since the Open Design Alliance actively maintains compatibility.
If a DWG file fails to convert, check the $ACADVER string in the error output. You may need a newer version of your conversion tool, or the file may be corrupted.
Encoding issues with international text
The $DWGCODEPAGE variable specifies the character encoding. Files from Japanese, Chinese, Korean, or Eastern European AutoCAD installations use encodings like ANSI_932, ANSI_936, or ANSI_1250. If layer names or text properties display as garbled characters, you are reading with the wrong encoding.
ezdxf handles encoding detection automatically for most files, but you can override it:
doc = ezdxf.readfile("drawing.dxf", encoding="cp932")
Proxy entities and custom objects
Drawings created with AutoCAD-based vertical products (Civil 3D, Plant 3D, Revit-exported DWG) may contain proxy entities and custom objects that third-party readers cannot fully parse. The metadata for these entities is often incomplete or inaccessible. The standard drawing properties and header variables are still readable, but you may see warnings about unresolved proxies.
Title block metadata vs. header metadata
Architects and engineers often store critical metadata (drawing number, project name, revision, approver signatures) in the title block rather than in drawing properties. Title blocks are composed of LINE, TEXT, MTEXT, and ATTRIB entities placed at specific coordinates. Extracting this data requires either positional logic tied to a specific title block template, or attribute extraction from block references.
If your title block uses attributed blocks (the recommended AutoCAD practice), ezdxf can read the attributes:
msp = doc.modelspace()
for insert in msp.query("INSERT"):
if insert.dxf.name == "TITLE_BLOCK":
for attrib in insert.attribs:
print(f"{attrib.dxf.tag}: {attrib.dxf.text}")
This works reliably when the block name and attribute tags are consistent across your drawing set. When they are not, AI extraction becomes the more practical path.
Batch processing failures
When processing large file sets, expect a failure rate between 1% and 5% due to corrupted files, unsupported versions, or encoding problems. Build your pipeline to log failures and continue processing rather than stopping on the first error. After the batch completes, review the failure log and handle edge cases individually.
Frequently Asked Questions
How do I extract metadata from a DWG file?
DWG is a proprietary binary format, so you cannot read it with a text editor. Use ODA File Converter to convert the DWG to DXF, then parse the DXF with Python ezdxf. Alternatively, use the ezdxf odafc addon to load DWG files directly: `doc = odafc.readfile('drawing.dwg')`. You can then access header variables like $ACADVER, $TDCREATE, $INSUNITS, and $LASTSAVEDBY through the doc.header dictionary.
What metadata is stored in DXF files?
DXF files store metadata in a human-readable HEADER section containing over 200 system variables. Key metadata includes the AutoCAD version ($ACADVER), creation and modification timestamps ($TDCREATE, $TDUPDATE), drawing units ($INSUNITS), last saved by ($LASTSAVEDBY), and character encoding ($DWGCODEPAGE). Beyond the header, DXF files also contain layer definitions, block references, dimension styles, and text styles as structural metadata.
Can Python read DWG file properties?
Python cannot read DWG files natively because the format is proprietary binary. However, the ezdxf library's odafc addon wraps ODA File Converter to convert DWG to DXF transparently, giving you full access to header variables, layers, blocks, and entities. Libraries like Aspose.CAD and GroupDocs.Metadata for Python also provide direct DWG parsing through commercial licenses.
How to batch extract CAD file metadata?
Use Python's pathlib to glob DXF files in a directory tree, then call ezdxf.readfile() on each file to extract header variables, layer counts, and entity statistics. Write the results to CSV or JSON for analysis. For DWG files, add a conversion step using ODA File Converter or the ezdxf odafc addon. Expect a 1-5% failure rate on large file sets due to version mismatches or corruption, so build error handling into your pipeline.
What is the difference between DWG and DXF metadata?
Both formats store the same core metadata (version, timestamps, units, layers, blocks), but they encode it differently. DXF is a text-based format where metadata appears as readable variable names in the HEADER section. DWG is a proprietary binary format that requires specialized parsers like ODA SDK or LibreDWG to read. DXF is easier to inspect manually, while DWG is more compact and is the format AutoCAD uses natively.
How do I extract title block information from CAD files?
Title block data (drawing number, project name, revision) is typically stored as block attributes rather than header variables. In ezdxf, query for INSERT entities matching your title block name, then iterate through the attribs collection to read tag-value pairs. If your title blocks do not use attributed blocks, the data exists as positioned TEXT or MTEXT entities, which requires positional parsing or AI-based extraction.
Related Resources
Centralize Your CAD Metadata Extraction
Upload DWG, DXF, and project documents to a Fast.io workspace. Define the fields you need in plain English and let Metadata Views extract structured data automatically. Free plan includes 50 GB storage and 5,000 credits per month, no credit card required.