How to Redact a PDF Properly (Why a Black Rectangle Doesn't Work)

Written by The PDFOutfit Team

Updated May 28, 2026 • 13 min read

🔑 Key Takeaways

A black rectangle drawn on top of text is often not real redaction. In many cases the underlying characters remain selectable, copyable, and searchable in the file.
True redaction permanently removes or destroys the underlying content. A visible black box should be a marker for where content was — not the thing hiding it.
Several famous failures — Manafort (2019), TSA (2009), DOJ Calipari report (2005) — involved black rectangles placed over intact text.
A safer workflow: apply true redaction, remove hidden metadata, flatten or sanitize annotations and form fields, then verify by searching and copy-pasting the output.
HIPAA's 18 PHI identifiers and Federal Rule of Civil Procedure 5.2 set specific redaction requirements for healthcare and federal court filings.

How Do You Redact a PDF Properly?

A black rectangle drawn over text in a PDF is often a separate object placed on top of unchanged text — anyone can sometimes select, copy, and read the underlying content. Real redaction permanently removes or destroys the affected content in the file itself. A safer workflow has four steps: (1) apply a true redaction operation; (2) remove document metadata that might identify the author, software, or origin; (3) flatten or sanitize annotation and form-field layers when applicable; (4) verify the output by searching for the redacted terms and attempting to copy-paste the redacted areas. PDFOutfit's Redact Text tool offers a secure, image-based redaction mode (on by default) that rasterizes the affected pages — converting the text into pixels so the original character data no longer exists in the file. Files are processed locally in your browser; document contents are not uploaded to PDFOutfit's servers for processing.

The Court Filing That Compromised an Investigation

On January 8, 2019, Paul Manafort's lawyers filed a 44-page response in the U.S. District Court for the District of Columbia.

The filing was the defense's reply to allegations from special counsel Robert Mueller that Manafort, the former chair of Donald Trump's 2016 presidential campaign, had lied to investigators in breach of his plea agreement. Roughly a fifth of the document was supposed to be redacted — black bars covering sensitive details about Manafort's alleged contacts with Konstantin Kilimnik, a business associate the FBI has tied to Russian intelligence.

Within hours of the document being unsealed, journalists from the Associated Press, Fortune, and The Hill noticed something the defense team apparently did not: the "redacted" sections were copy-pasteable.

The lawyers had drawn black rectangles over the sensitive text in their PDF. The underlying characters were untouched. Anyone could select the black box area, copy it, paste it into a plain text editor, and read the contents.

What came out: Manafort had shared 2016 campaign polling data with Kilimnik. He had met with him in Madrid. He had discussed a Ukrainian peace plan with him on more than one occasion during the campaign. These were the first publicly attributed details linking a Trump campaign official to someone the U.S. intelligence community had tied to the Kremlin — disclosed accidentally, by a "redaction" that didn't redact.

This is what every professional who handles confidential documents needs to understand: a black rectangle alone is not redaction. It can be a visual cover sitting on top of fully intact text. Real redaction is a different operation — and some tools make it easy to confuse visual markup with true content removal.

What Real Redaction Actually Is

In modern legal and document-handling contexts, to "redact" means to permanently remove specified information from a document such that the underlying data is destroyed, not just hidden. This distinction matters because of how PDFs are built.

A PDF can store text, graphics, images, annotations, form fields, comments, and metadata as separate objects within the file. When a reader displays a page, it composes the visible result by rendering all of these objects together. A black rectangle drawn over text typically adds a new annotation object on top of the original text object — it doesn't modify the text underneath. The visible result looks redacted. The file structure tells a different story.

Three Common Failure Modes

Annotation rectangles over text. The Manafort case. The black box is a separate object; the text remains selectable, copyable, and searchable.
White text on white background. Common in Word-to-PDF conversions. The text is invisible to the eye but still present and fully searchable.
Image overlays without text-layer modification. Tools that place a black bar image over the text area without touching the underlying PDF text.

Real redaction takes one of two paths:

Content-stream modification — the tool edits the PDF's internal structure to remove the character codes representing the redacted information. The black box in the output is a marker showing where the removed content used to be; the effectiveness comes from the underlying data being gone.
Page rasterization — the tool renders the affected pages to images, then rebuilds the page from those images. The original text objects cease to exist because the page is no longer composed of text objects.

Both approaches can produce real redaction when implemented correctly. They differ in cost — content-stream modification preserves searchability and accessibility for non-redacted content; rasterization is simpler to reason about but converts whole pages to images.

A simple first check: open any redacted PDF, press Ctrl+F (or Cmd+F), and search for the redacted term. Zero results is a useful early signal. But for high-risk documents, search alone is not enough — there are additional places redacted content can survive, which is why this guide presents redaction as a four-step workflow rather than a single operation.

The Hall of Fame of Failed Redactions

The Manafort case is the most famous example, but not the first or most consequential. Improper redaction has embarrassed governments, corporations, and law firms for at least two decades.

Manafort court filing — January 8, 2019

The case detailed above. Defense lawyers used black rectangles on top of intact PDF text in a federal court filing. Reporters extracted the entire redacted contents within hours. The disclosed information became central to the public narrative of the Mueller investigation's findings on the campaign's Russia contacts.

TSA Standard Operating Procedures manual — December 2009

The Transportation Security Administration posted a 93-page screening procedures manual on a federal procurement website. The redacted sections covered methods for screening foreign dignitaries and CIA-escorted travelers, calibration settings for X-ray machines and metal detectors, sample images of DHS and CIA credentials, and lists of countries whose citizens were always subject to enhanced screening.

The redaction method: black boxes drawn on top of intact text. Within days, copies of the unredacted version appeared on Cryptome and WikiLeaks, and the document remained accessible permanently. TSA placed five employees on administrative leave.

DOJ Calipari incident report — May 2005

On March 4, 2005, U.S. forces in Iraq fired on a vehicle near Baghdad airport, killing Italian intelligence officer Nicola Calipari, who was escorting a just-released kidnapping hostage. The Multi-National Force Iraq investigation report was posted online with extensive redactions covering classified procedures, rules of engagement, and personnel names. As The Register reported on May 3, 2005, the file had not been saved with the edits actually applied, so a simple copy-and-paste revealed the full report. Italian journalists extracted and published the unredacted contents.

The pattern

In all three cases the people doing the redacting were government lawyers, security officials, and intelligence professionals — not careless amateurs. The error was always the same: confusing a visual overlay with content modification. Drawing on top of a PDF is not the same as removing from a PDF.

A Safer Redaction Workflow (Four Steps)

Real redaction is a sequence of operations, not a single tool. Here is a workflow that protects against the failure modes above.

Step 1: Apply true redaction

Use a tool that either modifies the PDF's content stream to remove character codes, or rasterizes the affected pages so the original text objects no longer exist. PDFOutfit's Redact Text tool defaults to the rasterization approach — converting affected pages to images so the underlying text data is no longer present. After this step, the redacted information should not be present in the file; the visible black indicator is just a marker showing where content was removed.

Step 2: Remove hidden metadata

PDFs carry hidden metadata that often reveals as much as the visible content: author name (inherited from your OS user account), software identifiers like "Microsoft Word for Mac 16.78," creation timestamps to the second, and sometimes the file path on the originating computer. None of this is removed by redacting visible text. Use Edit Metadata to clear or replace the author, creator, producer, and timestamp fields — especially important for documents released publicly.

Step 3: Flatten or sanitize annotations and form fields (when applicable)

Even after redaction and metadata cleanup, the file may still contain annotation layers, comment threads, form fields, and tracked changes that aren't visible on the rendered page but are present in the file structure. Flatten PDF consolidates annotations into the rendered content. When the redaction step rasterizes the affected pages — as PDFOutfit's secure redaction does — much of this concern is handled automatically for those pages, because annotations have been baked into the image.

Step 4: Verify

Verification is non-negotiable. Three checks for any redacted document before transmission:

Search check. Search (Ctrl+F / Cmd+F) for each redacted term. Zero results is a useful first signal — but not the whole story.
Copy-paste check. Try to select the black indicator areas, copy, and paste into a plain text editor. Nothing relevant should paste.
Hidden-content check. For high-risk documents, verify there are no OCR layers, embedded files, hidden form fields, comments, alternate encodings, or thumbnail caches preserving the content.

If any check reveals the redacted content is still recoverable, do not transmit the document. Repeat the redaction — for PDFOutfit, ensure Secure redaction is enabled — and verify again.

How to Redact a PDF in PDFOutfit

PDFOutfit's Redact Text tool runs entirely in your browser. Files are processed locally; document contents are not uploaded to PDFOutfit's servers. This matters specifically for redaction, because the documents being redacted are, by definition, sensitive.

⚠ Before you redact: Always work on a copy of the file. Keep the original stored separately in case you need it for an audit trail or to redo the redaction with different settings. Once a page is rasterized, the redacted version cannot be reverted.

Step-by-step walkthrough

Open pdfoutfit.com/redact-text in any modern browser.
Drop your PDF into the upload area.
Confirm Secure redaction is on — it's the default. This mode rasterizes affected pages so the underlying text cannot be recovered.
Either select text directly on the page preview, or use the search feature to redact all instances of a recurring term — Social Security Numbers, phone numbers, email addresses, or custom patterns for case and account numbers.
Click Redact Text (or Redact N Areas if you drew boxes).
Download the redacted file.

The output file has the affected pages converted to images with the redacted areas removed at the pixel level. The original text objects on those pages no longer exist in the PDF.

What PDFOutfit's redactor does that simpler tools don't

True redaction by default. Secure redaction is the default mode and converts affected pages to images, so the original text data is no longer in the file structure.
Pattern matching. Redact every SSN, every email address, or every instance of a name in one operation rather than clicking each individually.
Local processing in the browser. For documents under legal hold, attorney-client privilege, or HIPAA protection, the fact that the file is not uploaded to a server is often the deciding factor.

How PDFOutfit's Redactor Actually Works

A PDF file is structured as a series of objects. One category contains the document's text content — character codes stored alongside positioning data. Another contains the document's annotations — rectangles, highlights, comments. When a tool draws a black rectangle annotation over text, the PDF stores the rectangle as a new annotation object and the original text object is untouched. At render time the text is drawn first, then the annotation on top. The visual result looks redacted; the file structure is not.

PDFOutfit's redactor takes a different path. Rather than editing the PDF's content stream to delete specific character codes, it rasterizes the affected pages — rendering them to images, then rebuilding the page from those images. The original text objects no longer exist because the page is no longer composed of text objects. It's a single flattened image with the redacted regions removed at the pixel level.

This produces a security model similar to printing the document, redacting it on paper with a marker, and scanning the result back into a PDF — the standard for handling classified paper documents for decades. PDFOutfit does the digital version without the printer.

Compared with content-stream modification (the approach Adobe Acrobat Pro uses), rasterization has different strengths and costs. Content-stream modification preserves text searchability and accessibility for non-redacted text on each page; rasterization destroys those for the entire page. Both can produce true redaction when implemented correctly, in the sense that matters most: the original redacted information is no longer recoverable from the file.

The Honest Trade-offs of Image-Based Redaction

Rasterization is not free. Honest disclosure of what changes when affected pages are converted to images:

File size increases. A 200KB text-only page can become 1–3MB as an image. Compress PDF can recover some of this after redaction.
Searchability ends on rasterized pages. Recipients can no longer use Ctrl+F to find any text on a redacted page. Pages without redactions remain searchable.
Accessibility is reduced on rasterized pages. Screen readers cannot read image-based text. If a document must be both redacted and accessible, content-stream redaction (Acrobat Pro) is typically the better choice.
Text crispness depends on resolution. Rasterized pages look like scanned documents — fine at default zoom, pixelated when zoomed past native resolution.
Copy-paste of any text on the page is gone. Recipients can't extract a quoted paragraph from a redacted page, even content that wasn't part of the redaction.
PDF/A conformance may be affected. Documents that must comply with archival standards should be re-validated after redaction.

These trade-offs are why image-based redaction isn't right for every document. For a document that must remain text-searchable for the recipient, rasterization is the wrong tool. But for documents where the cost of accidental disclosure exceeds the cost of larger files and reduced searchability — most of the use cases redaction was invented for — rasterization is among the strongest available options.

Redaction vs. Flattening vs. Metadata Removal

These three operations are often confused — but they protect against different risks. A complete document-sanitization workflow usually involves all three.

Operation	What it does	Protects against	What it doesn't do
Redaction	Permanently removes specific visible content. With rasterization, affected pages become images so the original text data is gone.	Disclosure of the redacted content.	Doesn't remove metadata or annotations outside the redacted area.
Flattening	Consolidates annotations, comments, form fields, and tracked changes into the rendered content.	Recovery of hidden comments, draft notes, tracked changes, or form responses.	Doesn't redact visible content or remove metadata fields.
Metadata removal	Clears or replaces hidden document properties — author, software, timestamps, file paths, custom XMP fields.	Identification of the author, organization, or origin computer.	Doesn't redact visible content or remove annotations.

A common mistake is doing one of these and assuming the others are handled. Redacting a name doesn't remove the author's name from metadata. Flattening doesn't redact what's visible. Removing metadata doesn't address collapsed comments still present in the file. For documents going outside your organization, plan for all three — and verify after each.

The Most Common Redaction Mistakes

Eight mistakes account for the majority of failed redactions in practice:

Using a highlight or annotation tool instead of a redaction tool. Highlights, comments, and shapes are annotation objects — they cover text visually without modifying it.
Drawing a black rectangle in a generic PDF editor. The rectangle is a separate object placed over intact text.
Using "fill" with black or white to cover text. The fill is a shape object; the text underneath remains.
Failing to remove metadata. The visible text is gone but document properties still identify author, employer, software, and timestamp.
Skipping the verification step. The single most common cause of public redaction embarrassments.
Redacting only visible text, ignoring margin comments. Comments and tracked changes persist even when collapsed in the UI.
Not addressing OCR layers in scans. A scanned PDF may have an invisible OCR text layer beneath the image; redacting the image alone leaves it intact.
Forgetting that earlier drafts exist elsewhere. Drafts shared with reviewers may persist as previous versions in collaboration tools.

Industry-Specific Redaction Requirements

Note: This guide is technical information about how PDF redaction works. It is not legal advice. Compliance rules vary by jurisdiction, document type, and use case. Consult counsel for specific situations.

HIPAA — Healthcare

The HIPAA Privacy Rule's Safe Harbor de-identification standard at 45 CFR § 164.514(b)(2)(i) lists 18 categories of identifiers that must be removed before health information qualifies as de-identified:

Names
Geographic subdivisions smaller than state
All date elements (except year) tied to an individual
Telephone numbers
Fax numbers
Email addresses
Social Security Numbers
Medical record numbers
Health plan beneficiary numbers
Account numbers
Certificate or license numbers
Vehicle identifiers and serial numbers
Device identifiers and serial numbers
Web URLs
IP addresses
Biometric identifiers
Full-face photographs and comparable images
Any other unique identifying number or code

Under Safe Harbor, covered entities generally must remove all 18 categories and have no actual knowledge that the remaining information could identify the individual. The HHS Office for Civil Rights guidance covers both the Safe Harbor and Expert Determination methods in detail.

FRCP Rule 5.2 — Federal Court Filings

Federal Rule of Civil Procedure 5.2 requires that filings in federal court redact:

Social Security Numbers to the last four digits
Taxpayer Identification Numbers to the last four digits
Names of minor children to initials only
Dates of birth to the year only
Financial account numbers to the last four digits

State court rules typically mirror this but vary by jurisdiction. Bankruptcy filings under FRBP 9037 have additional requirements.

GDPR — European Personal Data

GDPR Article 17 establishes a right to erasure that can require deletion of personal data in certain circumstances — it is not absolute and depends on the legal basis for the original processing and several exceptions. If a PDF is shared after applicable data has been erased, the data should be removed from the file, not merely covered visually. The same text-layer or rasterization requirements apply.

Frequently Asked Questions

Is drawing a black rectangle the same as redacting?

In most cases, no. A black rectangle in a PDF is often an annotation drawn on top of the document's existing text layer. The underlying text is unchanged and can be extracted by anyone with a PDF reader and the ability to copy and paste. Real redaction removes or destroys the text in the file structure itself, either by editing the PDF's content stream or by rasterizing the affected pages.

How do I know if my redaction actually worked?

Three checks. First, open the redacted file and search (Ctrl+F / Cmd+F) for one of the terms you redacted — zero matches is a useful first signal. Second, try selecting the black areas with your mouse, copying, and pasting into a plain text editor — nothing relevant should paste from redacted regions. Third, for high-risk documents, additionally verify there are no OCR layers, embedded attachments, hidden comments, or alternate encodings that could preserve the content. The search check alone is not sufficient for documents where the consequences of a leak are severe.

Can I redact a PDF for free?

Yes. PDFOutfit's Redact Text tool is free for up to two operations per day (five with a free account) and processes the file locally in your browser. The free version of Adobe Reader does not include redaction — that's part of Acrobat Pro. macOS Preview does not have a true redaction feature.

What's the difference between hiding and redacting text?

Hiding text covers it visually but leaves the underlying content in the file. Examples include drawing black rectangles, changing text color to match the background, applying highlight annotations, or placing image overlays. Redacting text removes the underlying content from the file structure — either by editing the content stream or by converting the affected pages to images. The distinction isn't visible in the rendered document; it shows up when you try to search, copy, or extract.

Do I need to remove metadata when I redact?

For internal use, often not. For external transmission of documents that are intended to be anonymous or de-identified, yes. PDF metadata typically contains the author's name (inherited from your operating system), the software used to create the file, exact timestamps, and sometimes file paths or computer names. Use PDFOutfit's Edit Metadata tool to clear these fields after redacting visible text.

How do I redact a scanned PDF?

Scanned PDFs sometimes have only an image of the page, and sometimes have an OCR text layer beneath the image. PDFOutfit's secure redaction mode rasterizes the affected pages — and because the original page is replaced with a rasterized image, existing OCR text layers on the affected page are removed along with the original text content. For image-only scans (no OCR), the rasterization approach works directly: the redacted area is removed at the pixel level.

Can I undo a redaction?

If the redaction was done properly — text removed from the content stream, or the page rasterized — no. The data is gone from the file. If the redaction was cosmetic (an annotation drawn over text), the original content is recoverable by anyone with basic PDF tools. This asymmetry is exactly why proper redaction matters: the cost of doing it right is two minutes; the cost of doing it wrong has historically had real consequences.

What tools do attorneys use for redaction?

Adobe Acrobat Pro's redaction tools are the most widely used in legal practice and produce true content-stream redaction while preserving searchability of non-redacted text. PDFOutfit produces true redaction via rasterization with browser-based local processing — preferred when documents are under attorney-client privilege or in active litigation hold, accepting the trade-off that rasterized pages lose searchability and accessibility. Specialized e-discovery platforms (Relativity, Everlaw, Logikcull) include redaction tooling integrated with document review but require subscription and training.

Redact a PDF the right way

Free, no account required, processed locally in your browser. Secure redaction is on by default—and the verification checks above let you confirm the redaction worked before you send a single byte.

Open Redact Text →

Related Privacy-Safe PDF Tools

Proper redaction is a workflow, not a single tool. Each runs in your browser and pairs naturally with the redaction process:

Redact Text — the tool walked through above
Edit Metadata — remove author, software, and timestamp fields before transmission
Flatten PDF — consolidate annotation layers into the rendered content
Compress PDF — recover file size after rasterization
Add Password — encrypt the final redacted file for delivery

Sources

Bertrand, Natasha. "Redaction error in court filing reveals Manafort shared polling info." The Hill, January 8, 2019.
Boyle, Christina; Schabner, Dean; Cole, Matthew. "Massive TSA Security Breach As Agency Gives Away Its Secrets." ABC News, December 8, 2009.
McCarthy, Kieren. "That classified US military report's secrets in full." The Register, May 3, 2005.
U.S. Department of Health and Human Services. "Guidance Regarding Methods for De-identification of Protected Health Information." HHS.gov.
45 CFR § 164.514; Federal Rule of Civil Procedure 5.2.

← Back to all guides

How to Redact a PDF Properly: Why a Black Rectangle Isn't Redaction (and the Famous Cases That Prove It)