Recent versions of ChatGPT, especially the o3 and o4-mini models, often insert invisible Unicode characters—such as zero-width spaces (U+200B), narrow no-break spaces (U+202F), and others—into their replies. These characters are not visible in standard text editors or word processors, but they can be detected and removed using several methods.
Common Invisible Unicode Characters in ChatGPT Replies
- Zero-width space: U+200B
- Zero-width non-joiner: U+200C
- Zero-width joiner: U+200D
- Narrow no-break space: U+202F
- Tag characters (for hidden instructions): U+E0000–U+E007F
Methods to Detect Invisible Unicode
1. Online Tools
- Paste the suspected text into specialized online detectors such as:
- soscisurvey.de/tools/view-chars.php
- invisible-characters.com
- ASCII Smuggler (for Tag Unicode block payloads)
- These tools will highlight or list any invisible or non-printable characters present in the text.
2. Text Editors with Special Character Visualization
- Use code editors like Visual Studio Code, Sublime Text, or Notepad++ (with appropriate plugins) to visualize zero-width and other invisible Unicode characters.
- In Microsoft Word, pressing Ctrl+Shift+8 will show regular spaces as dots and some ChatGPT watermarks (like U+202F) as circles.
- Google Docs and LibreOffice Writer may also help, but effectiveness varies.
3. Hex or Byte Inspection
- Use a hex editor or run a hexdump on the file to see the raw Unicode codepoints.
- Command-line tools (Linux/macOS/WSL) can filter or display these characters:
cat input.txt | tr -d '\u200B\u200C\u200D' > cleaned.txt
This command removes common zero-width characters from the text.
4. Find and Replace
- In editors like Word or Sublime Text, use the “Find” function to search for specific Unicode codes (e.g.,
\u200B
for zero-width space) and replace them with normal spaces or nothing.
5. Programming Scripts
- Use a Python script to remove invisible Unicode characters:
with open("input.txt", "r", encoding="utf-8") as f:
text = f.read()
cleaned = text.replace('\u200B', '').replace('\u200C', '').replace('\u200D', '')
with open("output.txt", "w", encoding="utf-8") as f:
f.write(cleaned)
This removes common invisible characters.
Summary Table: Detection Methods
Method | Tools/Editors | What It Shows |
---|---|---|
Online Unicode detector | soscisurvey.de, invisible-characters.com, ASCII Smuggler | All invisible Unicode, including tags |
Code editor with plugin | VS Code, Notepad++, Sublime Text | Zero-width, NNBSP, etc. |
Word processor (special view) | Word (Ctrl+Shift+8) | Spaces, NNBSP (as circles) |
Hex/byte analysis | Hex editor, hexdump | Raw Unicode codepoints |
Find & Replace | Word, Sublime, etc. | Search/remove by Unicode code |
Programming | Python, Bash | Remove or count invisible Unicode |
Key Points
- Invisible Unicode in ChatGPT replies is now common, especially in outputs from newer models.
- These characters are used as watermarks or as a result of tokenizer quirks, not visible in standard views.
- Detection is straightforward with online tools, enhanced text editors, or simple scripts.
- Removal is as easy as replacing or filtering out the specific Unicode codes.
Tip: If you want to ensure clean, invisible-character-free text, paste your ChatGPT output into a plain text editor like Notepad (Windows) or use a tool that strips all formatting and non-printable characters.
For advanced or security-focused use cases (e.g., prompt injection detection), tools like ASCII Smuggler can help you uncover even more subtle Unicode-based payloads.
Leave a Reply