Skip to content

LIGHT profile

Lossless — safe to apply by default. Every step is encoding repair: it repairs the encoding (Unicode form, presentation forms, tatweel, look-alike letters, whitespace) without discarding any linguistic signal (ADR-0004).

It assembles these steps, in order:

# Step Safety Lossless?
1 NormalizeUnicode encoding_repair ✓ lossless
2 StripBidi encoding_repair ✓ lossless
3 FoldPresentationForms encoding_repair ✓ lossless
4 RemoveTatweel encoding_repair ✓ lossless
5 UnifyLookalikes encoding_repair ✓ lossless
6 CollapseWhitespace encoding_repair ✓ lossless
7 NormalizeUnicode encoding_repair ✓ lossless

See the API reference for what each step does and how to configure it.