CLASSICAL profile¶
Lossless — safe to apply by default. Every step is encoding repair: it repairs the encoding (Unicode form, presentation forms, tatweel, look-alike letters, whitespace) without discarding any linguistic signal (ADR-0004).
It assembles these steps, in order:
| # | Step | Safety | Lossless? |
|---|---|---|---|
| 1 | NormalizeUnicode |
encoding_repair |
✓ lossless |
| 2 | StripBidi |
encoding_repair |
✓ lossless |
| 3 | FoldPresentationForms |
encoding_repair |
✓ lossless |
| 4 | RemoveTatweel |
encoding_repair |
✓ lossless |
| 5 | UnifyLookalikes |
encoding_repair |
✓ lossless |
| 6 | CollapseWhitespace |
encoding_repair |
✓ lossless |
| 7 | NormalizeUnicode |
encoding_repair |
✓ lossless |
See the API reference for what each step does and how to configure it.