Pattern matching in Autocorrect

 

Note: From LibreOffice 4.2.5, the wildchard character sequence is .* (dot asterisk) instead of the plain * (asterisk) – 2014-06-04.
A new patch of Autocorrect feature allows the text replacement before or after arbitrary affixes depending on the starting or ending wildcard character * in the Autocorrect Replace pattern. This is a small, but useful enhancement in word processing, especially for affix rich languages, but I will show a nice improvement for French typography, too, using this feature.
autocorrect_pattern
For example, with the “i18n*” → “internationalization” item Autocorrect will find and replace i18ns with internationalizations, too. Hungarian spelling dictionary handles two thousand suffixes of a given noun, dozens of them are quite frequent, simply exceeding the limitations of the old Autocorrect feature. With the new patch and the modified Autocorrect list LibreOffice will be able to handle all forms of serious misspellings and common abbreviations, that is a real innovation for Hungarian and similar languages. But the following examples help the word processing in English and other languages, too:

  • Typographic correction of ellipses (with the precomposed ellipsis character U+2026): *…, eg. word… → word… (see below on the screenshot)
  • The same combined with quotation marks: “…*“… and *…”…”, eg. “…and a quote…” → “…and a quote…”
  • Simplified input for special symbols: *%o, eg. 7%o → 7‰
  • French punctuation. LibreOffice has got only a poor man’s input method for French typography, inserting full long (“typewriter”) spaces before question and exclamation marks, colon and semicolon, and before and after guillemets (only Graphite fonts Linux Libertine G and Biolinum G support French typography well). With the new Autocorrect patch and with the following replacements, it’s possible to get better spaces in the case of Unicode fonts with narrow no-break space (U+202F): *! ! (U+202F !), * ! ! (a replacement for the same sequence to avoid multiple insertion of narrow no-break space) etc. It seems, this could be a general method, because missing narrow no-break spaces are replaced by normal spaces (like in the recent poor man’s method). But fonts with narrow no-break spaces, like DejaVu Serif, Liberation Sans and Serif, Linux Libertine and Biolinum (also not Graphite versions) give better French typography (to use the new method, switch the French poor man’s method off in the Localized options of Autocorrect settings):

ellipsisfrenchpunct600

Hozzászólások

  1. Hi László,

    Thanks for this useful feature.

    About the French typography, it was decided to exclude the narrow non-breaking space in automatic replacement in 2009, for many fonts don’t contain this character : http://wiki.openoffice.org/wiki/Non_Breaking_Spaces_Before_Punctuation_In_French_%28espaces_ins%C3%A9cables%29#Exclusion_of_the_NARROW_NO-BREAK_SPACE_.28U.2B202F.29

    May be it’s time to change this behavior ?

    BTW, I recently asked to make visible the narrow non-breaking space. His invisibility is another reason for not using it.
    https://bugs.freedesktop.org/show_bug.cgi?id=67669

    • Hi Olivier, Many thanks for your comment. I didn’t know about the discussion, and I glad to see your extension and the coincidence. 🙂 It seems, often narrow space is a little longer, than the correct length with the space of the dot, but it’s better, than the normal space. It would fine the narrow/thin space length without the space of dot glyph, so I don’t know the ideal solution. Thanks for the links! László

  2. There is a bug in the handling of “special” spaces on Linux with 4.1, you may want to check you are not running into it https://bugs.freedesktop.org/show_bug.cgi?id=66715.