diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/manual.md | 254 |
1 files changed, 37 insertions, 217 deletions
diff --git a/doc/manual.md b/doc/manual.md index aa3d7013..b57b964a 100644 --- a/doc/manual.md +++ b/doc/manual.md @@ -1618,8 +1618,9 @@ as running it. ### Requirements for Linux and BSD -First, Linux and BSD systems need either the [GNU C compiler][] (*gcc*) or -[Clang][] (*clang*), as well as [GNU Make][] (*make* or *gmake*). BSD users +First, Linux and BSD systems need either the [GNU C compiler][] (*gcc*) version +4.9 or later (circa early 2014) or [Clang][] (*clang*), [libstdc++][] 4.9 or +later (circa early 2014), and [GNU Make][] (*make* or *gmake*). BSD users additionally need to have [pkg-config][] and [libiconv][] installed. All of these should be available for your distribution through a package manager. For example, Ubuntu includes these tools in the "build-essential" package. @@ -1639,6 +1640,7 @@ users _also_ need "libncursesw5-dev".) [GNU C compiler]: http://gcc.gnu.org [Clang]: http://clang.llvm.org/ +[libstdc++]: http://gcc.gnu.org [GNU Make]: http://www.gnu.org/software/make/ [pkg-config]: http://www.freedesktop.org/wiki/Software/pkg-config/ [libiconv]: http://www.gnu.org/software/libiconv/ @@ -1649,15 +1651,18 @@ users _also_ need "libncursesw5-dev".) Compiling Textadept on Windows is no longer supported. The preferred way to compile for Windows is cross-compiling from Linux. In order to do so, you need -[MinGW][] with the Windows header files. Your package manager should offer them. +[MinGW][] or [mingw-w64][] version 4.9 or later with the Windows header files. +Your package manager should offer them. Note: compiling on Windows requires a C compiler that supports the C99 standard, -the [GTK+ for Windows bundle][] (2.24 is recommended), and +a C++ compiler that supports the C++11 standard, a C++ standard library that +supports C++11, the [GTK+ for Windows bundle][] version 2.24, and [libiconv for Windows][] (the "Developer files" and "Binaries" zip files). The terminal (pdcurses) version requires my [win32curses bundle][] instead of GTK+ and libiconv. [MinGW]: http://mingw.org +[mingw-w64]: http://mingw-w64.org/ [GTK+ for Windows bundle]: http://www.gtk.org/download/windows.php [libiconv for Windows]: http://gnuwin32.sourceforge.net/packages/libiconv.htm [win32curses bundle]: download/win32curses.zip @@ -1665,10 +1670,15 @@ and libiconv. ### Requirements for Mac OSX Compiling Textadept on Mac OSX is no longer supported. The preferred way is -cross-compiling from Linux. In order to do so, you need the -[Apple Cross-compiler][] binaries. +cross-compiling from Linux. In order to do so, you need install an [OSX cross +toolchain][] _with GCC_ version 4.9 or later. You will need to run +`./build_binutils.sh` _before_ `./build_gcc.sh`. OSX SDK tarballs like +*MacOSX10.5.tar.gz* can be found readily on the internet. -[Apple Cross-compiler]: https://launchpad.net/~flosoft/+archive/cross-apple +Note that building an OSX toolchain can easily take 30 minutes or more and +ultimately consume nearly 3.5GB of disk space. + +[OSX cross toolchain]: https://github.com/tpoechtrager/osxcross ## Compiling @@ -1729,12 +1739,13 @@ Similarly, `make curses` and `make curses install` installs the curses version. When cross-compiling from within Linux, first make a note of your MinGW compiler names. You may have to either modify the `CROSS` variable in the -"win32" block of *src/Makefile* or append something like "CROSS=i486-mingw32-" -when running `make`. After considering your MinGW compiler names, run -`make win32-deps` or `make CROSS=i486-mingw32- win32-deps` to prepare the build -environment followed by `make win32` or `make CROSS=i486-mingw32- win32` to -build *../textadept.exe* and *../textadeptjit.exe*. Finally, copy the dll files -from *src/win32gtk/bin/* to the directory containing the Textadept executables. +"win32" block of *src/Makefile* or append something like +"CROSS=i586-mingw32msvc-" when running `make`. After considering your MinGW +compiler names, run `make win32-deps` or +`make CROSS=i586-mingw32msvc- win32-deps` to prepare the build environment +followed by `make win32` or `make CROSS=i586-mingw32msvc- win32` to build +*../textadept.exe* and *../textadeptjit.exe*. Finally, copy the dll files from +*src/win32gtk/bin/* to the directory containing the Textadept executables. Similarly for the terminal version, run `make win32-curses` or its variant as suggested above to build *../textadept-curses.exe* and @@ -1869,210 +1880,12 @@ Textadept has a [mailing list][] and a [wiki][]. ## Regular Expressions -Textadept uses [TRE][] as its regular expression library. TRE is a "lightweight, -robust, and efficient POSIX compliant regexp matching library". - -The following is from the [TRE Regexp Syntax][]. - -This section describes the POSIX 1003.2 extended RE (ERE) syntax as implemented -by TRE, and the TRE extensions to the ERE syntax. A simple Extended Backus-Naur -Form (EBNF) style notation is used to describe the grammar. +Textadept's regular expressions are based on the C++11 standard for ECMAScript. +There are a number of references for this syntax on the internet including: -**Alternation operator** - - extended-regexp ::= branch - | extended-regexp "|" branch - -An extended regexp (ERE) is one or more branches, separated by `|`. An ERE -matches anything that matches one or more of the branches. - -**Catenation of REs** - - branch ::= piece - | branch piece - -A branch is one or more pieces concatenated. It matches a match for the first -piece, followed by a match for the second piece, and so on. - - piece ::= atom - | atom repeat-operator - | atom approx-settings - -A piece is an atom possibly followed by a repeat operator or an expression -controlling approximate matching parameters for the atom. - - atom ::= "(" extended-regexp ")" - | bracket-expression - | "." - | assertion - | literal - | back-reference - | "(?#" comment-text ")" - | "(?" options ")" extended-regexp - | "(?" options ":" extended-regexp ")" - -An atom is either an ERE enclosed in parenthesis, a bracket expression, a `.` -(period), an assertion, or a literal. - -The dot (`.`) matches any single character. - -Comment-text can contain any characters except for a closing parenthesis `)`. -The text in the comment is completely ignored by the regex parser and it used -solely for readability purposes. - -**Repeat operators** - - repeat-operator ::= "*" - | "+" - | "?" - | bound - | "*?" - | "+?" - | "??" - | bound ? - -An atom followed by `*` matches a sequence of 0 or more matches of the atom. `+` -is similar to `*`, matching a sequence of 1 or more matches of the atom. An atom -followed by `?` matches a sequence of 0 or 1 matches of the atom. - -A bound is one of the following, where *m* and *n* are unsigned decimal integers -between 0 and `RE_DUP_MAX`: - -1. {*m*,*n*} -2. {*m*,} -3. {*m*} - -An atom followed by [1] matches a sequence of *m* through *n* (inclusive) -matches of the atom. An atom followed by [2] matches a sequence of *m* or more -matches of the atom. An atom followed by [3] matches a sequence of exactly *m* -matches of the atom. - -Adding a `?` to a repeat operator makes the subexpression minimal, or -non-greedy. Normally a repeated expression is greedy, that is, it matches as -many characters as possible. A non-greedy subexpression matches as few -characters as possible. Note that this does not (always) mean the same thing as -matching as many or few repetitions as possible. - -**Bracket expressions** - - bracket-expression ::= "[" item+ "]" - | "[^" item+ "]" - -A bracket expression specifies a set of characters by enclosing a nonempty list -of items in brackets. Normally anything matching any item in the list is -matched. If the list begins with `^` the meaning is negated; any character -matching no item in the list is matched. - -An item is any of the following: - -* A single character, matching that character. -* Two characters separated by `-`. This is shorthand for the full range of - characters between those two (inclusive) in the collating sequence. For - example, `[0-9]` in ASCII matches any decimal digit. -* A collating element enclosed in `[.` and `.]`, matching the collating element. - This can be used to include a literal `-` or a multi-character collating - element in the list. -* A collating element enclosed in `[=` and `=]` (an equivalence class), matching - all collating elements with the same primary collation weight as that element, - including the element itself. -* The name of a character class enclosed in `[:` and `:]`, matching any - character belonging to the class. The set of valid names depends on the - `LC_CTYPE` category of the current locale, but the following names are valid - in all locales: - + `alnum` -- alphanumeric characters - + `alpha` -- alphabetic characters - + `blank` -- blank characters - + `cntrl` -- control characters - + `digit` -- decimal digits (0 through 9) - + `graph` -- all printable characters except space - + `lower` -- lower-case letters - + `print` -- printable characters including space - + `punct` -- printable characters not space or alphanumeric - + `space` -- white-space characters - + `upper` -- upper case letters - + `xdigit` -- hexadecimal digits - -To include a literal `-` in the list, make it either the first or last item, the -second endpoint of a range, or enclose it in `[.` and `.]` to make it a -collating element. To include a literal `]` in the list, make it either the -first item, the second endpoint of a range, or enclose it in `[.` and `.]`. To -use a literal `-` as the first endpoint of a range, enclose it in `[.` and `.].` - -**Assertions** - - assertion ::= "^" - | "$" - | "\" assertion-character - -The expressions `^` and `$` are called "left anchor" and "right anchor", -respectively. The left anchor matches the empty string at the beginning of the -string. The right anchor matches the empty string at the end of the string. - -An assertion-character can be any of the following: - -* `<` -- Beginning of word -* `>` -- End of word -* `b` -- Word boundary -* `B` -- Non-word boundary -* `d` -- Digit character (equivalent to `[[:digit:]]`) -* `D` -- Non-digit character (equivalent to `[^[:digit:]]`) -* `s` -- Space character (equivalent to `[[:space:]]`) -* `S` -- Non-space character (equivalent to `[^[:space:]]`) -* `w` -- Word character (equivalent to `[[:alnum:]_]`) -* `W` -- Non-word character (equivalent to `[^[:alnum:]_]`) - -**Literals** - - literal ::= ordinary-character - | "\x" ["1"-"9" "a"-"f" "A"-"F"]{0,2} - | "\x{" ["1"-"9" "a"-"f" "A"-"F"]* "}" - | "\" character - -A literal is either an ordinary character (a character that has no other -significance in the context), an 8 bit hexadecimal encoded character (e.g. -`\x1B`), a wide hexadecimal encoded character (e.g. `\x{263a}`), or an escaped -character. An escaped character is a `\` followed by any character, and matches -that character. Escaping can be used to match characters which have a special -meaning in regexp syntax. A `\` cannot be the last character of an ERE. Escaping -also allows you to include a few non-printable characters in the regular -expression. These special escape sequences include: - -* `\a` -- Bell character (ASCII code 7) -* `\e` -- Escape character (ASCII code 27) -* `\f` -- Form-feed character (ASCII code 12) -* `\n` -- New-line/line-feed character (ASCII code 10) -* `\r` -- Carriage return character (ASCII code 13) -* `\t` -- Horizontal tab character (ASCII code 9) - -An ordinary character is just a single character with no other significance, and -matches that character. A `{` followed by something else than a digit is -considered an ordinary character. - -**Back references** - - back-reference ::= "\" ["1"-"9"] - -A back reference is a backslash followed by a single non-zero decimal digit *d*. -It matches the same sequence of characters matched by the *d*th parenthesized -subexpression. - -**Options** - - options ::= ["i" "n" "r" "U"]* ("-" ["i" "n" "r" "U"]*)? - -Options allow compile time options to be turned on/off for particular parts of -the regular expression. If the option is specified in the first section, it is -turned on. If it is specified in the second section (after the `-`), it is -turned off. - -* `i` -- Case insensitive. -* `n` -- Forces special handling of the new line character. -* `r` -- Causes the regex to be matched in a right associative manner rather than - the normal left associative manner. -* `U` -- Forces repetition operators to be non-greedy unless a `?` is appended. - -[TRE]: https://github.com/laurikari/tre -[TRE Regexp Syntax]: http://laurikari.net/tre/documentation/regex-syntax/ +* [ECMAScript syntax C++ reference](http://www.cplusplus.com/reference/regex/ECMAScript/) +* [Modified ECMAScript regular expression grammar](http://en.cppreference.com/w/cpp/regex/ecmascript) +* [Regular Expressions (C++)](https://docs.microsoft.com/en-us/cpp/standard-library/regular-expressions-cpp) ## Lua Patterns @@ -2277,12 +2090,19 @@ Simply copying the contents of your *~/.textadept/properties.lua* into Lexers are now written in a more object-oriented way. Legacy lexers are still supported, but it is recommended that you [migrate them][]. +[migrate them]: api.html#lexer.Migrating.Legacy.Lexers + #### Key Bindings Changes The terminal version's key sequence for `Ctrl+Space` is now `'c '` instead of `'c@'`. -[migrate them]: api.html#lexer.Migrating.Legacy.Lexers +#### Regex Changes + +Textadept now uses [C++11's ECMAScript regex syntax](#Regular.Expressions) +instead of [TRE][]. + +[TRE]: https://github.com/laurikari/tre ### Textadept 8 to 9 |