rule does not specify any ordering in and of itself. Instead, it “resets” the ordering for subsequent shift rules to cause them to be taken in relation to a given character. Either of the following rules resets subsequent shift rules to be taken in relation to the letter 'A': A \u0041
• The , <s>, and shift rules define primary, secondary, and tertiary differences of a character from another character: • Use primary differences to distinguish separate letters. • Use secondary differences to distinguish accent variations. • Use tertiary differences to distinguish lettercase variations. Either of these rules specifies a primary shift rule for the 'G' character: G
\u0047
• The shift rule indicates that one character sorts identically to another. The following rules cause 'b' to sort the same as 'a': a b
• Abbreviated shift syntax specifies multiple shift rules using a single pair of tags. The following table shows the correspondence between abbreviated syntax rules and the equivalent nonabbreviated rules. Table 10.5 Abbreviated Shift Syntax Abbreviated Syntax
Nonabbreviated Syntax
xyz
x
y
z
<sc>xyz
<s>x<s>y<s>z
xyz
xyz
1533
Adding a UCA Collation to a Unicode Character Set
Abbreviated Syntax
Nonabbreviated Syntax
xyz
xyz
• An expansion is a reset rule that establishes an anchor point for a multiple-character sequence. MySQL supports expansions 2 to 6 characters long. The following rules put 'z' greater at the primary level than the sequence of three characters 'abc': abc z
• A contraction is a shift rule that sorts a multiple-character sequence. MySQL supports contractions 2 to 6 characters long. The following rules put the sequence of three characters 'xyz' greater at the primary level than 'a': a xyz
• Long expansions and long contractions can be used together. These rules put the sequence of three characters 'xyz' greater at the primary level than the sequence of three characters 'abc': abc xyz
• Normal expansion syntax uses <x> plus <extend> elements to specify an expansion. The following rules put the character 'k' greater at the secondary level than the sequence 'ch'. That is, 'k' behaves as if it expands to a character after 'c' followed by 'h': c <x><s>k<extend>h
This syntax permits long sequences. These rules sort the sequence 'ccs' greater at the tertiary level than the sequence 'cscs': cs <x>ccs<extend>cs
The LDML specification describes normal expansion syntax as “tricky.” See that specification for details. • Previous context syntax uses <x> plus elements to specify that the context before a character affects how it sorts. The following rules put '-' greater at the secondary level than 'a', but only when '-' occurs after 'b': a <x>b<s>-
• Previous context syntax can include the <extend> element. These rules put 'def' greater at the primary level than 'aghi', but only when 'def' comes after 'abc': a <x>abcdef
<extend>ghi
• Reset rules permit a before attribute. Normally, shift rules after a reset rule indicate characters that sort after the reset character. Shift rules after a reset rule that has the before attribute indicate characters that sort before the reset character. The following rules put the character 'b' immediately before 'a' at the primary level:
1534
Adding a UCA Collation to a Unicode Character Set
a b
Permissible before attribute values specify the sort level by name or the equivalent numeric value:
• A reset rule can name a logical reset position rather than a literal character:
These rules put 'z' greater at the primary level than nonignorable characters that have a Default Unicode Collation Element Table (DUCET) entry and that are not CJK: z
Logical positions have the code points shown in the following table. Table 10.6 Logical Reset Position Code Points Logical Position
Unicode 4.0.0 Code Point
Unicode 5.2.0 Code Point
U+02D0
U+02D0
U+A48C
U+1342E
U+0332
U+0332
U+20EA
U+101FD
U+0000
U+0000
U+FE73
U+FE73
U+0000
U+0000
U+FE73
U+FE73
U+0000
U+0000
U+0000
U+0000
U+0009
U+0009
U+2183
U+1D371
• The element permits a shift-after-method attribute that affects character weight calculation for shift rules. The attribute has these permitted values: • simple: Calculate character weights as for reset rules that do not have a before attribute. This is the default if the attribute is not given.
1535
Adding a UCA Collation to a Unicode Character Set
• expand: Use expansions for shifts after reset rules. Suppose that '0' and '1' have weights of 0E29 and 0E2A and we want to put all basic Latin letters between '0' and '1': 0 abcdefghijklmnopqrstuvwxyz
For simple shift mode, weights are calculated as follows: 'a' has weight 0E29+1 'b' has weight 0E29+2 'c' has weight 0E29+3 ...
However, there are not enough vacant positions to put 26 characters between '0' and '1'. The result is that digits and letters are intermixed. To solve this, use shift-after-method="expand". Then weights are calculated like this: 'a' has weight [0E29][233D+1] 'b' has weight [0E29][233D+2] 'c' has weight [0E29][233D+3] ...
233D is the UCA 4.0.0 weight for character 0xA48C, which is the last nonignorable character (a sort of the greatest character in the collation, excluding CJK). UCA 5.2.0 is similar but uses 3ACA, for character 0x1342E. MySQL-Specific LDML Extensions An extension to LDML rules permits the element to include an optional version attribute in tags to indicate the UCA version on which the collation is based. If the version attribute is omitted, its default value is 4.0.0. For example, this specification indicates a collation that is based on UCA 5.2.0: ...
10.13.4.3 Diagnostics During Index.xml Parsing The MySQL server generates diagnostics when it finds problems while parsing the Index.xml file: • Unknown tags are written to the error log. For example, the following message results if a collation definition contains a tag: [Warning] Buffered warning: Unknown LDML tag: 'charsets/charset/collation/rules/aaa'
• If collation initialization is not possible, the server reports an “Unknown collation” error, and also generates warnings explaining the problems, such as in the previous example. In other cases, when a collation description is generally correct but contains some unknown tags, the collation is initialized and is available for use. The unknown parts are ignored, but a warning is generated in the error log. • Problems with collations generate warnings that clients can display with SHOW WARNINGS. Suppose that a reset rule contains an expansion longer than the maximum supported length of 6 characters:
1536
Character Set Configuration
abcdefghi x
An attempt to use the collation produces warnings: mysql> SELECT _utf8'test' COLLATE utf8_test_ci; ERROR 1273 (HY000): Unknown collation: 'utf8_test_ci' mysql> SHOW WARNINGS; +---------+------+----------------------------------------+ | Level | Code | Message | +---------+------+----------------------------------------+ | Error | 1273 | Unknown collation: 'utf8_test_ci' | | Warning | 1273 | Expansion is too long at 'abcdefghi=x' | +---------+------+----------------------------------------+
10.14 Character Set Configuration The MySQL server has a compiled-in default character set and collation. To change these defaults, use the --character-set-server and --collation-server options when you start the server. See Section 5.1.6, “Server Command Options”. The collation must be a legal collation for the default character set. To determine which collations are available for each character set, use the SHOW COLLATION statement or query the INFORMATION_SCHEMA COLLATIONS table. If you try to use a character set that is not compiled into your binary, you might run into the following problems: • If your program uses an incorrect path to determine where the character sets are stored (which is typically the share/mysql/charsets or share/charsets directory under the MySQL installation directory), this can be fixed by using the --character-sets-dir option when you run the program. For example, to specify a directory to be used by MySQL client programs, list it in the [client] group of your option file. The examples given here show what the setting might look like for Unix or Windows, respectively: [client] character-sets-dir=/usr/local/mysql/share/mysql/charsets [client] character-sets-dir="C:/Program Files/MySQL/MySQL Server 5.7/share/charsets"
• If the character set is a complex character set that cannot be loaded dynamically, you must recompile the program with support for the character set. For Unicode character sets, you can define collations without recompiling by using LDML notation. See Section 10.13.4, “Adding a UCA Collation to a Unicode Character Set”. • If the character set is a dynamic character set, but you do not have a configuration file for it, you should install the configuration file for the character set from a new MySQL distribution. • If your character set index file (Index.xml) does not contain the name for the character set, your program displays an error message: Character set 'charset_name' is not a compiled character set and is not specified in the '/usr/share/mysql/charsets/Index.xml' file
To solve this problem, you should either get a new index file or manually add the name of any missing character sets to the current file. You can force client programs to use specific character set as follows: [client] default-character-set=charset_name
1537
MySQL Server Locale Support
This is normally unnecessary. However, when character_set_system differs from character_set_server or character_set_client, and you input characters manually (as database object identifiers, column values, or both), these may be displayed incorrectly in output from the client or the output itself may be formatted incorrectly. In such cases, starting the mysql client with --default-character-set=system_character_set—that is, setting the client character set to match the system character set—should fix the problem.
10.15 MySQL Server Locale Support The locale indicated by the lc_time_names system variable controls the language used to display day and month names and abbreviations. This variable affects the output from the DATE_FORMAT(), DAYNAME(), and MONTHNAME() functions. lc_time_names does not affect the STR_TO_DATE() or GET_FORMAT() function. The lc_time_names value does not affect the result from FORMAT(), but this function takes an optional third parameter that enables a locale to be specified to be used for the result number's decimal point, thousands separator, and grouping between separators. Permissible locale values are the same as the legal values for the lc_time_names system variable. Locale names have language and region subtags listed by IANA (http://www.iana.org/assignments/ language-subtag-registry) such as 'ja_JP' or 'pt_BR'. The default value is 'en_US' regardless of your system's locale setting, but you can set the value at server startup, or set the GLOBAL value at runtime if you have privileges sufficient to set global system variables; see Section 5.1.8.1, “System Variable Privileges”. Any client can examine the value of lc_time_names or set its SESSION value to affect the locale for its own connection. mysql> SET NAMES 'utf8'; Query OK, 0 rows affected (0.09 sec) mysql> SELECT @@lc_time_names; +-----------------+ | @@lc_time_names | +-----------------+ | en_US | +-----------------+ 1 row in set (0.00 sec) mysql> SELECT DAYNAME('2010-01-01'), MONTHNAME('2010-01-01'); +-----------------------+-------------------------+ | DAYNAME('2010-01-01') | MONTHNAME('2010-01-01') | +-----------------------+-------------------------+ | Friday | January | +-----------------------+-------------------------+ 1 row in set (0.00 sec) mysql> SELECT DATE_FORMAT('2010-01-01','%W %a %M %b'); +-----------------------------------------+ | DATE_FORMAT('2010-01-01','%W %a %M %b') | +-----------------------------------------+ | Friday Fri January Jan | +-----------------------------------------+ 1 row in set (0.00 sec) mysql> SET lc_time_names = 'es_MX'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT @@lc_time_names; +-----------------+ | @@lc_time_names | +-----------------+ | es_MX | +-----------------+ 1 row in set (0.00 sec) mysql> SELECT DAYNAME('2010-01-01'), MONTHNAME('2010-01-01');
1538
MySQL Server Locale Support
+-----------------------+-------------------------+ | DAYNAME('2010-01-01') | MONTHNAME('2010-01-01') | +-----------------------+-------------------------+ | viernes | enero | +-----------------------+-------------------------+ 1 row in set (0.00 sec) mysql> SELECT DATE_FORMAT('2010-01-01','%W %a %M %b'); +-----------------------------------------+ | DATE_FORMAT('2010-01-01','%W %a %M %b') | +-----------------------------------------+ | viernes vie enero ene | +-----------------------------------------+ 1 row in set (0.00 sec)
The day or month name for each of the affected functions is converted from utf8 to the character set indicated by the character_set_connection system variable. lc_time_names may be set to any of the following locale values. The set of locales supported by MySQL may differ from those supported by your operating system. Locale Value
Meaning
ar_AE: Arabic - United Arab Emirates
ar_BH: Arabic - Bahrain
ar_DZ: Arabic - Algeria
ar_EG: Arabic - Egypt
ar_IN: Arabic - India
ar_IQ: Arabic - Iraq
ar_JO: Arabic - Jordan
ar_KW: Arabic - Kuwait
ar_LB: Arabic - Lebanon
ar_LY: Arabic - Libya
ar_MA: Arabic - Morocco
ar_OM: Arabic - Oman
ar_QA: Arabic - Qatar
ar_SA: Arabic - Saudi Arabia
ar_SD: Arabic - Sudan
ar_SY: Arabic - Syria
ar_TN: Arabic - Tunisia
ar_YE: Arabic - Yemen
be_BY: Belarusian - Belarus
bg_BG: Bulgarian - Bulgaria
ca_ES: Catalan - Spain
cs_CZ: Czech - Czech Republic
da_DK: Danish - Denmark
de_AT: German - Austria
de_BE: German - Belgium
de_CH: German - Switzerland
de_DE: German - Germany
de_LU: German - Luxembourg
el_GR: Greek - Greece
en_AU: English - Australia
en_CA: English - Canada
en_GB: English - United Kingdom
en_IN: English - India
en_NZ: English - New Zealand
en_PH: English - Philippines
en_US: English - United States
en_ZA: English - South Africa
en_ZW: English - Zimbabwe
es_AR: Spanish - Argentina
es_BO: Spanish - Bolivia
es_CL: Spanish - Chile
es_CO: Spanish - Colombia
es_CR: Spanish - Costa Rica
es_DO: Spanish - Dominican Republic
es_EC: Spanish - Ecuador
es_ES: Spanish - Spain
es_GT: Spanish - Guatemala
es_HN: Spanish - Honduras
es_MX: Spanish - Mexico
es_NI: Spanish - Nicaragua
es_PA: Spanish - Panama
es_PE: Spanish - Peru
es_PR: Spanish - Puerto Rico
es_PY: Spanish - Paraguay
es_SV: Spanish - El Salvador
es_US: Spanish - United States
1539
MySQL Server Locale Support
Locale Value
Meaning
es_UY: Spanish - Uruguay
es_VE: Spanish - Venezuela
et_EE: Estonian - Estonia
eu_ES: Basque - Basque
fi_FI: Finnish - Finland
fo_FO: Faroese - Faroe Islands
fr_BE: French - Belgium
fr_CA: French - Canada
fr_CH: French - Switzerland
fr_FR: French - France
fr_LU: French - Luxembourg
gl_ES: Galician - Spain
gu_IN: Gujarati - India
he_IL: Hebrew - Israel
hi_IN: Hindi - India
hr_HR: Croatian - Croatia
hu_HU: Hungarian - Hungary
id_ID: Indonesian - Indonesia
is_IS: Icelandic - Iceland
it_CH: Italian - Switzerland
it_IT: Italian - Italy
ja_JP: Japanese - Japan
ko_KR: Korean - Republic of Korea
lt_LT: Lithuanian - Lithuania
lv_LV: Latvian - Latvia
mk_MK: Macedonian - FYROM
mn_MN: Mongolia - Mongolian
ms_MY: Malay - Malaysia
nb_NO: Norwegian(Bokmål) - Norway
nl_BE: Dutch - Belgium
nl_NL: Dutch - The Netherlands
no_NO: Norwegian - Norway
pl_PL: Polish - Poland
pt_BR: Portugese - Brazil
pt_PT: Portugese - Portugal
rm_CH: Romansh - Switzerland
ro_RO: Romanian - Romania
ru_RU: Russian - Russia
ru_UA: Russian - Ukraine
sk_SK: Slovak - Slovakia
sl_SI: Slovenian - Slovenia
sq_AL: Albanian - Albania
sr_RS: Serbian - Yugoslavia
sv_FI: Swedish - Finland
sv_SE: Swedish - Sweden
ta_IN: Tamil - India
te_IN: Telugu - India
th_TH: Thai - Thailand
tr_TR: Turkish - Turkey
uk_UA: Ukrainian - Ukraine
ur_PK: Urdu - Pakistan
vi_VN: Vietnamese - Viet Nam
zh_CN: Chinese - China
zh_HK: Chinese - Hong Kong
zh_TW: Chinese - Taiwan Province of China
1540
Chapter 11 Data Types Table of Contents 11.1 Data Type Overview ........................................................................................................ 11.1.1 Numeric Type Overview ........................................................................................ 11.1.2 Date and Time Type Overview .............................................................................. 11.1.3 String Type Overview ............................................................................................ 11.2 Numeric Types ................................................................................................................ 11.2.1 Integer Types (Exact Value) - INTEGER, INT, SMALLINT, TINYINT, MEDIUMINT, BIGINT ........................................................................................................................... 11.2.2 Fixed-Point Types (Exact Value) - DECIMAL, NUMERIC ........................................ 11.2.3 Floating-Point Types (Approximate Value) - FLOAT, DOUBLE ................................ 11.2.4 Bit-Value Type - BIT ............................................................................................. 11.2.5 Numeric Type Attributes ........................................................................................ 11.2.6 Out-of-Range and Overflow Handling ..................................................................... 11.3 Date and Time Types ...................................................................................................... 11.3.1 The DATE, DATETIME, and TIMESTAMP Types ................................................... 11.3.2 The TIME Type .................................................................................................... 11.3.3 The YEAR Type ................................................................................................... 11.3.4 YEAR(2) Limitations and Migrating to YEAR(4) ...................................................... 11.3.5 Automatic Initialization and Updating for TIMESTAMP and DATETIME .................... 11.3.6 Fractional Seconds in Time Values ........................................................................ 11.3.7 Conversion Between Date and Time Types ............................................................ 11.3.8 Two-Digit Years in Dates ...................................................................................... 11.4 String Types .................................................................................................................... 11.4.1 The CHAR and VARCHAR Types ......................................................................... 11.4.2 The BINARY and VARBINARY Types ................................................................... 11.4.3 The BLOB and TEXT Types ................................................................................. 11.4.4 The ENUM Type .................................................................................................. 11.4.5 The SET Type ...................................................................................................... 11.5 Spatial Data Types .......................................................................................................... 11.5.1 Spatial Data Types ............................................................................................... 11.5.2 The OpenGIS Geometry Model ............................................................................. 11.5.3 Supported Spatial Data Formats ............................................................................ 11.5.4 Geometry Well-Formedness and Validity ................................................................ 11.5.5 Creating Spatial Columns ...................................................................................... 11.5.6 Populating Spatial Columns ................................................................................... 11.5.7 Fetching Spatial Data ............................................................................................ 11.5.8 Optimizing Spatial Analysis ................................................................................... 11.5.9 Creating Spatial Indexes ....................................................................................... 11.5.10 Using Spatial Indexes ......................................................................................... 11.6 The JSON Data Type ...................................................................................................... 11.7 Data Type Default Values ................................................................................................ 11.8 Data Type Storage Requirements .................................................................................... 11.9 Choosing the Right Type for a Column ............................................................................. 11.10 Using Data Types from Other Database Engines ............................................................
1542 1542 1545 1547 1550 1551 1551 1552 1552 1552 1553 1555 1556 1558 1558 1559 1561 1565 1566 1567 1567 1567 1569 1570 1571 1574 1577 1578 1579 1584 1588 1588 1588 1590 1590 1590 1591 1593 1607 1608 1612 1613
MySQL supports a number of SQL data types in several categories: numeric types, date and time types, string (character and byte) types, spatial types, and the JSON data type. This chapter provides an overview of these data types, a more detailed description of the properties of the types in each category, and a summary of the data type storage requirements. The initial overview is intentionally brief. The more detailed descriptions later in the chapter should be consulted for additional information about particular data types, such as the permissible formats in which you can specify values. Data type descriptions use these conventions:
1541
Data Type Overview
•
M indicates the maximum display width for integer types. For floating-point and fixed-point types, M is the total number of digits that can be stored (the precision). For string types, M is the maximum length. The maximum permissible value of M depends on the data type.
•
D applies to floating-point and fixed-point types and indicates the number of digits following the decimal point (the scale). The maximum possible value is 30, but should be no greater than M−2.
•
fsp applies to the TIME, DATETIME, and TIMESTAMP types and represents fractional seconds precision; that is, the number of digits following the decimal point for fractional parts of seconds. The fsp value, if given, must be in the range 0 to 6. A value of 0 signifies that there is no fractional part. If omitted, the default precision is 0. (This differs from the standard SQL default of 6, for compatibility with previous MySQL versions.)
•
Square brackets ([ and ]) indicate optional parts of type definitions.
11.1 Data Type Overview 11.1.1 Numeric Type Overview A summary of the numeric data types follows. For additional information about properties and storage requirements of the numeric types, see Section 11.2, “Numeric Types”, and Section 11.8, “Data Type Storage Requirements”. M indicates the maximum display width for integer types. The maximum display width is 255. Display width is unrelated to the range of values a type can contain, as described in Section 11.2, “Numeric Types”. For floating-point and fixed-point types, M is the total number of digits that can be stored. If you specify ZEROFILL for a numeric column, MySQL automatically adds the UNSIGNED attribute to the column. Numeric data types that permit the UNSIGNED attribute also permit SIGNED. However, these data types are signed by default, so the SIGNED attribute has no effect. SERIAL is an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE. SERIAL DEFAULT VALUE in the definition of an integer column is an alias for NOT NULL AUTO_INCREMENT UNIQUE. Warning When you use subtraction between integer values where one is of type UNSIGNED, the result is unsigned unless the NO_UNSIGNED_SUBTRACTION SQL mode is enabled. See Section 12.10, “Cast Functions and Operators”. •
BIT[(M)] A bit-value type. M indicates the number of bits per value, from 1 to 64. The default is 1 if M is omitted.
•
TINYINT[(M)] [UNSIGNED] [ZEROFILL] A very small integer. The signed range is -128 to 127. The unsigned range is 0 to 255.
•
BOOL, BOOLEAN These types are synonyms for TINYINT(1). A value of zero is considered false. Nonzero values are considered true: mysql> SELECT IF(0, 'true', 'false'); +------------------------+ | IF(0, 'true', 'false') | +------------------------+
1542
Numeric Type Overview
| false | +------------------------+ mysql> SELECT IF(1, 'true', 'false'); +------------------------+ | IF(1, 'true', 'false') | +------------------------+ | true | +------------------------+ mysql> SELECT IF(2, 'true', 'false'); +------------------------+ | IF(2, 'true', 'false') | +------------------------+ | true | +------------------------+
However, the values TRUE and FALSE are merely aliases for 1 and 0, respectively, as shown here: mysql> SELECT IF(0 = FALSE, 'true', 'false'); +--------------------------------+ | IF(0 = FALSE, 'true', 'false') | +--------------------------------+ | true | +--------------------------------+ mysql> SELECT IF(1 = TRUE, 'true', 'false'); +-------------------------------+ | IF(1 = TRUE, 'true', 'false') | +-------------------------------+ | true | +-------------------------------+ mysql> SELECT IF(2 = TRUE, 'true', 'false'); +-------------------------------+ | IF(2 = TRUE, 'true', 'false') | +-------------------------------+ | false | +-------------------------------+ mysql> SELECT IF(2 = FALSE, 'true', 'false'); +--------------------------------+ | IF(2 = FALSE, 'true', 'false') | +--------------------------------+ | false | +--------------------------------+
The last two statements display the results shown because 2 is equal to neither 1 nor 0. •
SMALLINT[(M)] [UNSIGNED] [ZEROFILL] A small integer. The signed range is -32768 to 32767. The unsigned range is 0 to 65535.
•
MEDIUMINT[(M)] [UNSIGNED] [ZEROFILL] A medium-sized integer. The signed range is -8388608 to 8388607. The unsigned range is 0 to 16777215.
•
INT[(M)] [UNSIGNED] [ZEROFILL] A normal-size integer. The signed range is -2147483648 to 2147483647. The unsigned range is 0 to 4294967295.
•
INTEGER[(M)] [UNSIGNED] [ZEROFILL] This type is a synonym for INT.
•
BIGINT[(M)] [UNSIGNED] [ZEROFILL]
1543
Numeric Type Overview
A large integer. The signed range is -9223372036854775808 to 9223372036854775807. The unsigned range is 0 to 18446744073709551615. SERIAL is an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE. Some things you should be aware of with respect to BIGINT columns: • All arithmetic is done using signed BIGINT or DOUBLE values, so you should not use unsigned big integers larger than 9223372036854775807 (63 bits) except with bit functions! If you do that, some of the last digits in the result may be wrong because of rounding errors when converting a BIGINT value to a DOUBLE. MySQL can handle BIGINT in the following cases: • When using integers to store large unsigned values in a BIGINT column. • In MIN(col_name) or MAX(col_name), where col_name refers to a BIGINT column. • When using operators (+, -, *, and so on) where both operands are integers. • You can always store an exact integer value in a BIGINT column by storing it using a string. In this case, MySQL performs a string-to-number conversion that involves no intermediate doubleprecision representation. • The -, +, and * operators use BIGINT arithmetic when both operands are integer values. This means that if you multiply two big integers (or results from functions that return integers), you may get unexpected results when the result is larger than 9223372036854775807. •
DECIMAL[(M[,D])] [UNSIGNED] [ZEROFILL] A packed “exact” fixed-point number. M is the total number of digits (the precision) and D is the number of digits after the decimal point (the scale). The decimal point and (for negative numbers) the - sign are not counted in M. If D is 0, values have no decimal point or fractional part. The maximum number of digits (M) for DECIMAL is 65. The maximum number of supported decimals (D) is 30. If D is omitted, the default is 0. If M is omitted, the default is 10. UNSIGNED, if specified, disallows negative values. All basic calculations (+, -, *, /) with DECIMAL columns are done with a precision of 65 digits.
•
DEC[(M[,D])] [UNSIGNED] [ZEROFILL], NUMERIC[(M[,D])] [UNSIGNED] [ZEROFILL], FIXED[(M[,D])] [UNSIGNED] [ZEROFILL] These types are synonyms for DECIMAL. The FIXED synonym is available for compatibility with other database systems.
•
FLOAT[(M,D)] [UNSIGNED] [ZEROFILL] A small (single-precision) floating-point number. Permissible values are -3.402823466E+38 to -1.175494351E-38, 0, and 1.175494351E-38 to 3.402823466E+38. These are the theoretical limits, based on the IEEE standard. The actual range might be slightly smaller depending on your hardware or operating system. M is the total number of digits and D is the number of digits following the decimal point. If M and D are omitted, values are stored to the limits permitted by the hardware. A single-precision floating-point number is accurate to approximately 7 decimal places. UNSIGNED, if specified, disallows negative values. Using FLOAT might give you some unexpected problems because all calculations in MySQL are done with double precision. See Section B.6.4.7, “Solving Problems with No Matching Rows”.
1544
Date and Time Type Overview
•
FLOAT(p) [UNSIGNED] [ZEROFILL] A floating-point number. p represents the precision in bits, but MySQL uses this value only to determine whether to use FLOAT or DOUBLE for the resulting data type. If p is from 0 to 24, the data type becomes FLOAT with no M or D values. If p is from 25 to 53, the data type becomes DOUBLE with no M or D values. The range of the resulting column is the same as for the single-precision FLOAT or double-precision DOUBLE data types described earlier in this section. FLOAT(p) syntax is provided for ODBC compatibility.
•
DOUBLE[(M,D)] [UNSIGNED] [ZEROFILL] A normal-size (double-precision) floating-point number. Permissible values are -1.7976931348623157E+308 to -2.2250738585072014E-308, 0, and 2.2250738585072014E-308 to 1.7976931348623157E+308. These are the theoretical limits, based on the IEEE standard. The actual range might be slightly smaller depending on your hardware or operating system. M is the total number of digits and D is the number of digits following the decimal point. If M and D are omitted, values are stored to the limits permitted by the hardware. A double-precision floating-point number is accurate to approximately 15 decimal places. UNSIGNED, if specified, disallows negative values.
•
DOUBLE PRECISION[(M,D)] [UNSIGNED] [ZEROFILL], REAL[(M,D)] [UNSIGNED] [ZEROFILL] These types are synonyms for DOUBLE. Exception: If the REAL_AS_FLOAT SQL mode is enabled, REAL is a synonym for FLOAT rather than DOUBLE.
11.1.2 Date and Time Type Overview A summary of the temporal data types follows. For additional information about properties and storage requirements of the temporal types, see Section 11.3, “Date and Time Types”, and Section 11.8, “Data Type Storage Requirements”. For descriptions of functions that operate on temporal values, see Section 12.7, “Date and Time Functions”. For the DATE and DATETIME range descriptions, “supported” means that although earlier values might work, there is no guarantee. MySQL permits fractional seconds for TIME, DATETIME, and TIMESTAMP values, with up to microseconds (6 digits) precision. To define a column that includes a fractional seconds part, use the syntax type_name(fsp), where type_name is TIME, DATETIME, or TIMESTAMP, and fsp is the fractional seconds precision. For example: CREATE TABLE t1 (t TIME(3), dt DATETIME(6));
The fsp value, if given, must be in the range 0 to 6. A value of 0 signifies that there is no fractional part. If omitted, the default precision is 0. (This differs from the standard SQL default of 6, for compatibility with previous MySQL versions.) Any TIMESTAMP or DATETIME column in a table can have automatic initialization and updating properties. •
DATE A date. The supported range is '1000-01-01' to '9999-12-31'. MySQL displays DATE values in 'YYYY-MM-DD' format, but permits assignment of values to DATE columns using either strings or numbers.
•
DATETIME[(fsp)]
1545
Date and Time Type Overview
A date and time combination. The supported range is '1000-01-01 00:00:00.000000' to '9999-12-31 23:59:59.999999'. MySQL displays DATETIME values in 'YYYY-MM-DD HH:MM:SS[.fraction]' format, but permits assignment of values to DATETIME columns using either strings or numbers. An optional fsp value in the range from 0 to 6 may be given to specify fractional seconds precision. A value of 0 signifies that there is no fractional part. If omitted, the default precision is 0. Automatic initialization and updating to the current date and time for DATETIME columns can be specified using DEFAULT and ON UPDATE column definition clauses, as described in Section 11.3.5, “Automatic Initialization and Updating for TIMESTAMP and DATETIME”. •
TIMESTAMP[(fsp)] A timestamp. The range is '1970-01-01 00:00:01.000000' UTC to '2038-01-19 03:14:07.999999' UTC. TIMESTAMP values are stored as the number of seconds since the epoch ('1970-01-01 00:00:00' UTC). A TIMESTAMP cannot represent the value '1970-01-01 00:00:00' because that is equivalent to 0 seconds from the epoch and the value 0 is reserved for representing '0000-00-00 00:00:00', the “zero” TIMESTAMP value. An optional fsp value in the range from 0 to 6 may be given to specify fractional seconds precision. A value of 0 signifies that there is no fractional part. If omitted, the default precision is 0. The way the server handles TIMESTAMP definitions depends on the value of the explicit_defaults_for_timestamp system variable (see Section 5.1.7, “Server System Variables”). If explicit_defaults_for_timestamp is enabled, there is no automatic assignment of the DEFAULT CURRENT_TIMESTAMP or ON UPDATE CURRENT_TIMESTAMP attributes to any TIMESTAMP column. They must be included explicitly in the column definition. Also, any TIMESTAMP not explicitly declared as NOT NULL permits NULL values. If explicit_defaults_for_timestamp is disabled, the server handles TIMESTAMP as follows: Unless specified otherwise, the first TIMESTAMP column in a table is defined to be automatically set to the date and time of the most recent modification if not explicitly assigned a value. This makes TIMESTAMP useful for recording the timestamp of an INSERT or UPDATE operation. You can also set any TIMESTAMP column to the current date and time by assigning it a NULL value, unless it has been defined with the NULL attribute to permit NULL values. Automatic initialization and updating to the current date and time can be specified using DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP column definition clauses. By default, the first TIMESTAMP column has these properties, as previously noted. However, any TIMESTAMP column in a table can be defined to have these properties.
•
TIME[(fsp)] A time. The range is '-838:59:59.000000' to '838:59:59.000000'. MySQL displays TIME values in 'HH:MM:SS[.fraction]' format, but permits assignment of values to TIME columns using either strings or numbers. An optional fsp value in the range from 0 to 6 may be given to specify fractional seconds precision. A value of 0 signifies that there is no fractional part. If omitted, the default precision is 0.
•
YEAR[(4)] A year in four-digit format. MySQL displays YEAR values in YYYY format, but permits assignment of values to YEAR columns using either strings or numbers. Values display as 1901 to 2155, and 0000.
1546
String Type Overview
Note The YEAR(2) data type is deprecated and support for it is removed in MySQL 5.7.5. To convert YEAR(2) columns to YEAR(4), see Section 11.3.4, “YEAR(2) Limitations and Migrating to YEAR(4)”. For additional information about YEAR display format and interpretation of input values, see Section 11.3.3, “The YEAR Type”. The SUM() and AVG() aggregate functions do not work with temporal values. (They convert the values to numbers, losing everything after the first nonnumeric character.) To work around this problem, convert to numeric units, perform the aggregate operation, and convert back to a temporal value. Examples: SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(time_col))) FROM tbl_name; SELECT FROM_DAYS(SUM(TO_DAYS(date_col))) FROM tbl_name;
Note The MySQL server can be run with the MAXDB SQL mode enabled. In this case, TIMESTAMP is identical with DATETIME. If this mode is enabled at the time that a table is created, TIMESTAMP columns are created as DATETIME columns. As a result, such columns use DATETIME display format, have the same range of values, and there is no automatic initialization or updating to the current date and time. See Section 5.1.10, “Server SQL Modes”. Note As of MySQL 5.7.22, MAXDB is deprecated. It will be removed in a future version of MySQL.
11.1.3 String Type Overview A summary of the string data types follows. For additional information about properties and storage requirements of the string types, see Section 11.4, “String Types”, and Section 11.8, “Data Type Storage Requirements”. In some cases, MySQL may change a string column to a type different from that given in a CREATE TABLE or ALTER TABLE statement. See Section 13.1.18.7, “Silent Column Specification Changes”. MySQL interprets length specifications in character column definitions in character units. This applies to CHAR, VARCHAR, and the TEXT types. Column definitions for character string data types CHAR, VARCHAR, the TEXT types, ENUM, SET, and any synonyms) can specify the column character set and collation: • CHARACTER SET specifies the character set. If desired, a collation for the character set can be specified with the COLLATE attribute, along with any other attributes. For example: CREATE TABLE t ( c1 VARCHAR(20) CHARACTER SET utf8, c2 TEXT CHARACTER SET latin1 COLLATE latin1_general_cs );
This table definition creates a column named c1 that has a character set of utf8 with the default collation for that character set, and a column named c2 that has a character set of latin1 and a case-sensitive collation.
1547
String Type Overview
The rules for assigning the character set and collation when either or both of CHARACTER SET and the COLLATE attribute are missing are described in Section 10.3.5, “Column Character Set and Collation”. CHARSET is a synonym for CHARACTER SET. • Specifying the CHARACTER SET binary attribute for a character string data type causes the column to be created as the corresponding binary string data type: CHAR becomes BINARY, VARCHAR becomes VARBINARY, and TEXT becomes BLOB. For the ENUM and SET data types, this does not occur; they are created as declared. Suppose that you specify a table using this definition: CREATE TABLE t ( c1 VARCHAR(10) CHARACTER SET binary, c2 TEXT CHARACTER SET binary, c3 ENUM('a','b','c') CHARACTER SET binary );
The resulting table has this definition: CREATE TABLE t ( c1 VARBINARY(10), c2 BLOB, c3 ENUM('a','b','c') CHARACTER SET binary );
• The BINARY attribute is shorthand for specifying the table default character set and the binary (_bin) collation of that character set. In this case, comparison and sorting are based on numeric character code values. • The ASCII attribute is shorthand for CHARACTER SET latin1. • The UNICODE attribute is shorthand for CHARACTER SET ucs2. Character column comparison and sorting are based on the collation assigned to the column. For the CHAR, VARCHAR, TEXT, ENUM, and SET data types, you can declare a column with a binary (_bin) collation or the BINARY attribute to cause comparison and sorting to use the underlying character code values rather than a lexical ordering. For additional information about use of character sets in MySQL, see Chapter 10, Character Sets, Collations, Unicode. •
[NATIONAL] CHAR[(M)] [CHARACTER SET charset_name] [COLLATE collation_name] A fixed-length string that is always right-padded with spaces to the specified length when stored. M represents the column length in characters. The range of M is 0 to 255. If M is omitted, the length is 1. Note Trailing spaces are removed when CHAR values are retrieved unless the PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled. CHAR is shorthand for CHARACTER. NATIONAL CHAR (or its equivalent short form, NCHAR) is the standard SQL way to define that a CHAR column should use some predefined character set. MySQL uses utf8 as this predefined character set. Section 10.3.7, “The National Character Set”. The CHAR BYTE data type is an alias for the BINARY data type. This is a compatibility feature. MySQL permits you to create a column of type CHAR(0). This is useful primarily when you have to be compliant with old applications that depend on the existence of a column but that do not actually
1548
String Type Overview
use its value. CHAR(0) is also quite nice when you need a column that can take only two values: A column that is defined as CHAR(0) NULL occupies only one bit and can take only the values NULL and '' (the empty string). •
[NATIONAL] VARCHAR(M) [CHARACTER SET charset_name] [COLLATE collation_name] A variable-length string. M represents the maximum column length in characters. The range of M is 0 to 65,535. The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used. For example, utf8 characters can require up to three bytes per character, so a VARCHAR column that uses the utf8 character set can be declared to be a maximum of 21,844 characters. See Section C.10.4, “Limits on Table Column Count and Row Size”. MySQL stores VARCHAR values as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A VARCHAR column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes. Note MySQL follows the standard SQL specification, and does not remove trailing spaces from VARCHAR values. VARCHAR is shorthand for CHARACTER VARYING. NATIONAL VARCHAR is the standard SQL way to define that a VARCHAR column should use some predefined character set. MySQL uses utf8 as this predefined character set. Section 10.3.7, “The National Character Set”. NVARCHAR is shorthand for NATIONAL VARCHAR.
•
BINARY[(M)] The BINARY type is similar to the CHAR type, but stores binary byte strings rather than nonbinary character strings. An optional length M represents the column length in bytes. If omitted, M defaults to 1.
•
VARBINARY(M) The VARBINARY type is similar to the VARCHAR type, but stores binary byte strings rather than nonbinary character strings. M represents the maximum column length in bytes.
•
TINYBLOB A BLOB column with a maximum length of 255 (2 − 1) bytes. Each TINYBLOB value is stored using a 1-byte length prefix that indicates the number of bytes in the value. 8
•
TINYTEXT [CHARACTER SET charset_name] [COLLATE collation_name] A TEXT column with a maximum length of 255 (2 − 1) characters. The effective maximum length is less if the value contains multibyte characters. Each TINYTEXT value is stored using a 1-byte length prefix that indicates the number of bytes in the value. 8
•
BLOB[(M)] A BLOB column with a maximum length of 65,535 (2 − 1) bytes. Each BLOB value is stored using a 2-byte length prefix that indicates the number of bytes in the value. 16
An optional length M can be given for this type. If this is done, MySQL creates the column as the smallest BLOB type large enough to hold values M bytes long. •
TEXT[(M)] [CHARACTER SET charset_name] [COLLATE collation_name]
1549
Numeric Types A TEXT column with a maximum length of 65,535 (2 − 1) characters. The effective maximum length is less if the value contains multibyte characters. Each TEXT value is stored using a 2-byte length prefix that indicates the number of bytes in the value. 16
An optional length M can be given for this type. If this is done, MySQL creates the column as the smallest TEXT type large enough to hold values M characters long. •
MEDIUMBLOB A BLOB column with a maximum length of 16,777,215 (2 − 1) bytes. Each MEDIUMBLOB value is stored using a 3-byte length prefix that indicates the number of bytes in the value. 24
•
MEDIUMTEXT [CHARACTER SET charset_name] [COLLATE collation_name] A TEXT column with a maximum length of 16,777,215 (2 − 1) characters. The effective maximum length is less if the value contains multibyte characters. Each MEDIUMTEXT value is stored using a 3byte length prefix that indicates the number of bytes in the value. 24
•
LONGBLOB A BLOB column with a maximum length of 4,294,967,295 or 4GB (2 − 1) bytes. The effective maximum length of LONGBLOB columns depends on the configured maximum packet size in the client/server protocol and available memory. Each LONGBLOB value is stored using a 4-byte length prefix that indicates the number of bytes in the value. 32
•
LONGTEXT [CHARACTER SET charset_name] [COLLATE collation_name] A TEXT column with a maximum length of 4,294,967,295 or 4GB (2 − 1) characters. The effective maximum length is less if the value contains multibyte characters. The effective maximum length of LONGTEXT columns also depends on the configured maximum packet size in the client/server protocol and available memory. Each LONGTEXT value is stored using a 4-byte length prefix that indicates the number of bytes in the value. 32
•
ENUM('value1','value2',...) [CHARACTER SET charset_name] [COLLATE collation_name] An enumeration. A string object that can have only one value, chosen from the list of values 'value1', 'value2', ..., NULL or the special '' error value. ENUM values are represented internally as integers. An ENUM column can have a maximum of 65,535 distinct elements. (The practical limit is less than 3000.) A table can have no more than 255 unique element list definitions among its ENUM and SET columns considered as a group. For more information on these limits, see Section C.10.5, “Limits Imposed by .frm File Structure”.
•
SET('value1','value2',...) [CHARACTER SET charset_name] [COLLATE collation_name] A set. A string object that can have zero or more values, each of which must be chosen from the list of values 'value1', 'value2', ... SET values are represented internally as integers. A SET column can have a maximum of 64 distinct members. A table can have no more than 255 unique element list definitions among its ENUM and SET columns considered as a group. For more information on this limit, see Section C.10.5, “Limits Imposed by .frm File Structure”.
11.2 Numeric Types MySQL supports all standard SQL numeric data types. These types include the exact numeric data types (INTEGER, SMALLINT, DECIMAL, and NUMERIC), as well as the approximate numeric data types (FLOAT, REAL, and DOUBLE PRECISION). The keyword INT is a synonym for INTEGER, and
1550
Integer Types (Exact Value) - INTEGER, INT, SMALLINT, TINYINT, MEDIUMINT, BIGINT
the keywords DEC and FIXED are synonyms for DECIMAL. MySQL treats DOUBLE as a synonym for DOUBLE PRECISION (a nonstandard extension). MySQL also treats REAL as a synonym for DOUBLE PRECISION (a nonstandard variation), unless the REAL_AS_FLOAT SQL mode is enabled. The BIT data type stores bit values and is supported for MyISAM, MEMORY, InnoDB, and NDB tables. For information about how MySQL handles assignment of out-of-range values to columns and overflow during expression evaluation, see Section 11.2.6, “Out-of-Range and Overflow Handling”. For information about numeric type storage requirements, see Section 11.8, “Data Type Storage Requirements”. The data type used for the result of a calculation on numeric operands depends on the types of the operands and the operations performed on them. For more information, see Section 12.6.1, “Arithmetic Operators”.
11.2.1 Integer Types (Exact Value) - INTEGER, INT, SMALLINT, TINYINT, MEDIUMINT, BIGINT MySQL supports the SQL standard integer types INTEGER (or INT) and SMALLINT. As an extension to the standard, MySQL also supports the integer types TINYINT, MEDIUMINT, and BIGINT. The following table shows the required storage and range for each integer type. Table 11.1 Required Storage and Range for Integer Types Supported by MySQL Type
Storage (Bytes)
Minimum Value Signed
Minimum Value Unsigned
Maximum Value Signed
Maximum Value Unsigned
TINYINT
1
-128
0
127
255
SMALLINT
2
-32768
0
32767
65535
MEDIUMINT
3
-8388608
0
8388607
16777215
INT
4
-2147483648 0
2147483647
4294967295
BIGINT
8
-2
63
0
63
2 -1
264-1
11.2.2 Fixed-Point Types (Exact Value) - DECIMAL, NUMERIC The DECIMAL and NUMERIC types store exact numeric data values. These types are used when it is important to preserve exact precision, for example with monetary data. In MySQL, NUMERIC is implemented as DECIMAL, so the following remarks about DECIMAL apply equally to NUMERIC. MySQL stores DECIMAL values in binary format. See Section 12.22, “Precision Math”. In a DECIMAL column declaration, the precision and scale can be (and usually is) specified; for example: salary DECIMAL(5,2)
In this example, 5 is the precision and 2 is the scale. The precision represents the number of significant digits that are stored for values, and the scale represents the number of digits that can be stored following the decimal point. Standard SQL requires that DECIMAL(5,2) be able to store any value with five digits and two decimals, so values that can be stored in the salary column range from -999.99 to 999.99. In standard SQL, the syntax DECIMAL(M) is equivalent to DECIMAL(M,0). Similarly, the syntax DECIMAL is equivalent to DECIMAL(M,0), where the implementation is permitted to decide the value of M. MySQL supports both of these variant forms of DECIMAL syntax. The default value of M is 10. If the scale is 0, DECIMAL values contain no decimal point or fractional part.
1551
Floating-Point Types (Approximate Value) - FLOAT, DOUBLE
The maximum number of digits for DECIMAL is 65, but the actual range for a given DECIMAL column can be constrained by the precision or scale for a given column. When such a column is assigned a value with more digits following the decimal point than are permitted by the specified scale, the value is converted to that scale. (The precise behavior is operating system-specific, but generally the effect is truncation to the permissible number of digits.)
11.2.3 Floating-Point Types (Approximate Value) - FLOAT, DOUBLE The FLOAT and DOUBLE types represent approximate numeric data values. MySQL uses four bytes for single-precision values and eight bytes for double-precision values. For FLOAT, the SQL standard permits an optional specification of the precision (but not the range of the exponent) in bits following the keyword FLOAT in parentheses; ; that is, FLOAT(p). MySQL also supports this optional precision specification, but the precision value in FLOAT(p) is used only to determine storage size. A precision from 0 to 23 results in a 4-byte single-precision FLOAT column. A precision from 24 to 53 results in an 8-byte double-precision DOUBLE column. MySQL permits a nonstandard syntax: FLOAT(M,D) or REAL(M,D) or DOUBLE PRECISION(M,D). Here, (M,D) means than values can be stored with up to M digits in total, of which D digits may be after the decimal point. For example, a column defined as FLOAT(7,4) will look like -999.9999 when displayed. MySQL performs rounding when storing values, so if you insert 999.00009 into a FLOAT(7,4) column, the approximate result is 999.0001. Because floating-point values are approximate and not stored as exact values, attempts to treat them as exact in comparisons may lead to problems. They are also subject to platform or implementation dependencies. For more information, see Section B.6.4.8, “Problems with Floating-Point Values” For maximum portability, code requiring storage of approximate numeric data values should use FLOAT or DOUBLE PRECISION with no specification of precision or number of digits.
11.2.4 Bit-Value Type - BIT The BIT data type is used to store bit values. A type of BIT(M) enables storage of M-bit values. M can range from 1 to 64. To specify bit values, b'value' notation can be used. value is a binary value written using zeros and ones. For example, b'111' and b'10000000' represent 7 and 128, respectively. See Section 9.1.5, “Bit-Value Literals”. If you assign a value to a BIT(M) column that is less than M bits long, the value is padded on the left with zeros. For example, assigning a value of b'101' to a BIT(6) column is, in effect, the same as assigning b'000101'. NDB Cluster. The maximum combined size of all BIT columns used in a given NDB table must not exceed 4096 bits.
11.2.5 Numeric Type Attributes MySQL supports an extension for optionally specifying the display width of integer data types in parentheses following the base keyword for the type. For example, INT(4) specifies an INT with a display width of four digits. This optional display width may be used by applications to display integer values having a width less than the width specified for the column by left-padding them with spaces. (That is, this width is present in the metadata returned with result sets. Whether it is used or not is up to the application.) The display width does not constrain the range of values that can be stored in the column. Nor does it prevent values wider than the column display width from being displayed correctly. For example, a column specified as SMALLINT(3) has the usual SMALLINT range of -32768 to 32767, and values outside the range permitted by three digits are displayed in full using more than three digits.
1552
Out-of-Range and Overflow Handling
When used in conjunction with the optional (nonstandard) attribute ZEROFILL, the default padding of spaces is replaced with zeros. For example, for a column declared as INT(4) ZEROFILL, a value of 5 is retrieved as 0005. Note The ZEROFILL attribute is ignored when a column is involved in expressions or UNION queries. If you store values larger than the display width in an integer column that has the ZEROFILL attribute, you may experience problems when MySQL generates temporary tables for some complicated joins. In these cases, MySQL assumes that the data values fit within the column display width. All integer types can have an optional (nonstandard) attribute UNSIGNED. Unsigned type can be used to permit only nonnegative numbers in a column or when you need a larger upper numeric range for the column. For example, if an INT column is UNSIGNED, the size of the column's range is the same but its endpoints shift from -2147483648 and 2147483647 up to 0 and 4294967295. Floating-point and fixed-point types also can be UNSIGNED. As with integer types, this attribute prevents negative values from being stored in the column. Unlike the integer types, the upper range of column values remains the same. If you specify ZEROFILL for a numeric column, MySQL automatically adds the UNSIGNED attribute to the column. Integer or floating-point data types can have the additional attribute AUTO_INCREMENT. When you insert a value of NULL into an indexed AUTO_INCREMENT column, the column is set to the next sequence value. Typically this is value+1, where value is the largest value for the column currently in the table. (AUTO_INCREMENT sequences begin with 1.) Storing 0 into an AUTO_INCREMENT column has the same effect as storing NULL, unless the NO_AUTO_VALUE_ON_ZERO SQL mode is enabled. Inserting NULL to generate AUTO_INCREMENT values requires that the column be declared NOT NULL. If the column is declared NULL, inserting NULL stores a NULL. When you insert any other value into an AUTO_INCREMENT column, the column is set to that value and the sequence is reset so that the next automatically generated value follows sequentially from the inserted value. Negative values for AUTO_INCREMENT columns are not supported.
11.2.6 Out-of-Range and Overflow Handling When MySQL stores a value in a numeric column that is outside the permissible range of the column data type, the result depends on the SQL mode in effect at the time: • If strict SQL mode is enabled, MySQL rejects the out-of-range value with an error, and the insert fails, in accordance with the SQL standard. • If no restrictive modes are enabled, MySQL clips the value to the appropriate endpoint of the column data type range and stores the resulting value instead. When an out-of-range value is assigned to an integer column, MySQL stores the value representing the corresponding endpoint of the column data type range. When a floating-point or fixed-point column is assigned a value that exceeds the range implied by the specified (or default) precision and scale, MySQL stores the value representing the corresponding endpoint of that range. Suppose that a table t1 has this definition:
1553
Out-of-Range and Overflow Handling
CREATE TABLE t1 (i1 TINYINT, i2 TINYINT UNSIGNED);
With strict SQL mode enabled, an out of range error occurs: mysql> SET sql_mode = 'TRADITIONAL'; mysql> INSERT INTO t1 (i1, i2) VALUES(256, 256); ERROR 1264 (22003): Out of range value for column 'i1' at row 1 mysql> SELECT * FROM t1; Empty set (0.00 sec)
With strict SQL mode not enabled, clipping with warnings occurs: mysql> SET sql_mode = ''; mysql> INSERT INTO t1 (i1, i2) VALUES(256, 256); mysql> SHOW WARNINGS; +---------+------+---------------------------------------------+ | Level | Code | Message | +---------+------+---------------------------------------------+ | Warning | 1264 | Out of range value for column 'i1' at row 1 | | Warning | 1264 | Out of range value for column 'i2' at row 1 | +---------+------+---------------------------------------------+ mysql> SELECT * FROM t1; +------+------+ | i1 | i2 | +------+------+ | 127 | 255 | +------+------+
When strict SQL mode is not enabled, column-assignment conversions that occur due to clipping are reported as warnings for ALTER TABLE, LOAD DATA, UPDATE, and multiple-row INSERT statements. In strict mode, these statements fail, and some or all the values are not inserted or changed, depending on whether the table is a transactional table and other factors. For details, see Section 5.1.10, “Server SQL Modes”. Overflow during numeric expression evaluation results in an error. For example, the largest signed BIGINT value is 9223372036854775807, so the following expression produces an error: mysql> SELECT 9223372036854775807 + 1; ERROR 1690 (22003): BIGINT value is out of range in '(9223372036854775807 + 1)'
To enable the operation to succeed in this case, convert the value to unsigned; mysql> SELECT CAST(9223372036854775807 AS UNSIGNED) + 1; +-------------------------------------------+ | CAST(9223372036854775807 AS UNSIGNED) + 1 | +-------------------------------------------+ | 9223372036854775808 | +-------------------------------------------+
Whether overflow occurs depends on the range of the operands, so another way to handle the preceding expression is to use exact-value arithmetic because DECIMAL values have a larger range than integers: mysql> SELECT 9223372036854775807.0 + 1; +---------------------------+ | 9223372036854775807.0 + 1 | +---------------------------+ | 9223372036854775808.0 | +---------------------------+
Subtraction between integer values, where one is of type UNSIGNED, produces an unsigned result by default. If the result would otherwise have been negative, an error results:
1554
Date and Time Types
mysql> SET sql_mode = ''; Query OK, 0 rows affected (0.00 sec) mysql> SELECT CAST(0 AS UNSIGNED) - 1; ERROR 1690 (22003): BIGINT UNSIGNED value is out of range in '(cast(0 as unsigned) - 1)'
If the NO_UNSIGNED_SUBTRACTION SQL mode is enabled, the result is negative: mysql> SET sql_mode = 'NO_UNSIGNED_SUBTRACTION'; mysql> SELECT CAST(0 AS UNSIGNED) - 1; +-------------------------+ | CAST(0 AS UNSIGNED) - 1 | +-------------------------+ | -1 | +-------------------------+
If the result of such an operation is used to update an UNSIGNED integer column, the result is clipped to the maximum value for the column type, or clipped to 0 if NO_UNSIGNED_SUBTRACTION is enabled. If strict SQL mode is enabled, an error occurs and the column remains unchanged.
11.3 Date and Time Types The date and time types for representing temporal values are DATE, TIME, DATETIME, TIMESTAMP, and YEAR. Each temporal type has a range of valid values, as well as a “zero” value that may be used when you specify an invalid value that MySQL cannot represent. The TIMESTAMP type has special automatic updating behavior, described later. For temporal type storage requirements, see Section 11.8, “Data Type Storage Requirements”. Keep in mind these general considerations when working with date and time types: • MySQL retrieves values for a given date or time type in a standard output format, but it attempts to interpret a variety of formats for input values that you supply (for example, when you specify a value to be assigned to or compared to a date or time type). For a description of the permitted formats for date and time types, see Section 9.1.3, “Date and Time Literals”. It is expected that you supply valid values. Unpredictable results may occur if you use values in other formats. • Although MySQL tries to interpret values in several formats, date parts must always be given in yearmonth-day order (for example, '98-09-04'), rather than in the month-day-year or day-month-year orders commonly used elsewhere (for example, '09-04-98', '04-09-98'). • Dates containing two-digit year values are ambiguous because the century is unknown. MySQL interprets two-digit year values using these rules: • Year values in the range 70-99 are converted to 1970-1999. • Year values in the range 00-69 are converted to 2000-2069. See also Section 11.3.8, “Two-Digit Years in Dates”. • Conversion of values from one temporal type to another occurs according to the rules in Section 11.3.7, “Conversion Between Date and Time Types”. • MySQL automatically converts a date or time value to a number if the value is used in a numeric context and vice versa. • By default, when MySQL encounters a value for a date or time type that is out of range or otherwise invalid for the type, it converts the value to the “zero” value for that type. The exception is that out-ofrange TIME values are clipped to the appropriate endpoint of the TIME range. • By setting the SQL mode to the appropriate value, you can specify more exactly what kind of dates you want MySQL to support. (See Section 5.1.10, “Server SQL Modes”.) You can get MySQL to accept certain dates, such as '2009-11-31', by enabling the ALLOW_INVALID_DATES SQL
1555
The DATE, DATETIME, and TIMESTAMP Types
mode. This is useful when you want to store a “possibly wrong” value which the user has specified (for example, in a web form) in the database for future processing. Under this mode, MySQL verifies only that the month is in the range from 1 to 12 and that the day is in the range from 1 to 31. • MySQL permits you to store dates where the day or month and day are zero in a DATE or DATETIME column. This is useful for applications that need to store birthdates for which you may not know the exact date. In this case, you simply store the date as '2009-00-00' or '2009-01-00'. If you store dates such as these, you should not expect to get correct results for functions such as DATE_SUB() or DATE_ADD() that require complete dates. To disallow zero month or day parts in dates, enable the NO_ZERO_IN_DATE mode. • MySQL permits you to store a “zero” value of '0000-00-00' as a “dummy date.” This is in some cases more convenient than using NULL values, and uses less data and index space. To disallow '0000-00-00', enable the NO_ZERO_DATE mode. • “Zero” date or time values used through Connector/ODBC are converted automatically to NULL because ODBC cannot handle such values. The following table shows the format of the “zero” value for each type. The “zero” values are special, but you can store or refer to them explicitly using the values shown in the table. You can also do this using the values '0' or 0, which are easier to write. For temporal types that include a date part (DATE, DATETIME, and TIMESTAMP), use of these values produces warnings if the NO_ZERO_DATE SQL mode is enabled. Data Type
“Zero” Value
DATE
'0000-00-00'
TIME
'00:00:00'
DATETIME
'0000-00-00 00:00:00'
TIMESTAMP
'0000-00-00 00:00:00'
YEAR
0000
11.3.1 The DATE, DATETIME, and TIMESTAMP Types The DATE, DATETIME, and TIMESTAMP types are related. This section describes their characteristics, how they are similar, and how they differ. MySQL recognizes DATE, DATETIME, and TIMESTAMP values in several formats, described in Section 9.1.3, “Date and Time Literals”. For the DATE and DATETIME range descriptions, “supported” means that although earlier values might work, there is no guarantee. The DATE type is used for values with a date part but no time part. MySQL retrieves and displays DATE values in 'YYYY-MM-DD' format. The supported range is '1000-01-01' to '9999-12-31'. The DATETIME type is used for values that contain both date and time parts. MySQL retrieves and displays DATETIME values in 'YYYY-MM-DD HH:MM:SS' format. The supported range is '1000-01-01 00:00:00' to '9999-12-31 23:59:59'. The TIMESTAMP data type is used for values that contain both date and time parts. TIMESTAMP has a range of '1970-01-01 00:00:01' UTC to '2038-01-19 03:14:07' UTC. A DATETIME or TIMESTAMP value can include a trailing fractional seconds part in up to microseconds (6 digits) precision. In particular, any fractional part in a value inserted into a DATETIME or TIMESTAMP column is stored rather than discarded. With the fractional part included, the format for these values is 'YYYY-MM-DD HH:MM:SS[.fraction]', the range for DATETIME values is '1000-01-01 00:00:00.000000' to '9999-12-31 23:59:59.999999', and the range for TIMESTAMP values is '1970-01-01 00:00:01.000000' to '2038-01-19 03:14:07.999999'. The fractional part should always be separated from the rest of the time by a decimal point; no other fractional seconds delimiter is recognized. For information about fractional seconds support in MySQL, see Section 11.3.6, “Fractional Seconds in Time Values”.
1556
The DATE, DATETIME, and TIMESTAMP Types
The TIMESTAMP and DATETIME data types offer automatic initialization and updating to the current date and time. For more information, see Section 11.3.5, “Automatic Initialization and Updating for TIMESTAMP and DATETIME”. MySQL converts TIMESTAMP values from the current time zone to UTC for storage, and back from UTC to the current time zone for retrieval. (This does not occur for other types such as DATETIME.) By default, the current time zone for each connection is the server's time. The time zone can be set on a per-connection basis. As long as the time zone setting remains constant, you get back the same value you store. If you store a TIMESTAMP value, and then change the time zone and retrieve the value, the retrieved value is different from the value you stored. This occurs because the same time zone was not used for conversion in both directions. The current time zone is available as the value of the time_zone system variable. For more information, see Section 5.1.12, “MySQL Server Time Zone Support”. Invalid DATE, DATETIME, or TIMESTAMP values are converted to the “zero” value of the appropriate type ('0000-00-00' or '0000-00-00 00:00:00'). Be aware of certain properties of date value interpretation in MySQL: • MySQL permits a “relaxed” format for values specified as strings, in which any punctuation character may be used as the delimiter between date parts or time parts. In some cases, this syntax can be deceiving. For example, a value such as '10:11:12' might look like a time value because of the :, but is interpreted as the year '2010-11-12' if used in a date context. The value '10:45:15' is converted to '0000-00-00' because '45' is not a valid month. The only delimiter recognized between a date and time part and a fractional seconds part is the decimal point. • The server requires that month and day values be valid, and not merely in the range 1 to 12 and 1 to 31, respectively. With strict mode disabled, invalid dates such as '2004-04-31' are converted to '0000-00-00' and a warning is generated. With strict mode enabled, invalid dates generate an error. To permit such dates, enable ALLOW_INVALID_DATES. See Section 5.1.10, “Server SQL Modes”, for more information. • MySQL does not accept TIMESTAMP values that include a zero in the day or month column or values that are not a valid date. The sole exception to this rule is the special “zero” value '0000-00-00 00:00:00'. • Dates containing two-digit year values are ambiguous because the century is unknown. MySQL interprets two-digit year values using these rules: • Year values in the range 00-69 are converted to 2000-2069. • Year values in the range 70-99 are converted to 1970-1999. See also Section 11.3.8, “Two-Digit Years in Dates”. Note The MySQL server can be run with the MAXDB SQL mode enabled. In this case, TIMESTAMP is identical with DATETIME. If this mode is enabled at the time that a table is created, TIMESTAMP columns are created as DATETIME columns. As a result, such columns use DATETIME display format, have the same range of values, and there is no automatic initialization or updating to the current date and time. See Section 5.1.10, “Server SQL Modes”. Note As of MySQL 5.7.22, MAXDB is deprecated. It will be removed in a future version of MySQL. 1557
The TIME Type
11.3.2 The TIME Type MySQL retrieves and displays TIME values in 'HH:MM:SS' format (or 'HHH:MM:SS' format for large hours values). TIME values may range from '-838:59:59' to '838:59:59'. The hours part may be so large because the TIME type can be used not only to represent a time of day (which must be less than 24 hours), but also elapsed time or a time interval between two events (which may be much greater than 24 hours, or even negative). MySQL recognizes TIME values in several formats, some of which can include a trailing fractional seconds part in up to microseconds (6 digits) precision. See Section 9.1.3, “Date and Time Literals”. For information about fractional seconds support in MySQL, see Section 11.3.6, “Fractional Seconds in Time Values”. In particular, any fractional part in a value inserted into a TIME column is stored rather than discarded. With the fractional part included, the range for TIME values is '-838:59:59.000000' to '838:59:59.000000'. Be careful about assigning abbreviated values to a TIME column. MySQL interprets abbreviated TIME values with colons as time of the day. That is, '11:12' means '11:12:00', not '00:11:12'. MySQL interprets abbreviated values without colons using the assumption that the two rightmost digits represent seconds (that is, as elapsed time rather than as time of day). For example, you might think of '1112' and 1112 as meaning '11:12:00' (12 minutes after 11 o'clock), but MySQL interprets them as '00:11:12' (11 minutes, 12 seconds). Similarly, '12' and 12 are interpreted as '00:00:12'. The only delimiter recognized between a time part and a fractional seconds part is the decimal point. By default, values that lie outside the TIME range but are otherwise valid are clipped to the closest endpoint of the range. For example, '-850:00:00' and '850:00:00' are converted to '-838:59:59' and '838:59:59'. Invalid TIME values are converted to '00:00:00'. Note that because '00:00:00' is itself a valid TIME value, there is no way to tell, from a value of '00:00:00' stored in a table, whether the original value was specified as '00:00:00' or whether it was invalid. For more restrictive treatment of invalid TIME values, enable strict SQL mode to cause errors to occur. See Section 5.1.10, “Server SQL Modes”.
11.3.3 The YEAR Type The YEAR type is a 1-byte type used to represent year values. It can be declared as YEAR or YEAR(4) and has a display width of four characters. Note The YEAR(2) data type is deprecated and support for it is removed in MySQL 5.7.5. To convert YEAR(2) columns to YEAR(4), see Section 11.3.4, “YEAR(2) Limitations and Migrating to YEAR(4)”. MySQL displays YEAR values in YYYY format, with a range of 1901 to 2155, or 0000. You can specify input YEAR values in a variety of formats: • As a 4-digit number in the range 1901 to 2155. • As a 4-digit string in the range '1901' to '2155'. • As a 1- or 2-digit number in the range 1 to 99. MySQL converts values in the ranges 1 to 69 and 70 to 99 to YEAR values in the ranges 2001 to 2069 and 1970 to 1999. • As a 1- or 2-digit string in the range '0' to '99'. MySQL converts values in the ranges '0' to '69' and '70' to '99' to YEAR values in the ranges 2000 to 2069 and 1970 to 1999. • The result of inserting a numeric 0 has a display value of 0000 and an internal value of 0000. To insert zero and have it be interpreted as 2000, specify it as a string '0' or '00'.
1558
YEAR(2) Limitations and Migrating to YEAR(4)
• As the result of a function that returns a value that is acceptable in a YEAR context, such as NOW(). MySQL converts invalid YEAR values to 0000. See also Section 11.3.8, “Two-Digit Years in Dates”.
11.3.4 YEAR(2) Limitations and Migrating to YEAR(4) This section describes problems that can occur when using YEAR(2) and provides information about converting existing YEAR(2) columns to YEAR(4). Although the internal range of values for YEAR(4) and the deprecated YEAR(2) type is the same (1901 to 2155, and 0000), the display width for YEAR(2) makes that type inherently ambiguous because displayed values indicate only the last two digits of the internal values and omit the century digits. The result can be a loss of information under certain circumstances. For this reason, before MySQL 5.7.5, avoid using YEAR(2) in your applications and use YEAR(4) wherever you need a YEAR data type. As of MySQL 5.7.5, support for YEAR(2) is removed and existing YEAR(2) columns must be converted to YEAR(4) to become usable again.
YEAR(2) Limitations Issues with the YEAR(2) data type include ambiguity of displayed values, and possible loss of information when values are dumped and reloaded or converted to strings. • Displayed YEAR(2) values can be ambiguous. It is possible for up to three YEAR(2) values that have different internal values to have the same displayed value, as the following example demonstrates: mysql> CREATE TABLE t (y2 YEAR(2), y4 YEAR(4)); Query OK, 0 rows affected (0.01 sec) mysql> INSERT INTO t (y2) VALUES(1912),(2012),(2112); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> UPDATE t SET y4 = y2; Query OK, 3 rows affected (0.00 sec) Rows matched: 3 Changed: 3 Warnings: 0 mysql> SELECT * FROM t; +------+------+ | y2 | y4 | +------+------+ | 12 | 1912 | | 12 | 2012 | | 12 | 2112 | +------+------+ 3 rows in set (0.00 sec)
• If you use mysqldump to dump the table created in the preceding item, the dump file represents all y2 values using the same 2-digit representation (12). If you reload the table from the dump file, all resulting rows have internal value 2012 and display value 12, thus losing the distinctions among them. • Conversion of a YEAR(2) or YEAR(4) data value to string form uses the display width of the YEAR type. Suppose that YEAR(2) and YEAR(4) columns both contain the value 1970. Assigning each column to a string results in a value of '70' or '1970', respectively. That is, loss of information occurs for conversion from YEAR(2) to string. • Values outside the range from 1970 to 2069 are stored incorrectly when inserted into a YEAR(2) column in a CSV table. For example, inserting 2111 results in a display value of 11 but an internal value of 2011.
1559
YEAR(2) Limitations and Migrating to YEAR(4)
To avoid these problems, use YEAR(4) rather than YEAR(2). Suggestions regarding migration strategies appear later in this section.
Reduced/Removed YEAR(2) Support in MySQL 5.7 Before MySQL 5.7.5, support for YEAR(2) is diminished. As of MySQL 5.7.5, support for YEAR(2) is removed. • YEAR(2) column definitions for new tables produce warnings or errors: • Before MySQL 5.7.5, YEAR(2) column definitions for new tables are converted (with an ER_INVALID_YEAR_COLUMN_LENGTH warning) to YEAR(4): mysql> CREATE TABLE t1 (y YEAR(2)); Query OK, 0 rows affected, 1 warning (0.04 sec) mysql> SHOW WARNINGS\G *************************** 1. row *************************** Level: Warning Code: 1818 Message: YEAR(2) column type is deprecated. Creating YEAR(4) column instead. 1 row in set (0.00 sec) mysql> SHOW CREATE TABLE t1\G *************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `y` year(4) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec)
• As of MySQL 5.7.5, YEAR(2) column definitions for new tables produce an ER_INVALID_YEAR_COLUMN_LENGTH error: mysql> CREATE TABLE t1 (y YEAR(2)); ERROR 1818 (HY000): Supports only YEAR or YEAR(4) column.
• YEAR(2) column in existing tables remain as YEAR(2): • Before MySQL 5.7.5, YEAR(2) is processed in queries as in older versions of MySQL. • As of MySQL 5.7.5, YEAR(2) columns in queries produce warnings or errors. • Several programs or statements convert YEAR(2) to YEAR(4) automatically: • ALTER TABLE statements that result in a table rebuild. • REPAIR TABLE (which CHECK TABLE recommends you use if it finds that a table contains YEAR(2) columns). • mysql_upgrade (which uses REPAIR TABLE). • Dumping with mysqldump and reloading the dump file. Unlike the conversions performed by the preceding three items, a dump and reload has the potential to change values. A MySQL upgrade usually involves at least one of the last two items. However, with respect to YEAR(2), mysql_upgrade is preferable. You should avoid using mysqldump because, as noted, that can change values.
Migrating from YEAR(2) to YEAR(4) To convert YEAR(2) columns to YEAR(4), you can do so manually at any time without upgrading. Alternatively, you can upgrade to a version of MySQL with reduced or removed support for YEAR(2)
1560
Automatic Initialization and Updating for TIMESTAMP and DATETIME
(MySQL 5.6.6 or later), then have MySQL convert YEAR(2) columns automatically. In the latter case, avoid upgrading by dumping and reloading your data because that can change data values. In addition, if you use replication, there are upgrade considerations you must take into account. To convert YEAR(2) columns to YEAR(4) manually, use ALTER TABLE or REPAIR TABLE. Suppose that a table t1 has this definition: CREATE TABLE t1 (ycol YEAR(2) NOT NULL DEFAULT '70');
Modify the column using ALTER TABLE as follows: ALTER TABLE t1 FORCE;
The ALTER TABLE statement converts the table without changing YEAR(2) values. If the server is a replication master, the ALTER TABLE statement replicates to slaves and makes the corresponding table change on each one. Another migration method is to perform a binary upgrade: Install MySQL without dumping and reloading your data. Then run mysql_upgrade, which uses REPAIR TABLE to convert YEAR(2) columns to YEAR(4) without changing data values. If the server is a replication master, the REPAIR TABLE statements replicate to slaves and make the corresponding table changes on each one, unless you invoke mysql_upgrade with the --skip-write-binlog option. Upgrades to replication servers usually involve upgrading slaves to a newer version of MySQL, then upgrading the master. For example, if a master and slave both run MySQL 5.5, a typical upgrade sequence involves upgrading the slave to 5.6, then upgrading the master to 5.6. With regard to the different treatment of YEAR(2) as of MySQL 5.6.6, that upgrade sequence results in a problem: Suppose that the slave has been upgraded but not yet the master. Then creating a table containing a YEAR(2) column on the master results in a table containing a YEAR(4) column on the slave. Consequently, these operations will have a different result on the master and slave, if you use statement-based replication: • Inserting numeric 0. The resulting value has an internal value of 2000 on the master but 0000 on the slave. • Converting YEAR(2) to string. This operation uses the display value of YEAR(2) on the master but YEAR(4) on the slave. To avoid such problems, modify all YEAR(2) columns on the master to YEAR(4) before upgrading. (Use ALTER TABLE, as described previously.) Then you can upgrade normally (slave first, then master) without introducing any YEAR(2) to YEAR(4) differences between the master and slave. One migration method should be avoided: Do not dump your data with mysqldump and reload the dump file after upgrading. This has the potential to change YEAR(2) values, as described previously. A migration from YEAR(2) to YEAR(4) should also involve examining application code for the possibility of changed behavior under conditions such as these: • Code that expects selecting a YEAR column to produce exactly two digits. • Code that does not account for different handling for inserts of numeric 0: Inserting 0 into YEAR(2) or YEAR(4) results in an internal value of 2000 or 0000, respectively.
11.3.5 Automatic Initialization and Updating for TIMESTAMP and DATETIME TIMESTAMP and DATETIME columns can be automatically initializated and updated to the current date and time (that is, the current timestamp). For any TIMESTAMP or DATETIME column in a table, you can assign the current timestamp as the default value, the auto-update value, or both:
1561
Automatic Initialization and Updating for TIMESTAMP and DATETIME
• An auto-initialized column is set to the current timestamp for inserted rows that specify no value for the column. • An auto-updated column is automatically updated to the current timestamp when the value of any other column in the row is changed from its current value. An auto-updated column remains unchanged if all other columns are set to their current values. To prevent an auto-updated column from updating when other columns change, explicitly set it to its current value. To update an autoupdated column even when other columns do not change, explicitly set it to the value it should have (for example, set it to CURRENT_TIMESTAMP). In addition, if the explicit_defaults_for_timestamp system variable is disabled, you can initialize or update any TIMESTAMP (but not DATETIME) column to the current date and time by assigning it a NULL value, unless it has been defined with the NULL attribute to permit NULL values. To specify automatic properties, use the DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP clauses in column definitions. The order of the clauses does not matter. If both are present in a column definition, either can occur first. Any of the synonyms for CURRENT_TIMESTAMP have the same meaning as CURRENT_TIMESTAMP. These are CURRENT_TIMESTAMP(), NOW(), LOCALTIME, LOCALTIME(), LOCALTIMESTAMP, and LOCALTIMESTAMP(). Use of DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP is specific to TIMESTAMP and DATETIME. The DEFAULT clause also can be used to specify a constant (nonautomatic) default value; for example, DEFAULT 0 or DEFAULT '2000-01-01 00:00:00'. Note The following examples use DEFAULT 0, a default that can produce warnings or errors depending on whether strict SQL mode or the NO_ZERO_DATE SQL mode is enabled. Be aware that the TRADITIONAL SQL mode includes strict mode and NO_ZERO_DATE. See Section 5.1.10, “Server SQL Modes”. TIMESTAMP or DATETIME column definitions can specify the current timestamp for both the default and auto-update values, for one but not the other, or for neither. Different columns can have different combinations of automatic properties. The following rules describe the possibilities: • With both DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP, the column has the current timestamp for its default value and is automatically updated to the current timestamp. CREATE TABLE t1 ( ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, dt DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP );
• With a DEFAULT clause but no ON UPDATE CURRENT_TIMESTAMP clause, the column has the given default value and is not automatically updated to the current timestamp. The default depends on whether the DEFAULT clause specifies CURRENT_TIMESTAMP or a constant value. With CURRENT_TIMESTAMP, the default is the current timestamp. CREATE TABLE t1 ( ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP, dt DATETIME DEFAULT CURRENT_TIMESTAMP );
With a constant, the default is the given value. In this case, the column has no automatic properties at all. CREATE TABLE t1 ( ts TIMESTAMP DEFAULT 0, dt DATETIME DEFAULT 0
1562
Automatic Initialization and Updating for TIMESTAMP and DATETIME
);
• With an ON UPDATE CURRENT_TIMESTAMP clause and a constant DEFAULT clause, the column is automatically updated to the current timestamp and has the given constant default value. CREATE TABLE t1 ( ts TIMESTAMP DEFAULT 0 ON UPDATE CURRENT_TIMESTAMP, dt DATETIME DEFAULT 0 ON UPDATE CURRENT_TIMESTAMP );
• With an ON UPDATE CURRENT_TIMESTAMP clause but no DEFAULT clause, the column is automatically updated to the current timestamp but does not have the current timestamp for its default value. The default in this case is type dependent. TIMESTAMP has a default of 0 unless defined with the NULL attribute, in which case the default is NULL. CREATE TABLE t1 ( ts1 TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, -- default 0 ts2 TIMESTAMP NULL ON UPDATE CURRENT_TIMESTAMP -- default NULL );
DATETIME has a default of NULL unless defined with the NOT NULL attribute, in which case the default is 0. CREATE TABLE t1 ( dt1 DATETIME ON UPDATE CURRENT_TIMESTAMP, -- default NULL dt2 DATETIME NOT NULL ON UPDATE CURRENT_TIMESTAMP -- default 0 );
TIMESTAMP and DATETIME columns have no automatic properties unless they are specified explicitly, with this exception: If the explicit_defaults_for_timestamp system variable is disabled, the first TIMESTAMP column has both DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP if neither is specified explicitly. To suppress automatic properties for the first TIMESTAMP column, use one of these strategies: • Enable the explicit_defaults_for_timestamp system variable. In this case, the DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP clauses that specify automatic initialization and updating are available, but are not assigned to any TIMESTAMP column unless explicitly included in the column definition. • Alternatively, if explicit_defaults_for_timestamp is disabled, do either of the following: • Define the column with a DEFAULT clause that specifies a constant default value. • Specify the NULL attribute. This also causes the column to permit NULL values, which means that you cannot assign the current timestamp by setting the column to NULL. Assigning NULL sets the column to NULL, not the current timestamp. To assign the current timestamp, set the column to CURRENT_TIMESTAMP or a synonym such as NOW(). Consider these table definitions: CREATE TABLE t1 ( ts1 TIMESTAMP DEFAULT 0, ts2 TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP); CREATE TABLE t2 ( ts1 TIMESTAMP NULL, ts2 TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP); CREATE TABLE t3 ( ts1 TIMESTAMP NULL DEFAULT 0,
1563
Automatic Initialization and Updating for TIMESTAMP and DATETIME
ts2 TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP);
The tables have these properties: • In each table definition, the first TIMESTAMP column has no automatic initialization or updating. • The tables differ in how the ts1 column handles NULL values. For t1, ts1 is NOT NULL and assigning it a value of NULL sets it to the current timestamp. For t2 and t3, ts1 permits NULL and assigning it a value of NULL sets it to NULL. • t2 and t3 differ in the default value for ts1. For t2, ts1 is defined to permit NULL, so the default is also NULL in the absence of an explicit DEFAULT clause. For t3, ts1 permits NULL but has an explicit default of 0. If a TIMESTAMP or DATETIME column definition includes an explicit fractional seconds precision value anywhere, the same value must be used throughout the column definition. This is permitted: CREATE TABLE t1 ( ts TIMESTAMP(6) DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6) );
This is not permitted: CREATE TABLE t1 ( ts TIMESTAMP(6) DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP(3) );
TIMESTAMP Initialization and the NULL Attribute If the explicit_defaults_for_timestamp system variable is disabled, TIMESTAMP columns by default are NOT NULL, cannot contain NULL values, and assigning NULL assigns the current timestamp. To permit a TIMESTAMP column to contain NULL, explicitly declare it with the NULL attribute. In this case, the default value also becomes NULL unless overridden with a DEFAULT clause that specifies a different default value. DEFAULT NULL can be used to explicitly specify NULL as the default value. (For a TIMESTAMP column not declared with the NULL attribute, DEFAULT NULL is invalid.) If a TIMESTAMP column permits NULL values, assigning NULL sets it to NULL, not to the current timestamp. The following table contains several TIMESTAMP columns that permit NULL values: CREATE TABLE t ( ts1 TIMESTAMP NULL DEFAULT NULL, ts2 TIMESTAMP NULL DEFAULT 0, ts3 TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP );
A TIMESTAMP column that permits NULL values does not take on the current timestamp at insert time except under one of the following conditions: • Its default value is defined as CURRENT_TIMESTAMP and no value is specified for the column • CURRENT_TIMESTAMP or any of its synonyms such as NOW() is explicitly inserted into the column In other words, a TIMESTAMP column defined to permit NULL values auto-initializes only if its definition includes DEFAULT CURRENT_TIMESTAMP: CREATE TABLE t (ts TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP);
If the TIMESTAMP column permits NULL values but its definition does not include DEFAULT CURRENT_TIMESTAMP, you must explicitly insert a value corresponding to the current date and time. Suppose that tables t1 and t2 have these definitions:
1564
Fractional Seconds in Time Values
CREATE TABLE t1 (ts TIMESTAMP NULL DEFAULT '0000-00-00 00:00:00'); CREATE TABLE t2 (ts TIMESTAMP NULL DEFAULT NULL);
To set the TIMESTAMP column in either table to the current timestamp at insert time, explicitly assign it that value. For example: INSERT INTO t2 VALUES (CURRENT_TIMESTAMP); INSERT INTO t1 VALUES (NOW());
If the explicit_defaults_for_timestamp system variable is enabled, TIMESTAMP columns permit NULL values only if declared with the NULL attribute. Also, TIMESTAMP columns do not permit assigning NULL to assign the current timestamp, whether declared with the NULL or NOT NULL attribute. To assign the current timestamp, set the column to CURRENT_TIMESTAMP or a synonym such as NOW().
11.3.6 Fractional Seconds in Time Values MySQL 5.7 has fractional seconds support for TIME, DATETIME, and TIMESTAMP values, with up to microseconds (6 digits) precision: • To define a column that includes a fractional seconds part, use the syntax type_name(fsp), where type_name is TIME, DATETIME, or TIMESTAMP, and fsp is the fractional seconds precision. For example: CREATE TABLE t1 (t TIME(3), dt DATETIME(6));
The fsp value, if given, must be in the range 0 to 6. A value of 0 signifies that there is no fractional part. If omitted, the default precision is 0. (This differs from the standard SQL default of 6, for compatibility with previous MySQL versions.) • Inserting a TIME, DATE, or TIMESTAMP value with a fractional seconds part into a column of the same type but having fewer fractional digits results in rounding. Consider a table created and populated as follows: CREATE TABLE fractest( c1 TIME(2), c2 DATETIME(2), c3 TIMESTAMP(2) ); INSERT INTO fractest VALUES ('17:51:04.777', '2018-09-08 17:51:04.777', '2018-09-08 17:51:04.777');
The temporal values are inserted into the table with rounding: mysql> SELECT * FROM fractest; +-------------+------------------------+------------------------+ | c1 | c2 | c3 | +-------------+------------------------+------------------------+ | 17:51:04.78 | 2018-09-08 17:51:04.78 | 2018-09-08 17:51:04.78 | +-------------+------------------------+------------------------+
No warning or error is given when such rounding occurs. This behavior follows the SQL standard, and is not affected by the server sql_mode setting. • Functions that take temporal arguments accept values with fractional seconds. Return values from temporal functions include fractional seconds as appropriate. For example, NOW() with no argument returns the current date and time with no fractional part, but takes an optional argument from 0 to 6 to specify that the return value includes a fractional seconds part of that many digits. • Syntax for temporal literals produces temporal values: DATE 'str', TIME 'str', and TIMESTAMP 'str', and the ODBC-syntax equivalents. The resulting value includes a trailing fractional seconds part if specified. Previously, the temporal type keyword was ignored and these constructs produced the string value. See Standard SQL and ODBC Date and Time Literals
1565
Conversion Between Date and Time Types
11.3.7 Conversion Between Date and Time Types To some extent, you can convert a value from one temporal type to another. However, there may be some alteration of the value or loss of information. In all cases, conversion between temporal types is subject to the range of valid values for the resulting type. For example, although DATE, DATETIME, and TIMESTAMP values all can be specified using the same set of formats, the types do not all have the same range of values. TIMESTAMP values cannot be earlier than 1970 UTC or later than '2038-01-19 03:14:07' UTC. This means that a date such as '1968-01-01', while valid as a DATE or DATETIME value, is not valid as a TIMESTAMP value and is converted to 0. Conversion of DATE values: • Conversion to a DATETIME or TIMESTAMP value adds a time part of '00:00:00' because the DATE value contains no time information. • Conversion to a TIME value is not useful; the result is '00:00:00'. Conversion of DATETIME and TIMESTAMP values: • Conversion to a DATE value takes fractional seconds into account and rounds the time part. For example, '1999-12-31 23:59:59.499' becomes '1999-12-31', whereas '1999-12-31 23:59:59.500' becomes '2000-01-01'. • Conversion to a TIME value discards the date part because the TIME type contains no date information. For conversion of TIME values to other temporal types, the value of CURRENT_DATE() is used for the date part. The TIME is interpreted as elapsed time (not time of day) and added to the date. This means that the date part of the result differs from the current date if the time value is outside the range from '00:00:00' to '23:59:59'. Suppose that the current date is '2012-01-01'. TIME values of '12:00:00', '24:00:00', and '-12:00:00', when converted to DATETIME or TIMESTAMP values, result in '2012-01-01 12:00:00', '2012-01-02 00:00:00', and '2011-12-31 12:00:00', respectively. Conversion of TIME to DATE is similar but discards the time part from the result: '2012-01-01', '2012-01-02', and '2011-12-31', respectively. Explicit conversion can be used to override implicit conversion. For example, in comparison of DATE and DATETIME values, the DATE value is coerced to the DATETIME type by adding a time part of '00:00:00'. To perform the comparison by ignoring the time part of the DATETIME value instead, use the CAST() function in the following way: date_col = CAST(datetime_col AS DATE)
Conversion of TIME and DATETIME values to numeric form (for example, by adding +0) depends on whether the value contains a fractional seconds part. TIME(N) or DATETIME(N) is converted to integer when N is 0 (or omitted) and to a DECIMAL value with N decimal digits when N is greater than 0: mysql> SELECT CURTIME(), CURTIME()+0, CURTIME(3)+0; +-----------+-------------+--------------+ | CURTIME() | CURTIME()+0 | CURTIME(3)+0 | +-----------+-------------+--------------+ | 09:28:00 | 92800 | 92800.887 | +-----------+-------------+--------------+ mysql> SELECT NOW(), NOW()+0, NOW(3)+0; +---------------------+----------------+--------------------+ | NOW() | NOW()+0 | NOW(3)+0 | +---------------------+----------------+--------------------+ | 2012-08-15 09:28:00 | 20120815092800 | 20120815092800.889 |
1566
Two-Digit Years in Dates
+---------------------+----------------+--------------------+
11.3.8 Two-Digit Years in Dates Date values with two-digit years are ambiguous because the century is unknown. Such values must be interpreted into four-digit form because MySQL stores years internally using four digits. For DATETIME, DATE, and TIMESTAMP types, MySQL interprets dates specified with ambiguous year values using these rules: • Year values in the range 00-69 are converted to 2000-2069. • Year values in the range 70-99 are converted to 1970-1999. For YEAR, the rules are the same, with this exception: A numeric 00 inserted into YEAR(4) results in 0000 rather than 2000. To specify zero for YEAR(4) and have it be interpreted as 2000, specify it as a string '0' or '00'. Remember that these rules are only heuristics that provide reasonable guesses as to what your data values mean. If the rules used by MySQL do not produce the values you require, you must provide unambiguous input containing four-digit year values. ORDER BY properly sorts YEAR values that have two-digit years. Some functions like MIN() and MAX() convert a YEAR to a number. This means that a value with a two-digit year does not work properly with these functions. The fix in this case is to convert the YEAR to four-digit year format.
11.4 String Types The string types are CHAR, VARCHAR, BINARY, VARBINARY, BLOB, TEXT, ENUM, and SET. This section describes how these types work and how to use them in your queries. For string type storage requirements, see Section 11.8, “Data Type Storage Requirements”.
11.4.1 The CHAR and VARCHAR Types The CHAR and VARCHAR types are similar, but differ in the way they are stored and retrieved. They also differ in maximum length and in whether trailing spaces are retained. The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters. The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed unless the PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled. Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 65,535. The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used. See Section C.10.4, “Limits on Table Column Count and Row Size”. In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes. If strict SQL mode is not enabled and you assign a value to a CHAR or VARCHAR column that exceeds the column's maximum length, the value is truncated to fit and a warning is generated. For truncation of nonspace characters, you can cause an error to occur (rather than a warning) and suppress insertion of the value by using strict SQL mode. See Section 5.1.10, “Server SQL Modes”.
1567
The CHAR and VARCHAR Types
For VARCHAR columns, trailing spaces in excess of the column length are truncated prior to insertion and a warning is generated, regardless of the SQL mode in use. For CHAR columns, truncation of excess trailing spaces from inserted values is performed silently regardless of the SQL mode. VARCHAR values are not padded when they are stored. Trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL. The following table illustrates the differences between CHAR and VARCHAR by showing the result of storing various string values into CHAR(4) and VARCHAR(4) columns (assuming that the column uses a single-byte character set such as latin1). Value
CHAR(4)
Storage Required VARCHAR(4) Storage Required
''
'
'
4 bytes
''
1 byte
'ab'
'ab
'
4 bytes
'ab'
3 bytes
'abcd'
'abcd'
4 bytes
'abcd'
5 bytes
'abcdefgh' 'abcd'
4 bytes
'abcd'
5 bytes
The values shown as stored in the last row of the table apply only when not using strict mode; if MySQL is running in strict mode, values that exceed the column length are not stored, and an error results. InnoDB encodes fixed-length fields greater than or equal to 768 bytes in length as variable-length fields, which can be stored off-page. For example, a CHAR(255) column can exceed 768 bytes if the maximum byte length of the character set is greater than 3, as it is with utf8mb4. If a given value is stored into the CHAR(4) and VARCHAR(4) columns, the values retrieved from the columns are not always the same because trailing spaces are removed from CHAR columns upon retrieval. The following example illustrates this difference: mysql> CREATE TABLE vc (v VARCHAR(4), c CHAR(4)); Query OK, 0 rows affected (0.01 sec) mysql> INSERT INTO vc VALUES ('ab ', 'ab Query OK, 1 row affected (0.00 sec)
');
mysql> SELECT CONCAT('(', v, ')'), CONCAT('(', c, ')') FROM vc; +---------------------+---------------------+ | CONCAT('(', v, ')') | CONCAT('(', c, ')') | +---------------------+---------------------+ | (ab ) | (ab) | +---------------------+---------------------+ 1 row in set (0.06 sec)
Values in CHAR and VARCHAR columns are sorted and compared according to the character set collation assigned to the column. All MySQL collations are of type PAD SPACE. This means that all CHAR, VARCHAR, and TEXT values are compared without regard to any trailing spaces. “Comparison” in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant. For example: mysql> CREATE TABLE names (myname CHAR(10)); Query OK, 0 rows affected (0.03 sec) mysql> INSERT INTO names VALUES ('Monty'); Query OK, 1 row affected (0.00 sec) mysql> SELECT myname = 'Monty', myname = 'Monty +------------------+--------------------+ | myname = 'Monty' | myname = 'Monty ' | +------------------+--------------------+ | 1 | 1 |
1568
' FROM names;
The BINARY and VARBINARY Types
+------------------+--------------------+ 1 row in set (0.00 sec) mysql> SELECT myname LIKE 'Monty', myname LIKE 'Monty +---------------------+-----------------------+ | myname LIKE 'Monty' | myname LIKE 'Monty ' | +---------------------+-----------------------+ | 1 | 0 | +---------------------+-----------------------+ 1 row in set (0.00 sec)
' FROM names;
This is true for all MySQL versions, and is not affected by the server SQL mode. Note For more information about MySQL character sets and collations, see Chapter 10, Character Sets, Collations, Unicode. For additional information about storage requirements, see Section 11.8, “Data Type Storage Requirements”. For those cases where trailing pad characters are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad characters will result in a duplicate-key error. For example, if a table contains 'a', an attempt to store 'a ' causes a duplicate-key error.
11.4.2 The BINARY and VARBINARY Types The BINARY and VARBINARY types are similar to CHAR and VARCHAR, except that they contain binary strings rather than nonbinary strings. That is, they contain byte strings rather than character strings. This means they have the binary character set and collation, and comparison and sorting are based on the numeric values of the bytes in the values. The permissible maximum length is the same for BINARY and VARBINARY as it is for CHAR and VARCHAR, except that the length for BINARY and VARBINARY is a length in bytes rather than in characters. The BINARY and VARBINARY data types are distinct from the CHAR BINARY and VARCHAR BINARY data types. For the latter types, the BINARY attribute does not cause the column to be treated as a binary string column. Instead, it causes the binary (_bin) collation for the column character set to be used, and the column itself contains nonbinary character strings rather than binary byte strings. For example, CHAR(5) BINARY is treated as CHAR(5) CHARACTER SET latin1 COLLATE latin1_bin, assuming that the default character set is latin1. This differs from BINARY(5), which stores 5-bytes binary strings that have the binary character set and collation. For information about differences between binary strings and binary collations for nonbinary strings, see Section 10.8.5, “The binary Collation Compared to _bin Collations”. If strict SQL mode is not enabled and you assign a value to a BINARY or VARBINARY column that exceeds the column's maximum length, the value is truncated to fit and a warning is generated. For cases of truncation, you can cause an error to occur (rather than a warning) and suppress insertion of the value by using strict SQL mode. See Section 5.1.10, “Server SQL Modes”. When BINARY values are stored, they are right-padded with the pad value to the specified length. The pad value is 0x00 (the zero byte). Values are right-padded with 0x00 on insert, and no trailing bytes are removed on select. All bytes are significant in comparisons, including ORDER BY and DISTINCT operations. 0x00 bytes and spaces are different in comparisons, with 0x00 < space. Example: For a BINARY(3) column, 'a ' becomes 'a \0' when inserted. 'a\0' becomes 'a \0\0' when inserted. Both inserted values remain unchanged when selected. For VARBINARY, there is no padding on insert and no bytes are stripped on select. All bytes are significant in comparisons, including ORDER BY and DISTINCT operations. 0x00 bytes and spaces are different in comparisons, with 0x00 < space.
1569
The BLOB and TEXT Types
For those cases where trailing pad bytes are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad bytes will result in a duplicate-key error. For example, if a table contains 'a', an attempt to store 'a\0' causes a duplicate-key error. You should consider the preceding padding and stripping characteristics carefully if you plan to use the BINARY data type for storing binary data and you require that the value retrieved be exactly the same as the value stored. The following example illustrates how 0x00-padding of BINARY values affects column value comparisons: mysql> CREATE TABLE t (c BINARY(3)); Query OK, 0 rows affected (0.01 sec) mysql> INSERT INTO t SET c = 'a'; Query OK, 1 row affected (0.01 sec) mysql> SELECT HEX(c), c = 'a', c = 'a\0\0' from t; +--------+---------+-------------+ | HEX(c) | c = 'a' | c = 'a\0\0' | +--------+---------+-------------+ | 610000 | 0 | 1 | +--------+---------+-------------+ 1 row in set (0.09 sec)
If the value retrieved must be the same as the value specified for storage with no padding, it might be preferable to use VARBINARY or one of the BLOB data types instead.
11.4.3 The BLOB and TEXT Types A BLOB is a binary large object that can hold a variable amount of data. The four BLOB types are TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB. These differ only in the maximum length of the values they can hold. The four TEXT types are TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT. These correspond to the four BLOB types and have the same maximum lengths and storage requirements. See Section 11.8, “Data Type Storage Requirements”. BLOB values are treated as binary strings (byte strings). They have the binary character set and collation, and comparison and sorting are based on the numeric values of the bytes in column values. TEXT values are treated as nonbinary strings (character strings). They have a character set other than binary, and values are sorted and compared based on the collation of the character set. If strict SQL mode is not enabled and you assign a value to a BLOB or TEXT column that exceeds the column's maximum length, the value is truncated to fit and a warning is generated. For truncation of nonspace characters, you can cause an error to occur (rather than a warning) and suppress insertion of the value by using strict SQL mode. See Section 5.1.10, “Server SQL Modes”. Truncation of excess trailing spaces from values to be inserted into TEXT columns always generates a warning, regardless of the SQL mode. For TEXT and BLOB columns, there is no padding on insert and no bytes are stripped on select. If a TEXT column is indexed, index entry comparisons are space-padded at the end. This means that, if the index requires unique values, duplicate-key errors will occur for values that differ only in the number of trailing spaces. For example, if a table contains 'a', an attempt to store 'a ' causes a duplicatekey error. This is not true for BLOB columns. In most respects, you can regard a BLOB column as a VARBINARY column that can be as large as you like. Similarly, you can regard a TEXT column as a VARCHAR column. BLOB and TEXT differ from VARBINARY and VARCHAR in the following ways: • For indexes on BLOB and TEXT columns, you must specify an index prefix length. For CHAR and VARCHAR, a prefix length is optional. See Section 8.3.4, “Column Indexes”.
1570
The ENUM Type
•
BLOB and TEXT columns cannot have DEFAULT values.
If you use the BINARY attribute with a TEXT data type, the column is assigned the binary (_bin) collation of the column character set. LONG and LONG VARCHAR map to the MEDIUMTEXT data type. This is a compatibility feature. MySQL Connector/ODBC defines BLOB values as LONGVARBINARY and TEXT values as LONGVARCHAR. Because BLOB and TEXT values can be extremely long, you might encounter some constraints in using them: • Only the first max_sort_length bytes of the column are used when sorting. The default value of max_sort_length is 1024. You can make more bytes significant in sorting or grouping by increasing the value of max_sort_length at server startup or runtime. Any client can change the value of its session max_sort_length variable: mysql> SET max_sort_length = 2000; mysql> SELECT id, comment FROM t -> ORDER BY comment;
• Instances of BLOB or TEXT columns in the result of a query that is processed using a temporary table causes the server to use a table on disk rather than in memory because the MEMORY storage engine does not support those data types (see Section 8.4.4, “Internal Temporary Table Use in MySQL”). Use of disk incurs a performance penalty, so include BLOB or TEXT columns in the query result only if they are really needed. For example, avoid using SELECT *, which selects all columns. • The maximum size of a BLOB or TEXT object is determined by its type, but the largest value you actually can transmit between the client and server is determined by the amount of available memory and the size of the communications buffers. You can change the message buffer size by changing the value of the max_allowed_packet variable, but you must do so for both the server and your client program. For example, both mysql and mysqldump enable you to change the client-side max_allowed_packet value. See Section 5.1.1, “Configuring the Server”, Section 4.5.1, “mysql — The MySQL Command-Line Client”, and Section 4.5.4, “mysqldump — A Database Backup Program”. You may also want to compare the packet sizes and the size of the data objects you are storing with the storage requirements, see Section 11.8, “Data Type Storage Requirements” Each BLOB or TEXT value is represented internally by a separately allocated object. This is in contrast to all other data types, for which storage is allocated once per column when the table is opened. In some cases, it may be desirable to store binary data such as media files in BLOB or TEXT columns. You may find MySQL's string handling functions useful for working with such data. See Section 12.5, “String Functions”. For security and other reasons, it is usually preferable to do so using application code rather than giving application users the FILE privilege. You can discuss specifics for various languages and platforms in the MySQL Forums (http://forums.mysql.com/).
11.4.4 The ENUM Type An ENUM is a string object with a value chosen from a list of permitted values that are enumerated explicitly in the column specification at table creation time. It has these advantages: • Compact data storage in situations where a column has a limited set of possible values. The strings you specify as input values are automatically encoded as numbers. See Section 11.8, “Data Type Storage Requirements” for the storage requirements for ENUM types. • Readable queries and output. The numbers are translated back to the corresponding strings in query results. and these potential issues to consider:
1571
The ENUM Type
• If you make enumeration values that look like numbers, it is easy to mix up the literal values with their internal index numbers, as explained in Enumeration Limitations. • Using ENUM columns in ORDER BY clauses requires extra care, as explained in Enumeration Sorting. • Creating and Using ENUM Columns • Index Values for Enumeration Literals • Handling of Enumeration Literals • Empty or NULL Enumeration Values • Enumeration Sorting • Enumeration Limitations
Creating and Using ENUM Columns An enumeration value must be a quoted string literal. For example, you can create a table with an ENUM column like this: CREATE TABLE shirts ( name VARCHAR(40), size ENUM('x-small', 'small', 'medium', 'large', 'x-large') ); INSERT INTO shirts (name, size) VALUES ('dress shirt','large'), ('t-shirt','medium'), ('polo shirt','small'); SELECT name, size FROM shirts WHERE size = 'medium'; +---------+--------+ | name | size | +---------+--------+ | t-shirt | medium | +---------+--------+ UPDATE shirts SET size = 'small' WHERE size = 'large'; COMMIT;
Inserting 1 million rows into this table with a value of 'medium' would require 1 million bytes of storage, as opposed to 6 million bytes if you stored the actual string 'medium' in a VARCHAR column.
Index Values for Enumeration Literals Each enumeration value has an index: • The elements listed in the column specification are assigned index numbers, beginning with 1. • The index value of the empty string error value is 0. This means that you can use the following SELECT statement to find rows into which invalid ENUM values were assigned: mysql> SELECT * FROM tbl_name WHERE enum_col=0;
• The index of the NULL value is NULL. • The term “index” here refers to a position within the list of enumeration values. It has nothing to do with table indexes. For example, a column specified as ENUM('Mercury', 'Venus', 'Earth') can have any of the values shown here. The index of each value is also shown.
1572
Value
Index
NULL
NULL
The ENUM Type
Value
Index
''
0
'Mercury'
1
'Venus'
2
'Earth'
3
An ENUM column can have a maximum of 65,535 distinct elements. (The practical limit is less than 3000.) A table can have no more than 255 unique element list definitions among its ENUM and SET columns considered as a group. For more information on these limits, see Section C.10.5, “Limits Imposed by .frm File Structure”. If you retrieve an ENUM value in a numeric context, the column value's index is returned. For example, you can retrieve numeric values from an ENUM column like this: mysql> SELECT enum_col+0 FROM tbl_name;
Functions such as SUM() or AVG() that expect a numeric argument cast the argument to a number if necessary. For ENUM values, the index number is used in the calculation.
Handling of Enumeration Literals Trailing spaces are automatically deleted from ENUM member values in the table definition when a table is created. When retrieved, values stored into an ENUM column are displayed using the lettercase that was used in the column definition. Note that ENUM columns can be assigned a character set and collation. For binary or case-sensitive collations, lettercase is taken into account when assigning values to the column. If you store a number into an ENUM column, the number is treated as the index into the possible values, and the value stored is the enumeration member with that index. (However, this does not work with LOAD DATA, which treats all input as strings.) If the numeric value is quoted, it is still interpreted as an index if there is no matching string in the list of enumeration values. For these reasons, it is not advisable to define an ENUM column with enumeration values that look like numbers, because this can easily become confusing. For example, the following column has enumeration members with string values of '0', '1', and '2', but numeric index values of 1, 2, and 3: numbers ENUM('0','1','2')
If you store 2, it is interpreted as an index value, and becomes '1' (the value with index 2). If you store '2', it matches an enumeration value, so it is stored as '2'. If you store '3', it does not match any enumeration value, so it is treated as an index and becomes '2' (the value with index 3). mysql> INSERT INTO t (numbers) VALUES(2),('2'),('3'); mysql> SELECT * FROM t; +---------+ | numbers | +---------+ | 1 | | 2 | | 2 | +---------+
To determine all possible values for an ENUM column, use SHOW COLUMNS FROM tbl_name LIKE 'enum_col' and parse the ENUM definition in the Type column of the output. In the C API, ENUM values are returned as strings. For information about using result set metadata to distinguish them from other strings, see Section 27.8.5, “C API Data Structures”.
1573
The SET Type
Empty or NULL Enumeration Values An enumeration value can also be the empty string ('') or NULL under certain circumstances: • If you insert an invalid value into an ENUM (that is, a string not present in the list of permitted values), the empty string is inserted instead as a special error value. This string can be distinguished from a “normal” empty string by the fact that this string has the numeric value 0. See Index Values for Enumeration Literals for details about the numeric indexes for the enumeration values. If strict SQL mode is enabled, attempts to insert invalid ENUM values result in an error. • If an ENUM column is declared to permit NULL, the NULL value is a valid value for the column, and the default value is NULL. If an ENUM column is declared NOT NULL, its default value is the first element of the list of permitted values.
Enumeration Sorting ENUM values are sorted based on their index numbers, which depend on the order in which the enumeration members were listed in the column specification. For example, 'b' sorts before 'a' for ENUM('b', 'a'). The empty string sorts before nonempty strings, and NULL values sort before all other enumeration values. To prevent unexpected results when using the ORDER BY clause on an ENUM column, use one of these techniques: • Specify the ENUM list in alphabetic order. • Make sure that the column is sorted lexically rather than by index number by coding ORDER BY CAST(col AS CHAR) or ORDER BY CONCAT(col).
Enumeration Limitations An enumeration value cannot be an expression, even one that evaluates to a string value. For example, this CREATE TABLE statement does not work because the CONCAT function cannot be used to construct an enumeration value: CREATE TABLE sizes ( size ENUM('small', CONCAT('med','ium'), 'large') );
You also cannot employ a user variable as an enumeration value. This pair of statements do not work: SET @mysize = 'medium'; CREATE TABLE sizes ( size ENUM('small', @mysize, 'large') );
We strongly recommend that you do not use numbers as enumeration values, because it does not save on storage over the appropriate TINYINT or SMALLINT type, and it is easy to mix up the strings and the underlying number values (which might not be the same) if you quote the ENUM values incorrectly. If you do use a number as an enumeration value, always enclose it in quotation marks. If the quotation marks are omitted, the number is regarded as an index. See Handling of Enumeration Literals to see how even a quoted number could be mistakenly used as a numeric index value. Duplicate values in the definition cause a warning, or an error if strict SQL mode is enabled.
11.4.5 The SET Type A SET is a string object that can have zero or more values, each of which must be chosen from a list of permitted values specified when the table is created. SET column values that consist of multiple set
1574
The SET Type
members are specified with members separated by commas (,). A consequence of this is that SET member values should not themselves contain commas. For example, a column specified as SET('one', 'two') NOT NULL can have any of these values: '' 'one' 'two' 'one,two'
A SET column can have a maximum of 64 distinct members. A table can have no more than 255 unique element list definitions among its ENUM and SET columns considered as a group. For more information on this limit, see Section C.10.5, “Limits Imposed by .frm File Structure”. Duplicate values in the definition cause a warning, or an error if strict SQL mode is enabled. Trailing spaces are automatically deleted from SET member values in the table definition when a table is created. When retrieved, values stored in a SET column are displayed using the lettercase that was used in the column definition. Note that SET columns can be assigned a character set and collation. For binary or case-sensitive collations, lettercase is taken into account when assigning values to the column. MySQL stores SET values numerically, with the low-order bit of the stored value corresponding to the first set member. If you retrieve a SET value in a numeric context, the value retrieved has bits set corresponding to the set members that make up the column value. For example, you can retrieve numeric values from a SET column like this: mysql> SELECT set_col+0 FROM tbl_name;
If a number is stored into a SET column, the bits that are set in the binary representation of the number determine the set members in the column value. For a column specified as SET('a','b','c','d'), the members have the following decimal and binary values. SET Member Decimal Value
Binary Value
'a'
1
0001
'b'
2
0010
'c'
4
0100
'd'
8
1000
If you assign a value of 9 to this column, that is 1001 in binary, so the first and fourth SET value members 'a' and 'd' are selected and the resulting value is 'a,d'. For a value containing more than one SET element, it does not matter what order the elements are listed in when you insert the value. It also does not matter how many times a given element is listed in the value. When the value is retrieved later, each element in the value appears once, with elements listed according to the order in which they were specified at table creation time. For example, suppose that a column is specified as SET('a','b','c','d'): mysql> CREATE TABLE myset (col SET('a', 'b', 'c', 'd'));
If you insert the values 'a,d', 'd,a', 'a,d,d', 'a,d,a', and 'd,a,d': mysql> INSERT INTO myset (col) VALUES -> ('a,d'), ('d,a'), ('a,d,a'), ('a,d,d'), ('d,a,d'); Query OK, 5 rows affected (0.01 sec) Records: 5 Duplicates: 0 Warnings: 0
1575
The SET Type
Then all these values appear as 'a,d' when retrieved: mysql> SELECT col FROM myset; +------+ | col | +------+ | a,d | | a,d | | a,d | | a,d | | a,d | +------+ 5 rows in set (0.04 sec)
If you set a SET column to an unsupported value, the value is ignored and a warning is issued: mysql> INSERT INTO myset (col) VALUES ('a,d,d,s'); Query OK, 1 row affected, 1 warning (0.03 sec) mysql> SHOW WARNINGS; +---------+------+------------------------------------------+ | Level | Code | Message | +---------+------+------------------------------------------+ | Warning | 1265 | Data truncated for column 'col' at row 1 | +---------+------+------------------------------------------+ 1 row in set (0.04 sec) mysql> SELECT col FROM myset; +------+ | col | +------+ | a,d | | a,d | | a,d | | a,d | | a,d | | a,d | +------+ 6 rows in set (0.01 sec)
If strict SQL mode is enabled, attempts to insert invalid SET values result in an error. SET values are sorted numerically. NULL values sort before non-NULL SET values. Functions such as SUM() or AVG() that expect a numeric argument cast the argument to a number if necessary. For SET values, the cast operation causes the numeric value to be used. Normally, you search for SET values using the FIND_IN_SET() function or the LIKE operator: mysql> SELECT * FROM tbl_name WHERE FIND_IN_SET('value',set_col)>0; mysql> SELECT * FROM tbl_name WHERE set_col LIKE '%value%';
The first statement finds rows where set_col contains the value set member. The second is similar, but not the same: It finds rows where set_col contains value anywhere, even as a substring of another set member. The following statements also are permitted: mysql> SELECT * FROM tbl_name WHERE set_col & 1; mysql> SELECT * FROM tbl_name WHERE set_col = 'val1,val2';
The first of these statements looks for values containing the first set member. The second looks for an exact match. Be careful with comparisons of the second type. Comparing set values to 'val1,val2' returns different results than comparing values to 'val2,val1'. You should specify the values in the same order they are listed in the column definition.
1576
Spatial Data Types
To determine all possible values for a SET column, use SHOW COLUMNS FROM tbl_name LIKE set_col and parse the SET definition in the Type column of the output. In the C API, SET values are returned as strings. For information about using result set metadata to distinguish them from other strings, see Section 27.8.5, “C API Data Structures”.
11.5 Spatial Data Types The Open Geospatial Consortium (OGC) is an international consortium of more than 250 companies, agencies, and universities participating in the development of publicly available conceptual solutions that can be useful with all kinds of applications that manage spatial data. The Open Geospatial Consortium publishes the OpenGIS® Implementation Standard for Geographic information - Simple Feature Access - Part 2: SQL Option, a document that proposes several conceptual ways for extending an SQL RDBMS to support spatial data. This specification is available from the OGC website at http://www.opengeospatial.org/standards/sfs. Following the OGC specification, MySQL implements spatial extensions as a subset of the SQL with Geometry Types environment. This term refers to an SQL environment that has been extended with a set of geometry types. A geometry-valued SQL column is implemented as a column that has a geometry type. The specification describes a set of SQL geometry types, as well as functions on those types to create and analyze geometry values. MySQL spatial extensions enable the generation, storage, and analysis of geographic features: • Data types for representing spatial values • Functions for manipulating spatial values • Spatial indexing for improved access times to spatial columns The spatial data types and functions are available for MyISAM, InnoDB, NDB, and ARCHIVE tables. For indexing spatial columns, MyISAM and InnoDB support both SPATIAL and non-SPATIAL indexes. The other storage engines support non-SPATIAL indexes, as described in Section 13.1.14, “CREATE INDEX Syntax”. A geographic feature is anything in the world that has a location. A feature can be: • An entity. For example, a mountain, a pond, a city. • A space. For example, town district, the tropics. • A definable location. For example, a crossroad, as a particular place where two streets intersect. Some documents use the term geospatial feature to refer to geographic features. Geometry is another word that denotes a geographic feature. Originally the word geometry meant measurement of the earth. Another meaning comes from cartography, referring to the geometric features that cartographers use to map the world. The discussion here considers these terms synonymous: geographic feature, geospatial feature, feature, or geometry. The term most commonly used is geometry, defined as a point or an aggregate of points representing anything in the world that has a location. The following material covers these topics: • The spatial data types implemented in MySQL model • The basis of the spatial extensions in the OpenGIS geometry model • Data formats for representing spatial data
1577
MySQL GIS Conformance and Compatibility
• How to use spatial data in MySQL • Use of indexing for spatial data • MySQL differences from the OpenGIS specification For information about functions that operate on spatial data, see Section 12.16, “Spatial Analysis Functions”.
MySQL GIS Conformance and Compatibility MySQL does not implement the following GIS features: • Additional Metadata Views OpenGIS specifications propose several additional metadata views. For example, a system view named GEOMETRY_COLUMNS contains a description of geometry columns, one row for each geometry column in the database. • The OpenGIS function Length() on LineString and MultiLineString should be called in MySQL as ST_Length() The problem is that there is an existing SQL function Length() that calculates the length of string values, and sometimes it is not possible to distinguish whether the function is called in a textual or spatial context.
Additional Resources The Open Geospatial Consortium publishes the OpenGIS® Implementation Standard for Geographic information - Simple feature access - Part 2: SQL option, a document that proposes several conceptual ways for extending an SQL RDBMS to support spatial data. The Open Geospatial Consortium (OGC) maintains a website at http://www.opengeospatial.org/. The specification is available there at http:// www.opengeospatial.org/standards/sfs. It contains additional information relevant to the material here. If you have questions or concerns about the use of the spatial extensions to MySQL, you can discuss them in the GIS forum: https://forums.mysql.com/list.php?23.
11.5.1 Spatial Data Types MySQL has spatial data types that correspond to OpenGIS classes. The basis for these types is described in Section 11.5.2, “The OpenGIS Geometry Model”. Some spatial data types hold single geometry values: • GEOMETRY • POINT • LINESTRING • POLYGON GEOMETRY can store geometry values of any type. The other single-value types (POINT, LINESTRING, and POLYGON) restrict their values to a particular geometry type. The other spatial data types hold collections of values: • MULTIPOINT • MULTILINESTRING
1578
The OpenGIS Geometry Model
• MULTIPOLYGON • GEOMETRYCOLLECTION GEOMETRYCOLLECTION can store a collection of objects of any type. The other collection types (MULTIPOINT, MULTILINESTRING, and MULTIPOLYGON) restrict collection members to those having a particular geometry type. Example: To create a table named geom that has a column named g that can store values of any geometry type, use this statement: CREATE TABLE geom (g GEOMETRY);
SPATIAL indexes can be created on NOT NULL spatial columns, so if you plan to index the column, declare it NOT NULL: CREATE TABLE geom (g GEOMETRY NOT NULL);
For other examples showing how to use spatial data types in MySQL, see Section 11.5.5, “Creating Spatial Columns”.
11.5.2 The OpenGIS Geometry Model The set of geometry types proposed by OGC's SQL with Geometry Types environment is based on the OpenGIS Geometry Model. In this model, each geometric object has the following general properties: • It is associated with a spatial reference system, which describes the coordinate space in which the object is defined. • It belongs to some geometry class.
11.5.2.1 The Geometry Class Hierarchy The geometry classes define a hierarchy as follows: • Geometry (noninstantiable) • Point (instantiable) • Curve (noninstantiable) • LineString (instantiable) • Line • LinearRing • Surface (noninstantiable) • Polygon (instantiable) • GeometryCollection (instantiable) • MultiPoint (instantiable) • MultiCurve (noninstantiable) • MultiLineString (instantiable) • MultiSurface (noninstantiable)
1579
The OpenGIS Geometry Model
• MultiPolygon (instantiable) It is not possible to create objects in noninstantiable classes. It is possible to create objects in instantiable classes. All classes have properties, and instantiable classes may also have assertions (rules that define valid class instances). Geometry is the base class. It is an abstract class. The instantiable subclasses of Geometry are restricted to zero-, one-, and two-dimensional geometric objects that exist in two-dimensional coordinate space. All instantiable geometry classes are defined so that valid instances of a geometry class are topologically closed (that is, all defined geometries include their boundary). The base Geometry class has subclasses for Point, Curve, Surface, and GeometryCollection: • Point represents zero-dimensional objects. • Curve represents one-dimensional objects, and has subclass LineString, with sub-subclasses Line and LinearRing. • Surface is designed for two-dimensional objects and has subclass Polygon. • GeometryCollection has specialized zero-, one-, and two-dimensional collection classes named MultiPoint, MultiLineString, and MultiPolygon for modeling geometries corresponding to collections of Points, LineStrings, and Polygons, respectively. MultiCurve and MultiSurface are introduced as abstract superclasses that generalize the collection interfaces to handle Curves and Surfaces. Geometry, Curve, Surface, MultiCurve, and MultiSurface are defined as noninstantiable classes. They define a common set of methods for their subclasses and are included for extensibility. Point, LineString, Polygon, GeometryCollection, MultiPoint, MultiLineString, and MultiPolygon are instantiable classes.
11.5.2.2 Geometry Class Geometry is the root class of the hierarchy. It is a noninstantiable class but has a number of properties, described in the following list, that are common to all geometry values created from any of the Geometry subclasses. Particular subclasses have their own specific properties, described later. Geometry Properties A geometry value has the following properties: • Its type. Each geometry belongs to one of the instantiable classes in the hierarchy. • Its SRID, or spatial reference identifier. This value identifies the geometry's associated spatial reference system that describes the coordinate space in which the geometry object is defined. In MySQL, the SRID value is an integer associated with the geometry value. The maximum usable 32 SRID value is 2 −1. If a larger value is given, only the lower 32 bits are used. All computations are done assuming SRID 0, regardless of the actual SRID value. SRID 0 represents an infinite flat Cartesian plane with no units assigned to its axes. • Its coordinates in its spatial reference system, represented as double-precision (8-byte) numbers. All nonempty geometries include at least one pair of (X,Y) coordinates. Empty geometries contain no coordinates. Coordinates are related to the SRID. For example, in different coordinate systems, the distance between two objects may differ even when objects have the same coordinates, because the distance on the planar coordinate system and the distance on the geodetic system (coordinates on the Earth's surface) are different things.
1580
The OpenGIS Geometry Model
• Its interior, boundary, and exterior. Every geometry occupies some position in space. The exterior of a geometry is all space not occupied by the geometry. The interior is the space occupied by the geometry. The boundary is the interface between the geometry's interior and exterior. • Its MBR (minimum bounding rectangle), or envelope. This is the bounding geometry, formed by the minimum and maximum (X,Y) coordinates: ((MINX MINY, MAXX MINY, MAXX MAXY, MINX MAXY, MINX MINY))
• Whether the value is simple or nonsimple. Geometry values of types (LineString, MultiPoint, MultiLineString) are either simple or nonsimple. Each type determines its own assertions for being simple or nonsimple. • Whether the value is closed or not closed. Geometry values of types (LineString, MultiString) are either closed or not closed. Each type determines its own assertions for being closed or not closed. • Whether the value is empty or nonempty A geometry is empty if it does not have any points. Exterior, interior, and boundary of an empty geometry are not defined (that is, they are represented by a NULL value). An empty geometry is defined to be always simple and has an area of 0. • Its dimension. A geometry can have a dimension of −1, 0, 1, or 2: • −1 for an empty geometry. • 0 for a geometry with no length and no area. • 1 for a geometry with nonzero length and zero area. • 2 for a geometry with nonzero area. Point objects have a dimension of zero. LineString objects have a dimension of 1. Polygon objects have a dimension of 2. The dimensions of MultiPoint, MultiLineString, and MultiPolygon objects are the same as the dimensions of the elements they consist of.
11.5.2.3 Point Class A Point is a geometry that represents a single location in coordinate space. Point Examples • Imagine a large-scale map of the world with many cities. A Point object could represent each city. • On a city map, a Point object could represent a bus stop. Point Properties • X-coordinate value. • Y-coordinate value. • Point is defined as a zero-dimensional geometry. • The boundary of a Point is the empty set.
11.5.2.4 Curve Class A Curve is a one-dimensional geometry, usually represented by a sequence of points. Particular subclasses of Curve define the type of interpolation between points. Curve is a noninstantiable class.
1581
The OpenGIS Geometry Model
Curve Properties • A Curve has the coordinates of its points. • A Curve is defined as a one-dimensional geometry. • A Curve is simple if it does not pass through the same point twice, with the exception that a curve can still be simple if the start and end points are the same. • A Curve is closed if its start point is equal to its endpoint. • The boundary of a closed Curve is empty. • The boundary of a nonclosed Curve consists of its two endpoints. • A Curve that is simple and closed is a LinearRing.
11.5.2.5 LineString Class A LineString is a Curve with linear interpolation between points. LineString Examples • On a world map, LineString objects could represent rivers. • In a city map, LineString objects could represent streets. LineString Properties • A LineString has coordinates of segments, defined by each consecutive pair of points. • A LineString is a Line if it consists of exactly two points. • A LineString is a LinearRing if it is both closed and simple.
11.5.2.6 Surface Class A Surface is a two-dimensional geometry. It is a noninstantiable class. Its only instantiable subclass is Polygon. Surface Properties • A Surface is defined as a two-dimensional geometry. • The OpenGIS specification defines a simple Surface as a geometry that consists of a single “patch” that is associated with a single exterior boundary and zero or more interior boundaries. • The boundary of a simple Surface is the set of closed curves corresponding to its exterior and interior boundaries.
11.5.2.7 Polygon Class A Polygon is a planar Surface representing a multisided geometry. It is defined by a single exterior boundary and zero or more interior boundaries, where each interior boundary defines a hole in the Polygon. Polygon Examples • On a region map, Polygon objects could represent forests, districts, and so on. Polygon Assertions • The boundary of a Polygon consists of a set of LinearRing objects (that is, LineString objects that are both simple and closed) that make up its exterior and interior boundaries.
1582
The OpenGIS Geometry Model
• A Polygon has no rings that cross. The rings in the boundary of a Polygon may intersect at a Point, but only as a tangent. • A Polygon has no lines, spikes, or punctures. • A Polygon has an interior that is a connected point set. • A Polygon may have holes. The exterior of a Polygon with holes is not connected. Each hole defines a connected component of the exterior. The preceding assertions make a Polygon a simple geometry.
11.5.2.8 GeometryCollection Class A GeometryCollection is a geometry that is a collection of zero or more geometries of any class. All the elements in a geometry collection must be in the same spatial reference system (that is, in the same coordinate system). There are no other constraints on the elements of a geometry collection, although the subclasses of GeometryCollection described in the following sections may restrict membership. Restrictions may be based on: • Element type (for example, a MultiPoint may contain only Point elements) • Dimension • Constraints on the degree of spatial overlap between elements
11.5.2.9 MultiPoint Class A MultiPoint is a geometry collection composed of Point elements. The points are not connected or ordered in any way. MultiPoint Examples • On a world map, a MultiPoint could represent a chain of small islands. • On a city map, a MultiPoint could represent the outlets for a ticket office. MultiPoint Properties • A MultiPoint is a zero-dimensional geometry. • A MultiPoint is simple if no two of its Point values are equal (have identical coordinate values). • The boundary of a MultiPoint is the empty set.
11.5.2.10 MultiCurve Class A MultiCurve is a geometry collection composed of Curve elements. MultiCurve is a noninstantiable class. MultiCurve Properties • A MultiCurve is a one-dimensional geometry. • A MultiCurve is simple if and only if all of its elements are simple; the only intersections between any two elements occur at points that are on the boundaries of both elements. • A MultiCurve boundary is obtained by applying the “mod 2 union rule” (also known as the “oddeven rule”): A point is in the boundary of a MultiCurve if it is in the boundaries of an odd number of Curve elements.
1583
Supported Spatial Data Formats
• A MultiCurve is closed if all of its elements are closed. • The boundary of a closed MultiCurve is always empty.
11.5.2.11 MultiLineString Class A MultiLineString is a MultiCurve geometry collection composed of LineString elements. MultiLineString Examples • On a region map, a MultiLineString could represent a river system or a highway system.
11.5.2.12 MultiSurface Class A MultiSurface is a geometry collection composed of surface elements. MultiSurface is a noninstantiable class. Its only instantiable subclass is MultiPolygon. MultiSurface Assertions • Surfaces within a MultiSurface have no interiors that intersect. • Surfaces within a MultiSurface have boundaries that intersect at most at a finite number of points.
11.5.2.13 MultiPolygon Class A MultiPolygon is a MultiSurface object composed of Polygon elements. MultiPolygon Examples • On a region map, a MultiPolygon could represent a system of lakes. MultiPolygon Assertions • A MultiPolygon has no two Polygon elements with interiors that intersect. • A MultiPolygon has no two Polygon elements that cross (crossing is also forbidden by the previous assertion), or that touch at an infinite number of points. • A MultiPolygon may not have cut lines, spikes, or punctures. A MultiPolygon is a regular, closed point set. • A MultiPolygon that has more than one Polygon has an interior that is not connected. The number of connected components of the interior of a MultiPolygon is equal to the number of Polygon values in the MultiPolygon. MultiPolygon Properties • A MultiPolygon is a two-dimensional geometry. • A MultiPolygon boundary is a set of closed curves (LineString values) corresponding to the boundaries of its Polygon elements. • Each Curve in the boundary of the MultiPolygon is in the boundary of exactly one Polygon element. • Every Curve in the boundary of an Polygon element is in the boundary of the MultiPolygon.
11.5.3 Supported Spatial Data Formats Two standard spatial data formats are used to represent geometry objects in queries: • Well-Known Text (WKT) format
1584
Supported Spatial Data Formats
• Well-Known Binary (WKB) format Internally, MySQL stores geometry values in a format that is not identical to either WKT or WKB format. (Internal format is like WKB but with an initial 4 bytes to indicate the SRID.) There are functions available to convert between different data formats; see Section 12.16.6, “Geometry Format Conversion Functions”. The following sections describe the spatial data formats MySQL uses: • Well-Known Text (WKT) Format • Well-Known Binary (WKB) Format • Internal Geometry Storage Format
Well-Known Text (WKT) Format The Well-Known Text (WKT) representation of geometry values is designed for exchanging geometry data in ASCII form. The OpenGIS specification provides a Backus-Naur grammar that specifies the formal production rules for writing WKT values (see Section 11.5, “Spatial Data Types”). Examples of WKT representations of geometry objects: • A Point: POINT(15 20)
The point coordinates are specified with no separating comma. This differs from the syntax for the SQL Point() function, which requires a comma between the coordinates. Take care to use the syntax appropriate to the context of a given spatial operation. For example, the following statements both use ST_X() to extract the X-coordinate from a Point object. The first produces the object directly using the Point() function. The second uses a WKT representation converted to a Point with ST_GeomFromText(). mysql> SELECT ST_X(Point(15, 20)); +---------------------+ | ST_X(POINT(15, 20)) | +---------------------+ | 15 | +---------------------+ mysql> SELECT ST_X(ST_GeomFromText('POINT(15 20)')); +---------------------------------------+ | ST_X(ST_GeomFromText('POINT(15 20)')) | +---------------------------------------+ | 15 | +---------------------------------------+
• A LineString with four points: LINESTRING(0 0, 10 10, 20 25, 50 60)
The point coordinate pairs are separated by commas. • A Polygon with one exterior ring and one interior ring: POLYGON((0 0,10 0,10 10,0 10,0 0),(5 5,7 5,7 7,5 7, 5 5))
• A MultiPoint with three Point values:
1585
Supported Spatial Data Formats
MULTIPOINT(0 0, 20 20, 60 60)
As of MySQL 5.7.9, spatial functions such as ST_MPointFromText() and ST_GeomFromText() that accept WKT-format representations of MultiPoint values permit individual points within values to be surrounded by parentheses. For example, both of the following function calls are valid, whereas before MySQL 5.7.9 the second one produces an error: ST_MPointFromText('MULTIPOINT (1 1, 2 2, 3 3)') ST_MPointFromText('MULTIPOINT ((1 1), (2 2), (3 3))')
As of MySQL 5.7.9, output for MultiPoint values includes parentheses around each point. For example: mysql> SET @mp = 'MULTIPOINT(1 1, 2 2, 3 3)'; mysql> SELECT ST_AsText(ST_GeomFromText(@mp)); +---------------------------------+ | ST_AsText(ST_GeomFromText(@mp)) | +---------------------------------+ | MULTIPOINT((1 1),(2 2),(3 3)) | +---------------------------------+
Before MySQL 5.7.9, output for the same value does not include parentheses around each point: mysql> SET @mp = 'MULTIPOINT(1 1, 2 2, 3 3)'; mysql> SELECT ST_AsText(ST_GeomFromText(@mp)); +---------------------------------+ | ST_AsText(ST_GeomFromText(@mp)) | +---------------------------------+ | MULTIPOINT(1 1,2 2,3 3) | +---------------------------------+
• A MultiLineString with two LineString values: MULTILINESTRING((10 10, 20 20), (15 15, 30 15))
• A MultiPolygon with two Polygon values: MULTIPOLYGON(((0 0,10 0,10 10,0 10,0 0)),((5 5,7 5,7 7,5 7, 5 5)))
• A GeometryCollection consisting of two Point values and one LineString: GEOMETRYCOLLECTION(POINT(10 10), POINT(30 30), LINESTRING(15 15, 20 20))
Well-Known Binary (WKB) Format The Well-Known Binary (WKB) representation of geometric values is used for exchanging geometry data as binary streams represented by BLOB values containing geometric WKB information. This format is defined by the OpenGIS specification (see Section 11.5, “Spatial Data Types”). It is also defined in the ISO SQL/MM Part 3: Spatial standard. WKB uses 1-byte unsigned integers, 4-byte unsigned integers, and 8-byte double-precision numbers (IEEE 754 format). A byte is eight bits. For example, a WKB value that corresponds to POINT(1 -1) consists of this sequence of 21 bytes, each represented by two hexadecimal digits: 0101000000000000000000F03F000000000000F0BF
The sequence consists of the components shown in the following table. 1586
Supported Spatial Data Formats
Table 11.2 WKB Components Example Component
Size
Value
Byte order
1 byte
01
WKB type
4 bytes
01000000
X coordinate
8 bytes
000000000000F03F
Y coordinate
8 bytes
000000000000F0BF
Component representation is as follows: • The byte order indicator is either 1 or 0 to signify little-endian or big-endian storage. The little-endian and big-endian byte orders are also known as Network Data Representation (NDR) and External Data Representation (XDR), respectively. • The WKB type is a code that indicates the geometry type. MySQL uses values from 1 through 7 to indicate Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. • A Point value has X and Y coordinates, each represented as a double-precision value. WKB values for more complex geometry values have more complex data structures, as detailed in the OpenGIS specification.
Internal Geometry Storage Format MySQL stores geometry values using 4 bytes to indicate the SRID followed by the WKB representation of the value. For a description of WKB format, see Well-Known Binary (WKB) Format. For the WKB part, these MySQL-specific considerations apply: • The byte-order indicator byte is 1 because MySQL stores geometries as little-ending values. • MySQL supports geometry types of Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. Other geometry types are not supported. The LENGTH() function returns the space in bytes required for value storage. Example: mysql> SET @g = ST_GeomFromText('POINT(1 -1)'); mysql> SELECT LENGTH(@g); +------------+ | LENGTH(@g) | +------------+ | 25 | +------------+ mysql> SELECT HEX(@g); +----------------------------------------------------+ | HEX(@g) | +----------------------------------------------------+ | 000000000101000000000000000000F03F000000000000F0BF | +----------------------------------------------------+
The value length is 25 bytes, made up of these components (as can be seen from the hexadecimal value): • 4 bytes for integer SRID (0) • 1 byte for integer byte order (1 = little-endian) • 4 bytes for integer type information (1 = Point) • 8 bytes for double-precision X coordinate (1)
1587
Geometry Well-Formedness and Validity
• 8 bytes for double-precision Y coordinate (−1)
11.5.4 Geometry Well-Formedness and Validity For geometry values, MySQL distinguishes between the concepts of syntactically well-formed and geometrically valid. A geometry is syntactically well-formed if it satisfies conditions such as those in this (nonexhaustive) list: • Linestrings have at least two points • Polygons have at least one ring • Polygon rings are closed (first and last points the same) • Polygon rings have at least 4 points (minimum polygon is a triangle with first and last points the same) • Collections are not empty (except GeometryCollection) A geometry is geometrically valid if it is syntactically well-formed and satisfies conditions such as those in this (nonexhaustive) list: • Polygons are not self-intersecting • Polygon interior rings are inside the exterior ring • Multipolygons do not have overlapping polygons Spatial functions fail if a geometry is not syntactically well-formed. Spatial import functions that parse WKT or WKB values raise an error for attempts to create a geometry that is not syntactically wellformed. Syntactic well-formedness is also checked for attempts to store geometries into tables. It is permitted to insert, select, and update geometrically invalid geometries, but they must be syntactically well-formed. Due to the computational expense, MySQL does not check explicitly for geometric validity. Spatial computations may detect some cases of invalid geometries and raise an error, but they may also return an undefined result without detecting the invalidity. Applications that require geometically valid geometries should check them using the ST_IsValid() function.
11.5.5 Creating Spatial Columns MySQL provides a standard way of creating spatial columns for geometry types, for example, with CREATE TABLE or ALTER TABLE. Spatial columns are supported for MyISAM, InnoDB, NDB, and ARCHIVE tables. See also the notes about spatial indexes under Section 11.5.9, “Creating Spatial Indexes”. • Use the CREATE TABLE statement to create a table with a spatial column: CREATE TABLE geom (g GEOMETRY);
• Use the ALTER TABLE statement to add or drop a spatial column to or from an existing table: ALTER TABLE geom ADD pt POINT; ALTER TABLE geom DROP pt;
11.5.6 Populating Spatial Columns After you have created spatial columns, you can populate them with spatial data. Values should be stored in internal geometry format, but you can convert them to that format from either Well-Known Text (WKT) or Well-Known Binary (WKB) format. The following examples
1588
Populating Spatial Columns
demonstrate how to insert geometry values into a table by converting WKT values to internal geometry format: • Perform the conversion directly in the INSERT statement: INSERT INTO geom VALUES (ST_GeomFromText('POINT(1 1)')); SET @g = 'POINT(1 1)'; INSERT INTO geom VALUES (ST_GeomFromText(@g));
• Perform the conversion prior to the INSERT: SET @g = ST_GeomFromText('POINT(1 1)'); INSERT INTO geom VALUES (@g);
The following examples insert more complex geometries into the table: SET @g = 'LINESTRING(0 0,1 1,2 2)'; INSERT INTO geom VALUES (ST_GeomFromText(@g)); SET @g = 'POLYGON((0 0,10 0,10 10,0 10,0 0),(5 5,7 5,7 7,5 7, 5 5))'; INSERT INTO geom VALUES (ST_GeomFromText(@g)); SET @g = 'GEOMETRYCOLLECTION(POINT(1 1),LINESTRING(0 0,1 1,2 2,3 3,4 4))'; INSERT INTO geom VALUES (ST_GeomFromText(@g));
The preceding examples use ST_GeomFromText() to create geometry values. You can also use type-specific functions: SET @g = 'POINT(1 1)'; INSERT INTO geom VALUES (ST_PointFromText(@g)); SET @g = 'LINESTRING(0 0,1 1,2 2)'; INSERT INTO geom VALUES (ST_LineStringFromText(@g)); SET @g = 'POLYGON((0 0,10 0,10 10,0 10,0 0),(5 5,7 5,7 7,5 7, 5 5))'; INSERT INTO geom VALUES (ST_PolygonFromText(@g)); SET @g = 'GEOMETRYCOLLECTION(POINT(1 1),LINESTRING(0 0,1 1,2 2,3 3,4 4))'; INSERT INTO geom VALUES (ST_GeomCollFromText(@g));
A client application program that wants to use WKB representations of geometry values is responsible for sending correctly formed WKB in queries to the server. There are several ways to satisfy this requirement. For example: • Inserting a POINT(1 1) value with hex literal syntax: INSERT INTO geom VALUES (ST_GeomFromWKB(X'0101000000000000000000F03F000000000000F03F'));
• An ODBC application can send a WKB representation, binding it to a placeholder using an argument of BLOB type: INSERT INTO geom VALUES (ST_GeomFromWKB(?))
Other programming interfaces may support a similar placeholder mechanism. • In a C program, you can escape a binary value using mysql_real_escape_string_quote() and include the result in a query string that is sent to the server. See Section 27.8.7.56, “mysql_real_escape_string_quote()”.
1589
Fetching Spatial Data
11.5.7 Fetching Spatial Data Geometry values stored in a table can be fetched in internal format. You can also convert them to WKT or WKB format. • Fetching spatial data in internal format: Fetching geometry values using internal format can be useful in table-to-table transfers: CREATE TABLE geom2 (g GEOMETRY) SELECT g FROM geom;
• Fetching spatial data in WKT format: The ST_AsText() function converts a geometry from internal format to a WKT string. SELECT ST_AsText(g) FROM geom;
• Fetching spatial data in WKB format: The ST_AsBinary() function converts a geometry from internal format to a BLOB containing the WKB value. SELECT ST_AsBinary(g) FROM geom;
11.5.8 Optimizing Spatial Analysis For MyISAM and InnoDB tables, search operations in columns containing spatial data can be optimized using SPATIAL indexes. The most typical operations are: • Point queries that search for all objects that contain a given point • Region queries that search for all objects that overlap a given region MySQL uses R-Trees with quadratic splitting for SPATIAL indexes on spatial columns. A SPATIAL index is built using the minimum bounding rectangle (MBR) of a geometry. For most geometries, the MBR is a minimum rectangle that surrounds the geometries. For a horizontal or a vertical linestring, the MBR is a rectangle degenerated into the linestring. For a point, the MBR is a rectangle degenerated into the point. It is also possible to create normal indexes on spatial columns. In a non-SPATIAL index, you must declare a prefix for any spatial column except for POINT columns. MyISAM and InnoDB support both SPATIAL and non-SPATIAL indexes. Other storage engines support non-SPATIAL indexes, as described in Section 13.1.14, “CREATE INDEX Syntax”.
11.5.9 Creating Spatial Indexes For InnoDB and MyISAM tables, MySQL can create spatial indexes using syntax similar to that for creating regular indexes, but using the SPATIAL keyword. Columns in spatial indexes must be declared NOT NULL. The following examples demonstrate how to create spatial indexes: • With CREATE TABLE: CREATE TABLE geom (g GEOMETRY NOT NULL, SPATIAL INDEX(g));
• With ALTER TABLE: CREATE TABLE geom (g GEOMETRY NOT NULL); ALTER TABLE geom ADD SPATIAL INDEX(g);
1590
Using Spatial Indexes
• With CREATE INDEX: CREATE TABLE geom (g GEOMETRY NOT NULL); CREATE SPATIAL INDEX g ON geom (g);
SPATIAL INDEX creates an R-tree index. For storage engines that support nonspatial indexing of spatial columns, the engine creates a B-tree index. A B-tree index on spatial values is useful for exactvalue lookups, but not for range scans. For more information on indexing spatial columns, see Section 13.1.14, “CREATE INDEX Syntax”. To drop spatial indexes, use ALTER TABLE or DROP INDEX: • With ALTER TABLE: ALTER TABLE geom DROP INDEX g;
• With DROP INDEX: DROP INDEX g ON geom;
Example: Suppose that a table geom contains more than 32,000 geometries, which are stored in the column g of type GEOMETRY. The table also has an AUTO_INCREMENT column fid for storing object ID values. mysql> DESCRIBE geom; +-------+----------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------+----------+------+-----+---------+----------------+ | fid | int(11) | | PRI | NULL | auto_increment | | g | geometry | | | | | +-------+----------+------+-----+---------+----------------+ 2 rows in set (0.00 sec) mysql> SELECT COUNT(*) FROM geom; +----------+ | count(*) | +----------+ | 32376 | +----------+ 1 row in set (0.00 sec)
To add a spatial index on the column g, use this statement: mysql> ALTER TABLE geom ADD SPATIAL INDEX(g); Query OK, 32376 rows affected (4.05 sec) Records: 32376 Duplicates: 0 Warnings: 0
11.5.10 Using Spatial Indexes The optimizer investigates whether available spatial indexes can be involved in the search for queries that use a function such as MBRContains() or MBRWithin() in the WHERE clause. The following query finds all objects that are in the given rectangle: mysql> SET @poly = -> 'Polygon((30000 15000, 31000 15000, 31000 16000, 30000 16000, 30000 15000))'; mysql> SELECT fid,ST_AsText(g) FROM geom WHERE -> MBRContains(ST_GeomFromText(@poly),g);
1591
Using Spatial Indexes
+-----+---------------------------------------------------------------+ | fid | ST_AsText(g) | +-----+---------------------------------------------------------------+ | 21 | LINESTRING(30350.4 15828.8,30350.6 15845,30333.8 15845,30 ... | | 22 | LINESTRING(30350.6 15871.4,30350.6 15887.8,30334 15887.8, ... | | 23 | LINESTRING(30350.6 15914.2,30350.6 15930.4,30334 15930.4, ... | | 24 | LINESTRING(30290.2 15823,30290.2 15839.4,30273.4 15839.4, ... | | 25 | LINESTRING(30291.4 15866.2,30291.6 15882.4,30274.8 15882. ... | | 26 | LINESTRING(30291.6 15918.2,30291.6 15934.4,30275 15934.4, ... | | 249 | LINESTRING(30337.8 15938.6,30337.8 15946.8,30320.4 15946. ... | | 1 | LINESTRING(30250.4 15129.2,30248.8 15138.4,30238.2 15136. ... | | 2 | LINESTRING(30220.2 15122.8,30217.2 15137.8,30207.6 15136, ... | | 3 | LINESTRING(30179 15114.4,30176.6 15129.4,30167 15128,3016 ... | | 4 | LINESTRING(30155.2 15121.4,30140.4 15118.6,30142 15109,30 ... | | 5 | LINESTRING(30192.4 15085,30177.6 15082.2,30179.2 15072.4, ... | | 6 | LINESTRING(30244 15087,30229 15086.2,30229.4 15076.4,3024 ... | | 7 | LINESTRING(30200.6 15059.4,30185.6 15058.6,30186 15048.8, ... | | 10 | LINESTRING(30179.6 15017.8,30181 15002.8,30190.8 15003.6, ... | | 11 | LINESTRING(30154.2 15000.4,30168.6 15004.8,30166 15014.2, ... | | 13 | LINESTRING(30105 15065.8,30108.4 15050.8,30118 15053,3011 ... | | 154 | LINESTRING(30276.2 15143.8,30261.4 15141,30263 15131.4,30 ... | | 155 | LINESTRING(30269.8 15084,30269.4 15093.4,30258.6 15093,30 ... | | 157 | LINESTRING(30128.2 15011,30113.2 15010.2,30113.6 15000.4, ... | +-----+---------------------------------------------------------------+ 20 rows in set (0.00 sec)
Use EXPLAIN to check the way this query is executed: mysql> SET @poly = -> 'Polygon((30000 15000, 31000 15000, 31000 16000, 30000 16000, 30000 15000))'; mysql> EXPLAIN SELECT fid,ST_AsText(g) FROM geom WHERE -> MBRContains(ST_GeomFromText(@poly),g)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: geom type: range possible_keys: g key: g key_len: 32 ref: NULL rows: 50 Extra: Using where 1 row in set (0.00 sec)
Check what would happen without a spatial index: mysql> SET @poly = -> 'Polygon((30000 15000, 31000 15000, 31000 16000, 30000 16000, 30000 15000))'; mysql> EXPLAIN SELECT fid,ST_AsText(g) FROM g IGNORE INDEX (g) WHERE -> MBRContains(ST_GeomFromText(@poly),g)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: geom type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 32376 Extra: Using where
1592
The JSON Data Type
1 row in set (0.00 sec)
Executing the SELECT statement without the spatial index yields the same result but causes the execution time to rise from 0.00 seconds to 0.46 seconds: mysql> SET @poly = -> 'Polygon((30000 15000, 31000 15000, 31000 16000, 30000 16000, 30000 15000))'; mysql> SELECT fid,ST_AsText(g) FROM geom IGNORE INDEX (g) WHERE -> MBRContains(ST_GeomFromText(@poly),g); +-----+---------------------------------------------------------------+ | fid | ST_AsText(g) | +-----+---------------------------------------------------------------+ | 1 | LINESTRING(30250.4 15129.2,30248.8 15138.4,30238.2 15136. ... | | 2 | LINESTRING(30220.2 15122.8,30217.2 15137.8,30207.6 15136, ... | | 3 | LINESTRING(30179 15114.4,30176.6 15129.4,30167 15128,3016 ... | | 4 | LINESTRING(30155.2 15121.4,30140.4 15118.6,30142 15109,30 ... | | 5 | LINESTRING(30192.4 15085,30177.6 15082.2,30179.2 15072.4, ... | | 6 | LINESTRING(30244 15087,30229 15086.2,30229.4 15076.4,3024 ... | | 7 | LINESTRING(30200.6 15059.4,30185.6 15058.6,30186 15048.8, ... | | 10 | LINESTRING(30179.6 15017.8,30181 15002.8,30190.8 15003.6, ... | | 11 | LINESTRING(30154.2 15000.4,30168.6 15004.8,30166 15014.2, ... | | 13 | LINESTRING(30105 15065.8,30108.4 15050.8,30118 15053,3011 ... | | 21 | LINESTRING(30350.4 15828.8,30350.6 15845,30333.8 15845,30 ... | | 22 | LINESTRING(30350.6 15871.4,30350.6 15887.8,30334 15887.8, ... | | 23 | LINESTRING(30350.6 15914.2,30350.6 15930.4,30334 15930.4, ... | | 24 | LINESTRING(30290.2 15823,30290.2 15839.4,30273.4 15839.4, ... | | 25 | LINESTRING(30291.4 15866.2,30291.6 15882.4,30274.8 15882. ... | | 26 | LINESTRING(30291.6 15918.2,30291.6 15934.4,30275 15934.4, ... | | 154 | LINESTRING(30276.2 15143.8,30261.4 15141,30263 15131.4,30 ... | | 155 | LINESTRING(30269.8 15084,30269.4 15093.4,30258.6 15093,30 ... | | 157 | LINESTRING(30128.2 15011,30113.2 15010.2,30113.6 15000.4, ... | | 249 | LINESTRING(30337.8 15938.6,30337.8 15946.8,30320.4 15946. ... | +-----+---------------------------------------------------------------+ 20 rows in set (0.46 sec)
11.6 The JSON Data Type • Creating JSON Values • Normalization, Merging, and Autowrapping of JSON Values • Searching and Modifying JSON Values • JSON Path Syntax • Comparison and Ordering of JSON Values • Converting between JSON and non-JSON values • Aggregation of JSON Values As of MySQL 5.7.8, MySQL supports a native JSON data type defined by RFC 7159 that enables efficient access to data in JSON (JavaScript Object Notation) documents. The JSON data type provides these advantages over storing JSON-format strings in a string column: • Automatic validation of JSON documents stored in JSON columns. Invalid documents produce an error. • Optimized storage format. JSON documents stored in JSON columns are converted to an internal format that permits quick read access to document elements. When the server later must read a JSON value stored in this binary format, the value need not be parsed from a text representation. The binary format is structured to enable the server to look up subobjects or nested values directly by key or array index without reading all values before or after them in the document.
1593
Creating JSON Values
Note This discussion uses JSON in monotype to indicate specifically the JSON data type and “JSON” in regular font to indicate JSON data in general. The space required to store a JSON document is roughly the same as for LONGBLOB or LONGTEXT; see Section 11.8, “Data Type Storage Requirements”, for more information. It is important to keep in mind that the size of any JSON document stored in a JSON column is limited to the value of the max_allowed_packet system variable. (When the server is manipulating a JSON value internally in memory, it can be larger than this; the limit applies when the server stores it.) A JSON column cannot have a non-NULL default value. Along with the JSON data type, a set of SQL functions is available to enable operations on JSON values, such as creation, manipulation, and searching. The following discussion shows examples of these operations. For details about individual functions, see Section 12.17, “JSON Functions”. A set of spatial functions for operating on GeoJSON values is also available. See Section 12.16.11, “Spatial GeoJSON Functions”. JSON columns, like columns of other binary types, are not indexed directly; instead, you can create an index on a generated column that extracts a scalar value from the JSON column. See Indexing a Generated Column to Provide a JSON Column Index, for a detailed example. The MySQL optimizer also looks for compatible indexes on virtual columns that match JSON expressions. MySQL NDB Cluster 7.5 (7.5.2 and later) supports JSON columns and MySQL JSON functions, including creation of an index on a column generated from a JSON column as a workaround for being unable to index a JSON column. A maximum of 3 JSON columns per NDB table is supported. The next few sections provide basic information regarding the creation and manipulation of JSON values.
Creating JSON Values A JSON array contains a list of values separated by commas and enclosed within [ and ] characters: ["abc", 10, null, true, false]
A JSON object contains a set of key-value pairs separated by commas and enclosed within { and } characters: {"k1": "value", "k2": 10}
As the examples illustrate, JSON arrays and objects can contain scalar values that are strings or numbers, the JSON null literal, or the JSON boolean true or false literals. Keys in JSON objects must be strings. Temporal (date, time, or datetime) scalar values are also permitted: ["12:18:29.000000", "2015-07-29", "2015-07-29 12:18:29.000000"]
Nesting is permitted within JSON array elements and JSON object key values: [99, {"id": "HK500", "cost": 75.99}, ["hot", "cold"]] {"k1": "value", "k2": [10, 20]}
You can also obtain JSON values from a number of functions supplied by MySQL for this purpose (see Section 12.17.2, “Functions That Create JSON Values”) as well as by casting values of other types to
1594
Creating JSON Values
the JSON type using CAST(value AS JSON) (see Converting between JSON and non-JSON values). The next several paragraphs describe how MySQL handles JSON values provided as input. In MySQL, JSON values are written as strings. MySQL parses any string used in a context that requires a JSON value, and produces an error if it is not valid as JSON. These contexts include inserting a value into a column that has the JSON data type and passing an argument to a function that expects a JSON value (usually shown as json_doc or json_val in the documentation for MySQL JSON functions), as the following examples demonstrate: • Attempting to insert a value into a JSON column succeeds if the value is a valid JSON value, but fails if it is not: mysql> CREATE TABLE t1 (jdoc JSON); Query OK, 0 rows affected (0.20 sec) mysql> INSERT INTO t1 VALUES('{"key1": "value1", "key2": "value2"}'); Query OK, 1 row affected (0.01 sec) mysql> INSERT INTO t1 VALUES('[1, 2,'); ERROR 3140 (22032) at line 2: Invalid JSON text: "Invalid value." at position 6 in value (or column) '[1, 2,'.
Positions for “at position N” in such error messages are 0-based, but should be considered rough indications of where the problem in a value actually occurs. • The JSON_TYPE() function expects a JSON argument and attempts to parse it into a JSON value. It returns the value's JSON type if it is valid and produces an error otherwise: mysql> SELECT JSON_TYPE('["a", "b", 1]'); +----------------------------+ | JSON_TYPE('["a", "b", 1]') | +----------------------------+ | ARRAY | +----------------------------+ mysql> SELECT JSON_TYPE('"hello"'); +----------------------+ | JSON_TYPE('"hello"') | +----------------------+ | STRING | +----------------------+ mysql> SELECT JSON_TYPE('hello'); ERROR 3146 (22032): Invalid data type for JSON data in argument 1 to function json_type; a JSON string or JSON type is required.
MySQL handles strings used in JSON context using the utf8mb4 character set and utf8mb4_bin collation. Strings in other character sets are converted to utf8mb4 as necessary. (For strings in the ascii or utf8 character sets, no conversion is needed because ascii and utf8 are subsets of utf8mb4.) As an alternative to writing JSON values using literal strings, functions exist for composing JSON values from component elements. JSON_ARRAY() takes a (possibly empty) list of values and returns a JSON array containing those values: mysql> SELECT JSON_ARRAY('a', 1, NOW()); +----------------------------------------+ | JSON_ARRAY('a', 1, NOW()) | +----------------------------------------+ | ["a", 1, "2015-07-27 09:43:47.000000"] | +----------------------------------------+
JSON_OBJECT() takes a (possibly empty) list of key-value pairs and returns a JSON object containing those pairs:
1595
Creating JSON Values
mysql> SELECT JSON_OBJECT('key1', 1, 'key2', 'abc'); +---------------------------------------+ | JSON_OBJECT('key1', 1, 'key2', 'abc') | +---------------------------------------+ | {"key1": 1, "key2": "abc"} | +---------------------------------------+
JSON_MERGE() takes two or more JSON documents and returns the combined result: mysql> SELECT JSON_MERGE('["a", 1]', '{"key": "value"}'); +--------------------------------------------+ | JSON_MERGE('["a", 1]', '{"key": "value"}') | +--------------------------------------------+ | ["a", 1, {"key": "value"}] | +--------------------------------------------+
For information about the merging rules, see Normalization, Merging, and Autowrapping of JSON Values. JSON values can be assigned to user-defined variables: mysql> SET @j = JSON_OBJECT('key', 'value'); mysql> SELECT @j; +------------------+ | @j | +------------------+ | {"key": "value"} | +------------------+
However, user-defined variables cannot be of JSON data type, so although @j in the preceding example looks like a JSON value and has the same character set and collation as a JSON value, it does not have the JSON data type. Instead, the result from JSON_OBJECT() is converted to a string when assigned to the variable. Strings produced by converting JSON values have a character set of utf8mb4 and a collation of utf8mb4_bin: mysql> SELECT CHARSET(@j), COLLATION(@j); +-------------+---------------+ | CHARSET(@j) | COLLATION(@j) | +-------------+---------------+ | utf8mb4 | utf8mb4_bin | +-------------+---------------+
Because utf8mb4_bin is a binary collation, comparison of JSON values is case-sensitive. mysql> SELECT JSON_ARRAY('x') = JSON_ARRAY('X'); +-----------------------------------+ | JSON_ARRAY('x') = JSON_ARRAY('X') | +-----------------------------------+ | 0 | +-----------------------------------+
Case sensitivity also applies to the JSON null, true, and false literals, which always must be written in lowercase: mysql> SELECT JSON_VALID('null'), JSON_VALID('Null'), JSON_VALID('NULL'); +--------------------+--------------------+--------------------+ | JSON_VALID('null') | JSON_VALID('Null') | JSON_VALID('NULL') | +--------------------+--------------------+--------------------+ | 1 | 0 | 0 | +--------------------+--------------------+--------------------+
1596
Creating JSON Values
mysql> SELECT CAST('null' AS JSON); +----------------------+ | CAST('null' AS JSON) | +----------------------+ | null | +----------------------+ 1 row in set (0.00 sec) mysql> SELECT CAST('NULL' AS JSON); ERROR 3141 (22032): Invalid JSON text in argument 1 to function cast_as_json: "Invalid value." at position 0 in 'NULL'.
Case sensitivity of the JSON literals differs from that of the SQL NULL, TRUE, and FALSE literals, which can be written in any lettercase: mysql> SELECT ISNULL(null), ISNULL(Null), ISNULL(NULL); +--------------+--------------+--------------+ | ISNULL(null) | ISNULL(Null) | ISNULL(NULL) | +--------------+--------------+--------------+ | 1 | 1 | 1 | +--------------+--------------+--------------+
Sometimes it may be necessary or desirable to insert quote characters (" or ') into a JSON document. Assume for this example that you want to insert some JSON objects containing strings representing sentences that state some facts about MySQL, each paired with an appropriate keyword, into a table created using the SQL statement shown here: mysql> CREATE TABLE facts (sentence JSON);
Among these keyword-sentence pairs is this one: mascot: The MySQL mascot is a dolphin named "Sakila".
One way to insert this as a JSON object into the facts table is to use the MySQL JSON_OBJECT() function. In this case, you must escape each quote character using a backslash, as shown here: mysql> INSERT INTO facts VALUES > (JSON_OBJECT("mascot", "Our mascot is a dolphin named \"Sakila\"."));
This does not work in the same way if you insert the value as a JSON object literal, in which case, you must use the double backslash escape sequence, like this: mysql> INSERT INTO facts VALUES > ('{"mascot": "Our mascot is a dolphin named \\"Sakila\\"."}');
Using the double backslash keeps MySQL from performing escape sequence processing, and instead causes it to pass the string literal to the storage engine for processing. After inserting the JSON object in either of the ways just shown, you can see that the backslashes are present in the JSON column value by doing a simple SELECT, like this: mysql> SELECT sentence FROM facts; +---------------------------------------------------------+ | sentence | +---------------------------------------------------------+ | {"mascot": "Our mascot is a dolphin named \"Sakila\"."} | +---------------------------------------------------------+
To look up this particular sentence employing mascot as the key, you can use the column-path operator ->, as shown here:
1597
Normalization, Merging, and Autowrapping of JSON Values
mysql> SELECT col->"$.mascot" FROM qtest; +---------------------------------------------+ | col->"$.mascot" | +---------------------------------------------+ | "Our mascot is a dolphin named \"Sakila\"." | +---------------------------------------------+ 1 row in set (0.00 sec)
This leaves the backslashes intact, along with the surrounding quote marks. To display the desired value using mascot as the key, but without including the surrounding quote marks or any escapes, use the inline path operator ->>, like this: mysql> SELECT sentence->>"$.mascot" FROM facts; +-----------------------------------------+ | sentence->>"$.mascot" | +-----------------------------------------+ | Our mascot is a dolphin named "Sakila". | +-----------------------------------------+
Note The previous example does not work as shown if the NO_BACKSLASH_ESCAPES server SQL mode is enabled. If this mode is set, a single backslash instead of double backslashes can be used to insert the JSON object literal, and the backslashes are preserved. If you use the JSON_OBJECT() function when performing the insert and this mode is set, you must alternate single and double quotes, like this: mysql> INSERT INTO facts VALUES > (JSON_OBJECT('mascot', 'Our mascot is a dolphin named "Sakila".'));
See the description of the JSON_UNQUOTE() function for more information about the effects of this mode on escaped characters in JSON values.
Normalization, Merging, and Autowrapping of JSON Values When a string is parsed and found to be a valid JSON document, it is also normalized: Members with keys that duplicate a key found earlier in the document are discarded (even if the values differ). The object value produced by the following JSON_OBJECT() call does not include the second key1 element because that key name occurs earlier in the value: mysql> SELECT JSON_OBJECT('key1', 1, 'key2', 'abc', 'key1', 'def'); +------------------------------------------------------+ | JSON_OBJECT('key1', 1, 'key2', 'abc', 'key1', 'def') | +------------------------------------------------------+ | {"key1": 1, "key2": "abc"} | +------------------------------------------------------+
Note This “first key wins” handling of duplicate keys is not consistent with RFC 7159. This is a known issue in MySQL 5.7, which is fixed in MySQL 8.0. (Bug #86866, Bug #26369555) MySQL also discards extra whitespace between keys, values, or elements in the original JSON document. To make lookups more efficient, it also sorts the keys of a JSON object. You should be aware that the result of this ordering is subject to change and not guaranteed to be consistent across releases. MySQL functions that produce JSON values (see Section 12.17.2, “Functions That Create JSON Values”) always return normalized values.
1598
Searching and Modifying JSON Values
Merging JSON Values In contexts that combine multiple arrays, the arrays are merged into a single array by concatenating arrays named later to the end of the first array. In the following example, JSON_MERGE() merges its arguments into a single array: mysql> SELECT JSON_MERGE('[1, 2]', '["a", "b"]', '[true, false]'); +-----------------------------------------------------+ | JSON_MERGE('[1, 2]', '["a", "b"]', '[true, false]') | +-----------------------------------------------------+ | [1, 2, "a", "b", true, false] | +-----------------------------------------------------+
Normalization is also performed when values are inserted into JSON columns, as shown here: mysql> CREATE TABLE t1 (c1 JSON); mysql> INSERT INTO t1 VALUES > ('{"x": 17, "x": "red"}'), > ('{"x": 17, "x": "red", "x": [3, 5, 7]}'); mysql> SELECT c1 FROM t1; +-----------+ | c1 | +-----------+ | {"x": 17} | | {"x": 17} | +-----------+
Multiple objects when merged produce a single object. If multiple objects have the same key, the value for that key in the resulting merged object is an array containing the key values: mysql> SELECT JSON_MERGE('{"a": 1, "b": 2}', '{"c": 3, "a": 4}'); +----------------------------------------------------+ | JSON_MERGE('{"a": 1, "b": 2}', '{"c": 3, "a": 4}') | +----------------------------------------------------+ | {"a": [1, 4], "b": 2, "c": 3} | +----------------------------------------------------+
Nonarray values used in a context that requires an array value are autowrapped: The value is surrounded by [ and ] characters to convert it to an array. In the following statement, each argument is autowrapped as an array ([1], [2]). These are then merged to produce a single result array: mysql> SELECT JSON_MERGE('1', '2'); +----------------------+ | JSON_MERGE('1', '2') | +----------------------+ | [1, 2] | +----------------------+
Array and object values are merged by autowrapping the object as an array and merging the two arrays: mysql> SELECT JSON_MERGE('[10, 20]', '{"a": "x", "b": "y"}'); +------------------------------------------------+ | JSON_MERGE('[10, 20]', '{"a": "x", "b": "y"}') | +------------------------------------------------+ | [10, 20, {"a": "x", "b": "y"}] | +------------------------------------------------+
Searching and Modifying JSON Values A JSON path expression selects a value within a JSON document.
1599
Searching and Modifying JSON Values
Path expressions are useful with functions that extract parts of or modify a JSON document, to specify where within that document to operate. For example, the following query extracts from a JSON document the value of the member with the name key: mysql> SELECT JSON_EXTRACT('{"id": 14, "name": "Aztalan"}', '$.name'); +---------------------------------------------------------+ | JSON_EXTRACT('{"id": 14, "name": "Aztalan"}', '$.name') | +---------------------------------------------------------+ | "Aztalan" | +---------------------------------------------------------+
Path syntax uses a leading $ character to represent the JSON document under consideration, optionally followed by selectors that indicate successively more specific parts of the document: • A period followed by a key name names the member in an object with the given key. The key name must be specified within double quotation marks if the name without quotes is not legal within path expressions (for example, if it contains a space). • [N] appended to a path that selects an array names the value at position N within the array. Array positions are integers beginning with zero. If path does not select an array value, path[0] evaluates to the same value as path: mysql> SELECT JSON_SET('"x"', '$[0]', 'a'); +------------------------------+ | JSON_SET('"x"', '$[0]', 'a') | +------------------------------+ | "a" | +------------------------------+ 1 row in set (0.00 sec)
• Paths can contain * or ** wildcards: • .[*] evaluates to the values of all members in a JSON object. • [*] evaluates to the values of all elements in a JSON array. • prefix**suffix evaluates to all paths that begin with the named prefix and end with the named suffix. • A path that does not exist in the document (evaluates to nonexistent data) evaluates to NULL. Let $ refer to this JSON array with three elements: [3, {"a": [5, 6], "b": 10}, [99, 100]]
Then: • $[0] evaluates to 3. • $[1] evaluates to {"a": [5, 6], "b": 10}. • $[2] evaluates to [99, 100]. • $[3] evaluates to NULL (it refers to the fourth array element, which does not exist). Because $[1] and $[2] evaluate to nonscalar values, they can be used as the basis for more-specific path expressions that select nested values. Examples: • $[1].a evaluates to [5, 6]. • $[1].a[1] evaluates to 6. • $[1].b evaluates to 10.
1600
Searching and Modifying JSON Values
• $[2][0] evaluates to 99. As mentioned previously, path components that name keys must be quoted if the unquoted key name is not legal in path expressions. Let $ refer to this value: {"a fish": "shark", "a bird": "sparrow"}
The keys both contain a space and must be quoted: • $."a fish" evaluates to shark. • $."a bird" evaluates to sparrow. Paths that use wildcards evaluate to an array that can contain multiple values: mysql> SELECT JSON_EXTRACT('{"a": 1, "b": 2, "c": [3, 4, 5]}', '$.*'); +---------------------------------------------------------+ | JSON_EXTRACT('{"a": 1, "b": 2, "c": [3, 4, 5]}', '$.*') | +---------------------------------------------------------+ | [1, 2, [3, 4, 5]] | +---------------------------------------------------------+ mysql> SELECT JSON_EXTRACT('{"a": 1, "b": 2, "c": [3, 4, 5]}', '$.c[*]'); +------------------------------------------------------------+ | JSON_EXTRACT('{"a": 1, "b": 2, "c": [3, 4, 5]}', '$.c[*]') | +------------------------------------------------------------+ | [3, 4, 5] | +------------------------------------------------------------+
In the following example, the path $**.b evaluates to multiple paths ($.a.b and $.c.b) and produces an array of the matching path values: mysql> SELECT JSON_EXTRACT('{"a": {"b": 1}, "c": {"b": 2}}', '$**.b'); +---------------------------------------------------------+ | JSON_EXTRACT('{"a": {"b": 1}, "c": {"b": 2}}', '$**.b') | +---------------------------------------------------------+ | [1, 2] | +---------------------------------------------------------+
In MySQL 5.7.9 and later, you can use column->path with a JSON column identifier and JSON path expression as a synonym for JSON_EXTRACT(column, path). See Section 12.17.3, “Functions That Search JSON Values”, for more information. See also Indexing a Generated Column to Provide a JSON Column Index. Some functions take an existing JSON document, modify it in some way, and return the resulting modified document. Path expressions indicate where in the document to make changes. For example, the JSON_SET(), JSON_INSERT(), and JSON_REPLACE() functions each take a JSON document, plus one or more path/value pairs that describe where to modify the document and the values to use. The functions differ in how they handle existing and nonexisting values within the document. Consider this document: mysql> SET @j = '["a", {"b": [true, false]}, [10, 20]]';
JSON_SET() replaces values for paths that exist and adds values for paths that do not exist:. mysql> SELECT JSON_SET(@j, '$[1].b[0]', 1, '$[2][2]', 2); +--------------------------------------------+ | JSON_SET(@j, '$[1].b[0]', 1, '$[2][2]', 2) | +--------------------------------------------+ | ["a", {"b": [1, false]}, [10, 20, 2]] | +--------------------------------------------+
1601
JSON Path Syntax
In this case, the path $[1].b[0] selects an existing value (true), which is replaced with the value following the path argument (1). The path $[2][2] does not exist, so the corresponding value (2) is added to the value selected by $[2]. JSON_INSERT() adds new values but does not replace existing values: mysql> SELECT JSON_INSERT(@j, '$[1].b[0]', 1, '$[2][2]', 2); +-----------------------------------------------+ | JSON_INSERT(@j, '$[1].b[0]', 1, '$[2][2]', 2) | +-----------------------------------------------+ | ["a", {"b": [true, false]}, [10, 20, 2]] | +-----------------------------------------------+
JSON_REPLACE() replaces existing values and ignores new values: mysql> SELECT JSON_REPLACE(@j, '$[1].b[0]', 1, '$[2][2]', 2); +------------------------------------------------+ | JSON_REPLACE(@j, '$[1].b[0]', 1, '$[2][2]', 2) | +------------------------------------------------+ | ["a", {"b": [1, false]}, [10, 20]] | +------------------------------------------------+
The path/value pairs are evaluated left to right. The document produced by evaluating one pair becomes the new value against which the next pair is evaluated. JSON_REMOVE() takes a JSON document and one or more paths that specify values to be removed from the document. The return value is the original document minus the values selected by paths that exist within the document: mysql> SELECT JSON_REMOVE(@j, '$[2]', '$[1].b[1]', '$[1].b[1]'); +---------------------------------------------------+ | JSON_REMOVE(@j, '$[2]', '$[1].b[1]', '$[1].b[1]') | +---------------------------------------------------+ | ["a", {"b": [true]}] | +---------------------------------------------------+
The paths have these effects: • $[2] matches [10, 20] and removes it. • The first instance of $[1].b[1] matches false in the b element and removes it. • The second instance of $[1].b[1] matches nothing: That element has already been removed, the path no longer exists, and has no effect.
JSON Path Syntax Many of the JSON functions supported by MySQL and described elsewhere in this Manual (see Section 12.17, “JSON Functions”) require a path expression in order to identify a specific element in a JSON document. A path consists of the path's scope followed by one or more path legs. For paths used in MySQL JSON functions, the scope is always the document being searched or otherwise operated on, represented by a leading $ character. Path legs are separated by period characters (.). Cells in arrays are represented by [N], where N is a non-negative integer. Names of keys must be double-quoted strings or valid ECMAScript identifiers (see http://www.ecmainternational.org/ecma-262/5.1/#sec-7.6). Path expressions, like JSON text, should be encoded using the ascii, utf8, or utf8mb4 character set. Other character encodings are implicitly coerced to utf8mb4. The complete syntax is shown here: pathExpression: scope[(pathLeg)*]
1602
Comparison and Ordering of JSON Values
pathLeg: member | arrayLocation | doubleAsterisk member: period ( keyName | asterisk ) arrayLocation: leftBracket ( nonNegativeInteger | asterisk ) rightBracket keyName: ESIdentifier | doubleQuotedString doubleAsterisk: '**' period: '.' asterisk: '*' leftBracket: '[' rightBracket: ']'
As noted previously, in MySQL, the scope of the path is always the document being operated on, represented as $. You can use '$' as a synonynm for the document in JSON path expressions. Note Some implementations support column references for scopes of JSON paths; currently, MySQL does not support these. The wildcard * and ** tokens are used as follows: • .* represents the values of all members in the object. • [*] represents the values of all cells in the array. • [prefix]**suffix represents all paths beginning with prefix and ending with suffix. prefix is optional, while suffix is required; in other words, a path may not end in **. In addition, a path may not contain the sequence ***. For path syntax examples, see the descriptions of the various JSON functions that take paths as arguments, such as JSON_CONTAINS_PATH(), JSON_SET(), and JSON_REPLACE(). For examples which include the use of the * and ** wildcards, see the description of the JSON_SEARCH() function.
Comparison and Ordering of JSON Values JSON values can be compared using the =, <, <=, >, >=, <>, !=, and <=> operators. The following comparison operators and functions are not yet supported with JSON values: • BETWEEN • IN() • GREATEST() • LEAST() A workaround for the comparison operators and functions just listed is to cast JSON values to a native MySQL numeric or string data type so they have a consistent non-JSON scalar type.
1603
Comparison and Ordering of JSON Values
Comparison of JSON values takes place at two levels. The first level of comparison is based on the JSON types of the compared values. If the types differ, the comparison result is determined solely by which type has higher precedence. If the two values have the same JSON type, a second level of comparison occurs using type-specific rules. The following list shows the precedences of JSON types, from highest precedence to the lowest. (The type names are those returned by the JSON_TYPE() function.) Types shown together on a line have the same precedence. Any value having a JSON type listed earlier in the list compares greater than any value having a JSON type listed later in the list. BLOB BIT OPAQUE DATETIME TIME DATE BOOLEAN ARRAY OBJECT STRING INTEGER, DOUBLE NULL
For JSON values of the same precedence, the comparison rules are type specific: • BLOB The first N bytes of the two values are compared, where N is the number of bytes in the shorter value. If the first N bytes of the two values are identical, the shorter value is ordered before the longer value. • BIT Same rules as for BLOB. • OPAQUE Same rules as for BLOB. OPAQUE values are values that are not classified as one of the other types. • DATETIME A value that represents an earlier point in time is ordered before a value that represents a later point in time. If two values originally come from the MySQL DATETIME and TIMESTAMP types, respectively, they are equal if they represent the same point in time. • TIME The smaller of two time values is ordered before the larger one. • DATE The earlier date is ordered before the more recent date. • ARRAY Two JSON arrays are equal if they have the same length and values in corresponding positions in the arrays are equal. If the arrays are not equal, their order is determined by the elements in the first position where there is a difference. The array with the smaller value in that position is ordered first. If all values of the shorter array are equal to the corresponding values in the longer array, the shorter array is ordered first. Example:
1604
Comparison and Ordering of JSON Values
[] < ["a"] < ["ab"] < ["ab", "cd", "ef"] < ["ab", "ef"]
• BOOLEAN The JSON false literal is less than the JSON true literal. • OBJECT Two JSON objects are equal if they have the same set of keys, and each key has the same value in both objects. Example: {"a": 1, "b": 2} = {"b": 2, "a": 1}
The order of two objects that are not equal is unspecified but deterministic. • STRING Strings are ordered lexically on the first N bytes of the utf8mb4 representation of the two strings being compared, where N is the length of the shorter string. If the first N bytes of the two strings are identical, the shorter string is considered smaller than the longer string. Example: "a" < "ab" < "b" < "bc"
This ordering is equivalent to the ordering of SQL strings with collation utf8mb4_bin. Because utf8mb4_bin is a binary collation, comparison of JSON values is case-sensitive: "A" < "a"
• INTEGER, DOUBLE JSON values can contain exact-value numbers and approximate-value numbers. For a general discussion of these types of numbers, see Section 9.1.2, “Numeric Literals”. The rules for comparing native MySQL numeric types are discussed in Section 12.2, “Type Conversion in Expression Evaluation”, but the rules for comparing numbers within JSON values differ somewhat: • In a comparison between two columns that use the native MySQL INT and DOUBLE numeric types, respectively, it is known that all comparisons involve an integer and a double, so the integer is converted to double for all rows. That is, exact-value numbers are converted to approximatevalue numbers. • On the other hand, if the query compares two JSON columns containing numbers, it cannot be known in advance whether numbers will be integer or double. To provide the most consistent behavior across all rows, MySQL converts approximate-value numbers to exact-value numbers. The resulting ordering is consistent and does not lose precision for the exact-value numbers. For example, given the scalars 9223372036854775805, 9223372036854775806, 9223372036854775807 and 9.223372036854776e18, the order is such as this: 9223372036854775805 < 9223372036854775806 < 9223372036854775807 < 9.223372036854776e18 = 9223372036854776000 < 9223372036854776001
Were JSON comparisons to use the non-JSON numeric comparison rules, inconsistent ordering could occur. The usual MySQL comparison rules for numbers yield these orderings:
1605
Converting between JSON and non-JSON values
• Integer comparison: 9223372036854775805 < 9223372036854775806 < 9223372036854775807
(not defined for 9.223372036854776e18) • Double comparison: 9223372036854775805 = 9223372036854775806 = 9223372036854775807 = 9.223372036854776e18
For comparison of any JSON value to SQL NULL, the result is UNKNOWN. For comparison of JSON and non-JSON values, the non-JSON value is converted to JSON according to the rules in the following table, then the values compared as described previously.
Converting between JSON and non-JSON values The following table provides a summary of the rules that MySQL follows when casting between JSON values and values of other types: Table 11.3 JSON Conversion Rules other type
CAST(other type AS JSON)
CAST(JSON AS other type)
JSON
No change
No change
utf8 character type The string is parsed into a JSON value. The JSON value is serialized into a (utf8mb4, utf8, utf8mb4 string. ascii) Other character types
Other character encodings are implicitly The JSON value is serialized into a converted to utf8mb4 and treated as utf8mb4 string, then cast to the other described for utf8 character type. character encoding. The result may not be meaningful.
NULL
Results in a NULL value of type JSON.
Not applicable.
Geometry types
The geometry value is converted into a JSON document by calling ST_AsGeoJSON().
Illegal operation. Workaround: Pass the result of CAST(json_val AS CHAR) to ST_GeomFromGeoJSON().
All other types
Results in a JSON document consisting Succeeds if the JSON document of a single scalar value. consists of a single scalar value of the target type and that scalar value can be cast to the target type. Otherwise, returns NULL and produces a warning.
ORDER BY and GROUP BY for JSON values works according to these principles: • Ordering of scalar JSON values uses the same rules as in the preceding discussion. • For ascending sorts, SQL NULL orders before all JSON values, including the JSON null literal; for descending sorts, SQL NULL orders after all JSON values, including the JSON null literal. • Sort keys for JSON values are bound by the value of the max_sort_length system variable, so keys that differ only after the first max_sort_length bytes compare as equal. • Sorting of nonscalar values is not currently supported and a warning occurs. For sorting, it can be beneficial to cast a JSON scalar to some other native MySQL type. For example, if a column named jdoc contains JSON objects having a member consisting of an id key and a nonnegative value, use this expression to sort by id values:
1606
Aggregation of JSON Values
ORDER BY CAST(JSON_EXTRACT(jdoc, '$.id') AS UNSIGNED)
If there happens to be a generated column defined to use the same expression as in the ORDER BY, the MySQL optimizer recognizes that and considers using the index for the query execution plan. See Section 8.3.10, “Optimizer Use of Generated Column Indexes”.
Aggregation of JSON Values For aggregation of JSON values, SQL NULL values are ignored as for other data types. Non-NULL values are converted to a numeric type and aggregated, except for MIN(), MAX(), and GROUP_CONCAT(). The conversion to number should produce a meaningful result for JSON values that are numeric scalars, although (depending on the values) truncation and loss of precision may occur. Conversion to number of other JSON values may not produce a meaningful result.
11.7 Data Type Default Values Data type specifications can have explicit or implicit default values. • Handling of Explicit Defaults • Handling of Implicit Defaults
Handling of Explicit Defaults A DEFAULT value clause in a data type specification explicitly indicates a default value for a column. Examples: CREATE TABLE t1 ( i INT DEFAULT -1, c VARCHAR(10) DEFAULT '', price DOUBLE(16,2) DEFAULT '0.00' );
SERIAL DEFAULT VALUE is a special case. In the definition of an integer column, it is an alias for NOT NULL AUTO_INCREMENT UNIQUE. With one exception, the default value specified in a DEFAULT clause must be a literal constant; it cannot be a function or an expression. This means, for example, that you cannot set the default for a date column to be the value of a function such as NOW() or CURRENT_DATE. The exception is that, for TIMESTAMP and DATETIME columns, you can specify CURRENT_TIMESTAMP as the default. See Section 11.3.5, “Automatic Initialization and Updating for TIMESTAMP and DATETIME”. The BLOB, TEXT, GEOMETRY, and JSON data types cannot be assigned a default value.
Handling of Implicit Defaults If a data type specification includes no explicit DEFAULT value, MySQL determines the default value as follows: If the column can take NULL as a value, the column is defined with an explicit DEFAULT NULL clause. If the column cannot take NULL as a value, MySQL defines the column with no explicit DEFAULT clause. Exception: If the column is defined as part of a PRIMARY KEY but not explicitly as NOT NULL, MySQL creates it as a NOT NULL column (because PRIMARY KEY columns must be NOT NULL). For data entry into a NOT NULL column that has no explicit DEFAULT clause, if an INSERT or REPLACE statement includes no value for the column, or an UPDATE statement sets the column to NULL, MySQL handles the column according to the SQL mode in effect at the time:
1607
Data Type Storage Requirements
• If strict SQL mode is enabled, an error occurs for transactional tables and the statement is rolled back. For nontransactional tables, an error occurs, but if this happens for the second or subsequent row of a multiple-row statement, the preceding rows will have been inserted. • If strict mode is not enabled, MySQL sets the column to the implicit default value for the column data type. Suppose that a table t is defined as follows: CREATE TABLE t (i INT NOT NULL);
In this case, i has no explicit default, so in strict mode each of the following statements produce an error and no row is inserted. When not using strict mode, only the third statement produces an error; the implicit default is inserted for the first two statements, but the third fails because DEFAULT(i) cannot produce a value: INSERT INTO t VALUES(); INSERT INTO t VALUES(DEFAULT); INSERT INTO t VALUES(DEFAULT(i));
See Section 5.1.10, “Server SQL Modes”. For a given table, the SHOW CREATE TABLE statement displays which columns have an explicit DEFAULT clause. Implicit defaults are defined as follows: • For numeric types, the default is 0, with the exception that for integer or floating-point types declared with the AUTO_INCREMENT attribute, the default is the next value in the sequence. • For date and time types other than TIMESTAMP, the default is the appropriate “zero” value for the type. This is also true for TIMESTAMP if the explicit_defaults_for_timestamp system variable is enabled (see Section 5.1.7, “Server System Variables”). Otherwise, for the first TIMESTAMP column in a table, the default value is the current date and time. See Section 11.3, “Date and Time Types”. • For string types other than ENUM, the default value is the empty string. For ENUM, the default is the first enumeration value.
11.8 Data Type Storage Requirements • InnoDB Table Storage Requirements • NDB Table Storage Requirements • Numeric Type Storage Requirements • Date and Time Type Storage Requirements • String Type Storage Requirements • Spatial Type Storage Requirements • JSON Storage Requirements The storage requirements for table data on disk depend on several factors. Different storage engines represent data types and store raw data differently. Table data might be compressed, either for a column or an entire row, complicating the calculation of storage requirements for a table or column. Despite differences in storage layout on disk, the internal MySQL APIs that communicate and exchange information about table rows use a consistent data structure that applies across all storage engines.
1608
InnoDB Table Storage Requirements
This section includes guidelines and information for the storage requirements for each data type supported by MySQL, including the internal format and size for storage engines that use a fixed-size representation for data types. Information is listed by category or storage engine. The internal representation of a table has a maximum row size of 65,535 bytes, even if the storage engine is capable of supporting larger rows. This figure excludes BLOB or TEXT columns, which contribute only 9 to 12 bytes toward this size. For BLOB and TEXT data, the information is stored internally in a different area of memory than the row buffer. Different storage engines handle the allocation and storage of this data in different ways, according to the method they use for handling the corresponding types. For more information, see Chapter 15, Alternative Storage Engines, and Section C.10.4, “Limits on Table Column Count and Row Size”.
InnoDB Table Storage Requirements See Section 14.11, “InnoDB Row Formats” for information about storage requirements for InnoDB tables.
NDB Table Storage Requirements Important NDB tables use 4-byte alignment; all NDB data storage is done in multiples of 4 bytes. Thus, a column value that would typically take 15 bytes requires 16 bytes in an NDB table. For example, in NDB tables, the TINYINT, SMALLINT, MEDIUMINT, and INTEGER (INT) column types each require 4 bytes storage per record due to the alignment factor. Each BIT(M) column takes M bits of storage space. Although an individual BIT column is not 4-byte aligned, NDB reserves 4 bytes (32 bits) per row for the first 1-32 bits needed for BIT columns, then another 4 bytes for bits 33-64, and so on. While a NULL itself does not require any storage space, NDB reserves 4 bytes per row if the table definition contains any columns defined as NULL, up to 32 NULL columns. (If an NDB Cluster table is defined with more than 32 NULL columns up to 64 NULL columns, then 8 bytes per row are reserved.) Every table using the NDB storage engine requires a primary key; if you do not define a primary key, a “hidden” primary key is created by NDB. This hidden primary key consumes 31-35 bytes per table record. You can use the ndb_size.pl Perl script to estimate NDB storage requirements. It connects to a current MySQL (not NDB Cluster) database and creates a report on how much space that database would require if it used the NDB storage engine. See Section 21.4.29, “ndb_size.pl — NDBCLUSTER Size Requirement Estimator” for more information.
Numeric Type Storage Requirements Data Type
Storage Required
TINYINT
1 byte
SMALLINT
2 bytes
MEDIUMINT
3 bytes
INT, INTEGER
4 bytes
BIGINT
8 bytes
FLOAT(p)
4 bytes if 0 <= p <= 24, 8 bytes if 25 <= p <= 53
FLOAT
4 bytes
1609
Date and Time Type Storage Requirements
Data Type
Storage Required
DOUBLE [PRECISION], REAL
8 bytes
DECIMAL(M,D), NUMERIC(M,D)
Varies; see following discussion
BIT(M)
approximately (M+7)/8 bytes
Values for DECIMAL (and NUMERIC) columns are represented using a binary format that packs nine decimal (base 10) digits into four bytes. Storage for the integer and fractional parts of each value are determined separately. Each multiple of nine digits requires four bytes, and the “leftover” digits require some fraction of four bytes. The storage required for excess digits is given by the following table. Leftover Digits
Number of Bytes
0
0
1
1
2
1
3
2
4
2
5
3
6
3
7
4
8
4
Date and Time Type Storage Requirements For TIME, DATETIME, and TIMESTAMP columns, the storage required for tables created before MySQL 5.6.4 differs from tables created from 5.6.4 on. This is due to a change in 5.6.4 that permits these types to have a fractional part, which requires from 0 to 3 bytes. Data Type
Storage Required Before MySQL 5.6.4
Storage Required as of MySQL 5.6.4
YEAR
1 byte
1 byte
DATE
3 bytes
3 bytes
TIME
3 bytes
3 bytes + fractional seconds storage
DATETIME
8 bytes
5 bytes + fractional seconds storage
TIMESTAMP
4 bytes
4 bytes + fractional seconds storage
As of MySQL 5.6.4, storage for YEAR and DATE remains unchanged. However, TIME, DATETIME, and TIMESTAMP are represented differently. DATETIME is packed more efficiently, requiring 5 rather than 8 bytes for the nonfractional part, and all three parts have a fractional part that requires from 0 to 3 bytes, depending on the fractional seconds precision of stored values. Fractional Seconds Precision
Storage Required
0
0 bytes
1, 2
1 byte
3, 4
2 bytes
5, 6
3 bytes
For example, TIME(0), TIME(2), TIME(4), and TIME(6) use 3, 4, 5, and 6 bytes, respectively. TIME and TIME(0) are equivalent and require the same storage. For details about internal representation of temporal values, see MySQL Internals: Important Algorithms and Structures.
1610
String Type Storage Requirements
String Type Storage Requirements In the following table, M represents the declared column length in characters for nonbinary string types and bytes for binary string types. L represents the actual length in bytes of a given string value. Data Type
Storage Required
CHAR(M)
The compact family of InnoDB row formats optimize storage for variable-length character sets. See COMPACT Row Format Storage Characteristics. Otherwise, M × w bytes, <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set.
BINARY(M)
M bytes, 0 <= M <= 255
VARCHAR(M), VARBINARY(M)
L + 1 bytes if column values require 0 − 255 bytes, L + 2 bytes if values may require more than 255 bytes
TINYBLOB, TINYTEXT
L + 1 bytes, where L < 2
BLOB, TEXT
L + 2 bytes, where L < 2
MEDIUMBLOB, MEDIUMTEXT
L + 3 bytes, where L < 2
LONGBLOB, LONGTEXT
L + 4 bytes, where L < 2
ENUM('value1','value2',...)
1 or 2 bytes, depending on the number of enumeration values (65,535 values maximum)
SET('value1','value2',...)
1, 2, 3, 4, or 8 bytes, depending on the number of set members (64 members maximum)
8 16 24 32
Variable-length string types are stored using a length prefix plus data. The length prefix requires from one to four bytes depending on the data type, and the value of the prefix is L (the byte length of the string). For example, storage for a MEDIUMTEXT value requires L bytes to store the value plus three bytes to store the length of the value. To calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multibyte characters. In particular, when using a utf8 Unicode character set, you must keep in mind that not all characters use the same number of bytes. utf8mb3 and utf8mb4 character sets can require up to three and four bytes per character, respectively. For a breakdown of the storage used for different categories of utf8mb3 or utf8mb4 characters, see Section 10.9, “Unicode Support”. VARCHAR, VARBINARY, and the BLOB and TEXT types are variable-length types. For each, the storage requirements depend on these factors: • The actual length of the column value • The column's maximum possible length • The character set used for the column, because some character sets contain multibyte characters For example, a VARCHAR(255) column can hold a string with a maximum length of 255 characters. Assuming that the column uses the latin1 character set (one byte per character), the actual storage required is the length of the string (L), plus one byte to record the length of the string. For the string 'abcd', L is 4 and the storage requirement is five bytes. If the same column is instead declared to use the ucs2 double-byte character set, the storage requirement is 10 bytes: The length of 'abcd' is eight bytes and the column requires two bytes to store lengths because the maximum length is greater than 255 (up to 510 bytes). The effective maximum number of bytes that can be stored in a VARCHAR or VARBINARY column is subject to the maximum row size of 65,535 bytes, which is shared among all columns. For a VARCHAR column that stores multibyte characters, the effective maximum number of characters is less. For example, utf8mb3 characters can require up to three bytes per character, so a VARCHAR column
1611
Spatial Type Storage Requirements
that uses the utf8mb3 character set can be declared to be a maximum of 21,844 characters. See Section C.10.4, “Limits on Table Column Count and Row Size”. InnoDB encodes fixed-length fields greater than or equal to 768 bytes in length as variable-length fields, which can be stored off-page. For example, a CHAR(255) column can exceed 768 bytes if the maximum byte length of the character set is greater than 3, as it is with utf8mb4. The NDB storage engine supports variable-width columns. This means that a VARCHAR column in an NDB Cluster table requires the same amount of storage as would any other storage engine, with the exception that such values are 4-byte aligned. Thus, the string 'abcd' stored in a VARCHAR(50) column using the latin1 character set requires 8 bytes (rather than 5 bytes for the same column value in a MyISAM table). TEXT and BLOB columns are implemented differently in the NDB storage engine, wherein each row in a TEXT column is made up of two separate parts. One of these is of fixed size (256 bytes), and is actually stored in the original table. The other consists of any data in excess of 256 bytes, which is stored in a hidden table. The rows in this second table are always 2000 bytes long. This means that the size of a TEXT column is 256 if size <= 256 (where size represents the size of the row); otherwise, the size is 256 + size + (2000 × (size − 256) % 2000). The size of an ENUM object is determined by the number of different enumeration values. One byte is used for enumerations with up to 255 possible values. Two bytes are used for enumerations having between 256 and 65,535 possible values. See Section 11.4.4, “The ENUM Type”. The size of a SET object is determined by the number of different set members. If the set size is N, the object occupies (N+7)/8 bytes, rounded up to 1, 2, 3, 4, or 8 bytes. A SET can have a maximum of 64 members. See Section 11.4.5, “The SET Type”.
Spatial Type Storage Requirements MySQL stores geometry values using 4 bytes to indicate the SRID followed by the WKB representation of the value. The LENGTH() function returns the space in bytes required for value storage. For descriptions of WKB and internal storage formats for spatial values, see Section 11.5.3, “Supported Spatial Data Formats”.
JSON Storage Requirements In general, the storage requirement for a JSON column is approximately the same as for a LONGBLOB or LONGTEXT column; that is, the space consumed by a JSON document is roughly the same as it would be for the document's string representation stored in a column of one of these types. However, there is an overhead imposed by the binary encoding, including metadata and dictionaries needed for lookup, of the individual values stored in the JSON document. For example, a string stored in a JSON document requires 4 to 10 bytes additional storage, depending on the length of the string and the size of the object or array in which it is stored. In addition, MySQL imposes a limit on the size of any JSON document stored in a JSON column such that it cannot be any larger than the value of max_allowed_packet.
11.9 Choosing the Right Type for a Column For optimum storage, you should try to use the most precise type in all cases. For example, if an integer column is used for values in the range from 1 to 99999, MEDIUMINT UNSIGNED is the best type. Of the types that represent all the required values, this type uses the least amount of storage. All basic calculations (+, -, *, and /) with DECIMAL columns are done with precision of 65 decimal (base 10) digits. See Section 11.1.1, “Numeric Type Overview”. If accuracy is not too important or if speed is the highest priority, the DOUBLE type may be good enough. For high precision, you can always convert to a fixed-point type stored in a BIGINT. This
1612
Using Data Types from Other Database Engines
enables you to do all calculations with 64-bit integers and then convert results back to floating-point values as necessary.
11.10 Using Data Types from Other Database Engines To facilitate the use of code written for SQL implementations from other vendors, MySQL maps data types as shown in the following table. These mappings make it easier to import table definitions from other database systems into MySQL. Other Vendor Type
MySQL Type
BOOL
TINYINT
BOOLEAN
TINYINT
CHARACTER VARYING(M)
VARCHAR(M)
FIXED
DECIMAL
FLOAT4
FLOAT
FLOAT8
DOUBLE
INT1
TINYINT
INT2
SMALLINT
INT3
MEDIUMINT
INT4
INT
INT8
BIGINT
LONG VARBINARY
MEDIUMBLOB
LONG VARCHAR
MEDIUMTEXT
LONG
MEDIUMTEXT
MIDDLEINT
MEDIUMINT
NUMERIC
DECIMAL
Data type mapping occurs at table creation time, after which the original type specifications are discarded. If you create a table with types used by other vendors and then issue a DESCRIBE tbl_name statement, MySQL reports the table structure using the equivalent MySQL types. For example: mysql> CREATE TABLE t (a BOOL, b FLOAT8, c LONG VARCHAR, d NUMERIC); Query OK, 0 rows affected (0.00 sec) mysql> DESCRIBE t; +-------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+---------------+------+-----+---------+-------+ | a | tinyint(1) | YES | | NULL | | | b | double | YES | | NULL | | | c | mediumtext | YES | | NULL | | | d | decimal(10,0) | YES | | NULL | | +-------+---------------+------+-----+---------+-------+ 4 rows in set (0.01 sec)
1613
1614
Chapter 12 Functions and Operators Table of Contents 12.1 Function and Operator Reference .................................................................................... 12.2 Type Conversion in Expression Evaluation ....................................................................... 12.3 Operators ........................................................................................................................ 12.3.1 Operator Precedence ............................................................................................ 12.3.2 Comparison Functions and Operators .................................................................... 12.3.3 Logical Operators ................................................................................................. 12.3.4 Assignment Operators ........................................................................................... 12.4 Control Flow Functions .................................................................................................... 12.5 String Functions .............................................................................................................. 12.5.1 String Comparison Functions ................................................................................. 12.5.2 Regular Expressions ............................................................................................. 12.5.3 Character Set and Collation of Function Results ..................................................... 12.6 Numeric Functions and Operators .................................................................................... 12.6.1 Arithmetic Operators ............................................................................................. 12.6.2 Mathematical Functions ......................................................................................... 12.7 Date and Time Functions ................................................................................................. 12.8 What Calendar Is Used By MySQL? ................................................................................ 12.9 Full-Text Search Functions .............................................................................................. 12.9.1 Natural Language Full-Text Searches .................................................................... 12.9.2 Boolean Full-Text Searches .................................................................................. 12.9.3 Full-Text Searches with Query Expansion .............................................................. 12.9.4 Full-Text Stopwords .............................................................................................. 12.9.5 Full-Text Restrictions ............................................................................................ 12.9.6 Fine-Tuning MySQL Full-Text Search .................................................................... 12.9.7 Adding a Collation for Full-Text Indexing ................................................................ 12.9.8 ngram Full-Text Parser ......................................................................................... 12.9.9 MeCab Full-Text Parser Plugin .............................................................................. 12.10 Cast Functions and Operators ........................................................................................ 12.11 XML Functions .............................................................................................................. 12.12 Bit Functions and Operators ........................................................................................... 12.13 Encryption and Compression Functions .......................................................................... 12.14 Locking Functions .......................................................................................................... 12.15 Information Functions ..................................................................................................... 12.16 Spatial Analysis Functions ............................................................................................. 12.16.1 Spatial Function Reference ................................................................................. 12.16.2 Argument Handling by Spatial Functions .............................................................. 12.16.3 Functions That Create Geometry Values from WKT Values ................................... 12.16.4 Functions That Create Geometry Values from WKB Values ................................... 12.16.5 MySQL-Specific Functions That Create Geometry Values ..................................... 12.16.6 Geometry Format Conversion Functions .............................................................. 12.16.7 Geometry Property Functions .............................................................................. 12.16.8 Spatial Operator Functions .................................................................................. 12.16.9 Functions That Test Spatial Relations Between Geometry Objects ......................... 12.16.10 Spatial Geohash Functions ................................................................................ 12.16.11 Spatial GeoJSON Functions .............................................................................. 12.16.12 Spatial Convenience Functions .......................................................................... 12.17 JSON Functions ............................................................................................................ 12.17.1 JSON Function Reference ................................................................................... 12.17.2 Functions That Create JSON Values ................................................................... 12.17.3 Functions That Search JSON Values ................................................................... 12.17.4 Functions That Modify JSON Values .................................................................... 12.17.5 Functions That Return JSON Value Attributes ......................................................
1617 1629 1632 1633 1634 1640 1641 1642 1644 1660 1663 1669 1670 1671 1673 1682 1702 1702 1704 1707 1712 1713 1717 1718 1721 1723 1725 1729 1735 1745 1748 1757 1759 1769 1769 1773 1774 1777 1780 1780 1782 1791 1794 1799 1801 1802 1805 1805 1806 1807 1816 1825
1615
12.17.6 JSON Utility Functions ........................................................................................ 12.18 Functions Used with Global Transaction Identifiers (GTIDs) ............................................. 12.19 MySQL Enterprise Encryption Functions ......................................................................... 12.19.1 MySQL Enterprise Encryption Installation ............................................................. 12.19.2 MySQL Enterprise Encryption Usage and Examples ............................................. 12.19.3 MySQL Enterprise Encryption Function Reference ................................................ 12.19.4 MySQL Enterprise Encryption Function Descriptions ............................................. 12.20 Aggregate (GROUP BY) Functions ................................................................................. 12.20.1 Aggregate (GROUP BY) Function Descriptions .................................................... 12.20.2 GROUP BY Modifiers ......................................................................................... 12.20.3 MySQL Handling of GROUP BY .......................................................................... 12.20.4 Detection of Functional Dependence .................................................................... 12.21 Miscellaneous Functions ................................................................................................ 12.22 Precision Math .............................................................................................................. 12.22.1 Types of Numeric Values .................................................................................... 12.22.2 DECIMAL Data Type Characteristics ................................................................... 12.22.3 Expression Handling ........................................................................................... 12.22.4 Rounding Behavior ............................................................................................. 12.22.5 Precision Math Examples ....................................................................................
1827 1830 1833 1833 1834 1836 1836 1840 1840 1847 1850 1853 1856 1864 1865 1865 1866 1868 1868
Expressions can be used at several points in SQL statements, such as in the ORDER BY or HAVING clauses of SELECT statements, in the WHERE clause of a SELECT, DELETE, or UPDATE statement, or in SET statements. Expressions can be written using literal values, column values, NULL, built-in functions, stored functions, user-defined functions, and operators. This chapter describes the functions and operators that are permitted for writing expressions in MySQL. Instructions for writing stored functions and user-defined functions are given in Section 23.2, “Using Stored Routines (Procedures and Functions)”, and Section 28.4, “Adding New Functions to MySQL”. See Section 9.2.4, “Function Name Parsing and Resolution”, for the rules describing how the server interprets references to different kinds of functions. An expression that contains NULL always produces a NULL value unless otherwise indicated in the documentation for a particular function or operator. Note By default, there must be no whitespace between a function name and the parenthesis following it. This helps the MySQL parser distinguish between function calls and references to tables or columns that happen to have the same name as a function. However, spaces around function arguments are permitted. You can tell the MySQL server to accept spaces after function names by starting it with the --sqlmode=IGNORE_SPACE option. (See Section 5.1.10, “Server SQL Modes”.) Individual client programs can request this behavior by using the CLIENT_IGNORE_SPACE option for mysql_real_connect(). In either case, all function names become reserved words. For the sake of brevity, most examples in this chapter display the output from the mysql program in abbreviated form. Rather than showing examples in this format: mysql> SELECT MOD(29,9); +-----------+ | mod(29,9) | +-----------+ | 2 | +-----------+ 1 rows in set (0.00 sec)
This format is used instead: mysql> SELECT MOD(29,9); -> 2
1616
Function and Operator Reference
12.1 Function and Operator Reference Table 12.1 Functions and Operators Name
Description
ABS()
Return the absolute value
ACOS()
Return the arc cosine
ADDDATE()
Add time values (intervals) to a date value
ADDTIME()
Add time
AES_DECRYPT()
Decrypt using AES
AES_ENCRYPT()
Encrypt using AES
AND, &&
Logical AND
ANY_VALUE()
Suppress ONLY_FULL_GROUP_BY value rejection
Area() (deprecated 5.7.6)
Return Polygon or MultiPolygon area
AsBinary(), AsWKB() (deprecated 5.7.6)
Convert from internal geometry format to WKB
ASCII()
Return numeric value of left-most character
ASIN()
Return the arc sine
=
Assign a value (as part of a SET statement, or as part of the SET clause in an UPDATE statement)
:=
Assign a value
AsText(), AsWKT() (deprecated 5.7.6)
Convert from internal geometry format to WKT
ASYMMETRIC_DECRYPT()
Decrypt ciphertext using private or public key
ASYMMETRIC_DERIVE()
Derive symmetric key from asymmetric keys
ASYMMETRIC_ENCRYPT()
Encrypt cleartext using private or public key
ASYMMETRIC_SIGN()
Generate signature from digest
ASYMMETRIC_VERIFY()
Verify that signature matches digest
ATAN()
Return the arc tangent
ATAN2(), ATAN()
Return the arc tangent of the two arguments
AVG()
Return the average value of the argument
BENCHMARK()
Repeatedly execute an expression
BETWEEN ... AND ...
Check whether a value is within a range of values
BIN()
Return a string containing binary representation of a number
BINARY
Cast a string to a binary string
BIT_AND()
Return bitwise AND
BIT_COUNT()
Return the number of bits that are set
BIT_LENGTH()
Return length of argument in bits
BIT_OR()
Return bitwise OR
BIT_XOR()
Return bitwise XOR
&
Bitwise AND
~
Bitwise inversion
|
Bitwise OR
^
Bitwise XOR
1617
Function and Operator Reference
1618
Name
Description
Buffer() (deprecated 5.7.6)
Return geometry of points within given distance from geometry
CASE
Case operator
CAST()
Cast a value as a certain type
CEIL()
Return the smallest integer value not less than the argument
CEILING()
Return the smallest integer value not less than the argument
Centroid() (deprecated 5.7.6)
Return centroid as a point
CHAR()
Return the character for each integer passed
CHAR_LENGTH()
Return number of characters in argument
CHARACTER_LENGTH()
Synonym for CHAR_LENGTH()
CHARSET()
Return the character set of the argument
COALESCE()
Return the first non-NULL argument
COERCIBILITY()
Return the collation coercibility value of the string argument
COLLATION()
Return the collation of the string argument
COMPRESS()
Return result as a binary string
CONCAT()
Return concatenated string
CONCAT_WS()
Return concatenate with separator
CONNECTION_ID()
Return the connection ID (thread ID) for the connection
Contains() (deprecated 5.7.6)
Whether MBR of one geometry contains MBR of another
CONV()
Convert numbers between different number bases
CONVERT()
Cast a value as a certain type
CONVERT_TZ()
Convert from one time zone to another
ConvexHull() (deprecated 5.7.6)
Return convex hull of geometry
COS()
Return the cosine
COT()
Return the cotangent
COUNT()
Return a count of the number of rows returned
COUNT(DISTINCT)
Return the count of a number of different values
CRC32()
Compute a cyclic redundancy check value
CREATE_ASYMMETRIC_PRIV_KEY()
Create private key
CREATE_ASYMMETRIC_PUB_KEY()
Create public key
CREATE_DH_PARAMETERS()
Generate shared DH secret
CREATE_DIGEST()
Generate digest from string
Crosses() (deprecated 5.7.6)
Whether one geometry crosses another
CURDATE()
Return the current date
CURRENT_DATE(), CURRENT_DATE
Synonyms for CURDATE()
CURRENT_TIME(), CURRENT_TIME
Synonyms for CURTIME()
CURRENT_TIMESTAMP(), CURRENT_TIMESTAMP
Synonyms for NOW()
CURRENT_USER(), CURRENT_USER
The authenticated user name and host name
CURTIME()
Return the current time
DATABASE()
Return the default (current) database name
Function and Operator Reference
Name
Description
DATE()
Extract the date part of a date or datetime expression
DATE_ADD()
Add time values (intervals) to a date value
DATE_FORMAT()
Format date as specified
DATE_SUB()
Subtract a time value (interval) from a date
DATEDIFF()
Subtract two dates
DAY()
Synonym for DAYOFMONTH()
DAYNAME()
Return the name of the weekday
DAYOFMONTH()
Return the day of the month (0-31)
DAYOFWEEK()
Return the weekday index of the argument
DAYOFYEAR()
Return the day of the year (1-366)
DECODE() (deprecated 5.7.2)
Decode a string encrypted using ENCODE()
DEFAULT()
Return the default value for a table column
DEGREES()
Convert radians to degrees
DES_DECRYPT() (deprecated 5.7.6)
Decrypt a string
DES_ENCRYPT() (deprecated 5.7.6)
Encrypt a string
Dimension() (deprecated 5.7.6)
Dimension of geometry
Disjoint() (deprecated 5.7.6)
Whether MBRs of two geometries are disjoint
Distance() (deprecated 5.7.6)
The distance of one geometry from another
DIV
Integer division
/
Division operator
ELT()
Return string at index number
ENCODE() (deprecated 5.7.2)
Encode a string
ENCRYPT() (deprecated 5.7.6)
Encrypt a string
EndPoint() (deprecated 5.7.6)
End Point of LineString
Envelope() (deprecated 5.7.6)
Return MBR of geometry
=
Equal operator
<=>
NULL-safe equal to operator
Equals() (deprecated 5.7.6)
Whether MBRs of two geometries are equal
EXP()
Raise to the power of
EXPORT_SET()
Return a string such that for every bit set in the value bits, you get an on string and for every unset bit, you get an off string
ExteriorRing() (deprecated 5.7.6)
Return exterior ring of Polygon
EXTRACT()
Extract part of a date
ExtractValue()
Extract a value from an XML string using XPath notation
FIELD()
Index (position) of first argument in subsequent arguments
FIND_IN_SET()
Index (position) of first argument within second argument
FLOOR()
Return the largest integer value not greater than the argument
FORMAT()
Return a number formatted to specified number of decimal places
1619
Function and Operator Reference
Name
Description
FOUND_ROWS()
For a SELECT with a LIMIT clause, the number of rows that would be returned were there no LIMIT clause
FROM_BASE64()
Decode base64 encoded string and return result
FROM_DAYS()
Convert a day number to a date
FROM_UNIXTIME()
Format Unix timestamp as a date
GeomCollFromText(), GeometryCollectionFromText() (deprecated 5.7.6)
Return geometry collection from WKT
GeomCollFromWKB(), GeometryCollectionFromWKB() (deprecated 5.7.6)
Return geometry collection from WKB
GeometryCollection()
Construct geometry collection from geometries
GeometryN() (deprecated 5.7.6)
Return N-th geometry from geometry collection
GeometryType() (deprecated 5.7.6)
Return name of geometry type
GeomFromText(), GeometryFromText() (deprecated 5.7.6)
Return geometry from WKT
GeomFromWKB(), GeometryFromWKB() (deprecated 5.7.6)
Return geometry from WKB
GET_FORMAT()
Return a date format string
GET_LOCK()
Get a named lock
GLength() (deprecated 5.7.6)
Return length of LineString
>
Greater than operator
>=
Greater than or equal operator
GREATEST()
Return the largest argument
GROUP_CONCAT()
Return a concatenated string
GTID_SUBSET()
Return true if all GTIDs in subset are also in set; otherwise false.
GTID_SUBTRACT()
Return all GTIDs in set that are not in subset.
HEX()
Hexadecimal representation of decimal or string value
HOUR()
Extract the hour
IF()
If/else construct
IFNULL()
Null if/else construct
IN()
Check whether a value is within a set of values
INET_ATON()
Return the numeric value of an IP address
INET_NTOA()
Return the IP address from a numeric value
INET6_ATON()
Return the numeric value of an IPv6 address
INET6_NTOA()
Return the IPv6 address from a numeric value
INSERT()
Insert substring at specified position up to specified number of characters
INSTR()
Return the index of the first occurrence of substring
InteriorRingN() (deprecated 5.7.6) Return N-th interior ring of Polygon Intersects() (deprecated 5.7.6)
1620
Whether MBRs of two geometries intersect
Function and Operator Reference
Name
Description
INTERVAL()
Return the index of the argument that is less than the first argument
IS
Test a value against a boolean
IS_FREE_LOCK()
Whether the named lock is free
IS_IPV4()
Whether argument is an IPv4 address
IS_IPV4_COMPAT()
Whether argument is an IPv4-compatible address
IS_IPV4_MAPPED()
Whether argument is an IPv4-mapped address
IS_IPV6()
Whether argument is an IPv6 address
IS NOT
Test a value against a boolean
IS NOT NULL
NOT NULL value test
IS NULL
NULL value test
IS_USED_LOCK()
Whether the named lock is in use; return connection identifier if true
IsClosed() (deprecated 5.7.6)
Whether a geometry is closed and simple
IsEmpty() (deprecated 5.7.6)
Placeholder function
ISNULL()
Test whether the argument is NULL
IsSimple() (deprecated 5.7.6)
Whether a geometry is simple
JSON_APPEND() (deprecated 5.7.9)
Append data to JSON document
JSON_ARRAY()
Create JSON array
JSON_ARRAY_APPEND()
Append data to JSON document
JSON_ARRAY_INSERT()
Insert into JSON array
JSON_ARRAYAGG()
Return result set as a single JSON array
->
Return value from JSON column after evaluating path; equivalent to JSON_EXTRACT().
JSON_CONTAINS()
Whether JSON document contains specific object at path
JSON_CONTAINS_PATH()
Whether JSON document contains any data at path
JSON_DEPTH()
Maximum depth of JSON document
JSON_EXTRACT()
Return data from JSON document
->>
Return value from JSON column after evaluating path and unquoting the result; equivalent to JSON_UNQUOTE(JSON_EXTRACT()).
JSON_INSERT()
Insert data into JSON document
JSON_KEYS()
Array of keys from JSON document
JSON_LENGTH()
Number of elements in JSON document
JSON_MERGE() (deprecated 5.7.22)
Merge JSON documents, preserving duplicate keys. Deprecated synonym for JSON_MERGE_PRESERVE()
JSON_MERGE_PATCH()
Merge JSON documents, replacing values of duplicate keys
JSON_MERGE_PRESERVE()
Merge JSON documents, preserving duplicate keys
JSON_OBJECT()
Create JSON object
JSON_OBJECTAGG()
Return result set as a single JSON object
JSON_PRETTY()
Print a JSON document in human-readable format
JSON_QUOTE()
Quote JSON document
1621
Function and Operator Reference
Name
Description
JSON_REMOVE()
Remove data from JSON document
JSON_REPLACE()
Replace values in JSON document
JSON_SEARCH()
Path to value within JSON document
JSON_SET()
Insert data into JSON document
JSON_STORAGE_SIZE()
Space used for storage of binary representation of a JSON document
JSON_TYPE()
Type of JSON value
JSON_UNQUOTE()
Unquote JSON value
JSON_VALID()
Whether JSON value is valid
LAST_DAY
Return the last day of the month for the argument
LAST_INSERT_ID()
Value of the AUTOINCREMENT column for the last INSERT
LCASE()
Synonym for LOWER()
LEAST()
Return the smallest argument
LEFT()
Return the leftmost number of characters as specified
<<
Left shift
LENGTH()
Return the length of a string in bytes
<
Less than operator
<=
Less than or equal operator
LIKE
Simple pattern matching
LineFromText(), Construct LineString from WKT LineStringFromText() (deprecated 5.7.6)
1622
LineFromWKB(), LineStringFromWKB() (deprecated 5.7.6)
Construct LineString from WKB
LineString()
Construct LineString from Point values
LN()
Return the natural logarithm of the argument
LOAD_FILE()
Load the named file
LOCALTIME(), LOCALTIME
Synonym for NOW()
LOCALTIMESTAMP, LOCALTIMESTAMP()
Synonym for NOW()
LOCATE()
Return the position of the first occurrence of substring
LOG()
Return the natural logarithm of the first argument
LOG10()
Return the base-10 logarithm of the argument
LOG2()
Return the base-2 logarithm of the argument
LOWER()
Return the argument in lowercase
LPAD()
Return the string argument, left-padded with the specified string
LTRIM()
Remove leading spaces
MAKE_SET()
Return a set of comma-separated strings that have the corresponding bit in bits set
MAKEDATE()
Create a date from the year and day of year
MAKETIME()
Create time from hour, minute, second
Function and Operator Reference
Name
Description
MASTER_POS_WAIT()
Block until the slave has read and applied all updates up to the specified position
MATCH
Perform full-text search
MAX()
Return the maximum value
MBRContains()
Whether MBR of one geometry contains MBR of another
MBRCoveredBy()
Whether one MBR is covered by another
MBRCovers()
Whether one MBR covers another
MBRDisjoint()
Whether MBRs of two geometries are disjoint
MBREqual() (deprecated 5.7.6)
Whether MBRs of two geometries are equal
MBREquals()
Whether MBRs of two geometries are equal
MBRIntersects()
Whether MBRs of two geometries intersect
MBROverlaps()
Whether MBRs of two geometries overlap
MBRTouches()
Whether MBRs of two geometries touch
MBRWithin()
Whether MBR of one geometry is within MBR of another
MD5()
Calculate MD5 checksum
MICROSECOND()
Return the microseconds from argument
MID()
Return a substring starting from the specified position
MIN()
Return the minimum value
-
Minus operator
MINUTE()
Return the minute from the argument
MLineFromText(), MultiLineStringFromText() (deprecated 5.7.6)
Construct MultiLineString from WKT
MLineFromWKB(), MultiLineStringFromWKB() (deprecated 5.7.6)
Construct MultiLineString from WKB
MOD()
Return the remainder
%, MOD
Modulo operator
MONTH()
Return the month from the date passed
MONTHNAME()
Return the name of the month
MPointFromText(), Construct MultiPoint from WKT MultiPointFromText() (deprecated 5.7.6) MPointFromWKB(), MultiPointFromWKB() (deprecated 5.7.6)
Construct MultiPoint from WKB
MPolyFromText(), MultiPolygonFromText() (deprecated 5.7.6)
Construct MultiPolygon from WKT
MPolyFromWKB(), MultiPolygonFromWKB() (deprecated 5.7.6)
Construct MultiPolygon from WKB
MultiLineString()
Contruct MultiLineString from LineString values
MultiPoint()
Construct MultiPoint from Point values
1623
Function and Operator Reference
Name
Description
MultiPolygon()
Construct MultiPolygon from Polygon values
NAME_CONST()
Cause the column to have the given name
NOT, !
Negates value
NOT BETWEEN ... AND ...
Check whether a value is not within a range of values
!=, <>
Not equal operator
NOT IN()
Check whether a value is not within a set of values
NOT LIKE
Negation of simple pattern matching
NOT REGEXP
Negation of REGEXP
NOW()
Return the current date and time
NULLIF()
Return NULL if expr1 = expr2
NumGeometries() (deprecated 5.7.6) Return number of geometries in geometry collection NumInteriorRings() (deprecated 5.7.6)
Return number of interior rings in Polygon
NumPoints() (deprecated 5.7.6)
Return number of points in LineString
OCT()
Return a string containing octal representation of a number
OCTET_LENGTH()
Synonym for LENGTH()
OLD_PASSWORD()
Return the value of the pre-4.1 implementation of PASSWORD
||, OR
Logical OR
ORD()
Return character code for leftmost character of the argument
Overlaps() (deprecated 5.7.6)
Whether MBRs of two geometries overlap
PASSWORD() (deprecated 5.7.6)
Calculate and return a password string
PERIOD_ADD()
Add a period to a year-month
PERIOD_DIFF()
Return the number of months between periods
PI()
Return the value of pi
+
Addition operator
Point()
Construct Point from coordinates
PointFromText() (deprecated 5.7.6) Construct Point from WKT
1624
PointFromWKB() (deprecated 5.7.6)
Construct Point from WKB
PointN() (deprecated 5.7.6)
Return N-th point from LineString
PolyFromText(), PolygonFromText() (deprecated 5.7.6)
Construct Polygon from WKT
PolyFromWKB(), PolygonFromWKB() (deprecated 5.7.6)
Construct Polygon from WKB
Polygon()
Construct Polygon from LineString arguments
POSITION()
Synonym for LOCATE()
POW()
Return the argument raised to the specified power
POWER()
Return the argument raised to the specified power
PROCEDURE ANALYSE() (deprecated 5.7.18)
Analyze the results of a query
QUARTER()
Return the quarter from a date argument
Function and Operator Reference
Name
Description
QUOTE()
Escape the argument for use in an SQL statement
RADIANS()
Return argument converted to radians
RAND()
Return a random floating-point value
RANDOM_BYTES()
Return a random byte vector
REGEXP
Whether string matches regular expression
RELEASE_ALL_LOCKS()
Release all current named locks
RELEASE_LOCK()
Release the named lock
REPEAT()
Repeat a string the specified number of times
REPLACE()
Replace occurrences of a specified string
REVERSE()
Reverse the characters in a string
RIGHT()
Return the specified rightmost number of characters
>>
Right shift
RLIKE
Whether string matches regular expression
ROUND()
Round the argument
ROW_COUNT()
The number of rows updated
RPAD()
Append string the specified number of times
RTRIM()
Remove trailing spaces
SCHEMA()
Synonym for DATABASE()
SEC_TO_TIME()
Converts seconds to 'HH:MM:SS' format
SECOND()
Return the second (0-59)
SESSION_USER()
Synonym for USER()
SHA1(), SHA()
Calculate an SHA-1 160-bit checksum
SHA2()
Calculate an SHA-2 checksum
SIGN()
Return the sign of the argument
SIN()
Return the sine of the argument
SLEEP()
Sleep for a number of seconds
SOUNDEX()
Return a soundex string
SOUNDS LIKE
Compare sounds
SPACE()
Return a string of the specified number of spaces
SQRT()
Return the square root of the argument
SRID() (deprecated 5.7.6)
Return spatial reference system ID for geometry
ST_Area()
Return Polygon or MultiPolygon area
ST_AsBinary(), ST_AsWKB()
Convert from internal geometry format to WKB
ST_AsGeoJSON()
Generate GeoJSON object from geometry
ST_AsText(), ST_AsWKT()
Convert from internal geometry format to WKT
ST_Buffer()
Return geometry of points within given distance from geometry
ST_Buffer_Strategy()
Produce strategy option for ST_Buffer()
ST_Centroid()
Return centroid as a point
ST_Contains()
Whether one geometry contains another
ST_ConvexHull()
Return convex hull of geometry
1625
Function and Operator Reference
Name
Description
ST_Crosses()
Whether one geometry crosses another
ST_Difference()
Return point set difference of two geometries
ST_Dimension()
Dimension of geometry
ST_Disjoint()
Whether one geometry is disjoint from another
ST_Distance()
The distance of one geometry from another
ST_Distance_Sphere()
Minimum distance on earth between two geometries
ST_EndPoint()
End Point of LineString
ST_Envelope()
Return MBR of geometry
ST_Equals()
Whether one geometry is equal to another
ST_ExteriorRing()
Return exterior ring of Polygon
ST_GeoHash()
Produce a geohash value
ST_GeomCollFromText(), Return geometry collection from WKT ST_GeometryCollectionFromText(), ST_GeomCollFromTxt() ST_GeomCollFromWKB(), Return geometry collection from WKB ST_GeometryCollectionFromWKB()
1626
ST_GeometryN()
Return N-th geometry from geometry collection
ST_GeometryType()
Return name of geometry type
ST_GeomFromGeoJSON()
Generate geometry from GeoJSON object
ST_GeomFromText(), ST_GeometryFromText()
Return geometry from WKT
ST_GeomFromWKB(), ST_GeometryFromWKB()
Return geometry from WKB
ST_InteriorRingN()
Return N-th interior ring of Polygon
ST_Intersection()
Return point set intersection of two geometries
ST_Intersects()
Whether one geometry intersects another
ST_IsClosed()
Whether a geometry is closed and simple
ST_IsEmpty()
Placeholder function
ST_IsSimple()
Whether a geometry is simple
ST_IsValid()
Whether a geometry is valid
ST_LatFromGeoHash()
Return latitude from geohash value
ST_Length()
Return length of LineString
ST_LineFromText(), ST_LineStringFromText()
Construct LineString from WKT
ST_LineFromWKB(), ST_LineStringFromWKB()
Construct LineString from WKB
ST_LongFromGeoHash()
Return longitude from geohash value
ST_MakeEnvelope()
Rectangle around two points
ST_MLineFromText(), ST_MultiLineStringFromText()
Construct MultiLineString from WKT
ST_MLineFromWKB(), ST_MultiLineStringFromWKB()
Construct MultiLineString from WKB
ST_MPointFromText(), ST_MultiPointFromText()
Construct MultiPoint from WKT
Function and Operator Reference
Name
Description
ST_MPointFromWKB(), ST_MultiPointFromWKB()
Construct MultiPoint from WKB
ST_MPolyFromText(), ST_MultiPolygonFromText()
Construct MultiPolygon from WKT
ST_MPolyFromWKB(), ST_MultiPolygonFromWKB()
Construct MultiPolygon from WKB
ST_NumGeometries()
Return number of geometries in geometry collection
ST_NumInteriorRing(), ST_NumInteriorRings()
Return number of interior rings in Polygon
ST_NumPoints()
Return number of points in LineString
ST_Overlaps()
Whether one geometry overlaps another
ST_PointFromGeoHash()
Convert geohash value to POINT value
ST_PointFromText()
Construct Point from WKT
ST_PointFromWKB()
Construct Point from WKB
ST_PointN()
Return N-th point from LineString
ST_PolyFromText(), ST_PolygonFromText()
Construct Polygon from WKT
ST_PolyFromWKB(), ST_PolygonFromWKB()
Construct Polygon from WKB
ST_Simplify()
Return simplified geometry
ST_SRID()
Return spatial reference system ID for geometry
ST_StartPoint()
Start Point of LineString
ST_SymDifference()
Return point set symmetric difference of two geometries
ST_Touches()
Whether one geometry touches another
ST_Union()
Return point set union of two geometries
ST_Validate()
Return validated geometry
ST_Within()
Whether one geometry is within another
ST_X()
Return X coordinate of Point
ST_Y()
Return Y coordinate of Point
StartPoint() (deprecated 5.7.6)
Start Point of LineString
STD()
Return the population standard deviation
STDDEV()
Return the population standard deviation
STDDEV_POP()
Return the population standard deviation
STDDEV_SAMP()
Return the sample standard deviation
STR_TO_DATE()
Convert a string to a date
STRCMP()
Compare two strings
SUBDATE()
Synonym for DATE_SUB() when invoked with three arguments
SUBSTR()
Return the substring as specified
SUBSTRING()
Return the substring as specified
SUBSTRING_INDEX()
Return a substring from a string before the specified number of occurrences of the delimiter
SUBTIME()
Subtract times
1627
Function and Operator Reference
1628
Name
Description
SUM()
Return the sum
SYSDATE()
Return the time at which the function executes
SYSTEM_USER()
Synonym for USER()
TAN()
Return the tangent of the argument
TIME()
Extract the time portion of the expression passed
TIME_FORMAT()
Format as time
TIME_TO_SEC()
Return the argument converted to seconds
TIMEDIFF()
Subtract time
*
Multiplication operator
TIMESTAMP()
With a single argument, this function returns the date or datetime expression; with two arguments, the sum of the arguments
TIMESTAMPADD()
Add an interval to a datetime expression
TIMESTAMPDIFF()
Subtract an interval from a datetime expression
TO_BASE64()
Return the argument converted to a base-64 string
TO_DAYS()
Return the date argument converted to days
TO_SECONDS()
Return the date or datetime argument converted to seconds since Year 0
Touches() (deprecated 5.7.6)
Whether one geometry touches another
TRIM()
Remove leading and trailing spaces
TRUNCATE()
Truncate to specified number of decimal places
UCASE()
Synonym for UPPER()
-
Change the sign of the argument
UNCOMPRESS()
Uncompress a string compressed
UNCOMPRESSED_LENGTH()
Return the length of a string before compression
UNHEX()
Return a string containing hex representation of a number
UNIX_TIMESTAMP()
Return a Unix timestamp
UpdateXML()
Return replaced XML fragment
UPPER()
Convert to uppercase
USER()
The user name and host name provided by the client
UTC_DATE()
Return the current UTC date
UTC_TIME()
Return the current UTC time
UTC_TIMESTAMP()
Return the current UTC date and time
UUID()
Return a Universal Unique Identifier (UUID)
UUID_SHORT()
Return an integer-valued universal identifier
VALIDATE_PASSWORD_STRENGTH()
Determine strength of password
VALUES()
Define the values to be used during an INSERT
VAR_POP()
Return the population standard variance
VAR_SAMP()
Return the sample variance
VARIANCE()
Return the population standard variance
VERSION()
Return a string that indicates the MySQL server version
WAIT_FOR_EXECUTED_GTID_SET()
Wait until the given GTIDs have executed on slave.
Type Conversion in Expression Evaluation
Name
Description
WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS() Wait until the given GTIDs have executed on slave. WEEK()
Return the week number
WEEKDAY()
Return the weekday index
WEEKOFYEAR()
Return the calendar week of the date (1-53)
WEIGHT_STRING()
Return the weight string for a string
Within() (deprecated 5.7.6)
Whether MBR of one geometry is within MBR of another
X() (deprecated 5.7.6)
Return X coordinate of Point
XOR
Logical XOR
Y() (deprecated 5.7.6)
Return Y coordinate of Point
YEAR()
Return the year
YEARWEEK()
Return the year and week
12.2 Type Conversion in Expression Evaluation When an operator is used with operands of different types, type conversion occurs to make the operands compatible. Some conversions occur implicitly. For example, MySQL automatically converts strings to numbers as necessary, and vice versa. mysql> SELECT 1+'1'; -> 2 mysql> SELECT CONCAT(2,' test'); -> '2 test'
It is also possible to convert a number to a string explicitly using the CAST() function. Conversion occurs implicitly with the CONCAT() function because it expects string arguments. mysql> SELECT 38.8, CAST(38.8 AS CHAR); -> 38.8, '38.8' mysql> SELECT 38.8, CONCAT(38.8); -> 38.8, '38.8'
See later in this section for information about the character set of implicit number-to-string conversions, and for modified rules that apply to CREATE TABLE ... SELECT statements. The following rules describe how conversion occurs for comparison operations: • If one or both arguments are NULL, the result of the comparison is NULL, except for the NULL-safe <=> equality comparison operator. For NULL <=> NULL, the result is true. No conversion is needed. • If both arguments in a comparison operation are strings, they are compared as strings. • If both arguments are integers, they are compared as integers. • Hexadecimal values are treated as binary strings if not compared to a number. •
If one of the arguments is a TIMESTAMP or DATETIME column and the other argument is a constant, the constant is converted to a timestamp before the comparison is performed. This is done to be more ODBC-friendly. This is not done for the arguments to IN(). To be safe, always use complete datetime, date, or time strings when doing comparisons. For example, to achieve best results when using BETWEEN with date or time values, use CAST() to explicitly convert the values to the desired data type. A single-row subquery from a table or tables is not considered a constant. For example, if a subquery returns an integer to be compared to a DATETIME value, the comparison is done as two integers.
1629
Type Conversion in Expression Evaluation
The integer is not converted to a temporal value. To compare the operands as DATETIME values, use CAST() to explicitly convert the subquery value to DATETIME. • If one of the arguments is a decimal value, comparison depends on the other argument. The arguments are compared as decimal values if the other argument is a decimal or integer value, or as floating-point values if the other argument is a floating-point value. • In all other cases, the arguments are compared as floating-point (real) numbers. For information about conversion of values from one temporal type to another, see Section 11.3.7, “Conversion Between Date and Time Types”. Comparison of JSON values takes place at two levels. The first level of comparison is based on the JSON types of the compared values. If the types differ, the comparison result is determined solely by which type has higher precedence. If the two values have the same JSON type, a second level of comparison occurs using type-specific rules. For comparison of JSON and non-JSON values, the non-JSON value is converted to JSON and the values compared as JSON values. For details, see Comparison and Ordering of JSON Values. The following examples illustrate conversion of strings to numbers for comparison operations: mysql> SELECT -> 0 mysql> SELECT -> 1 mysql> SELECT -> 0 mysql> SELECT -> 1
1 > '6x'; 7 > '6x'; 0 > 'x6'; 0 = 'x6';
For comparisons of a string column with a number, MySQL cannot use an index on the column to look up the value quickly. If str_col is an indexed string column, the index cannot be used when performing the lookup in the following statement: SELECT * FROM tbl_name WHERE str_col=1;
The reason for this is that there are many different strings that may convert to the value 1, such as '1', ' 1', or '1a'. Comparisons that use floating-point numbers (or values that are converted to floating-point numbers) are approximate because such numbers are inexact. This might lead to results that appear inconsistent: mysql> SELECT '18015376320243458' = 18015376320243458; -> 1 mysql> SELECT '18015376320243459' = 18015376320243459; -> 0
Such results can occur because the values are converted to floating-point numbers, which have only 53 bits of precision and are subject to rounding: mysql> SELECT '18015376320243459'+0.0; -> 1.8015376320243e+16
Furthermore, the conversion from string to floating-point and from integer to floating-point do not necessarily occur the same way. The integer may be converted to floating-point by the CPU, whereas the string is converted digit by digit in an operation that involves floating-point multiplications. The results shown will vary on different systems, and can be affected by factors such as computer architecture or the compiler version or optimization level. One way to avoid such problems is to use CAST() so that a value is not converted implicitly to a float-point number:
1630
Type Conversion in Expression Evaluation
mysql> SELECT CAST('18015376320243459' AS UNSIGNED) = 18015376320243459; -> 1
For more information about floating-point comparisons, see Section B.6.4.8, “Problems with FloatingPoint Values”. The server includes dtoa, a conversion library that provides the basis for improved conversion between string or DECIMAL values and approximate-value (FLOAT/DOUBLE) numbers: • Consistent conversion results across platforms, which eliminates, for example, Unix versus Windows conversion differences. • Accurate representation of values in cases where results previously did not provide sufficient precision, such as for values close to IEEE limits. • Conversion of numbers to string format with the best possible precision. The precision of dtoa is always the same or better than that of the standard C library functions. Because the conversions produced by this library differ in some cases from non-dtoa results, the potential exists for incompatibilities in applications that rely on previous results. For example, applications that depend on a specific exact result from previous conversions might need adjustment to accommodate additional precision. The dtoa library provides conversions with the following properties. D represents a value with a DECIMAL or string representation, and F represents a floating-point number in native binary (IEEE) format. • F -> D conversion is done with the best possible precision, returning D as the shortest string that yields F when read back in and rounded to the nearest value in native binary format as specified by IEEE. • D -> F conversion is done such that F is the nearest native binary number to the input decimal string D. These properties imply that F -> D -> F conversions are lossless unless F is -inf, +inf, or NaN. The latter values are not supported because the SQL standard defines them as invalid values for FLOAT or DOUBLE. For D -> F -> D conversions, a sufficient condition for losslessness is that D uses 15 or fewer digits of precision, is not a denormal value, -inf, +inf, or NaN. In some cases, the conversion is lossless even if D has more than 15 digits of precision, but this is not always the case. Implicit conversion of a numeric or temporal value to string produces a value that has a character set and collation determined by the character_set_connection and collation_connection system variables. (These variables commonly are set with SET NAMES. For information about connection character sets, see Section 10.4, “Connection Character Sets and Collations”.) This means that such a conversion results in a character (nonbinary) string (a CHAR, VARCHAR, or LONGTEXT value), except in the case that the connection character set is set to binary. In that case, the conversion result is a binary string (a BINARY, VARBINARY, or LONGBLOB value). For integer expressions, the preceding remarks about expression evaluation apply somewhat differently for expression assignment; for example, in a statement such as this: CREATE TABLE t SELECT integer_expr;
In this case, the table in the column resulting from the expression has type INT or BIGINT depending on the length of the integer expression. If the maximum length of the expression does not fit in an INT, BIGINT is used instead. The length is taken from the max_length value of the SELECT result set metadata (see Section 27.8.5, “C API Data Structures”). This means that you can force a BIGINT rather than INT by use of a sufficiently long expression:
1631
Operators
CREATE TABLE t SELECT 000000000000000000000;
12.3 Operators Table 12.2 Operators
1632
Name
Description
AND, &&
Logical AND
=
Assign a value (as part of a SET statement, or as part of the SET clause in an UPDATE statement)
:=
Assign a value
BETWEEN ... AND ...
Check whether a value is within a range of values
BINARY
Cast a string to a binary string
&
Bitwise AND
~
Bitwise inversion
|
Bitwise OR
^
Bitwise XOR
CASE
Case operator
DIV
Integer division
/
Division operator
=
Equal operator
<=>
NULL-safe equal to operator
>
Greater than operator
>=
Greater than or equal operator
IS
Test a value against a boolean
IS NOT
Test a value against a boolean
IS NOT NULL
NOT NULL value test
IS NULL
NULL value test
->
Return value from JSON column after evaluating path; equivalent to JSON_EXTRACT().
->>
Return value from JSON column after evaluating path and unquoting the result; equivalent to JSON_UNQUOTE(JSON_EXTRACT()).
<<
Left shift
<
Less than operator
<=
Less than or equal operator
LIKE
Simple pattern matching
-
Minus operator
%, MOD
Modulo operator
NOT, !
Negates value
NOT BETWEEN ... AND ...
Check whether a value is not within a range of values
!=, <>
Not equal operator
NOT LIKE
Negation of simple pattern matching
NOT REGEXP
Negation of REGEXP
||, OR
Logical OR
Operator Precedence
Name
Description
+
Addition operator
REGEXP
Whether string matches regular expression
>>
Right shift
RLIKE
Whether string matches regular expression
SOUNDS LIKE
Compare sounds
*
Multiplication operator
-
Change the sign of the argument
XOR
Logical XOR
12.3.1 Operator Precedence Operator precedences are shown in the following list, from highest precedence to the lowest. Operators that are shown together on a line have the same precedence. INTERVAL BINARY, COLLATE ! - (unary minus), ~ (unary bit inversion) ^ *, /, DIV, %, MOD -, + <<, >> & | = (comparison), <=>, >=, >, <=, <, <>, !=, IS, LIKE, REGEXP, IN BETWEEN, CASE, WHEN, THEN, ELSE NOT AND, && XOR OR, || = (assignment), :=
The precedence of = depends on whether it is used as a comparison operator (=) or as an assignment operator (=). When used as a comparison operator, it has the same precedence as <=>, >=, >, <=, <, <>, !=, IS, LIKE, REGEXP, and IN. When used as an assignment operator, it has the same precedence as :=. Section 13.7.4.1, “SET Syntax for Variable Assignment”, and Section 9.4, “UserDefined Variables”, explain how MySQL determines which interpretation of = should apply. For operators that occur at the same precedence level within an expression, evaluation proceeds left to right, with the exception that assignments evaluate right to left. The precedence and meaning of some operators depends on the SQL mode: • By default, || is a logical OR operator. With PIPES_AS_CONCAT enabled, || is string concatenation, with a precedence between ^ and the unary operators. • By default, ! has a higher precedence than NOT. With HIGH_NOT_PRECEDENCE enabled, ! and NOT have the same precedence. See Section 5.1.10, “Server SQL Modes”. The precedence of operators determines the order of evaluation of terms in an expression. To override this order and group terms explicitly, use parentheses. For example: mysql> SELECT 1+2*3; -> 7 mysql> SELECT (1+2)*3;
1633
Comparison Functions and Operators
-> 9
12.3.2 Comparison Functions and Operators Table 12.3 Comparison Operators Name
Description
BETWEEN ... AND ...
Check whether a value is within a range of values
COALESCE()
Return the first non-NULL argument
=
Equal operator
<=>
NULL-safe equal to operator
>
Greater than operator
>=
Greater than or equal operator
GREATEST()
Return the largest argument
IN()
Check whether a value is within a set of values
INTERVAL()
Return the index of the argument that is less than the first argument
IS
Test a value against a boolean
IS NOT
Test a value against a boolean
IS NOT NULL
NOT NULL value test
IS NULL
NULL value test
ISNULL()
Test whether the argument is NULL
LEAST()
Return the smallest argument
<
Less than operator
<=
Less than or equal operator
LIKE
Simple pattern matching
NOT BETWEEN ... AND ...
Check whether a value is not within a range of values
!=, <>
Not equal operator
NOT IN()
Check whether a value is not within a set of values
NOT LIKE
Negation of simple pattern matching
STRCMP()
Compare two strings
Comparison operations result in a value of 1 (TRUE), 0 (FALSE), or NULL. These operations work for both numbers and strings. Strings are automatically converted to numbers and numbers to strings as necessary. The following relational comparison operators can be used to compare not only scalar operands, but row operands: =
>
<
>=
<=
<>
!=
The descriptions for those operators later in this section detail how they work with row operands. For additional examples of row comparisons in the context of row subqueries, see Section 13.2.10.5, “Row Subqueries”. Some of the functions in this section return values other than 1 (TRUE), 0 (FALSE), or NULL. LEAST() and GREATEST() are examples of such functions; Section 12.2, “Type Conversion in Expression Evaluation”, describes the rules for comparison operations performed by these and similar functions for determining their return values.
1634
Comparison Functions and Operators
To convert a value to a specific type for comparison purposes, you can use the CAST() function. String values can be converted to a different character set using CONVERT(). See Section 12.10, “Cast Functions and Operators”. By default, string comparisons are not case-sensitive and use the current character set. The default is latin1 (cp1252 West European), which also works well for English. •
= Equal: mysql> SELECT -> 0 mysql> SELECT -> 1 mysql> SELECT -> 1 mysql> SELECT -> 0 mysql> SELECT -> 1
1 = 0; '0' = 0; '0.0' = 0; '0.01' = 0; '.01' = 0.01;
For row comparisons, (a, b) = (x, y) is equivalent to: (a = x) AND (b = y)
•
<=> NULL-safe equal. This operator performs an equality comparison like the = operator, but returns 1 rather than NULL if both operands are NULL, and 0 rather than NULL if one operand is NULL. The <=> operator is equivalent to the standard SQL IS NOT DISTINCT FROM operator. mysql> SELECT -> 1, mysql> SELECT -> 1,
1 <=> 1, NULL <=> NULL, 1 <=> NULL; 1, 0 1 = 1, NULL = NULL, 1 = NULL; NULL, NULL
For row comparisons, (a, b) <=> (x, y) is equivalent to: (a <=> x) AND (b <=> y)
•
<>, != Not equal: mysql> SELECT '.01' <> '0.01'; -> 1 mysql> SELECT .01 <> '0.01'; -> 0 mysql> SELECT 'zapp' <> 'zappp'; -> 1
For row comparisons, (a, b) <> (x, y) and (a, b) != (x, y) are equivalent to: (a <> x) OR (b <> y)
•
<= Less than or equal:
1635
Comparison Functions and Operators
mysql> SELECT 0.1 <= 2; -> 1
For row comparisons, (a, b) <= (x, y) is equivalent to: (a < x) OR ((a = x) AND (b <= y))
•
< Less than: mysql> SELECT 2 < 2; -> 0
For row comparisons, (a, b) < (x, y) is equivalent to: (a < x) OR ((a = x) AND (b < y))
•
>= Greater than or equal: mysql> SELECT 2 >= 2; -> 1
For row comparisons, (a, b) >= (x, y) is equivalent to: (a > x) OR ((a = x) AND (b >= y))
•
> Greater than: mysql> SELECT 2 > 2; -> 0
For row comparisons, (a, b) > (x, y) is equivalent to: (a > x) OR ((a = x) AND (b > y))
•
IS boolean_value Tests a value against a boolean value, where boolean_value can be TRUE, FALSE, or UNKNOWN. mysql> SELECT 1 IS TRUE, 0 IS FALSE, NULL IS UNKNOWN; -> 1, 1, 1
•
IS NOT boolean_value Tests a value against a boolean value, where boolean_value can be TRUE, FALSE, or UNKNOWN. mysql> SELECT 1 IS NOT UNKNOWN, 0 IS NOT UNKNOWN, NULL IS NOT UNKNOWN; -> 1, 1, 0
•
IS NULL Tests whether a value is NULL.
1636
Comparison Functions and Operators
mysql> SELECT 1 IS NULL, 0 IS NULL, NULL IS NULL; -> 0, 0, 1
To work well with ODBC programs, MySQL supports the following extra features when using IS NULL: • If sql_auto_is_null variable is set to 1, then after a statement that successfully inserts an automatically generated AUTO_INCREMENT value, you can find that value by issuing a statement of the following form: SELECT * FROM tbl_name WHERE auto_col IS NULL
If the statement returns a row, the value returned is the same as if you invoked the LAST_INSERT_ID() function. For details, including the return value after a multiple-row insert, see Section 12.15, “Information Functions”. If no AUTO_INCREMENT value was successfully inserted, the SELECT statement returns no row. The behavior of retrieving an AUTO_INCREMENT value by using an IS NULL comparison can be disabled by setting sql_auto_is_null = 0. See Section 5.1.7, “Server System Variables”. The default value of sql_auto_is_null is 0. • For DATE and DATETIME columns that are declared as NOT NULL, you can find the special date '0000-00-00' by using a statement like this: SELECT * FROM tbl_name WHERE date_column IS NULL
This is needed to get some ODBC applications to work because ODBC does not support a '0000-00-00' date value. See Obtaining Auto-Increment Values, and the description for the FLAG_AUTO_IS_NULL option at Connector/ODBC Connection Parameters. •
IS NOT NULL Tests whether a value is not NULL. mysql> SELECT 1 IS NOT NULL, 0 IS NOT NULL, NULL IS NOT NULL; -> 1, 1, 0
• expr BETWEEN min AND max If expr is greater than or equal to min and expr is less than or equal to max, BETWEEN returns 1, otherwise it returns 0. This is equivalent to the expression (min <= expr AND expr <= max) if all the arguments are of the same type. Otherwise type conversion takes place according to the rules described in Section 12.2, “Type Conversion in Expression Evaluation”, but applied to all the three arguments. mysql> SELECT -> 1, mysql> SELECT -> 0 mysql> SELECT -> 1 mysql> SELECT -> 1 mysql> SELECT -> 0
2 BETWEEN 1 AND 3, 2 BETWEEN 3 and 1; 0 1 BETWEEN 2 AND 3; 'b' BETWEEN 'a' AND 'c'; 2 BETWEEN 2 AND '3'; 2 BETWEEN 2 AND 'x-3';
For best results when using BETWEEN with date or time values, use CAST() to explicitly convert the values to the desired data type. Examples: If you compare a DATETIME to two DATE values,
1637
Comparison Functions and Operators
convert the DATE values to DATETIME values. If you use a string constant such as '2001-1-1' in a comparison to a DATE, cast the string to a DATE. • expr NOT BETWEEN min AND max This is the same as NOT (expr BETWEEN min AND max). •
COALESCE(value,...) Returns the first non-NULL value in the list, or NULL if there are no non-NULL values. The return type of COALESCE() is the aggregated type of the argument types. mysql> SELECT COALESCE(NULL,1); -> 1 mysql> SELECT COALESCE(NULL,NULL,NULL); -> NULL
• GREATEST(value1,value2,...) With two or more arguments, returns the largest (maximum-valued) argument. The arguments are compared using the same rules as for LEAST(). mysql> SELECT GREATEST(2,0); -> 2 mysql> SELECT GREATEST(34.0,3.0,5.0,767.0); -> 767.0 mysql> SELECT GREATEST('B','A','C'); -> 'C'
GREATEST() returns NULL if any argument is NULL. • expr IN (value,...) Returns 1 if expr is equal to any of the values in the IN list, else returns 0. If all values are constants, they are evaluated according to the type of expr and sorted. The search for the item then is done using a binary search. This means IN is very quick if the IN value list consists entirely of constants. Otherwise, type conversion takes place according to the rules described in Section 12.2, “Type Conversion in Expression Evaluation”, but applied to all the arguments. mysql> SELECT 2 IN (0,3,5,7); -> 0 mysql> SELECT 'wefwf' IN ('wee','wefwf','weg'); -> 1
IN can be used to compare row constructors: mysql> SELECT (3,4) IN ((1,2), (3,4)); -> 1 mysql> SELECT (3,4) IN ((1,2), (3,5)); -> 0
You should never mix quoted and unquoted values in an IN list because the comparison rules for quoted values (such as strings) and unquoted values (such as numbers) differ. Mixing types may therefore lead to inconsistent results. For example, do not write an IN expression like this: SELECT val1 FROM tbl1 WHERE val1 IN (1,2,'a');
Instead, write it like this: SELECT val1 FROM tbl1 WHERE val1 IN ('1','2','a');
1638
Comparison Functions and Operators
The number of values in the IN list is only limited by the max_allowed_packet value. To comply with the SQL standard, IN returns NULL not only if the expression on the left hand side is NULL, but also if no match is found in the list and one of the expressions in the list is NULL. IN() syntax can also be used to write certain types of subqueries. See Section 13.2.10.3, “Subqueries with ANY, IN, or SOME”. • expr NOT IN (value,...) This is the same as NOT (expr IN (value,...)). • ISNULL(expr) If expr is NULL, ISNULL() returns 1, otherwise it returns 0. mysql> SELECT ISNULL(1+1); -> 0 mysql> SELECT ISNULL(1/0); -> 1
ISNULL() can be used instead of = to test whether a value is NULL. (Comparing a value to NULL using = always yields NULL.) The ISNULL() function shares some special behaviors with the IS NULL comparison operator. See the description of IS NULL. • INTERVAL(N,N1,N2,N3,...) Returns 0 if N < N1, 1 if N < N2 and so on or -1 if N is NULL. All arguments are treated as integers. It is required that N1 < N2 < N3 < ... < Nn for this function to work correctly. This is because a binary search is used (very fast). mysql> SELECT INTERVAL(23, 1, 15, 17, 30, 44, 200); -> 3 mysql> SELECT INTERVAL(10, 1, 10, 100, 1000); -> 2 mysql> SELECT INTERVAL(22, 23, 30, 44, 200); -> 0
• LEAST(value1,value2,...) With two or more arguments, returns the smallest (minimum-valued) argument. The arguments are compared using the following rules: • If any argument is NULL, the result is NULL. No comparison is needed. • If all arguments are integer-valued, they are compared as integers. • If at least one argument is double precision, they are compared as double-precision values. Otherwise, if at least one argument is a DECIMAL value, they are compared as DECIMAL values. • If the arguments comprise a mix of numbers and strings, they are compared as numbers. • If any argument is a nonbinary (character) string, the arguments are compared as nonbinary strings. • In all other cases, the arguments are compared as binary strings. The return type of LEAST() is the aggregated type of the comparison argument types.
1639
Logical Operators
mysql> SELECT LEAST(2,0); -> 0 mysql> SELECT LEAST(34.0,3.0,5.0,767.0); -> 3.0 mysql> SELECT LEAST('B','A','C'); -> 'A'
12.3.3 Logical Operators Table 12.4 Logical Operators Name
Description
AND, &&
Logical AND
NOT, !
Negates value
||, OR
Logical OR
XOR
Logical XOR
In SQL, all logical operators evaluate to TRUE, FALSE, or NULL (UNKNOWN). In MySQL, these are implemented as 1 (TRUE), 0 (FALSE), and NULL. Most of this is common to different SQL database servers, although some servers may return any nonzero value for TRUE. MySQL evaluates any nonzero, non-NULL value to TRUE. For example, the following statements all assess to TRUE: mysql> SELECT 10 IS TRUE; -> 1 mysql> SELECT -10 IS TRUE; -> 1 mysql> SELECT 'string' IS NOT NULL; -> 1
•
NOT, ! Logical NOT. Evaluates to 1 if the operand is 0, to 0 if the operand is nonzero, and NOT NULL returns NULL. mysql> SELECT NOT 10; -> 0 mysql> SELECT NOT 0; -> 1 mysql> SELECT NOT NULL; -> NULL mysql> SELECT ! (1+1); -> 0 mysql> SELECT ! 1+1; -> 1
The last example produces 1 because the expression evaluates the same way as (!1)+1. •
AND, && Logical AND. Evaluates to 1 if all operands are nonzero and not NULL, to 0 if one or more operands are 0, otherwise NULL is returned. mysql> SELECT 1 AND 1; -> 1 mysql> SELECT 1 AND 0; -> 0 mysql> SELECT 1 AND NULL; -> NULL mysql> SELECT 0 AND NULL; -> 0 mysql> SELECT NULL AND 0;
1640
Assignment Operators
-> 0
•
OR, || Logical OR. When both operands are non-NULL, the result is 1 if any operand is nonzero, and 0 otherwise. With a NULL operand, the result is 1 if the other operand is nonzero, and NULL otherwise. If both operands are NULL, the result is NULL. mysql> SELECT 1 -> 1 mysql> SELECT 1 -> 1 mysql> SELECT 0 -> 0 mysql> SELECT 0 -> NULL mysql> SELECT 1 -> 1
OR 1; OR 0; OR 0; OR NULL; OR NULL;
• XOR Logical XOR. Returns NULL if either operand is NULL. For non-NULL operands, evaluates to 1 if an odd number of operands is nonzero, otherwise 0 is returned. mysql> SELECT 1 -> 0 mysql> SELECT 1 -> 1 mysql> SELECT 1 -> NULL mysql> SELECT 1 -> 1
XOR 1; XOR 0; XOR NULL; XOR 1 XOR 1;
a XOR b is mathematically equal to (a AND (NOT b)) OR ((NOT a) and b).
12.3.4 Assignment Operators Table 12.5 Assignment Operators Name
Description
=
Assign a value (as part of a SET statement, or as part of the SET clause in an UPDATE statement)
:=
Assign a value
•
:= Assignment operator. Causes the user variable on the left hand side of the operator to take on the value to its right. The value on the right hand side may be a literal value, another variable storing a value, or any legal expression that yields a scalar value, including the result of a query (provided that this value is a scalar value). You can perform multiple assignments in the same SET statement. You can perform multiple assignments in the same statement. Unlike =, the := operator is never interpreted as a comparison operator. This means you can use := in any valid SQL statement (not just in SET statements) to assign a value to a variable. mysql> SELECT @var1, @var2; -> NULL, NULL mysql> SELECT @var1 := 1, @var2; -> 1, NULL mysql> SELECT @var1, @var2; -> 1, NULL mysql> SELECT @var1, @var2 := @var1;
1641
Control Flow Functions
-> 1, 1 mysql> SELECT @var1, @var2; -> 1, 1 mysql> SELECT @var1:=COUNT(*) FROM t1; -> 4 mysql> SELECT @var1; -> 4
You can make value assignments using := in other statements besides SELECT, such as UPDATE, as shown here: mysql> SELECT @var1; -> 4 mysql> SELECT * FROM t1; -> 1, 3, 5, 7 mysql> UPDATE t1 SET c1 = 2 WHERE c1 = @var1:= 1; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> SELECT @var1; -> 1 mysql> SELECT * FROM t1; -> 2, 3, 5, 7
While it is also possible both to set and to read the value of the same variable in a single SQL statement using the := operator, this is not recommended. Section 9.4, “User-Defined Variables”, explains why you should avoid doing this. •
= This operator is used to perform value assignments in two cases, described in the next two paragraphs. Within a SET statement, = is treated as an assignment operator that causes the user variable on the left hand side of the operator to take on the value to its right. (In other words, when used in a SET statement, = is treated identically to :=.) The value on the right hand side may be a literal value, another variable storing a value, or any legal expression that yields a scalar value, including the result of a query (provided that this value is a scalar value). You can perform multiple assignments in the same SET statement. In the SET clause of an UPDATE statement, = also acts as an assignment operator; in this case, however, it causes the column named on the left hand side of the operator to assume the value given to the right, provided any WHERE conditions that are part of the UPDATE are met. You can make multiple assignments in the same SET clause of an UPDATE statement. In any other context, = is treated as a comparison operator. mysql> SELECT @var1, @var2; -> NULL, NULL mysql> SELECT @var1 := 1, @var2; -> 1, NULL mysql> SELECT @var1, @var2; -> 1, NULL mysql> SELECT @var1, @var2 := @var1; -> 1, 1 mysql> SELECT @var1, @var2; -> 1, 1
For more information, see Section 13.7.4.1, “SET Syntax for Variable Assignment”, Section 13.2.11, “UPDATE Syntax”, and Section 13.2.10, “Subquery Syntax”.
12.4 Control Flow Functions 1642
Control Flow Functions
Table 12.6 Flow Control Operators Name
Description
CASE
Case operator
IF()
If/else construct
IFNULL()
Null if/else construct
NULLIF()
Return NULL if expr1 = expr2
• CASE value WHEN [compare_value] THEN result [WHEN [compare_value] THEN result ...] [ELSE result] END CASE WHEN [condition] THEN result [WHEN [condition] THEN result ...] [ELSE result] END The first CASE syntax returns the result for the first value=compare_value comparison that is true. The second syntax returns the result for the first condition that is true. If no comparison or condition is true, the result after ELSE is returned, or NULL if there is no ELSE part. Note The syntax of the CASE expr described here differs slightly from that of the SQL CASE statement described in Section 13.6.5.1, “CASE Syntax”, for use inside stored programs. The CASE statement cannot have an ELSE NULL clause, and it is terminated with END CASE instead of END. The return type of a CASE expression result is the aggregated type of all result values. mysql> SELECT CASE 1 WHEN 1 THEN 'one' -> WHEN 2 THEN 'two' ELSE 'more' END; -> 'one' mysql> SELECT CASE WHEN 1>0 THEN 'true' ELSE 'false' END; -> 'true' mysql> SELECT CASE BINARY 'B' -> WHEN 'a' THEN 1 WHEN 'b' THEN 2 END; -> NULL
• IF(expr1,expr2,expr3) If expr1 is TRUE (expr1 <> 0 and expr1 <> NULL), IF() returns expr2. Otherwise, it returns expr3. Note There is also an IF statement, which differs from the IF() function described here. See Section 13.6.5.2, “IF Syntax”. If only one of expr2 or expr3 is explicitly NULL, the result type of the IF() function is the type of the non-NULL expression. The default return type of IF() (which may matter when it is stored into a temporary table) is calculated as follows: • If expr2 or expr3 produce a string, the result is a string. If expr2 and expr3 are both strings, the result is case-sensitive if either string is case sensitive. • If expr2 or expr3 produce a floating-point value, the result is a floating-point value. • If expr2 or expr3 produce an integer, the result is an integer.
1643
String Functions
mysql> SELECT IF(1>2,2,3); -> 3 mysql> SELECT IF(1<2,'yes','no'); -> 'yes' mysql> SELECT IF(STRCMP('test','test1'),'no','yes'); -> 'no'
•
IFNULL(expr1,expr2) If expr1 is not NULL, IFNULL() returns expr1; otherwise it returns expr2. mysql> SELECT IFNULL(1,0); -> 1 mysql> SELECT IFNULL(NULL,10); -> 10 mysql> SELECT IFNULL(1/0,10); -> 10 mysql> SELECT IFNULL(1/0,'yes'); -> 'yes'
The default return type of IFNULL(expr1,expr2) is the more “general” of the two expressions, in the order STRING, REAL, or INTEGER. Consider the case of a table based on expressions or where MySQL must internally store a value returned by IFNULL() in a temporary table:
mysql> CREATE TABLE tmp SELECT IFNULL(1,'test') AS test; mysql> DESCRIBE tmp; +-------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+--------------+------+-----+---------+-------+ | test | varbinary(4) | NO | | | | +-------+--------------+------+-----+---------+-------+
In this example, the type of the test column is VARBINARY(4) (a string type). • NULLIF(expr1,expr2) Returns NULL if expr1 = expr2 is true, otherwise returns expr1. This is the same as CASE WHEN expr1 = expr2 THEN NULL ELSE expr1 END. The return value has the same type as the first argument. mysql> SELECT NULLIF(1,1); -> NULL mysql> SELECT NULLIF(1,2); -> 1
Note MySQL evaluates expr1 twice if the arguments are not equal.
12.5 String Functions Table 12.7 String Operators
1644
Name
Description
ASCII()
Return numeric value of left-most character
BIN()
Return a string containing binary representation of a number
BIT_LENGTH()
Return length of argument in bits
CHAR()
Return the character for each integer passed
CHAR_LENGTH()
Return number of characters in argument
String Functions
Name
Description
CHARACTER_LENGTH()
Synonym for CHAR_LENGTH()
CONCAT()
Return concatenated string
CONCAT_WS()
Return concatenate with separator
ELT()
Return string at index number
EXPORT_SET()
Return a string such that for every bit set in the value bits, you get an on string and for every unset bit, you get an off string
FIELD()
Index (position) of first argument in subsequent arguments
FIND_IN_SET()
Index (position) of first argument within second argument
FORMAT()
Return a number formatted to specified number of decimal places
FROM_BASE64()
Decode base64 encoded string and return result
HEX()
Hexadecimal representation of decimal or string value
INSERT()
Insert substring at specified position up to specified number of characters
INSTR()
Return the index of the first occurrence of substring
LCASE()
Synonym for LOWER()
LEFT()
Return the leftmost number of characters as specified
LENGTH()
Return the length of a string in bytes
LIKE
Simple pattern matching
LOAD_FILE()
Load the named file
LOCATE()
Return the position of the first occurrence of substring
LOWER()
Return the argument in lowercase
LPAD()
Return the string argument, left-padded with the specified string
LTRIM()
Remove leading spaces
MAKE_SET()
Return a set of comma-separated strings that have the corresponding bit in bits set
MATCH
Perform full-text search
MID()
Return a substring starting from the specified position
NOT LIKE
Negation of simple pattern matching
NOT REGEXP
Negation of REGEXP
OCT()
Return a string containing octal representation of a number
OCTET_LENGTH()
Synonym for LENGTH()
ORD()
Return character code for leftmost character of the argument
POSITION()
Synonym for LOCATE()
QUOTE()
Escape the argument for use in an SQL statement
REGEXP
Whether string matches regular expression
REPEAT()
Repeat a string the specified number of times
REPLACE()
Replace occurrences of a specified string
REVERSE()
Reverse the characters in a string
RIGHT()
Return the specified rightmost number of characters
1645
String Functions
Name
Description
RLIKE
Whether string matches regular expression
RPAD()
Append string the specified number of times
RTRIM()
Remove trailing spaces
SOUNDEX()
Return a soundex string
SOUNDS LIKE
Compare sounds
SPACE()
Return a string of the specified number of spaces
STRCMP()
Compare two strings
SUBSTR()
Return the substring as specified
SUBSTRING()
Return the substring as specified
SUBSTRING_INDEX()
Return a substring from a string before the specified number of occurrences of the delimiter
TO_BASE64()
Return the argument converted to a base-64 string
TRIM()
Remove leading and trailing spaces
UCASE()
Synonym for UPPER()
UNHEX()
Return a string containing hex representation of a number
UPPER()
Convert to uppercase
WEIGHT_STRING()
Return the weight string for a string
String-valued functions return NULL if the length of the result would be greater than the value of the max_allowed_packet system variable. See Section 5.1.1, “Configuring the Server”. For functions that operate on string positions, the first position is numbered 1. For functions that take length arguments, noninteger arguments are rounded to the nearest integer. • ASCII(str) Returns the numeric value of the leftmost character of the string str. Returns 0 if str is the empty string. Returns NULL if str is NULL. ASCII() works for 8-bit characters. mysql> SELECT ASCII('2'); -> 50 mysql> SELECT ASCII(2); -> 50 mysql> SELECT ASCII('dx'); -> 100
See also the ORD() function. • BIN(N) Returns a string representation of the binary value of N, where N is a longlong (BIGINT) number. This is equivalent to CONV(N,10,2). Returns NULL if N is NULL. mysql> SELECT BIN(12); -> '1100'
• BIT_LENGTH(str) Returns the length of the string str in bits. mysql> SELECT BIT_LENGTH('text'); -> 32
1646
String Functions
• CHAR(N,... [USING charset_name]) CHAR() interprets each argument N as an integer and returns a string consisting of the characters given by the code values of those integers. NULL values are skipped. mysql> SELECT CHAR(77,121,83,81,'76'); -> 'MySQL' mysql> SELECT CHAR(77,77.3,'77.3'); -> 'MMM'
CHAR() arguments larger than 255 are converted into multiple result bytes. For example, CHAR(256) is equivalent to CHAR(1,0), and CHAR(256*256) is equivalent to CHAR(1,0,0): mysql> SELECT HEX(CHAR(1,0)), HEX(CHAR(256)); +----------------+----------------+ | HEX(CHAR(1,0)) | HEX(CHAR(256)) | +----------------+----------------+ | 0100 | 0100 | +----------------+----------------+ mysql> SELECT HEX(CHAR(1,0,0)), HEX(CHAR(256*256)); +------------------+--------------------+ | HEX(CHAR(1,0,0)) | HEX(CHAR(256*256)) | +------------------+--------------------+ | 010000 | 010000 | +------------------+--------------------+
By default, CHAR() returns a binary string. To produce a string in a given character set, use the optional USING clause: mysql> SELECT CHARSET(CHAR(X'65')), CHARSET(CHAR(X'65' USING utf8)); +----------------------+---------------------------------+ | CHARSET(CHAR(X'65')) | CHARSET(CHAR(X'65' USING utf8)) | +----------------------+---------------------------------+ | binary | utf8 | +----------------------+---------------------------------+
If USING is given and the result string is illegal for the given character set, a warning is issued. Also, if strict SQL mode is enabled, the result from CHAR() becomes NULL. • CHAR_LENGTH(str) Returns the length of the string str, measured in characters. A multibyte character counts as a single character. This means that for a string containing five 2-byte characters, LENGTH() returns 10, whereas CHAR_LENGTH() returns 5. • CHARACTER_LENGTH(str) CHARACTER_LENGTH() is a synonym for CHAR_LENGTH(). •
CONCAT(str1,str2,...) Returns the string that results from concatenating the arguments. May have one or more arguments. If all arguments are nonbinary strings, the result is a nonbinary string. If the arguments include any binary strings, the result is a binary string. A numeric argument is converted to its equivalent nonbinary string form. CONCAT() returns NULL if any argument is NULL. mysql> SELECT CONCAT('My', 'S', 'QL'); -> 'MySQL' mysql> SELECT CONCAT('My', NULL, 'QL'); -> NULL mysql> SELECT CONCAT(14.3);
1647
String Functions
-> '14.3'
For quoted strings, concatenation can be performed by placing the strings next to each other: mysql> SELECT 'My' 'S' 'QL'; -> 'MySQL'
• CONCAT_WS(separator,str1,str2,...) CONCAT_WS() stands for Concatenate With Separator and is a special form of CONCAT(). The first argument is the separator for the rest of the arguments. The separator is added between the strings to be concatenated. The separator can be a string, as can the rest of the arguments. If the separator is NULL, the result is NULL. mysql> SELECT CONCAT_WS(',','First name','Second name','Last Name'); -> 'First name,Second name,Last Name' mysql> SELECT CONCAT_WS(',','First name',NULL,'Last Name'); -> 'First name,Last Name'
CONCAT_WS() does not skip empty strings. However, it does skip any NULL values after the separator argument. • ELT(N,str1,str2,str3,...) ELT() returns the Nth element of the list of strings: str1 if N = 1, str2 if N = 2, and so on. Returns NULL if N is less than 1 or greater than the number of arguments. ELT() is the complement of FIELD(). mysql> SELECT ELT(1, 'Aa', 'Bb', 'Cc', 'Dd'); -> 'Aa' mysql> SELECT ELT(4, 'Aa', 'Bb', 'Cc', 'Dd'); -> 'Dd'
• EXPORT_SET(bits,on,off[,separator[,number_of_bits]]) Returns a string such that for every bit set in the value bits, you get an on string and for every bit not set in the value, you get an off string. Bits in bits are examined from right to left (from low-order to high-order bits). Strings are added to the result from left to right, separated by the separator string (the default being the comma character ,). The number of bits examined is given by number_of_bits, which has a default of 64 if not specified. number_of_bits is silently clipped to 64 if larger than 64. It is treated as an unsigned integer, so a value of −1 is effectively the same as 64. mysql> SELECT EXPORT_SET(5,'Y','N',',',4); -> 'Y,N,Y,N' mysql> SELECT EXPORT_SET(6,'1','0',',',10); -> '0,1,1,0,0,0,0,0,0,0'
• FIELD(str,str1,str2,str3,...) Returns the index (position) of str in the str1, str2, str3, ... list. Returns 0 if str is not found. If all arguments to FIELD() are strings, all arguments are compared as strings. If all arguments are numbers, they are compared as numbers. Otherwise, the arguments are compared as double. If str is NULL, the return value is 0 because NULL fails equality comparison with any value. FIELD() is the complement of ELT(). mysql> SELECT FIELD('Bb', 'Aa', 'Bb', 'Cc', 'Dd', 'Ff'); -> 2 mysql> SELECT FIELD('Gg', 'Aa', 'Bb', 'Cc', 'Dd', 'Ff');
1648
String Functions
-> 0
• FIND_IN_SET(str,strlist) Returns a value in the range of 1 to N if the string str is in the string list strlist consisting of N substrings. A string list is a string composed of substrings separated by , characters. If the first argument is a constant string and the second is a column of type SET, the FIND_IN_SET() function is optimized to use bit arithmetic. Returns 0 if str is not in strlist or if strlist is the empty string. Returns NULL if either argument is NULL. This function does not work properly if the first argument contains a comma (,) character. mysql> SELECT FIND_IN_SET('b','a,b,c,d'); -> 2
• FORMAT(X,D[,locale]) Formats the number X to a format like '#,###,###.##', rounded to D decimal places, and returns the result as a string. If D is 0, the result has no decimal point or fractional part. The optional third parameter enables a locale to be specified to be used for the result number's decimal point, thousands separator, and grouping between separators. Permissible locale values are the same as the legal values for the lc_time_names system variable (see Section 10.15, “MySQL Server Locale Support”). If no locale is specified, the default is 'en_US'. mysql> SELECT FORMAT(12332.123456, 4); -> '12,332.1235' mysql> SELECT FORMAT(12332.1,4); -> '12,332.1000' mysql> SELECT FORMAT(12332.2,0); -> '12,332' mysql> SELECT FORMAT(12332.2,2,'de_DE'); -> '12.332,20'
• FROM_BASE64(str) Takes a string encoded with the base-64 encoded rules used by TO_BASE64() and returns the decoded result as a binary string. The result is NULL if the argument is NULL or not a valid base-64 string. See the description of TO_BASE64() for details about the encoding and decoding rules. mysql> SELECT TO_BASE64('abc'), FROM_BASE64(TO_BASE64('abc')); -> 'JWJj', 'abc'
• HEX(str), HEX(N) For a string argument str, HEX() returns a hexadecimal string representation of str where each byte of each character in str is converted to two hexadecimal digits. (Multibyte characters therefore become more than two digits.) The inverse of this operation is performed by the UNHEX() function. For a numeric argument N, HEX() returns a hexadecimal string representation of the value of N treated as a longlong (BIGINT) number. This is equivalent to CONV(N,10,16). The inverse of this operation is performed by CONV(HEX(N),16,10). mysql> SELECT X'616263', HEX('abc'), UNHEX(HEX('abc')); -> 'abc', 616263, 'abc' mysql> SELECT HEX(255), CONV(HEX(255),16,10); -> 'FF', 255
• INSERT(str,pos,len,newstr) Returns the string str, with the substring beginning at position pos and len characters long replaced by the string newstr. Returns the original string if pos is not within the length of the string.
1649
String Functions
Replaces the rest of the string from position pos if len is not within the length of the rest of the string. Returns NULL if any argument is NULL. mysql> SELECT INSERT('Quadratic', 3, 4, 'What'); -> 'QuWhattic' mysql> SELECT INSERT('Quadratic', -1, 4, 'What'); -> 'Quadratic' mysql> SELECT INSERT('Quadratic', 3, 100, 'What'); -> 'QuWhat'
This function is multibyte safe. • INSTR(str,substr) Returns the position of the first occurrence of substring substr in string str. This is the same as the two-argument form of LOCATE(), except that the order of the arguments is reversed. mysql> SELECT INSTR('foobarbar', 'bar'); -> 4 mysql> SELECT INSTR('xbar', 'foobar'); -> 0
This function is multibyte safe, and is case-sensitive only if at least one argument is a binary string. • LCASE(str) LCASE() is a synonym for LOWER(). In MySQL 5.7, LCASE() used in a view is rewritten as LOWER() when storing the view's definition. (Bug #12844279) • LEFT(str,len) Returns the leftmost len characters from the string str, or NULL if any argument is NULL. mysql> SELECT LEFT('foobarbar', 5); -> 'fooba'
This function is multibyte safe. • LENGTH(str) Returns the length of the string str, measured in bytes. A multibyte character counts as multiple bytes. This means that for a string containing five 2-byte characters, LENGTH() returns 10, whereas CHAR_LENGTH() returns 5. mysql> SELECT LENGTH('text'); -> 4
Note The Length() OpenGIS spatial function is named ST_Length() in MySQL. •
LOAD_FILE(file_name) Reads the file and returns the file contents as a string. To use this function, the file must be located on the server host, you must specify the full path name to the file, and you must have the FILE privilege. The file must be readable by all and its size less than max_allowed_packet bytes. If the secure_file_priv system variable is set to a nonempty directory name, the file to be loaded must be located in that directory.
1650
String Functions
If the file does not exist or cannot be read because one of the preceding conditions is not satisfied, the function returns NULL. The character_set_filesystem system variable controls interpretation of file names that are given as literal strings. mysql> UPDATE t SET blob_col=LOAD_FILE('/tmp/picture') WHERE id=1;
• LOCATE(substr,str), LOCATE(substr,str,pos) The first syntax returns the position of the first occurrence of substring substr in string str. The second syntax returns the position of the first occurrence of substring substr in string str, starting at position pos. Returns 0 if substr is not in str. Returns NULL if substr or str is NULL. mysql> SELECT LOCATE('bar', 'foobarbar'); -> 4 mysql> SELECT LOCATE('xbar', 'foobar'); -> 0 mysql> SELECT LOCATE('bar', 'foobarbar', 5); -> 7
This function is multibyte safe, and is case-sensitive only if at least one argument is a binary string. • LOWER(str) Returns the string str with all characters changed to lowercase according to the current character set mapping. The default is latin1 (cp1252 West European). mysql> SELECT LOWER('QUADRATICALLY'); -> 'quadratically'
LOWER() (and UPPER()) are ineffective when applied to binary strings (BINARY, VARBINARY, BLOB). To perform lettercase conversion, convert the string to a nonbinary string: mysql> SET @str = BINARY 'New York'; mysql> SELECT LOWER(@str), LOWER(CONVERT(@str USING latin1)); +-------------+-----------------------------------+ | LOWER(@str) | LOWER(CONVERT(@str USING latin1)) | +-------------+-----------------------------------+ | New York | new york | +-------------+-----------------------------------+
For collations of Unicode character sets, LOWER() and UPPER() work according to the Unicode Collation Algorithm (UCA) version in the collation name, if there is one, and UCA 4.0.0 if no version is specified. For example, utf8_unicode_520_ci works according to UCA 5.2.0, whereas utf8_unicode_ci works according to UCA 4.0.0. See Section 10.10.1, “Unicode Character Sets”. This function is multibyte safe. In previous versions of MySQL, LOWER() used within a view was rewritten as LCASE() when storing the view's definition. In MySQL 5.7, LOWER() is never rewritten in such cases, but LCASE() used within views is instead rewritten as LOWER(). (Bug #12844279) • LPAD(str,len,padstr) Returns the string str, left-padded with the string padstr to a length of len characters. If str is longer than len, the return value is shortened to len characters.
1651
String Functions
mysql> SELECT LPAD('hi',4,'??'); -> '??hi' mysql> SELECT LPAD('hi',1,'??'); -> 'h'
• LTRIM(str) Returns the string str with leading space characters removed. mysql> SELECT LTRIM(' -> 'barbar'
barbar');
This function is multibyte safe. • MAKE_SET(bits,str1,str2,...) Returns a set value (a string containing substrings separated by , characters) consisting of the strings that have the corresponding bit in bits set. str1 corresponds to bit 0, str2 to bit 1, and so on. NULL values in str1, str2, ... are not appended to the result. mysql> SELECT MAKE_SET(1,'a','b','c'); -> 'a' mysql> SELECT MAKE_SET(1 | 4,'hello','nice','world'); -> 'hello,world' mysql> SELECT MAKE_SET(1 | 4,'hello','nice',NULL,'world'); -> 'hello' mysql> SELECT MAKE_SET(0,'a','b','c'); -> ''
• MID(str,pos,len) MID(str,pos,len) is a synonym for SUBSTRING(str,pos,len). • OCT(N) Returns a string representation of the octal value of N, where N is a longlong (BIGINT) number. This is equivalent to CONV(N,10,8). Returns NULL if N is NULL. mysql> SELECT OCT(12); -> '14'
• OCTET_LENGTH(str) OCTET_LENGTH() is a synonym for LENGTH(). • ORD(str) If the leftmost character of the string str is a multibyte character, returns the code for that character, calculated from the numeric values of its constituent bytes using this formula: (1st byte code) + (2nd byte code * 256) + (3rd byte code * 256^2) ...
If the leftmost character is not a multibyte character, ORD() returns the same value as the ASCII() function. mysql> SELECT ORD('2'); -> 50
• POSITION(substr IN str) 1652
String Functions
POSITION(substr IN str) is a synonym for LOCATE(substr,str). • QUOTE(str) Quotes a string to produce a result that can be used as a properly escaped data value in an SQL statement. The string is returned enclosed by single quotation marks and with each instance of backslash (\), single quote ('), ASCII NUL, and Control+Z preceded by a backslash. If the argument is NULL, the return value is the word “NULL” without enclosing single quotation marks. mysql> SELECT QUOTE('Don\'t!'); -> 'Don\'t!' mysql> SELECT QUOTE(NULL); -> NULL
For comparison, see the quoting rules for literal strings and within the C API in Section 9.1.1, “String Literals”, and Section 27.8.7.56, “mysql_real_escape_string_quote()”. • REPEAT(str,count) Returns a string consisting of the string str repeated count times. If count is less than 1, returns an empty string. Returns NULL if str or count are NULL. mysql> SELECT REPEAT('MySQL', 3); -> 'MySQLMySQLMySQL'
• REPLACE(str,from_str,to_str) Returns the string str with all occurrences of the string from_str replaced by the string to_str. REPLACE() performs a case-sensitive match when searching for from_str. mysql> SELECT REPLACE('www.mysql.com', 'w', 'Ww'); -> 'WwWwWw.mysql.com'
This function is multibyte safe. • REVERSE(str) Returns the string str with the order of the characters reversed. mysql> SELECT REVERSE('abc'); -> 'cba'
This function is multibyte safe. • RIGHT(str,len) Returns the rightmost len characters from the string str, or NULL if any argument is NULL. mysql> SELECT RIGHT('foobarbar', 4); -> 'rbar'
This function is multibyte safe. • RPAD(str,len,padstr) Returns the string str, right-padded with the string padstr to a length of len characters. If str is longer than len, the return value is shortened to len characters. mysql> SELECT RPAD('hi',5,'?');
1653
String Functions
-> 'hi???' mysql> SELECT RPAD('hi',1,'?'); -> 'h'
This function is multibyte safe. • RTRIM(str) Returns the string str with trailing space characters removed. mysql> SELECT RTRIM('barbar -> 'barbar'
');
This function is multibyte safe. • SOUNDEX(str) Returns a soundex string from str. Two strings that sound almost the same should have identical soundex strings. A standard soundex string is four characters long, but the SOUNDEX() function returns an arbitrarily long string. You can use SUBSTRING() on the result to get a standard soundex string. All nonalphabetic characters in str are ignored. All international alphabetic characters outside the A-Z range are treated as vowels. Important When using SOUNDEX(), you should be aware of the following limitations: • This function, as currently implemented, is intended to work well with strings that are in the English language only. Strings in other languages may not produce reliable results. • This function is not guaranteed to provide consistent results with strings that use multibyte character sets, including utf-8. See Bug #22638 for more information. mysql> SELECT SOUNDEX('Hello'); -> 'H400' mysql> SELECT SOUNDEX('Quadratically'); -> 'Q36324'
Note This function implements the original Soundex algorithm, not the more popular enhanced version (also described by D. Knuth). The difference is that original version discards vowels first and duplicates second, whereas the enhanced version discards duplicates first and vowels second. • expr1 SOUNDS LIKE expr2 This is the same as SOUNDEX(expr1) = SOUNDEX(expr2). • SPACE(N) Returns a string consisting of N space characters. mysql> SELECT SPACE(6); -> ' '
• SUBSTR(str,pos), SUBSTR(str FROM pos), SUBSTR(str,pos,len), SUBSTR(str FROM pos FOR len) SUBSTR() is a synonym for SUBSTRING(). 1654
String Functions
• SUBSTRING(str,pos), SUBSTRING(str FROM pos), SUBSTRING(str,pos,len), SUBSTRING(str FROM pos FOR len) The forms without a len argument return a substring from string str starting at position pos. The forms with a len argument return a substring len characters long from string str, starting at position pos. The forms that use FROM are standard SQL syntax. It is also possible to use a negative value for pos. In this case, the beginning of the substring is pos characters from the end of the string, rather than the beginning. A negative value may be used for pos in any of the forms of this function. For all forms of SUBSTRING(), the position of the first character in the string from which the substring is to be extracted is reckoned as 1. mysql> SELECT SUBSTRING('Quadratically',5); -> 'ratically' mysql> SELECT SUBSTRING('foobarbar' FROM 4); -> 'barbar' mysql> SELECT SUBSTRING('Quadratically',5,6); -> 'ratica' mysql> SELECT SUBSTRING('Sakila', -3); -> 'ila' mysql> SELECT SUBSTRING('Sakila', -5, 3); -> 'aki' mysql> SELECT SUBSTRING('Sakila' FROM -4 FOR 2); -> 'ki'
This function is multibyte safe. If len is less than 1, the result is the empty string. • SUBSTRING_INDEX(str,delim,count) Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. SUBSTRING_INDEX() performs a case-sensitive match when searching for delim. mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2); -> 'www.mysql' mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2); -> 'mysql.com'
This function is multibyte safe. • TO_BASE64(str) Converts the string argument to base-64 encoded form and returns the result as a character string with the connection character set and collation. If the argument is not a string, it is converted to a string before conversion takes place. The result is NULL if the argument is NULL. Base-64 encoded strings can be decoded using the FROM_BASE64() function. mysql> SELECT TO_BASE64('abc'), FROM_BASE64(TO_BASE64('abc')); -> 'JWJj', 'abc'
Different base-64 encoding schemes exist. These are the encoding and decoding rules used by TO_BASE64() and FROM_BASE64(): • The encoding for alphabet value 62 is '+'. • The encoding for alphabet value 63 is '/'.
1655
String Functions
• Encoded output consists of groups of 4 printable characters. Each 3 bytes of the input data are encoded using 4 characters. If the last group is incomplete, it is padded with '=' characters to a length of 4. • A newline is added after each 76 characters of encoded output to divide long output into multiple lines. • Decoding recognizes and ignores newline, carriage return, tab, and space. • TRIM([{BOTH | LEADING | TRAILING} [remstr] FROM] str), TRIM([remstr FROM] str) Returns the string str with all remstr prefixes or suffixes removed. If none of the specifiers BOTH, LEADING, or TRAILING is given, BOTH is assumed. remstr is optional and, if not specified, spaces are removed. mysql> SELECT TRIM(' bar '); -> 'bar' mysql> SELECT TRIM(LEADING 'x' FROM 'xxxbarxxx'); -> 'barxxx' mysql> SELECT TRIM(BOTH 'x' FROM 'xxxbarxxx'); -> 'bar' mysql> SELECT TRIM(TRAILING 'xyz' FROM 'barxxyz'); -> 'barx'
This function is multibyte safe. • UCASE(str) UCASE() is a synonym for UPPER(). In MySQL 5.7, UCASE() used in a view is rewritten as UPPER() when storing the view's definition. (Bug #12844279) • UNHEX(str) For a string argument str, UNHEX(str) interprets each pair of characters in the argument as a hexadecimal number and converts it to the byte represented by the number. The return value is a binary string. mysql> SELECT UNHEX('4D7953514C'); -> 'MySQL' mysql> SELECT X'4D7953514C'; -> 'MySQL' mysql> SELECT UNHEX(HEX('string')); -> 'string' mysql> SELECT HEX(UNHEX('1267')); -> '1267'
The characters in the argument string must be legal hexadecimal digits: '0' .. '9', 'A' .. 'F', 'a' .. 'f'. If the argument contains any nonhexadecimal digits, the result is NULL: mysql> SELECT UNHEX('GG'); +-------------+ | UNHEX('GG') | +-------------+ | NULL | +-------------+
A NULL result can occur if the argument to UNHEX() is a BINARY column, because values are padded with 0x00 bytes when stored but those bytes are not stripped on retrieval. For example, '41' is stored into a CHAR(3) column as '41 ' and retrieved as '41' (with the trailing pad
1656
String Functions
space stripped), so UNHEX() for the column value returns 'A'. By contrast '41' is stored into a BINARY(3) column as '41\0' and retrieved as '41\0' (with the trailing pad 0x00 byte not stripped). '\0' is not a legal hexadecimal digit, so UNHEX() for the column value returns NULL. For a numeric argument N, the inverse of HEX(N) is not performed by UNHEX(). Use CONV(HEX(N),16,10) instead. See the description of HEX(). • UPPER(str) Returns the string str with all characters changed to uppercase according to the current character set mapping. The default is latin1 (cp1252 West European). mysql> SELECT UPPER('Hej'); -> 'HEJ'
See the description of LOWER() for information that also applies to UPPER(). This included information about how to perform lettercase conversion of binary strings (BINARY, VARBINARY, BLOB) for which these functions are ineffective, and information about case folding for Unicode character sets. This function is multibyte safe. In previous versions of MySQL, UPPER() used within a view was rewritten as UCASE() when storing the view's definition. In MySQL 5.7, UPPER() is never rewritten in such cases, but UCASE() used within views is instead rewritten as UPPER(). (Bug #12844279) • WEIGHT_STRING(str [AS {CHAR|BINARY}(N)] [LEVEL levels] [flags]) levels: N [ASC|DESC|REVERSE] [, N [ASC|DESC|REVERSE]] ... This function returns the weight string for the input string. The return value is a binary string that represents the comparison and sorting value of the string. It has these properties: • If WEIGHT_STRING(str1) = WEIGHT_STRING(str2), then str1 = str2 (str1 and str2 are considered equal) • If WEIGHT_STRING(str1) < WEIGHT_STRING(str2), then str1 < str2 (str1 sorts before str2) WEIGHT_STRING() is a debugging function intended for internal use. Its behavior can change without notice between MySQL versions. It can be used for testing and debugging of collations, especially if you are adding a new collation. See Section 10.13, “Adding a Collation to a Character Set”. This list briefly summarizes the arguments. More details are given in the discussion following the list. • str: The input string expression. • AS clause: Optional; cast the input string to a given type and length. • LEVEL clause: Optional; specify weight levels for the return value. • flags: Optional; unused. The input string, str, is a string expression. If the input is a nonbinary (character) string such as a CHAR, VARCHAR, or TEXT value, the return value contains the collation weights for the string. If the input is a binary (byte) string such as a BINARY, VARBINARY, or BLOB value, the return value is the same as the input (the weight for each byte in a binary string is the byte value). If the input is NULL, WEIGHT_STRING() returns NULL. Examples: 1657
String Functions
mysql> SET @s = _latin1 'AB' COLLATE latin1_swedish_ci; mysql> SELECT @s, HEX(@s), HEX(WEIGHT_STRING(@s)); +------+---------+------------------------+ | @s | HEX(@s) | HEX(WEIGHT_STRING(@s)) | +------+---------+------------------------+ | AB | 4142 | 4142 | +------+---------+------------------------+
mysql> SET @s = _latin1 'ab' COLLATE latin1_swedish_ci; mysql> SELECT @s, HEX(@s), HEX(WEIGHT_STRING(@s)); +------+---------+------------------------+ | @s | HEX(@s) | HEX(WEIGHT_STRING(@s)) | +------+---------+------------------------+ | ab | 6162 | 4142 | +------+---------+------------------------+
mysql> SET @s = CAST('AB' AS BINARY); mysql> SELECT @s, HEX(@s), HEX(WEIGHT_STRING(@s)); +------+---------+------------------------+ | @s | HEX(@s) | HEX(WEIGHT_STRING(@s)) | +------+---------+------------------------+ | AB | 4142 | 4142 | +------+---------+------------------------+
mysql> SET @s = CAST('ab' AS BINARY); mysql> SELECT @s, HEX(@s), HEX(WEIGHT_STRING(@s)); +------+---------+------------------------+ | @s | HEX(@s) | HEX(WEIGHT_STRING(@s)) | +------+---------+------------------------+ | ab | 6162 | 6162 | +------+---------+------------------------+
The preceding examples use HEX() to display the WEIGHT_STRING() result. Because the result is a binary value, HEX() can be especially useful when the result contains nonprinting values, to display it in printable form: mysql> SET @s = CONVERT(X'C39F' USING utf8) COLLATE utf8_czech_ci; mysql> SELECT HEX(WEIGHT_STRING(@s)); +------------------------+ | HEX(WEIGHT_STRING(@s)) | +------------------------+ | 0FEA0FEA | +------------------------+
For non-NULL return values, the data type of the value is VARBINARY if its length is within the maximum length for VARBINARY, otherwise the data type is BLOB. The AS clause may be given to cast the input string to a nonbinary or binary string and to force it to a given length: • AS CHAR(N) casts the string to a nonbinary string and pads it on the right with spaces to a length of N characters. N must be at least 1. If N is less than the length of the input string, the string is truncated to N characters. No warning occurs for truncation. • AS BINARY(N) is similar but casts the string to a binary string, N is measured in bytes (not characters), and padding uses 0x00 bytes (not spaces). mysql> SET NAMES 'latin1'; mysql> SELECT HEX(WEIGHT_STRING('ab' AS CHAR(4))); +-------------------------------------+ | HEX(WEIGHT_STRING('ab' AS CHAR(4))) | +-------------------------------------+
1658
String Functions
| 41422020 | +-------------------------------------+ mysql> SET NAMES 'utf8'; mysql> SELECT HEX(WEIGHT_STRING('ab' AS CHAR(4))); +-------------------------------------+ | HEX(WEIGHT_STRING('ab' AS CHAR(4))) | +-------------------------------------+ | 0041004200200020 | +-------------------------------------+
mysql> SELECT HEX(WEIGHT_STRING('ab' AS BINARY(4))); +---------------------------------------+ | HEX(WEIGHT_STRING('ab' AS BINARY(4))) | +---------------------------------------+ | 61620000 | +---------------------------------------+
The LEVEL clause may be given to specify that the return value should contain weights for specific collation levels. The levels specifier following the LEVEL keyword may be given either as a list of one or more integers separated by commas, or as a range of two integers separated by a dash. Whitespace around the punctuation characters does not matter. Examples: LEVEL 1 LEVEL 2, 3, 5 LEVEL 1-3
Any level less than 1 is treated as 1. Any level greater than the maximum for the input string collation is treated as maximum for the collation. The maximum varies per collation, but is never greater than 6. In a list of levels, levels must be given in increasing order. In a range of levels, if the second number is less than the first, it is treated as the first number (for example, 4-2 is the same as 4-4). If the LEVEL clause is omitted, MySQL assumes LEVEL 1 - max, where max is the maximum level for the collation. If LEVEL is specified using list syntax (not range syntax), any level number can be followed by these modifiers: • ASC: Return the weights without modification. This is the default. • DESC: Return bitwise-inverted weights (for example, 0x78f0 DESC = 0x870f). • REVERSE: Return the weights in reverse order (that is,the weights for the reversed string, with the first character last and the last first). Examples: mysql> SELECT HEX(WEIGHT_STRING(0x007fff LEVEL 1)); +--------------------------------------+ | HEX(WEIGHT_STRING(0x007fff LEVEL 1)) | +--------------------------------------+ | 007FFF | +--------------------------------------+
mysql> SELECT HEX(WEIGHT_STRING(0x007fff LEVEL 1 DESC)); +-------------------------------------------+ | HEX(WEIGHT_STRING(0x007fff LEVEL 1 DESC)) | +-------------------------------------------+
1659
String Comparison Functions
| FF8000 | +-------------------------------------------+
mysql> SELECT HEX(WEIGHT_STRING(0x007fff LEVEL 1 REVERSE)); +----------------------------------------------+ | HEX(WEIGHT_STRING(0x007fff LEVEL 1 REVERSE)) | +----------------------------------------------+ | FF7F00 | +----------------------------------------------+
mysql> SELECT HEX(WEIGHT_STRING(0x007fff LEVEL 1 DESC REVERSE)); +---------------------------------------------------+ | HEX(WEIGHT_STRING(0x007fff LEVEL 1 DESC REVERSE)) | +---------------------------------------------------+ | 0080FF | +---------------------------------------------------+
The flags clause currently is unused.
12.5.1 String Comparison Functions Table 12.8 String Comparison Operators Name
Description
LIKE
Simple pattern matching
NOT LIKE
Negation of simple pattern matching
STRCMP()
Compare two strings
If a string function is given a binary string as an argument, the resulting string is also a binary string. A number converted to a string is treated as a binary string. This affects only comparisons. Normally, if any expression in a string comparison is case sensitive, the comparison is performed in case-sensitive fashion. • expr LIKE pat [ESCAPE 'escape_char'] Pattern matching using an SQL pattern. Returns 1 (TRUE) or 0 (FALSE). If either expr or pat is NULL, the result is NULL. The pattern need not be a literal string. For example, it can be specified as a string expression or table column. Per the SQL standard, LIKE performs matching on a per-character basis, thus it can produce results different from the = comparison operator: mysql> SELECT 'ä' LIKE 'ae' COLLATE latin1_german2_ci; +-----------------------------------------+ | 'ä' LIKE 'ae' COLLATE latin1_german2_ci | +-----------------------------------------+ | 0 | +-----------------------------------------+ mysql> SELECT 'ä' = 'ae' COLLATE latin1_german2_ci; +--------------------------------------+ | 'ä' = 'ae' COLLATE latin1_german2_ci | +--------------------------------------+ | 1 | +--------------------------------------+
In particular, trailing spaces are significant, which is not true for CHAR or VARCHAR comparisons performed with the = operator: mysql> SELECT 'a' = 'a ', 'a' LIKE 'a ';
1660
String Comparison Functions
+------------+---------------+ | 'a' = 'a ' | 'a' LIKE 'a ' | +------------+---------------+ | 1 | 0 | +------------+---------------+ 1 row in set (0.00 sec)
With LIKE you can use the following two wildcard characters in the pattern: • % matches any number of characters, even zero characters. • _ matches exactly one character. mysql> SELECT 'David!' LIKE 'David_'; -> 1 mysql> SELECT 'David!' LIKE '%D%v%'; -> 1
To test for literal instances of a wildcard character, precede it by the escape character. If you do not specify the ESCAPE character, \ is assumed. • \% matches one % character. • \_ matches one _ character. mysql> SELECT 'David!' LIKE 'David\_'; -> 0 mysql> SELECT 'David_' LIKE 'David\_'; -> 1
To specify a different escape character, use the ESCAPE clause: mysql> SELECT 'David_' LIKE 'David|_' ESCAPE '|'; -> 1
The escape sequence should be empty or one character long. The expression must evaluate as a constant at execution time. If the NO_BACKSLASH_ESCAPES SQL mode is enabled, the sequence cannot be empty. The following two statements illustrate that string comparisons are not case-sensitive unless one of the operands is case-sensitive (uses a case-sensitive collation or is a binary string): mysql> SELECT -> 1 mysql> SELECT -> 0 mysql> SELECT -> 0 mysql> SELECT -> 0
'abc' LIKE 'ABC'; 'abc' LIKE _latin1 'ABC' COLLATE latin1_general_cs; 'abc' LIKE _latin1 'ABC' COLLATE latin1_bin; 'abc' LIKE BINARY 'ABC';
As an extension to standard SQL, MySQL permits LIKE on numeric expressions. mysql> SELECT 10 LIKE '1%'; -> 1
Note Because MySQL uses C escape syntax in strings (for example, \n to represent a newline character), you must double any \ that you use in LIKE strings. For example, to search for \n, specify it as \\n. To search for \, specify it as \\\\; this is because the backslashes are stripped once by the 1661
String Comparison Functions
parser and again when the pattern match is made, leaving a single backslash to be matched against. Exception: At the end of the pattern string, backslash can be specified as \\. At the end of the string, backslash stands for itself because there is nothing following to escape. Suppose that a table contains the following values: mysql> SELECT filename FROM t1; +--------------+ | filename | +--------------+ | C: | | C:\ | | C:\Programs | | C:\Programs\ | +--------------+
To test for values that end with backslash, you can match the values using either of the following patterns: mysql> SELECT filename, filename LIKE '%\\' FROM t1; +--------------+---------------------+ | filename | filename LIKE '%\\' | +--------------+---------------------+ | C: | 0 | | C:\ | 1 | | C:\Programs | 0 | | C:\Programs\ | 1 | +--------------+---------------------+ mysql> SELECT filename, filename LIKE '%\\\\' FROM t1; +--------------+-----------------------+ | filename | filename LIKE '%\\\\' | +--------------+-----------------------+ | C: | 0 | | C:\ | 1 | | C:\Programs | 0 | | C:\Programs\ | 1 | +--------------+-----------------------+
• expr NOT LIKE pat [ESCAPE 'escape_char'] This is the same as NOT (expr LIKE pat [ESCAPE 'escape_char']). Note Aggregate queries involving NOT LIKE comparisons with columns containing NULL may yield unexpected results. For example, consider the following table and data: CREATE TABLE foo (bar VARCHAR(10)); INSERT INTO foo VALUES (NULL), (NULL);
The query SELECT COUNT(*) FROM foo WHERE bar LIKE '%baz%'; returns 0. You might assume that SELECT COUNT(*) FROM foo WHERE bar NOT LIKE '%baz%'; would return 2. However, this is not the case: The second query returns 0. This is because NULL NOT LIKE expr always returns NULL, regardless of the value of expr. The same is true for aggregate queries involving NULL and comparisons using NOT RLIKE or NOT REGEXP. In such cases, you must test explicitly for NOT NULL using OR (and not AND), as shown here:
1662
Regular Expressions
SELECT COUNT(*) FROM foo WHERE bar NOT LIKE '%baz%' OR bar IS NULL;
• STRCMP(expr1,expr2) STRCMP() returns 0 if the strings are the same, -1 if the first argument is smaller than the second according to the current sort order, and 1 otherwise. mysql> SELECT STRCMP('text', 'text2'); -> -1 mysql> SELECT STRCMP('text2', 'text'); -> 1 mysql> SELECT STRCMP('text', 'text'); -> 0
STRCMP() performs the comparison using the collation of the arguments. mysql> SET @s1 = _latin1 'x' COLLATE latin1_general_ci; mysql> SET @s2 = _latin1 'X' COLLATE latin1_general_ci; mysql> SET @s3 = _latin1 'x' COLLATE latin1_general_cs; mysql> SET @s4 = _latin1 'X' COLLATE latin1_general_cs; mysql> SELECT STRCMP(@s1, @s2), STRCMP(@s3, @s4); +------------------+------------------+ | STRCMP(@s1, @s2) | STRCMP(@s3, @s4) | +------------------+------------------+ | 0 | 1 | +------------------+------------------+
If the collations are incompatible, one of the arguments must be converted to be compatible with the other. See Section 10.8.4, “Collation Coercibility in Expressions”.
mysql> SELECT STRCMP(@s1, @s3); ERROR 1267 (HY000): Illegal mix of collations (latin1_general_ci,IMPLICIT) and (latin1_general_cs,IMPLICIT) for operation 'strcmp' mysql> SELECT STRCMP(@s1, @s3 COLLATE latin1_general_ci); +--------------------------------------------+ | STRCMP(@s1, @s3 COLLATE latin1_general_ci) | +--------------------------------------------+ | 0 | +--------------------------------------------+
12.5.2 Regular Expressions Table 12.9 Regular Expression Operators Name
Description
NOT REGEXP
Negation of REGEXP
REGEXP
Whether string matches regular expression
RLIKE
Whether string matches regular expression
A regular expression is a powerful way of specifying a pattern for a complex search. This section discusses the operators available for regular expression matching and illustrates, with examples, some of the special characters and constructs that can be used for regular expression operations. See also Section 3.3.4.7, “Pattern Matching”. MySQL uses Henry Spencer's implementation of regular expressions, which is aimed at conformance with POSIX 1003.2. MySQL uses the extended version to support regular expression pattern-matching operations in SQL statements. This section does not contain all the details that can be found in Henry Spencer's regex(7) manual page. That manual page is included in MySQL source distributions, in the regex.7 file under the regex directory. • Regular Expression Operators
1663
Regular Expressions
• Regular Expression Syntax
Regular Expression Operators • expr NOT REGEXP pat, expr NOT RLIKE pat This is the same as NOT (expr REGEXP pat). •
expr REGEXP pat, expr RLIKE pat Returns 1 if the string expr matches the regular expression specified by the pattern pat, 0 otherwise. If either expr or pat is NULL, the return value is NULL. RLIKE is a synonym for REGEXP. The pattern can be an extended regular expression, the syntax for which is discussed in Regular Expression Syntax. The pattern need not be a literal string. For example, it can be specified as a string expression or table column. Note Because MySQL uses the C escape syntax in strings (for example, \n to represent the newline character), you must double any \ that you use in your REGEXP arguments. Regular expression operations use the character set and collation of the string expression and pattern arguments when deciding the type of a character and performing the comparison. If the arguments have different character sets or collations, coercibility rules apply as described in Section 10.8.4, “Collation Coercibility in Expressions”. If either argument is a binary string, the arguments are handled in case-sensitive fashion as binary strings. mysql> SELECT 'Michael!' REGEXP '.*'; +------------------------+ | 'Michael!' REGEXP '.*' | +------------------------+ | 1 | +------------------------+ mysql> SELECT 'new*\n*line' REGEXP 'new\\*.\\*line'; +---------------------------------------+ | 'new*\n*line' REGEXP 'new\\*.\\*line' | +---------------------------------------+ | 0 | +---------------------------------------+ mysql> SELECT 'a' REGEXP '^[a-d]'; +---------------------+ | 'a' REGEXP '^[a-d]' | +---------------------+ | 1 | +---------------------+ mysql> SELECT 'a' REGEXP 'A', 'a' REGEXP BINARY 'A'; +----------------+-----------------------+ | 'a' REGEXP 'A' | 'a' REGEXP BINARY 'A' | +----------------+-----------------------+ | 1 | 0 | +----------------+-----------------------+
Warning The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multibyte safe and may produce unexpected results with multibyte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal. 1664
Regular Expressions
Regular Expression Syntax A regular expression describes a set of strings. The simplest regular expression is one that has no special characters in it. For example, the regular expression hello matches hello and nothing else. Nontrivial regular expressions use certain special constructs so that they can match more than one string. For example, the regular expression hello|world contains the | alternation operator and matches either the hello or world. As a more complex example, the regular expression B[an]*s matches any of the strings Bananas, Baaaaas, Bs, and any other string starting with a B, ending with an s, and containing any number of a or n characters in between. A regular expression for the REGEXP operator may use any of the following special characters and constructs: • ^ Match the beginning of a string. mysql> SELECT 'fo\nfo' REGEXP '^fo$'; mysql> SELECT 'fofo' REGEXP '^fo';
-> 0 -> 1
• $ Match the end of a string. mysql> SELECT 'fo\no' REGEXP '^fo\no$'; mysql> SELECT 'fo\no' REGEXP '^fo$';
-> 1 -> 0
• . Match any character (including carriage return and newline). mysql> SELECT 'fofo' REGEXP '^f.*$'; mysql> SELECT 'fo\r\nfo' REGEXP '^f.*$';
-> 1 -> 1
• a* Match any sequence of zero or more a characters. mysql> SELECT 'Ban' REGEXP '^Ba*n'; mysql> SELECT 'Baaan' REGEXP '^Ba*n'; mysql> SELECT 'Bn' REGEXP '^Ba*n';
-> 1 -> 1 -> 1
• a+ Match any sequence of one or more a characters. mysql> SELECT 'Ban' REGEXP '^Ba+n'; mysql> SELECT 'Bn' REGEXP '^Ba+n';
-> 1 -> 0
• a? Match either zero or one a character. mysql> SELECT 'Bn' REGEXP '^Ba?n'; mysql> SELECT 'Ban' REGEXP '^Ba?n'; mysql> SELECT 'Baan' REGEXP '^Ba?n';
-> 1 -> 1 -> 0
1665
Regular Expressions
• de|abc Alternation; match either of the sequences de or abc. mysql> mysql> mysql> mysql> mysql> mysql>
SELECT SELECT SELECT SELECT SELECT SELECT
'pi' REGEXP 'pi|apa'; 'axe' REGEXP 'pi|apa'; 'apa' REGEXP 'pi|apa'; 'apa' REGEXP '^(pi|apa)$'; 'pi' REGEXP '^(pi|apa)$'; 'pix' REGEXP '^(pi|apa)$';
-> -> -> -> -> ->
1 0 1 1 1 0
• (abc)* Match zero or more instances of the sequence abc. mysql> SELECT 'pi' REGEXP '^(pi)*$'; mysql> SELECT 'pip' REGEXP '^(pi)*$'; mysql> SELECT 'pipi' REGEXP '^(pi)*$';
-> 1 -> 0 -> 1
• {1}, {2,3} Repetition; {n} and {m,n} notation provide a more general way of writing regular expressions that match many occurrences of the previous atom (or “piece”) of the pattern. m and n are integers. • a* Can be written as a{0,}. • a+ Can be written as a{1,}. • a? Can be written as a{0,1}. To be more precise, a{n} matches exactly n instances of a. a{n,} matches n or more instances of a. a{m,n} matches m through n instances of a, inclusive. If both m and n are given, m must be less than or equal to n. m and n must be in the range from 0 to RE_DUP_MAX (default 255), inclusive. mysql> SELECT 'abcde' REGEXP 'a[bcd]{2}e'; mysql> SELECT 'abcde' REGEXP 'a[bcd]{3}e'; mysql> SELECT 'abcde' REGEXP 'a[bcd]{1,10}e';
-> 0 -> 1 -> 1
• [a-dX], [^a-dX] Matches any character that is (or is not, if ^ is used) either a, b, c, d or X. A - character between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9] matches any decimal digit. To include a literal ] character, it must immediately follow the opening bracket [. To include a literal - character, it must be written first or last. Any character that does not have a defined special meaning inside a [] pair matches only itself. mysql> mysql> mysql> mysql> mysql> mysql>
SELECT SELECT SELECT SELECT SELECT SELECT
'aXbc' REGEXP '[a-dXYZ]'; 'aXbc' REGEXP '^[a-dXYZ]$'; 'aXbc' REGEXP '^[a-dXYZ]+$'; 'aXbc' REGEXP '^[^a-dXYZ]+$'; 'gheis' REGEXP '^[^a-dXYZ]+$'; 'gheisa' REGEXP '^[^a-dXYZ]+$';
• [.characters.]
1666
-> -> -> -> -> ->
1 0 1 0 1 0
Regular Expressions
Within a bracket expression (written using [ and ]), matches the sequence of characters of that collating element. characters is either a single character or a character name like newline. The following table lists the permissible character names. The following table shows the permissible character names and the characters that they match. For characters given as numeric values, the values are represented in octal.
Name
Character
Name
Character
NUL
0
SOH
001
STX
002
ETX
003
EOT
004
ENQ
005
ACK
006
BEL
007
alert
007
BS
010
backspace
'\b'
HT
011
tab
'\t'
LF
012
newline
'\n'
VT
013
vertical-tab
'\v'
FF
014
form-feed
'\f'
CR
015
carriage-return
'\r'
SO
016
SI
017
DLE
020
DC1
021
DC2
022
DC3
023
DC4
024
NAK
025
SYN
026
ETB
027
CAN
030
EM
031
SUB
032
ESC
033
IS4
034
FS
034
IS3
035
GS
035
IS2
036
RS
036
IS1
037
US
037
space
' '
exclamation-mark
'!'
quotation-mark
'"'
number-sign
'#'
dollar-sign
'$'
percent-sign
'%'
ampersand
'&'
apostrophe
'\''
left-parenthesis
'('
right-parenthesis ')'
asterisk
'*'
plus-sign
'+'
comma
','
hyphen
'-'
hyphen-minus
'-'
period
'.'
full-stop
'.'
slash
'/'
solidus
'/'
zero
'0'
one
'1'
two
'2'
three
'3'
four
'4'
five
'5'
six
'6'
seven
'7'
1667
Regular Expressions
Name
Character
Name
Character
eight
'8'
nine
'9'
colon
':'
semicolon
';'
less-than-sign
'<'
equals-sign
'='
greater-than-sign '>'
question-mark
'?'
commercial-at
'@'
left-squarebracket
'['
backslash
'\\'
reverse-solidus
'\\'
right-squarebracket
']'
circumflex
'^'
circumflex-accent '^'
underscore
'_'
low-line
'_'
grave-accent
'`'
left-brace
'{'
left-curlybracket
'{'
vertical-line
'|'
right-brace
'}'
right-curlybracket
'}'
tilde
'~'
DEL
177
mysql> SELECT '~' REGEXP '[[.~.]]'; mysql> SELECT '~' REGEXP '[[.tilde.]]';
-> 1 -> 1
• [=character_class=] Within a bracket expression (written using [ and ]), [=character_class=] represents an equivalence class. It matches all characters with the same collation value, including itself. For example, if o and (+) are the members of an equivalence class, [[=o=]], [[=(+)=]], and [o(+)] are all synonymous. An equivalence class may not be used as an endpoint of a range. • [:character_class:] Within a bracket expression (written using [ and ]), [:character_class:] represents a character class that matches all characters belonging to that class. The following table lists the standard class names. These names stand for the character classes defined in the ctype(3) manual page. A particular locale may provide other class names. A character class may not be used as an endpoint of a range.
1668
Character Class Name
Meaning
alnum
Alphanumeric characters
alpha
Alphabetic characters
blank
Whitespace characters
cntrl
Control characters
digit
Digit characters
graph
Graphic characters
lower
Lowercase alphabetic characters
print
Graphic or space characters
punct
Punctuation characters
space
Space, tab, newline, and carriage return
Character Set and Collation of Function Results
Character Class Name
Meaning
upper
Uppercase alphabetic characters
xdigit
Hexadecimal digit characters
mysql> SELECT 'justalnums' REGEXP '[[:alnum:]]+'; mysql> SELECT '!!' REGEXP '[[:alnum:]]+';
-> 1 -> 0
• [[:<:]], [[:>:]] These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_). mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]';
-> 1 -> 0
To use a literal instance of a special character in a regular expression, precede it by two backslash (\) characters. The MySQL parser interprets one of the backslashes, and the regular expression library interprets the other. For example, to match the string 1+2 that contains the special + character, only the last of the following regular expressions is the correct one: mysql> SELECT '1+2' REGEXP '1+2'; mysql> SELECT '1+2' REGEXP '1\+2'; mysql> SELECT '1+2' REGEXP '1\\+2';
-> 0 -> 0 -> 1
12.5.3 Character Set and Collation of Function Results MySQL has many operators and functions that return a string. This section answers the question: What is the character set and collation of such a string? For simple functions that take string input and return a string result as output, the output's character set and collation are the same as those of the principal input value. For example, UPPER(X) returns a string with the same character string and collation as X. The same applies for INSTR(), LCASE(), LOWER(), LTRIM(), MID(), REPEAT(), REPLACE(), REVERSE(), RIGHT(), RPAD(), RTRIM(), SOUNDEX(), SUBSTRING(), TRIM(), UCASE(), and UPPER(). Note The REPLACE() function, unlike all other functions, always ignores the collation of the string input and performs a case-sensitive comparison. If a string input or function result is a binary string, the string has the binary character set and collation. This can be checked by using the CHARSET() and COLLATION() functions, both of which return binary for a binary string argument: mysql> SELECT CHARSET(BINARY 'a'), COLLATION(BINARY 'a'); +---------------------+-----------------------+ | CHARSET(BINARY 'a') | COLLATION(BINARY 'a') | +---------------------+-----------------------+ | binary | binary | +---------------------+-----------------------+
For operations that combine multiple string inputs and return a single string output, the “aggregation rules” of standard SQL apply for determining the collation of the result: • If an explicit COLLATE Y occurs, use Y. • If explicit COLLATE Y and COLLATE Z occur, raise an error.
1669
Numeric Functions and Operators
• Otherwise, if all collations are Y, use Y. • Otherwise, the result has no collation. For example, with CASE ... WHEN a THEN b WHEN b THEN c COLLATE X END, the resulting collation is X. The same applies for UNION, ||, CONCAT(), ELT(), GREATEST(), IF(), and LEAST(). For operations that convert to character data, the character set and collation of the strings that result from the operations are defined by the character_set_connection and collation_connection system variables that determine the default connection character set and collation (see Section 10.4, “Connection Character Sets and Collations”). This applies only to CAST(), CONV(), FORMAT(), HEX(), and SPACE(). As of MySQL 5.7.19, an exception to the preceding priniciple occurs for expressions for virtual generated columns. In such expressions, the table character set is used for CONV() or HEX() results, regardless of connection character set. If there is any question about the character set or collation of the result returned by a string function, use the CHARSET() or COLLATION() function to find out: mysql> SELECT USER(), CHARSET(USER()), COLLATION(USER()); +----------------+-----------------+-------------------+ | USER() | CHARSET(USER()) | COLLATION(USER()) | +----------------+-----------------+-------------------+ | test@localhost | utf8 | utf8_general_ci | +----------------+-----------------+-------------------+ mysql> SELECT CHARSET(COMPRESS('abc')), COLLATION(COMPRESS('abc')); +--------------------------+----------------------------+ | CHARSET(COMPRESS('abc')) | COLLATION(COMPRESS('abc')) | +--------------------------+----------------------------+ | binary | binary | +--------------------------+----------------------------+
12.6 Numeric Functions and Operators Table 12.10 Numeric Functions and Operators
1670
Name
Description
ABS()
Return the absolute value
ACOS()
Return the arc cosine
ASIN()
Return the arc sine
ATAN()
Return the arc tangent
ATAN2(), ATAN()
Return the arc tangent of the two arguments
CEIL()
Return the smallest integer value not less than the argument
CEILING()
Return the smallest integer value not less than the argument
CONV()
Convert numbers between different number bases
COS()
Return the cosine
COT()
Return the cotangent
CRC32()
Compute a cyclic redundancy check value
DEGREES()
Convert radians to degrees
DIV
Integer division
/
Division operator
EXP()
Raise to the power of
FLOOR()
Return the largest integer value not greater than the argument
Arithmetic Operators
Name
Description
LN()
Return the natural logarithm of the argument
LOG()
Return the natural logarithm of the first argument
LOG10()
Return the base-10 logarithm of the argument
LOG2()
Return the base-2 logarithm of the argument
-
Minus operator
MOD()
Return the remainder
%, MOD
Modulo operator
PI()
Return the value of pi
+
Addition operator
POW()
Return the argument raised to the specified power
POWER()
Return the argument raised to the specified power
RADIANS()
Return argument converted to radians
RAND()
Return a random floating-point value
ROUND()
Round the argument
SIGN()
Return the sign of the argument
SIN()
Return the sine of the argument
SQRT()
Return the square root of the argument
TAN()
Return the tangent of the argument
*
Multiplication operator
TRUNCATE()
Truncate to specified number of decimal places
-
Change the sign of the argument
12.6.1 Arithmetic Operators Table 12.11 Arithmetic Operators Name
Description
DIV
Integer division
/
Division operator
-
Minus operator
%, MOD
Modulo operator
+
Addition operator
*
Multiplication operator
-
Change the sign of the argument
The usual arithmetic operators are available. The result is determined according to the following rules: • In the case of -, +, and *, the result is calculated with BIGINT (64-bit) precision if both operands are integers. • If both operands are integers and any of them are unsigned, the result is an unsigned integer. For subtraction, if the NO_UNSIGNED_SUBTRACTION SQL mode is enabled, the result is signed even if any operand is unsigned. • If any of the operands of a +, -, /, *, % is a real or string value, the precision of the result is the precision of the operand with the maximum precision.
1671
Arithmetic Operators
• In division performed with /, the scale of the result when using two exact-value operands is the scale of the first operand plus the value of the div_precision_increment system variable (which is 4 by default). For example, the result of the expression 5.05 / 0.014 has a scale of six decimal places (360.714286). These rules are applied for each operation, such that nested calculations imply the precision of each component. Hence, (14620 / 9432456) / (24250 / 9432456), resolves first to (0.0014) / (0.0026), with the final result having 8 decimal places (0.60288653). Because of these rules and the way they are applied, care should be taken to ensure that components and subcomponents of a calculation use the appropriate level of precision. See Section 12.10, “Cast Functions and Operators”. For information about handling of overflow in numeric expression evaluation, see Section 11.2.6, “Outof-Range and Overflow Handling”. Arithmetic operators apply to numbers. For other types of values, alternative operations may be available. For example, to add date values, use DATE_ADD(); see Section 12.7, “Date and Time Functions”. •
+ Addition: mysql> SELECT 3+5; -> 8
•
Subtraction: mysql> SELECT 3-5; -> -2
•
Unary minus. This operator changes the sign of the operand. mysql> SELECT - 2; -> -2
Note If this operator is used with a BIGINT, the return value is also a BIGINT. This means that you should avoid using - on integers that may have the value of 63 −2 . •
* Multiplication: mysql> SELECT 3*5; -> 15 mysql> SELECT 18014398509481984*18014398509481984.0; -> 324518553658426726783156020576256.0 mysql> SELECT 18014398509481984*18014398509481984; -> out-of-range error
The last expression produces an error because the result of the integer multiplication exceeds the 64-bit range of BIGINT calculations. (See Section 11.2, “Numeric Types”.) •
1672
/
Mathematical Functions
Division: mysql> SELECT 3/5; -> 0.60
Division by zero produces a NULL result: mysql> SELECT 102/(1-1); -> NULL
A division is calculated with BIGINT arithmetic only if performed in a context where its result is converted to an integer. • DIV Integer division. Discards from the division result any fractional part to the right of the decimal point. If either operand has a noninteger type, the operands are converted to DECIMAL and divided using DECIMAL arithmetic before converting the result to BIGINT. If the result exceeds BIGINT range, an error occurs. mysql> SELECT 5 DIV 2, -5 DIV 2, 5 DIV -2, -5 DIV -2; -> 2, -2, -2, 2
• N % M, N MOD M Modulo operation. Returns the remainder of N divided by M. For more information, see the description for the MOD() function in Section 12.6.2, “Mathematical Functions”.
12.6.2 Mathematical Functions Table 12.12 Mathematical Functions Name
Description
ABS()
Return the absolute value
ACOS()
Return the arc cosine
ASIN()
Return the arc sine
ATAN()
Return the arc tangent
ATAN2(), ATAN()
Return the arc tangent of the two arguments
CEIL()
Return the smallest integer value not less than the argument
CEILING()
Return the smallest integer value not less than the argument
CONV()
Convert numbers between different number bases
COS()
Return the cosine
COT()
Return the cotangent
CRC32()
Compute a cyclic redundancy check value
DEGREES()
Convert radians to degrees
EXP()
Raise to the power of
FLOOR()
Return the largest integer value not greater than the argument
LN()
Return the natural logarithm of the argument
LOG()
Return the natural logarithm of the first argument
LOG10()
Return the base-10 logarithm of the argument
1673
Mathematical Functions
Name
Description
LOG2()
Return the base-2 logarithm of the argument
MOD()
Return the remainder
PI()
Return the value of pi
POW()
Return the argument raised to the specified power
POWER()
Return the argument raised to the specified power
RADIANS()
Return argument converted to radians
RAND()
Return a random floating-point value
ROUND()
Round the argument
SIGN()
Return the sign of the argument
SIN()
Return the sine of the argument
SQRT()
Return the square root of the argument
TAN()
Return the tangent of the argument
TRUNCATE()
Truncate to specified number of decimal places
All mathematical functions return NULL in the event of an error. • ABS(X) Returns the absolute value of X. mysql> SELECT ABS(2); -> 2 mysql> SELECT ABS(-32); -> 32
This function is safe to use with BIGINT values. • ACOS(X) Returns the arc cosine of X, that is, the value whose cosine is X. Returns NULL if X is not in the range -1 to 1. mysql> SELECT ACOS(1); -> 0 mysql> SELECT ACOS(1.0001); -> NULL mysql> SELECT ACOS(0); -> 1.5707963267949
• ASIN(X) Returns the arc sine of X, that is, the value whose sine is X. Returns NULL if X is not in the range -1 to 1. mysql> SELECT ASIN(0.2); -> 0.20135792079033 mysql> SELECT ASIN('foo'); +-------------+ | ASIN('foo') | +-------------+ | 0 | +-------------+ 1 row in set, 1 warning (0.00 sec) mysql> SHOW WARNINGS;
1674
Mathematical Functions
+---------+------+-----------------------------------------+ | Level | Code | Message | +---------+------+-----------------------------------------+ | Warning | 1292 | Truncated incorrect DOUBLE value: 'foo' | +---------+------+-----------------------------------------+
• ATAN(X) Returns the arc tangent of X, that is, the value whose tangent is X. mysql> SELECT ATAN(2); -> 1.1071487177941 mysql> SELECT ATAN(-2); -> -1.1071487177941
• ATAN(Y,X), ATAN2(Y,X) Returns the arc tangent of the two variables X and Y. It is similar to calculating the arc tangent of Y / X, except that the signs of both arguments are used to determine the quadrant of the result. mysql> SELECT ATAN(-2,2); -> -0.78539816339745 mysql> SELECT ATAN2(PI(),0); -> 1.5707963267949
• CEIL(X) CEIL() is a synonym for CEILING(). • CEILING(X) Returns the smallest integer value not less than X. mysql> SELECT CEILING(1.23); -> 2 mysql> SELECT CEILING(-1.23); -> -1
For exact-value numeric arguments, the return value has an exact-value numeric type. For string or floating-point arguments, the return value has a floating-point type. • CONV(N,from_base,to_base) Converts numbers between different number bases. Returns a string representation of the number N, converted from base from_base to base to_base. Returns NULL if any argument is NULL. The argument N is interpreted as an integer, but may be specified as an integer or a string. The minimum base is 2 and the maximum base is 36. If from_base is a negative number, N is regarded as a signed number. Otherwise, N is treated as unsigned. CONV() works with 64-bit precision. mysql> SELECT CONV('a',16,2); -> '1010' mysql> SELECT CONV('6E',18,8); -> '172' mysql> SELECT CONV(-17,10,-18); -> '-H' mysql> SELECT CONV(10+'10'+'10'+X'0a',10,10); -> '40'
• COS(X) Returns the cosine of X, where X is given in radians. mysql> SELECT COS(PI());
1675
Mathematical Functions
-> -1
• COT(X) Returns the cotangent of X. mysql> SELECT COT(12); -> -1.5726734063977 mysql> SELECT COT(0); -> out-of-range error
• CRC32(expr) Computes a cyclic redundancy check value and returns a 32-bit unsigned value. The result is NULL if the argument is NULL. The argument is expected to be a string and (if possible) is treated as one if it is not. mysql> SELECT CRC32('MySQL'); -> 3259397556 mysql> SELECT CRC32('mysql'); -> 2501908538
• DEGREES(X) Returns the argument X, converted from radians to degrees. mysql> SELECT DEGREES(PI()); -> 180 mysql> SELECT DEGREES(PI() / 2); -> 90
• EXP(X) Returns the value of e (the base of natural logarithms) raised to the power of X. The inverse of this function is LOG() (using a single argument only) or LN(). mysql> SELECT EXP(2); -> 7.3890560989307 mysql> SELECT EXP(-2); -> 0.13533528323661 mysql> SELECT EXP(0); -> 1
• FLOOR(X) Returns the largest integer value not greater than X. mysql> SELECT FLOOR(1.23), FLOOR(-1.23); -> 1, -2
For exact-value numeric arguments, the return value has an exact-value numeric type. For string or floating-point arguments, the return value has a floating-point type. • FORMAT(X,D) Formats the number X to a format like '#,###,###.##', rounded to D decimal places, and returns the result as a string. For details, see Section 12.5, “String Functions”. • HEX(N_or_S)
1676
Mathematical Functions
This function can be used to obtain a hexadecimal representation of a decimal number or a string; the manner in which it does so varies according to the argument's type. See this function's description in Section 12.5, “String Functions”, for details. • LN(X) Returns the natural logarithm of X; that is, the base-e logarithm of X. If X is less than or equal to 0.0E0, the function returns NULL and (as of MySQL 5.7.4) a warning “Invalid argument for logarithm” is reported. mysql> SELECT LN(2); -> 0.69314718055995 mysql> SELECT LN(-2); -> NULL
This function is synonymous with LOG(X). The inverse of this function is the EXP() function. • LOG(X), LOG(B,X) If called with one parameter, this function returns the natural logarithm of X. If X is less than or equal to 0.0E0, the function returns NULL and (as of MySQL 5.7.4) a warning “Invalid argument for logarithm” is reported. The inverse of this function (when called with a single argument) is the EXP() function. mysql> SELECT LOG(2); -> 0.69314718055995 mysql> SELECT LOG(-2); -> NULL
If called with two parameters, this function returns the logarithm of X to the base B. If X is less than or equal to 0, or if B is less than or equal to 1, then NULL is returned. mysql> SELECT LOG(2,65536); -> 16 mysql> SELECT LOG(10,100); -> 2 mysql> SELECT LOG(1,100); -> NULL
LOG(B,X) is equivalent to LOG(X) / LOG(B). • LOG2(X) Returns the base-2 logarithm of X. If X is less than or equal to 0.0E0, the function returns NULL and (as of MySQL 5.7.4) a warning “Invalid argument for logarithm” is reported. mysql> SELECT LOG2(65536); -> 16 mysql> SELECT LOG2(-100); -> NULL
LOG2() is useful for finding out how many bits a number requires for storage. This function is equivalent to the expression LOG(X) / LOG(2). • LOG10(X) Returns the base-10 logarithm of X. If X is less than or equal to 0.0E0, the function returns NULL and (as of MySQL 5.7.4) a warning “Invalid argument for logarithm” is reported.
1677
Mathematical Functions
mysql> SELECT LOG10(2); -> 0.30102999566398 mysql> SELECT LOG10(100); -> 2 mysql> SELECT LOG10(-100); -> NULL
LOG10(X) is equivalent to LOG(10,X). •
MOD(N,M), N % M, N MOD M Modulo operation. Returns the remainder of N divided by M. mysql> SELECT -> 4 mysql> SELECT -> 1 mysql> SELECT -> 2 mysql> SELECT -> 2
MOD(234, 10); 253 % 7; MOD(29,9); 29 MOD 9;
This function is safe to use with BIGINT values. MOD() also works on values that have a fractional part and returns the exact remainder after division: mysql> SELECT MOD(34.5,3); -> 1.5
MOD(N,0) returns NULL. • PI() Returns the value of π (pi). The default number of decimal places displayed is seven, but MySQL uses the full double-precision value internally. mysql> SELECT PI(); -> 3.141593 mysql> SELECT PI()+0.000000000000000000; -> 3.141592653589793116
• POW(X,Y) Returns the value of X raised to the power of Y. mysql> SELECT POW(2,2); -> 4 mysql> SELECT POW(2,-2); -> 0.25
• POWER(X,Y) This is a synonym for POW(). • RADIANS(X) Returns the argument X, converted from degrees to radians. (Note that π radians equals 180 degrees.) mysql> SELECT RADIANS(90); -> 1.5707963267949
1678
Mathematical Functions
• RAND([N]) Returns a random floating-point value v in the range 0 <= v < 1.0. To obtain a random integer R in the range i <= R < j, use the expression FLOOR(i + RAND() * (j − i)). For example, to obtain a random integer in the range the range 7 <= R < 12, use the following statement: SELECT FLOOR(7 + (RAND() * 5));
If an integer argument N is specified, it is used as the seed value: • With a constant initializer argument, the seed is initialized once when the statement is prepared, prior to execution. • With a nonconstant initializer argument (such as a column name), the seed is initialized with the value for each invocation of RAND(). One implication of this behavior is that for equal argument values, RAND(N) returns the same value each time, and thus produces a repeatable sequence of column values. In the following example, the sequence of values produced by RAND(3) is the same both places it occurs.
mysql> CREATE TABLE t (i INT); Query OK, 0 rows affected (0.42 sec) mysql> INSERT INTO t VALUES(1),(2),(3); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> SELECT i, RAND() FROM t; +------+------------------+ | i | RAND() | +------+------------------+ | 1 | 0.61914388706828 | | 2 | 0.93845168309142 | | 3 | 0.83482678498591 | +------+------------------+ 3 rows in set (0.00 sec) mysql> SELECT i, RAND(3) FROM t; +------+------------------+ | i | RAND(3) | +------+------------------+ | 1 | 0.90576975597606 | | 2 | 0.37307905813035 | | 3 | 0.14808605345719 | +------+------------------+ 3 rows in set (0.00 sec) mysql> SELECT i, RAND() FROM t; +------+------------------+ | i | RAND() | +------+------------------+ | 1 | 0.35877890638893 | | 2 | 0.28941420772058 | | 3 | 0.37073435016976 | +------+------------------+ 3 rows in set (0.00 sec) mysql> SELECT i, RAND(3) FROM t; +------+------------------+ | i | RAND(3) | +------+------------------+ | 1 | 0.90576975597606 | | 2 | 0.37307905813035 | | 3 | 0.14808605345719 | +------+------------------+ 3 rows in set (0.01 sec)
1679
Mathematical Functions
RAND() in a WHERE clause is evaluated for every row (when selecting from one table) or combination of rows (when selecting from a multiple-table join). Thus, for optimizer purposes, RAND() is not a constant value and cannot be used for index optimizations. For more information, see Section 8.2.1.18, “Function Call Optimization”. Use of a column with RAND() values in an ORDER BY or GROUP BY clause may yield unexpected results because for either clause a RAND() expression can be evaluated multiple times for the same row, each time returning a different result. If the goal is to retrieve rows in random order, you can use a statement like this: SELECT * FROM tbl_name ORDER BY RAND();
To select a random sample from a set of rows, combine ORDER BY RAND() with LIMIT: SELECT * FROM table1, table2 WHERE a=b AND c
RAND() is not meant to be a perfect random generator. It is a fast way to generate random numbers on demand that is portable between platforms for the same MySQL version. This function is unsafe for statement-based replication. A warning is logged if you use this function when binlog_format is set to STATEMENT. (Bug #49222) • ROUND(X), ROUND(X,D) Rounds the argument X to D decimal places. The rounding algorithm depends on the data type of X. D defaults to 0 if not specified. D can be negative to cause D digits left of the decimal point of the value X to become zero. mysql> SELECT ROUND(-1.23); -> -1 mysql> SELECT ROUND(-1.58); -> -2 mysql> SELECT ROUND(1.58); -> 2 mysql> SELECT ROUND(1.298, 1); -> 1.3 mysql> SELECT ROUND(1.298, 0); -> 1 mysql> SELECT ROUND(23.298, -1); -> 20
The return value has the same type as the first argument (assuming that it is integer, double, or decimal). This means that for an integer argument, the result is an integer (no decimal places): mysql> SELECT ROUND(150.000,2), ROUND(150,2); +------------------+--------------+ | ROUND(150.000,2) | ROUND(150,2) | +------------------+--------------+ | 150.00 | 150 | +------------------+--------------+
ROUND() uses the following rules depending on the type of the first argument: • For exact-value numbers, ROUND() uses the “round half away from zero” or “round toward nearest” rule: A value with a fractional part of .5 or greater is rounded up to the next integer if positive or down to the next integer if negative. (In other words, it is rounded away from zero.) A value with a fractional part less than .5 is rounded down to the next integer if positive or up to the next integer if negative.
1680
Mathematical Functions
• For approximate-value numbers, the result depends on the C library. On many systems, this means that ROUND() uses the “round to nearest even” rule: A value with a fractional part exactly half way between two integers is rounded to the nearest even integer. The following example shows how rounding differs for exact and approximate values: mysql> SELECT ROUND(2.5), ROUND(25E-1); +------------+--------------+ | ROUND(2.5) | ROUND(25E-1) | +------------+--------------+ | 3 | 2 | +------------+--------------+
For more information, see Section 12.22, “Precision Math”. • SIGN(X) Returns the sign of the argument as -1, 0, or 1, depending on whether X is negative, zero, or positive. mysql> SELECT SIGN(-32); -> -1 mysql> SELECT SIGN(0); -> 0 mysql> SELECT SIGN(234); -> 1
• SIN(X) Returns the sine of X, where X is given in radians. mysql> SELECT SIN(PI()); -> 1.2246063538224e-16 mysql> SELECT ROUND(SIN(PI())); -> 0
• SQRT(X) Returns the square root of a nonnegative number X. mysql> SELECT SQRT(4); -> 2 mysql> SELECT SQRT(20); -> 4.4721359549996 mysql> SELECT SQRT(-16); -> NULL
• TAN(X) Returns the tangent of X, where X is given in radians. mysql> SELECT TAN(PI()); -> -1.2246063538224e-16 mysql> SELECT TAN(PI()+1); -> 1.5574077246549
• TRUNCATE(X,D) Returns the number X, truncated to D decimal places. If D is 0, the result has no decimal point or fractional part. D can be negative to cause D digits left of the decimal point of the value X to become zero. 1681
Date and Time Functions
mysql> SELECT TRUNCATE(1.223,1); -> 1.2 mysql> SELECT TRUNCATE(1.999,1); -> 1.9 mysql> SELECT TRUNCATE(1.999,0); -> 1 mysql> SELECT TRUNCATE(-1.999,1); -> -1.9 mysql> SELECT TRUNCATE(122,-2); -> 100 mysql> SELECT TRUNCATE(10.28*100,0); -> 1028
All numbers are rounded toward zero.
12.7 Date and Time Functions This section describes the functions that can be used to manipulate temporal values. See Section 11.3, “Date and Time Types”, for a description of the range of values each date and time type has and the valid formats in which values may be specified. Table 12.13 Date and Time Functions
1682
Name
Description
ADDDATE()
Add time values (intervals) to a date value
ADDTIME()
Add time
CONVERT_TZ()
Convert from one time zone to another
CURDATE()
Return the current date
CURRENT_DATE(), CURRENT_DATE
Synonyms for CURDATE()
CURRENT_TIME(), CURRENT_TIME
Synonyms for CURTIME()
CURRENT_TIMESTAMP(), CURRENT_TIMESTAMP
Synonyms for NOW()
CURTIME()
Return the current time
DATE()
Extract the date part of a date or datetime expression
DATE_ADD()
Add time values (intervals) to a date value
DATE_FORMAT()
Format date as specified
DATE_SUB()
Subtract a time value (interval) from a date
DATEDIFF()
Subtract two dates
DAY()
Synonym for DAYOFMONTH()
DAYNAME()
Return the name of the weekday
DAYOFMONTH()
Return the day of the month (0-31)
DAYOFWEEK()
Return the weekday index of the argument
DAYOFYEAR()
Return the day of the year (1-366)
EXTRACT()
Extract part of a date
FROM_DAYS()
Convert a day number to a date
FROM_UNIXTIME()
Format Unix timestamp as a date
GET_FORMAT()
Return a date format string
HOUR()
Extract the hour
LAST_DAY
Return the last day of the month for the argument
LOCALTIME(), LOCALTIME
Synonym for NOW()
Date and Time Functions
Name
Description
LOCALTIMESTAMP, LOCALTIMESTAMP()
Synonym for NOW()
MAKEDATE()
Create a date from the year and day of year
MAKETIME()
Create time from hour, minute, second
MICROSECOND()
Return the microseconds from argument
MINUTE()
Return the minute from the argument
MONTH()
Return the month from the date passed
MONTHNAME()
Return the name of the month
NOW()
Return the current date and time
PERIOD_ADD()
Add a period to a year-month
PERIOD_DIFF()
Return the number of months between periods
QUARTER()
Return the quarter from a date argument
SEC_TO_TIME()
Converts seconds to 'HH:MM:SS' format
SECOND()
Return the second (0-59)
STR_TO_DATE()
Convert a string to a date
SUBDATE()
Synonym for DATE_SUB() when invoked with three arguments
SUBTIME()
Subtract times
SYSDATE()
Return the time at which the function executes
TIME()
Extract the time portion of the expression passed
TIME_FORMAT()
Format as time
TIME_TO_SEC()
Return the argument converted to seconds
TIMEDIFF()
Subtract time
TIMESTAMP()
With a single argument, this function returns the date or datetime expression; with two arguments, the sum of the arguments
TIMESTAMPADD()
Add an interval to a datetime expression
TIMESTAMPDIFF()
Subtract an interval from a datetime expression
TO_DAYS()
Return the date argument converted to days
TO_SECONDS()
Return the date or datetime argument converted to seconds since Year 0
UNIX_TIMESTAMP()
Return a Unix timestamp
UTC_DATE()
Return the current UTC date
UTC_TIME()
Return the current UTC time
UTC_TIMESTAMP()
Return the current UTC date and time
WEEK()
Return the week number
WEEKDAY()
Return the weekday index
WEEKOFYEAR()
Return the calendar week of the date (1-53)
YEAR()
Return the year
YEARWEEK()
Return the year and week
Here is an example that uses date functions. The following query selects all rows with a date_col value from within the last 30 days:
1683
Date and Time Functions
mysql> SELECT something FROM tbl_name -> WHERE DATE_SUB(CURDATE(),INTERVAL 30 DAY) <= date_col;
The query also selects rows with dates that lie in the future. Functions that expect date values usually accept datetime values and ignore the time part. Functions that expect time values usually accept datetime values and ignore the date part. Functions that return the current date or time each are evaluated only once per query at the start of query execution. This means that multiple references to a function such as NOW() within a single query always produce the same result. (For our purposes, a single query also includes a call to a stored program (stored routine, trigger, or event) and all subprograms called by that program.) This principle also applies to CURDATE(), CURTIME(), UTC_DATE(), UTC_TIME(), UTC_TIMESTAMP(), and to any of their synonyms. The CURRENT_TIMESTAMP(), CURRENT_TIME(), CURRENT_DATE(), and FROM_UNIXTIME() functions return values in the current session time zone, which is available as the session value of the time_zone system variable. In addition, UNIX_TIMESTAMP() assumes that its argument is a datetime value in the session time zone. See Section 5.1.12, “MySQL Server Time Zone Support”. Some date functions can be used with “zero” dates or incomplete dates such as '2001-11-00', whereas others cannot. Functions that extract parts of dates typically work with incomplete dates and thus can return 0 when you might otherwise expect a nonzero value. For example: mysql> SELECT DAYOFMONTH('2001-11-00'), MONTH('2005-00-00'); -> 0, 0
Other functions expect complete dates and return NULL for incomplete dates. These include functions that perform date arithmetic or that map parts of dates to names. For example: mysql> SELECT DATE_ADD('2006-05-00',INTERVAL 1 DAY); -> NULL mysql> SELECT DAYNAME('2006-05-00'); -> NULL
Several functions are more strict when passed a DATE() function value as their argument and reject incomplete dates with a day part of zero. These functions are affected: CONVERT_TZ(), DATE_ADD(), DATE_SUB(), DAYOFYEAR(), LAST_DAY() (permits a day part of zero), TIMESTAMPDIFF(), TO_DAYS(), TO_SECONDS(), WEEK(), WEEKDAY(), WEEKOFYEAR(), YEARWEEK(). Fractional seconds for TIME, DATETIME, and TIMESTAMP values are supported, with up to microsecond precision. Functions that take temporal arguments accept values with fractional seconds. Return values from temporal functions include fractional seconds as appropriate. • ADDDATE(date,INTERVAL expr unit), ADDDATE(expr,days) When invoked with the INTERVAL form of the second argument, ADDDATE() is a synonym for DATE_ADD(). The related function SUBDATE() is a synonym for DATE_SUB(). For information on the INTERVAL unit argument, see Temporal Intervals. mysql> SELECT DATE_ADD('2008-01-02', INTERVAL 31 DAY); -> '2008-02-02' mysql> SELECT ADDDATE('2008-01-02', INTERVAL 31 DAY); -> '2008-02-02'
When invoked with the days form of the second argument, MySQL treats it as an integer number of days to be added to expr. mysql> SELECT ADDDATE('2008-01-02', 31); -> '2008-02-02'
1684
Date and Time Functions
• ADDTIME(expr1,expr2) ADDTIME() adds expr2 to expr1 and returns the result. expr1 is a time or datetime expression, and expr2 is a time expression. mysql> SELECT ADDTIME('2007-12-31 23:59:59.999999', '1 1:1:1.000002'); -> '2008-01-02 01:01:01.000001' mysql> SELECT ADDTIME('01:00:00.999999', '02:00:00.999998'); -> '03:00:01.999997'
• CONVERT_TZ(dt,from_tz,to_tz) CONVERT_TZ() converts a datetime value dt from the time zone given by from_tz to the time zone given by to_tz and returns the resulting value. Time zones are specified as described in Section 5.1.12, “MySQL Server Time Zone Support”. This function returns NULL if the arguments are invalid. If the value falls out of the supported range of the TIMESTAMP type when converted from from_tz to UTC, no conversion occurs. The TIMESTAMP range is described in Section 11.1.2, “Date and Time Type Overview”. mysql> SELECT CONVERT_TZ('2004-01-01 12:00:00','GMT','MET'); -> '2004-01-01 13:00:00' mysql> SELECT CONVERT_TZ('2004-01-01 12:00:00','+00:00','+10:00'); -> '2004-01-01 22:00:00'
Note To use named time zones such as 'MET' or 'Europe/Amsterdam', the time zone tables must be properly set up. For instructions, see Section 5.1.12, “MySQL Server Time Zone Support”. • CURDATE() Returns the current date as a value in 'YYYY-MM-DD' or YYYYMMDD format, depending on whether the function is used in a string or numeric context. mysql> SELECT CURDATE(); -> '2008-06-13' mysql> SELECT CURDATE() + 0; -> 20080613
• CURRENT_DATE, CURRENT_DATE() CURRENT_DATE and CURRENT_DATE() are synonyms for CURDATE(). • CURRENT_TIME, CURRENT_TIME([fsp]) CURRENT_TIME and CURRENT_TIME() are synonyms for CURTIME(). • CURRENT_TIMESTAMP, CURRENT_TIMESTAMP([fsp]) CURRENT_TIMESTAMP and CURRENT_TIMESTAMP() are synonyms for NOW(). • CURTIME([fsp]) Returns the current time as a value in 'HH:MM:SS' or HHMMSS format, depending on whether the function is used in a string or numeric context. The value is expressed in the session time zone. If the fsp argument is given to specify a fractional seconds precision from 0 to 6, the return value includes a fractional seconds part of that many digits. 1685
Date and Time Functions
mysql> SELECT CURTIME(); -> '23:50:26' mysql> SELECT CURTIME() + 0; -> 235026.000000
• DATE(expr) Extracts the date part of the date or datetime expression expr. mysql> SELECT DATE('2003-12-31 01:02:03'); -> '2003-12-31'
• DATEDIFF(expr1,expr2) DATEDIFF() returns expr1 − expr2 expressed as a value in days from one date to the other. expr1 and expr2 are date or date-and-time expressions. Only the date parts of the values are used in the calculation. mysql> SELECT DATEDIFF('2007-12-31 23:59:59','2007-12-30'); -> 1 mysql> SELECT DATEDIFF('2010-11-30 23:59:59','2010-12-31'); -> -31
•
DATE_ADD(date,INTERVAL expr unit), DATE_SUB(date,INTERVAL expr unit) These functions perform date arithmetic. The date argument specifies the starting date or datetime value. expr is an expression specifying the interval value to be added or subtracted from the starting date. expr is evaluated as a string; it may start with a - for negative intervals. unit is a keyword indicating the units in which the expression should be interpreted. For more information about temporal interval syntax, including a full list of unit specifiers, the expected form of the expr argument for each unit value, and rules for operand interpretation in temporal arithmetic, see Temporal Intervals. The return value depends on the arguments: • DATE if the date argument is a DATE value and your calculations involve only YEAR, MONTH, and DAY parts (that is, no time parts). • DATETIME if the first argument is a DATETIME (or TIMESTAMP) value, or if the first argument is a DATE and the unit value uses HOURS, MINUTES, or SECONDS. • String otherwise. To ensure that the result is DATETIME, you can use CAST() to convert the first argument to DATETIME. mysql> SELECT DATE_ADD('2018-05-01',INTERVAL 1 DAY); -> '2018-05-02' mysql> SELECT DATE_SUB('2018-05-01',INTERVAL 1 YEAR); -> '2017-05-01' mysql> SELECT DATE_ADD('2020-12-31 23:59:59', -> INTERVAL 1 SECOND); -> '2021-01-01 00:00:00' mysql> SELECT DATE_ADD('2018-12-31 23:59:59', -> INTERVAL 1 DAY); -> '2019-01-01 23:59:59' mysql> SELECT DATE_ADD('2100-12-31 23:59:59', -> INTERVAL '1:1' MINUTE_SECOND); -> '2101-01-01 00:01:00' mysql> SELECT DATE_SUB('2025-01-01 00:00:00', -> INTERVAL '1 1:1:1' DAY_SECOND);
1686
Date and Time Functions
-> '2024-12-30 22:58:59' mysql> SELECT DATE_ADD('1900-01-01 00:00:00', -> INTERVAL '-1 10' DAY_HOUR); -> '1899-12-30 14:00:00' mysql> SELECT DATE_SUB('1998-01-02', INTERVAL 31 DAY); -> '1997-12-02' mysql> SELECT DATE_ADD('1992-12-31 23:59:59.000002', -> INTERVAL '1.999999' SECOND_MICROSECOND); -> '1993-01-01 00:00:01.000001'
• DATE_FORMAT(date,format) Formats the date value according to the format string. The specifiers shown in the following table may be used in the format string. The % character is required before format specifier characters. The specifiers apply to other functions as well: STR_TO_DATE(), TIME_FORMAT(), UNIX_TIMESTAMP().
Specifier
Description
%a
Abbreviated weekday name (Sun..Sat)
%b
Abbreviated month name (Jan..Dec)
%c
Month, numeric (0..12)
%D
Day of the month with English suffix (0th, 1st, 2nd, 3rd, …)
%d
Day of the month, numeric (00..31)
%e
Day of the month, numeric (0..31)
%f
Microseconds (000000..999999)
%H
Hour (00..23)
%h
Hour (01..12)
%I
Hour (01..12)
%i
Minutes, numeric (00..59)
%j
Day of year (001..366)
%k
Hour (0..23)
%l
Hour (1..12)
%M
Month name (January..December)
%m
Month, numeric (00..12)
%p
AM or PM
%r
Time, 12-hour (hh:mm:ss followed by AM or PM)
%S
Seconds (00..59)
%s
Seconds (00..59)
%T
Time, 24-hour (hh:mm:ss)
%U
Week (00..53), where Sunday is the first day of the week; WEEK() mode 0
%u
Week (00..53), where Monday is the first day of the week; WEEK() mode 1
%V
Week (01..53), where Sunday is the first day of the week; WEEK() mode 2; used with %X
%v
Week (01..53), where Monday is the first day of the week; WEEK() mode 3; used with %x
%W
Weekday name (Sunday..Saturday)
%w
Day of the week (0=Sunday..6=Saturday) 1687
Date and Time Functions
Specifier
Description
%X
Year for the week where Sunday is the first day of the week, numeric, four digits; used with %V
%x
Year for the week, where Monday is the first day of the week, numeric, four digits; used with %v
%Y
Year, numeric, four digits
%y
Year, numeric (two digits)
%%
A literal % character
%x
x, for any “x” not listed above
Ranges for the month and day specifiers begin with zero due to the fact that MySQL permits the storing of incomplete dates such as '2014-00-00'. The language used for day and month names and abbreviations is controlled by the value of the lc_time_names system variable (Section 10.15, “MySQL Server Locale Support”). For the %U, %u, %V, and %v specifiers, see the description of the WEEK() function for information about the mode values. The mode affects how week numbering occurs. DATE_FORMAT() returns a string with a character set and collation given by character_set_connection and collation_connection so that it can return month and weekday names containing non-ASCII characters. mysql> SELECT DATE_FORMAT('2009-10-04 22:23:00', '%W %M %Y'); -> 'Sunday October 2009' mysql> SELECT DATE_FORMAT('2007-10-04 22:23:00', '%H:%i:%s'); -> '22:23:00' mysql> SELECT DATE_FORMAT('1900-10-04 22:23:00', -> '%D %y %a %d %m %b %j'); -> '4th 00 Thu 04 10 Oct 277' mysql> SELECT DATE_FORMAT('1997-10-04 22:23:00', -> '%H %k %I %r %T %S %w'); -> '22 22 10 10:23:00 PM 22:23:00 00 6' mysql> SELECT DATE_FORMAT('1999-01-01', '%X %V'); -> '1998 52' mysql> SELECT DATE_FORMAT('2006-06-00', '%d'); -> '00'
• DATE_SUB(date,INTERVAL expr unit) See the description for DATE_ADD(). • DAY(date) DAY() is a synonym for DAYOFMONTH(). • DAYNAME(date) Returns the name of the weekday for date. The language used for the name is controlled by the value of the lc_time_names system variable (Section 10.15, “MySQL Server Locale Support”). mysql> SELECT DAYNAME('2007-02-03'); -> 'Saturday'
• DAYOFMONTH(date) Returns the day of the month for date, in the range 1 to 31, or 0 for dates such as '0000-00-00' or '2008-00-00' that have a zero day part.
1688
Date and Time Functions
mysql> SELECT DAYOFMONTH('2007-02-03'); -> 3
• DAYOFWEEK(date) Returns the weekday index for date (1 = Sunday, 2 = Monday, …, 7 = Saturday). These index values correspond to the ODBC standard. mysql> SELECT DAYOFWEEK('2007-02-03'); -> 7
• DAYOFYEAR(date) Returns the day of the year for date, in the range 1 to 366. mysql> SELECT DAYOFYEAR('2007-02-03'); -> 34
• EXTRACT(unit FROM date) The EXTRACT() function uses the same kinds of unit specifiers as DATE_ADD() or DATE_SUB(), but extracts parts from the date rather than performing date arithmetic. For information on the unit argument, see Temporal Intervals. mysql> SELECT EXTRACT(YEAR FROM '2019-07-02'); -> 2019 mysql> SELECT EXTRACT(YEAR_MONTH FROM '2019-07-02 01:02:03'); -> 201907 mysql> SELECT EXTRACT(DAY_MINUTE FROM '2019-07-02 01:02:03'); -> 20102 mysql> SELECT EXTRACT(MICROSECOND -> FROM '2003-01-02 10:30:00.000123'); -> 123
• FROM_DAYS(N) Given a day number N, returns a DATE value. mysql> SELECT FROM_DAYS(730669); -> '2000-07-03'
Use FROM_DAYS() with caution on old dates. It is not intended for use with values that precede the advent of the Gregorian calendar (1582). See Section 12.8, “What Calendar Is Used By MySQL?”. • FROM_UNIXTIME(unix_timestamp[,format]) Returns a representation of the unix_timestamp argument as a value in 'YYYY-MM-DD HH:MM:SS' or YYYYMMDDHHMMSS.uuuuuu format, depending on whether the function is used in a string or numeric context. unix_timestamp is an internal timestamp value representing seconds since '1970-01-01 00:00:00' UTC, such as produced by the UNIX_TIMESTAMP() function. The return value is expressed in the session time zone. (Clients can set the session time zone as described in Section 5.1.12, “MySQL Server Time Zone Support”.) The format string, if given, is used to format the result the same way as described in the entry for the DATE_FORMAT() function. mysql> SELECT FROM_UNIXTIME(1447430881); -> '2015-11-13 10:08:01' mysql> SELECT FROM_UNIXTIME(1447430881) + 0; -> 20151113100801 mysql> SELECT FROM_UNIXTIME(1447430881, -> '%Y %D %M %h:%i:%s %x'); -> '2015 13th November 10:08:01 2015'
1689
Date and Time Functions
Note If you use UNIX_TIMESTAMP() and FROM_UNIXTIME() to convert between values in a non-UTC time zone and Unix timestamp values, the conversion is lossy because the mapping is not one-to-one in both directions. For details, see the description of the UNIX_TIMESTAMP() function. • GET_FORMAT({DATE|TIME|DATETIME}, {'EUR'|'USA'|'JIS'|'ISO'|'INTERNAL'}) Returns a format string. This function is useful in combination with the DATE_FORMAT() and the STR_TO_DATE() functions. The possible values for the first and second arguments result in several possible format strings (for the specifiers used, see the table in the DATE_FORMAT() function description). ISO format refers to ISO 9075, not ISO 8601.
Function Call
Result
GET_FORMAT(DATE,'USA')
'%m.%d.%Y'
GET_FORMAT(DATE,'JIS')
'%Y-%m-%d'
GET_FORMAT(DATE,'ISO')
'%Y-%m-%d'
GET_FORMAT(DATE,'EUR')
'%d.%m.%Y'
GET_FORMAT(DATE,'INTERNAL')
'%Y%m%d'
GET_FORMAT(DATETIME,'USA')
'%Y-%m-%d %H.%i.%s'
GET_FORMAT(DATETIME,'JIS')
'%Y-%m-%d %H:%i:%s'
GET_FORMAT(DATETIME,'ISO')
'%Y-%m-%d %H:%i:%s'
GET_FORMAT(DATETIME,'EUR')
'%Y-%m-%d %H.%i.%s'
GET_FORMAT(DATETIME,'INTERNAL')
'%Y%m%d%H%i%s'
GET_FORMAT(TIME,'USA')
'%h:%i:%s %p'
GET_FORMAT(TIME,'JIS')
'%H:%i:%s'
GET_FORMAT(TIME,'ISO')
'%H:%i:%s'
GET_FORMAT(TIME,'EUR')
'%H.%i.%s'
GET_FORMAT(TIME,'INTERNAL')
'%H%i%s'
TIMESTAMP can also be used as the first argument to GET_FORMAT(), in which case the function returns the same values as for DATETIME. mysql> SELECT DATE_FORMAT('2003-10-03',GET_FORMAT(DATE,'EUR')); -> '03.10.2003' mysql> SELECT STR_TO_DATE('10.31.2003',GET_FORMAT(DATE,'USA')); -> '2003-10-31'
• HOUR(time) Returns the hour for time. The range of the return value is 0 to 23 for time-of-day values. However, the range of TIME values actually is much larger, so HOUR can return values greater than 23. mysql> SELECT HOUR('10:05:03'); -> 10 mysql> SELECT HOUR('272:59:59'); -> 272
• LAST_DAY(date)
1690
Date and Time Functions
Takes a date or datetime value and returns the corresponding value for the last day of the month. Returns NULL if the argument is invalid. mysql> SELECT LAST_DAY('2003-02-05'); -> '2003-02-28' mysql> SELECT LAST_DAY('2004-02-05'); -> '2004-02-29' mysql> SELECT LAST_DAY('2004-01-01 01:01:01'); -> '2004-01-31' mysql> SELECT LAST_DAY('2003-03-32'); -> NULL
• LOCALTIME, LOCALTIME([fsp]) LOCALTIME and LOCALTIME() are synonyms for NOW(). • LOCALTIMESTAMP, LOCALTIMESTAMP([fsp]) LOCALTIMESTAMP and LOCALTIMESTAMP() are synonyms for NOW(). • MAKEDATE(year,dayofyear) Returns a date, given year and day-of-year values. dayofyear must be greater than 0 or the result is NULL. mysql> SELECT MAKEDATE(2011,31), MAKEDATE(2011,32); -> '2011-01-31', '2011-02-01' mysql> SELECT MAKEDATE(2011,365), MAKEDATE(2014,365); -> '2011-12-31', '2014-12-31' mysql> SELECT MAKEDATE(2011,0); -> NULL
• MAKETIME(hour,minute,second) Returns a time value calculated from the hour, minute, and second arguments. The second argument can have a fractional part. mysql> SELECT MAKETIME(12,15,30); -> '12:15:30'
• MICROSECOND(expr) Returns the microseconds from the time or datetime expression expr as a number in the range from 0 to 999999. mysql> SELECT MICROSECOND('12:00:00.123456'); -> 123456 mysql> SELECT MICROSECOND('2019-12-31 23:59:59.000010'); -> 10
• MINUTE(time) Returns the minute for time, in the range 0 to 59. mysql> SELECT MINUTE('2008-02-03 10:05:03'); -> 5
• MONTH(date) Returns the month for date, in the range 1 to 12 for January to December, or 0 for dates such as '0000-00-00' or '2008-00-00' that have a zero month part.
1691
Date and Time Functions
mysql> SELECT MONTH('2008-02-03'); -> 2
• MONTHNAME(date) Returns the full name of the month for date. The language used for the name is controlled by the value of the lc_time_names system variable (Section 10.15, “MySQL Server Locale Support”). mysql> SELECT MONTHNAME('2008-02-03'); -> 'February'
• NOW([fsp]) Returns the current date and time as a value in 'YYYY-MM-DD HH:MM:SS' or YYYYMMDDHHMMSS format, depending on whether the function is used in a string or numeric context. The value is expressed in the session time zone. If the fsp argument is given to specify a fractional seconds precision from 0 to 6, the return value includes a fractional seconds part of that many digits. mysql> SELECT NOW(); -> '2007-12-15 23:50:26' mysql> SELECT NOW() + 0; -> 20071215235026.000000
NOW() returns a constant time that indicates the time at which the statement began to execute. (Within a stored function or trigger, NOW() returns the time at which the function or triggering statement began to execute.) This differs from the behavior for SYSDATE(), which returns the exact time at which it executes. mysql> SELECT NOW(), SLEEP(2), NOW(); +---------------------+----------+---------------------+ | NOW() | SLEEP(2) | NOW() | +---------------------+----------+---------------------+ | 2006-04-12 13:47:36 | 0 | 2006-04-12 13:47:36 | +---------------------+----------+---------------------+ mysql> SELECT SYSDATE(), SLEEP(2), SYSDATE(); +---------------------+----------+---------------------+ | SYSDATE() | SLEEP(2) | SYSDATE() | +---------------------+----------+---------------------+ | 2006-04-12 13:47:44 | 0 | 2006-04-12 13:47:46 | +---------------------+----------+---------------------+
In addition, the SET TIMESTAMP statement affects the value returned by NOW() but not by SYSDATE(). This means that timestamp settings in the binary log have no effect on invocations of SYSDATE(). Setting the timestamp to a nonzero value causes each subsequent invocation of NOW() to return that value. Setting the timestamp to zero cancels this effect so that NOW() once again returns the current date and time. See the description for SYSDATE() for additional information about the differences between the two functions. • PERIOD_ADD(P,N) Adds N months to period P (in the format YYMM or YYYYMM). Returns a value in the format YYYYMM. Note that the period argument P is not a date value. mysql> SELECT PERIOD_ADD(200801,2); -> 200803
1692
Date and Time Functions
• PERIOD_DIFF(P1,P2) Returns the number of months between periods P1 and P2. P1 and P2 should be in the format YYMM or YYYYMM. Note that the period arguments P1 and P2 are not date values. mysql> SELECT PERIOD_DIFF(200802,200703); -> 11
• QUARTER(date) Returns the quarter of the year for date, in the range 1 to 4. mysql> SELECT QUARTER('2008-04-01'); -> 2
• SECOND(time) Returns the second for time, in the range 0 to 59. mysql> SELECT SECOND('10:05:03'); -> 3
• SEC_TO_TIME(seconds) Returns the seconds argument, converted to hours, minutes, and seconds, as a TIME value. The range of the result is constrained to that of the TIME data type. A warning occurs if the argument corresponds to a value outside that range. mysql> SELECT SEC_TO_TIME(2378); -> '00:39:38' mysql> SELECT SEC_TO_TIME(2378) + 0; -> 3938
• STR_TO_DATE(str,format) This is the inverse of the DATE_FORMAT() function. It takes a string str and a format string format. STR_TO_DATE() returns a DATETIME value if the format string contains both date and time parts, or a DATE or TIME value if the string contains only date or time parts. If the date, time, or datetime value extracted from str is illegal, STR_TO_DATE() returns NULL and produces a warning. The server scans str attempting to match format to it. The format string can contain literal characters and format specifiers beginning with %. Literal characters in format must match literally in str. Format specifiers in format must match a date or time part in str. For the specifiers that can be used in format, see the DATE_FORMAT() function description. mysql> SELECT STR_TO_DATE('01,5,2013','%d,%m,%Y'); -> '2013-05-01' mysql> SELECT STR_TO_DATE('May 1, 2013','%M %d,%Y'); -> '2013-05-01'
Scanning starts at the beginning of str and fails if format is found not to match. Extra characters at the end of str are ignored. mysql> SELECT STR_TO_DATE('a09:30:17','a%h:%i:%s'); -> '09:30:17' mysql> SELECT STR_TO_DATE('a09:30:17','%h:%i:%s'); -> NULL mysql> SELECT STR_TO_DATE('09:30:17a','%h:%i:%s'); -> '09:30:17'
1693
Date and Time Functions
Unspecified date or time parts have a value of 0, so incompletely specified values in str produce a result with some or all parts set to 0: mysql> SELECT STR_TO_DATE('abc','abc'); -> '0000-00-00' mysql> SELECT STR_TO_DATE('9','%m'); -> '0000-09-00' mysql> SELECT STR_TO_DATE('9','%s'); -> '00:00:09'
Range checking on the parts of date values is as described in Section 11.3.1, “The DATE, DATETIME, and TIMESTAMP Types”. This means, for example, that “zero” dates or dates with part values of 0 are permitted unless the SQL mode is set to disallow such values. mysql> SELECT STR_TO_DATE('00/00/0000', '%m/%d/%Y'); -> '0000-00-00' mysql> SELECT STR_TO_DATE('04/31/2004', '%m/%d/%Y'); -> '2004-04-31'
If the NO_ZERO_DATE or NO_ZERO_IN_DATE SQL mode is enabled, zero dates or part of dates are disallowed. In that case, STR_TO_DATE() returns NULL and generates a warning: mysql> SET sql_mode = ''; mysql> SELECT STR_TO_DATE('15:35:00', '%H:%i:%s'); +-------------------------------------+ | STR_TO_DATE('15:35:00', '%H:%i:%s') | +-------------------------------------+ | 15:35:00 | +-------------------------------------+ mysql> SET sql_mode = 'NO_ZERO_IN_DATE'; mysql> SELECT STR_TO_DATE('15:35:00', '%h:%i:%s'); +-------------------------------------+ | STR_TO_DATE('15:35:00', '%h:%i:%s') | +-------------------------------------+ | NULL | +-------------------------------------+ mysql> SHOW WARNINGS\G *************************** 1. row *************************** Level: Warning Code: 1411 Message: Incorrect datetime value: '15:35:00' for function str_to_date
Note You cannot use format "%X%V" to convert a year-week string to a date because the combination of a year and week does not uniquely identify a year and month if the week crosses a month boundary. To convert a year-week to a date, you should also specify the weekday: mysql> SELECT STR_TO_DATE('200442 Monday', '%X%V %W'); -> '2004-10-18'
• SUBDATE(date,INTERVAL expr unit), SUBDATE(expr,days) When invoked with the INTERVAL form of the second argument, SUBDATE() is a synonym for DATE_SUB(). For information on the INTERVAL unit argument, see the discussion for DATE_ADD(). mysql> SELECT DATE_SUB('2008-01-02', INTERVAL 31 DAY); -> '2007-12-02' mysql> SELECT SUBDATE('2008-01-02', INTERVAL 31 DAY); -> '2007-12-02'
1694
Date and Time Functions
The second form enables the use of an integer value for days. In such cases, it is interpreted as the number of days to be subtracted from the date or datetime expression expr. mysql> SELECT SUBDATE('2008-01-02 12:00:00', 31); -> '2007-12-02 12:00:00'
• SUBTIME(expr1,expr2) SUBTIME() returns expr1 − expr2 expressed as a value in the same format as expr1. expr1 is a time or datetime expression, and expr2 is a time expression. mysql> SELECT SUBTIME('2007-12-31 23:59:59.999999','1 1:1:1.000002'); -> '2007-12-30 22:58:58.999997' mysql> SELECT SUBTIME('01:00:00.999999', '02:00:00.999998'); -> '-00:59:59.999999'
• SYSDATE([fsp]) Returns the current date and time as a value in 'YYYY-MM-DD HH:MM:SS' or YYYYMMDDHHMMSS format, depending on whether the function is used in a string or numeric context. If the fsp argument is given to specify a fractional seconds precision from 0 to 6, the return value includes a fractional seconds part of that many digits. SYSDATE() returns the time at which it executes. This differs from the behavior for NOW(), which returns a constant time that indicates the time at which the statement began to execute. (Within a stored function or trigger, NOW() returns the time at which the function or triggering statement began to execute.) mysql> SELECT NOW(), SLEEP(2), NOW(); +---------------------+----------+---------------------+ | NOW() | SLEEP(2) | NOW() | +---------------------+----------+---------------------+ | 2006-04-12 13:47:36 | 0 | 2006-04-12 13:47:36 | +---------------------+----------+---------------------+ mysql> SELECT SYSDATE(), SLEEP(2), SYSDATE(); +---------------------+----------+---------------------+ | SYSDATE() | SLEEP(2) | SYSDATE() | +---------------------+----------+---------------------+ | 2006-04-12 13:47:44 | 0 | 2006-04-12 13:47:46 | +---------------------+----------+---------------------+
In addition, the SET TIMESTAMP statement affects the value returned by NOW() but not by SYSDATE(). This means that timestamp settings in the binary log have no effect on invocations of SYSDATE(). Because SYSDATE() can return different values even within the same statement, and is not affected by SET TIMESTAMP, it is nondeterministic and therefore unsafe for replication if statement-based binary logging is used. If that is a problem, you can use row-based logging. Alternatively, you can use the --sysdate-is-now option to cause SYSDATE() to be an alias for NOW(). This works if the option is used on both the master and the slave. The nondeterministic nature of SYSDATE() also means that indexes cannot be used for evaluating expressions that refer to it. • TIME(expr) Extracts the time part of the time or datetime expression expr and returns it as a string.
1695
Date and Time Functions
This function is unsafe for statement-based replication. A warning is logged if you use this function when binlog_format is set to STATEMENT. mysql> SELECT TIME('2003-12-31 01:02:03'); -> '01:02:03' mysql> SELECT TIME('2003-12-31 01:02:03.000123'); -> '01:02:03.000123'
• TIMEDIFF(expr1,expr2) TIMEDIFF() returns expr1 − expr2 expressed as a time value. expr1 and expr2 are time or date-and-time expressions, but both must be of the same type. The result returned by TIMEDIFF() is limited to the range allowed for TIME values. Alternatively, you can use either of the functions TIMESTAMPDIFF() and UNIX_TIMESTAMP(), both of which return integers. mysql> SELECT TIMEDIFF('2000:01:01 -> '2000:01:01 -> '-00:00:00.000001' mysql> SELECT TIMEDIFF('2008-12-31 -> '2008-12-30 -> '46:58:57.999999'
00:00:00', 00:00:00.000001'); 23:59:59.000001', 01:01:01.000002');
• TIMESTAMP(expr), TIMESTAMP(expr1,expr2) With a single argument, this function returns the date or datetime expression expr as a datetime value. With two arguments, it adds the time expression expr2 to the date or datetime expression expr1 and returns the result as a datetime value. mysql> SELECT TIMESTAMP('2003-12-31'); -> '2003-12-31 00:00:00' mysql> SELECT TIMESTAMP('2003-12-31 12:00:00','12:00:00'); -> '2004-01-01 00:00:00'
• TIMESTAMPADD(unit,interval,datetime_expr) Adds the integer expression interval to the date or datetime expression datetime_expr. The unit for interval is given by the unit argument, which should be one of the following values: MICROSECOND (microseconds), SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, or YEAR. The unit value may be specified using one of keywords as shown, or with a prefix of SQL_TSI_. For example, DAY and SQL_TSI_DAY both are legal. mysql> SELECT TIMESTAMPADD(MINUTE,1,'2003-01-02'); -> '2003-01-02 00:01:00' mysql> SELECT TIMESTAMPADD(WEEK,1,'2003-01-02'); -> '2003-01-09'
• TIMESTAMPDIFF(unit,datetime_expr1,datetime_expr2) Returns datetime_expr2 − datetime_expr1, where datetime_expr1 and datetime_expr2 are date or datetime expressions. One expression may be a date and the other a datetime; a date value is treated as a datetime having the time part '00:00:00' where necessary. The unit for the result (an integer) is given by the unit argument. The legal values for unit are the same as those listed in the description of the TIMESTAMPADD() function. mysql> SELECT TIMESTAMPDIFF(MONTH,'2003-02-01','2003-05-01'); -> 3 mysql> SELECT TIMESTAMPDIFF(YEAR,'2002-05-01','2001-01-01');
1696
Date and Time Functions
-> -1 mysql> SELECT TIMESTAMPDIFF(MINUTE,'2003-02-01','2003-05-01 12:05:55'); -> 128885
Note The order of the date or datetime arguments for this function is the opposite of that used with the TIMESTAMP() function when invoked with 2 arguments. • TIME_FORMAT(time,format) This is used like the DATE_FORMAT() function, but the format string may contain format specifiers only for hours, minutes, seconds, and microseconds. Other specifiers produce a NULL value or 0. If the time value contains an hour part that is greater than 23, the %H and %k hour format specifiers produce a value larger than the usual range of 0..23. The other hour format specifiers produce the hour value modulo 12. mysql> SELECT TIME_FORMAT('100:00:00', '%H %k %h %I %l'); -> '100 100 04 04 4'
• TIME_TO_SEC(time) Returns the time argument, converted to seconds. mysql> SELECT TIME_TO_SEC('22:23:00'); -> 80580 mysql> SELECT TIME_TO_SEC('00:39:38'); -> 2378
• TO_DAYS(date) Given a date date, returns a day number (the number of days since year 0). mysql> SELECT TO_DAYS(950501); -> 728779 mysql> SELECT TO_DAYS('2007-10-07'); -> 733321
TO_DAYS() is not intended for use with values that precede the advent of the Gregorian calendar (1582), because it does not take into account the days that were lost when the calendar was changed. For dates before 1582 (and possibly a later year in other locales), results from this function are not reliable. See Section 12.8, “What Calendar Is Used By MySQL?”, for details. Remember that MySQL converts two-digit year values in dates to four-digit form using the rules in Section 11.3, “Date and Time Types”. For example, '2008-10-07' and '08-10-07' are seen as identical dates: mysql> SELECT TO_DAYS('2008-10-07'), TO_DAYS('08-10-07'); -> 733687, 733687
In MySQL, the zero date is defined as '0000-00-00', even though this date is itself considered invalid. This means that, for '0000-00-00' and '0000-01-01', TO_DAYS() returns the values shown here: mysql> SELECT TO_DAYS('0000-00-00'); +-----------------------+ | to_days('0000-00-00') | +-----------------------+ | NULL | +-----------------------+
1697
Date and Time Functions
1 row in set, 1 warning (0.00 sec) mysql> SHOW WARNINGS; +---------+------+----------------------------------------+ | Level | Code | Message | +---------+------+----------------------------------------+ | Warning | 1292 | Incorrect datetime value: '0000-00-00' | +---------+------+----------------------------------------+ 1 row in set (0.00 sec)
mysql> SELECT TO_DAYS('0000-01-01'); +-----------------------+ | to_days('0000-01-01') | +-----------------------+ | 1 | +-----------------------+ 1 row in set (0.00 sec)
This is true whether or not the ALLOW_INVALID_DATES SQL server mode is enabled. • TO_SECONDS(expr) Given a date or datetime expr, returns the number of seconds since the year 0. If expr is not a valid date or datetime value, returns NULL. mysql> SELECT TO_SECONDS(950501); -> 62966505600 mysql> SELECT TO_SECONDS('2009-11-29'); -> 63426672000 mysql> SELECT TO_SECONDS('2009-11-29 13:43:32'); -> 63426721412 mysql> SELECT TO_SECONDS( NOW() ); -> 63426721458
Like TO_DAYS(), TO_SECONDS() is not intended for use with values that precede the advent of the Gregorian calendar (1582), because it does not take into account the days that were lost when the calendar was changed. For dates before 1582 (and possibly a later year in other locales), results from this function are not reliable. See Section 12.8, “What Calendar Is Used By MySQL?”, for details. Like TO_DAYS(), TO_SECONDS(), converts two-digit year values in dates to four-digit form using the rules in Section 11.3, “Date and Time Types”. In MySQL, the zero date is defined as '0000-00-00', even though this date is itself considered invalid. This means that, for '0000-00-00' and '0000-01-01', TO_SECONDS() returns the values shown here: mysql> SELECT TO_SECONDS('0000-00-00'); +--------------------------+ | TO_SECONDS('0000-00-00') | +--------------------------+ | NULL | +--------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> SHOW WARNINGS; +---------+------+----------------------------------------+ | Level | Code | Message | +---------+------+----------------------------------------+ | Warning | 1292 | Incorrect datetime value: '0000-00-00' | +---------+------+----------------------------------------+ 1 row in set (0.00 sec)
mysql> SELECT TO_SECONDS('0000-01-01'); +--------------------------+
1698
Date and Time Functions
| TO_SECONDS('0000-01-01') | +--------------------------+ | 86400 | +--------------------------+ 1 row in set (0.00 sec)
This is true whether or not the ALLOW_INVALID_DATES SQL server mode is enabled. • UNIX_TIMESTAMP([date]) If UNIX_TIMESTAMP() is called with no date argument, it returns a Unix timestamp representing seconds since '1970-01-01 00:00:00' UTC. If UNIX_TIMESTAMP() is called with a date argument, it returns the value of the argument as seconds since '1970-01-01 00:00:00' UTC. The server interprets date as a value in the session time zone and converts it to an internal Unix timestamp value in UTC. (Clients can set the session time zone as described in Section 5.1.12, “MySQL Server Time Zone Support”.) The date argument may be a DATE, DATETIME, or TIMESTAMP string, or a number in YYMMDD, YYMMDDHHMMSS, YYYYMMDD, or YYYYMMDDHHMMSS format. If the argument includes a time part, it may optionally include a fractional seconds part. The return value is an integer if no argument is given or the argument does not include a fractional seconds part, or DECIMAL if an argument is given that includes a fractional seconds part. When the date argument is a TIMESTAMP column, UNIX_TIMESTAMP() returns the internal timestamp value directly, with no implicit “string-to-Unix-timestamp” conversion. The valid range of argument values is the same as for the TIMESTAMP data type: '1970-01-01 00:00:01.000000' UTC to '2038-01-19 03:14:07.999999' UTC. If you pass an out-ofrange date to UNIX_TIMESTAMP(), it returns 0. mysql> SELECT UNIX_TIMESTAMP(); -> 1447431666 mysql> SELECT UNIX_TIMESTAMP('2015-11-13 10:20:19'); -> 1447431619 mysql> SELECT UNIX_TIMESTAMP('2015-11-13 10:20:19.012'); -> 1447431619.012
If you use UNIX_TIMESTAMP() and FROM_UNIXTIME() to convert between values in a non-UTC time zone and Unix timestamp values, the conversion is lossy because the mapping is not one-toone in both directions. For example, due to conventions for local time zone changes such as Daylight Saving Time (DST), it is possible for UNIX_TIMESTAMP() to map two values that are distinct in a non-UTC time zone to the same Unix timestamp value. FROM_UNIXTIME() will map that value back to only one of the original values. Here is an example, using values that are distinct in the MET time zone: mysql> SET time_zone = 'MET'; mysql> SELECT UNIX_TIMESTAMP('2005-03-27 03:00:00'); +---------------------------------------+ | UNIX_TIMESTAMP('2005-03-27 03:00:00') | +---------------------------------------+ | 1111885200 | +---------------------------------------+ mysql> SELECT UNIX_TIMESTAMP('2005-03-27 02:00:00'); +---------------------------------------+ | UNIX_TIMESTAMP('2005-03-27 02:00:00') | +---------------------------------------+ | 1111885200 | +---------------------------------------+ mysql> SELECT FROM_UNIXTIME(1111885200); +---------------------------+ | FROM_UNIXTIME(1111885200) | +---------------------------+
1699
Date and Time Functions
| 2005-03-27 03:00:00 | +---------------------------+
Note To use named time zones such as 'MET' or 'Europe/Amsterdam', the time zone tables must be properly set up. For instructions, see Section 5.1.12, “MySQL Server Time Zone Support”. If you want to subtract UNIX_TIMESTAMP() columns, you might want to cast them to signed integers. See Section 12.10, “Cast Functions and Operators”. • UTC_DATE, UTC_DATE() Returns the current UTC date as a value in 'YYYY-MM-DD' or YYYYMMDD format, depending on whether the function is used in a string or numeric context. mysql> SELECT UTC_DATE(), UTC_DATE() + 0; -> '2003-08-14', 20030814
• UTC_TIME, UTC_TIME([fsp]) Returns the current UTC time as a value in 'HH:MM:SS' or HHMMSS format, depending on whether the function is used in a string or numeric context. If the fsp argument is given to specify a fractional seconds precision from 0 to 6, the return value includes a fractional seconds part of that many digits. mysql> SELECT UTC_TIME(), UTC_TIME() + 0; -> '18:07:53', 180753.000000
• UTC_TIMESTAMP, UTC_TIMESTAMP([fsp]) Returns the current UTC date and time as a value in 'YYYY-MM-DD HH:MM:SS' or YYYYMMDDHHMMSS format, depending on whether the function is used in a string or numeric context. If the fsp argument is given to specify a fractional seconds precision from 0 to 6, the return value includes a fractional seconds part of that many digits. mysql> SELECT UTC_TIMESTAMP(), UTC_TIMESTAMP() + 0; -> '2003-08-14 18:08:04', 20030814180804.000000
• WEEK(date[,mode]) This function returns the week number for date. The two-argument form of WEEK() enables you to specify whether the week starts on Sunday or Monday and whether the return value should be in the range from 0 to 53 or from 1 to 53. If the mode argument is omitted, the value of the default_week_format system variable is used. See Section 5.1.7, “Server System Variables”. The following table describes how the mode argument works.
1700
Mode
First day of week Range
Week 1 is the first week …
0
Sunday
0-53
with a Sunday in this year
1
Monday
0-53
with 4 or more days this year
2
Sunday
1-53
with a Sunday in this year
3
Monday
1-53
with 4 or more days this year
4
Sunday
0-53
with 4 or more days this year
5
Monday
0-53
with a Monday in this year
Date and Time Functions
Mode
First day of week Range
Week 1 is the first week …
6
Sunday
1-53
with 4 or more days this year
7
Monday
1-53
with a Monday in this year
For mode values with a meaning of “with 4 or more days this year,” weeks are numbered according to ISO 8601:1988: • If the week containing January 1 has 4 or more days in the new year, it is week 1. • Otherwise, it is the last week of the previous year, and the next week is week 1. mysql> SELECT -> 7 mysql> SELECT -> 7 mysql> SELECT -> 8 mysql> SELECT -> 53
WEEK('2008-02-20'); WEEK('2008-02-20',0); WEEK('2008-02-20',1); WEEK('2008-12-31',1);
If a date falls in the last week of the previous year, MySQL returns 0 if you do not use 2, 3, 6, or 7 as the optional mode argument: mysql> SELECT YEAR('2000-01-01'), WEEK('2000-01-01',0); -> 2000, 0
One might argue that WEEK() should return 52 because the given date actually occurs in the 52nd week of 1999. WEEK() returns 0 instead so that the return value is “the week number in the given year.” This makes use of the WEEK() function reliable when combined with other functions that extract a date part from a date. If you prefer a result evaluated with respect to the year that contains the first day of the week for the given date, use 0, 2, 5, or 7 as the optional mode argument. mysql> SELECT WEEK('2000-01-01',2); -> 52
Alternatively, use the YEARWEEK() function: mysql> SELECT YEARWEEK('2000-01-01'); -> 199952 mysql> SELECT MID(YEARWEEK('2000-01-01'),5,2); -> '52'
• WEEKDAY(date) Returns the weekday index for date (0 = Monday, 1 = Tuesday, … 6 = Sunday). mysql> SELECT WEEKDAY('2008-02-03 22:23:00'); -> 6 mysql> SELECT WEEKDAY('2007-11-06'); -> 1
• WEEKOFYEAR(date) Returns the calendar week of the date as a number in the range from 1 to 53. WEEKOFYEAR() is a compatibility function that is equivalent to WEEK(date,3). mysql> SELECT WEEKOFYEAR('2008-02-20');
1701
What Calendar Is Used By MySQL?
-> 8
• YEAR(date) Returns the year for date, in the range 1000 to 9999, or 0 for the “zero” date. mysql> SELECT YEAR('1987-01-01'); -> 1987
• YEARWEEK(date), YEARWEEK(date,mode) Returns year and week for a date. The year in the result may be different from the year in the date argument for the first and the last week of the year. The mode argument works exactly like the mode argument to WEEK(). For the single-argument syntax, a mode value of 0 is used. Unlike WEEK(), the value of default_week_format does not influence YEARWEEK(). mysql> SELECT YEARWEEK('1987-01-01'); -> 198652
The week number is different from what the WEEK() function would return (0) for optional arguments 0 or 1, as WEEK() then returns the week in the context of the given year.
12.8 What Calendar Is Used By MySQL? MySQL uses what is known as a proleptic Gregorian calendar. Every country that has switched from the Julian to the Gregorian calendar has had to discard at least ten days during the switch. To see how this works, consider the month of October 1582, when the first Julian-to-Gregorian switch occurred. Monday
Tuesday
Wednesday Thursday
Friday
Saturday
Sunday
1
2
3
4
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
There are no dates between October 4 and October 15. This discontinuity is called the cutover. Any dates before the cutover are Julian, and any dates following the cutover are Gregorian. Dates during a cutover are nonexistent. A calendar applied to dates when it was not actually in use is called proleptic. Thus, if we assume there was never a cutover and Gregorian rules always rule, we have a proleptic Gregorian calendar. This is what is used by MySQL, as is required by standard SQL. For this reason, dates prior to the cutover stored as MySQL DATE or DATETIME values must be adjusted to compensate for the difference. It is important to realize that the cutover did not occur at the same time in all countries, and that the later it happened, the more days were lost. For example, in Great Britain, it took place in 1752, when Wednesday September 2 was followed by Thursday September 14. Russia remained on the Julian calendar until 1918, losing 13 days in the process, and what is popularly referred to as its “October Revolution” occurred in November according to the Gregorian calendar.
12.9 Full-Text Search Functions MATCH (col1,col2,...) AGAINST (expr [search_modifier]) search_modifier:
1702
Full-Text Search Functions
{ IN NATURAL LANGUAGE MODE | IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION | IN BOOLEAN MODE | WITH QUERY EXPANSION }
MySQL has support for full-text indexing and searching: • A full-text index in MySQL is an index of type FULLTEXT. • Full-text indexes can be used only with InnoDB or MyISAM tables, and can be created only for CHAR, VARCHAR, or TEXT columns. • As of MySQL 5.7.6, MySQL provides a built-in full-text ngram parser that supports Chinese, Japanese, and Korean (CJK), and an installable MeCab full-text parser plugin for Japanese. Parsing differences are outlined in Section 12.9.8, “ngram Full-Text Parser”, and Section 12.9.9, “MeCab Full-Text Parser Plugin”. • A FULLTEXT index definition can be given in the CREATE TABLE statement when a table is created, or added later using ALTER TABLE or CREATE INDEX. • For large data sets, it is much faster to load your data into a table that has no FULLTEXT index and then create the index after that, than to load data into a table that has an existing FULLTEXT index. Full-text searching is performed using MATCH() ... AGAINST syntax. MATCH() takes a commaseparated list that names the columns to be searched. AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform. The search string must be a string value that is constant during query evaluation. This rules out, for example, a table column because that can differ for each row. There are three types of full-text searches: • A natural language search interprets the search string as a phrase in natural human language (a phrase in free text). There are no special operators, with the exception of double quote (") characters. The stopword list applies. For more information about stopword lists, see Section 12.9.4, “Full-Text Stopwords”. Full-text searches are natural language searches if the IN NATURAL LANGUAGE MODE modifier is given or if no modifier is given. For more information, see Section 12.9.1, “Natural Language FullText Searches”. • A boolean search interprets the search string using the rules of a special query language. The string contains the words to search for. It can also contain operators that specify requirements such that a word must be present or absent in matching rows, or that it should be weighted higher or lower than usual. Certain common words (stopwords) are omitted from the search index and do not match if present in the search string. The IN BOOLEAN MODE modifier specifies a boolean search. For more information, see Section 12.9.2, “Boolean Full-Text Searches”. • A query expansion search is a modification of a natural language search. The search string is used to perform a natural language search. Then words from the most relevant rows returned by the search are added to the search string and the search is done again. The query returns the rows from the second search. The IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION or WITH QUERY EXPANSION modifier specifies a query expansion search. For more information, see Section 12.9.3, “Full-Text Searches with Query Expansion”. For information about FULLTEXT query performance, see Section 8.3.4, “Column Indexes”. For more information about InnoDB FULLTEXT indexes, see Section 14.6.2.4, “InnoDB FULLTEXT Indexes”. Constraints on full-text searching are listed in Section 12.9.5, “Full-Text Restrictions”.
1703
Natural Language Full-Text Searches
The myisam_ftdump utility dumps the contents of a MyISAM full-text index. This may be helpful for debugging full-text queries. See Section 4.6.2, “myisam_ftdump — Display Full-Text Index information”.
12.9.1 Natural Language Full-Text Searches By default or with the IN NATURAL LANGUAGE MODE modifier, the MATCH() function performs a natural language search for a string against a text collection. A collection is a set of one or more columns included in a FULLTEXT index. The search string is given as the argument to AGAINST(). For each row in the table, MATCH() returns a relevance value; that is, a similarity measure between the search string and the text in that row in the columns named in the MATCH() list. mysql> CREATE TABLE articles ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, title VARCHAR(200), body TEXT, FULLTEXT (title,body) ) ENGINE=InnoDB; Query OK, 0 rows affected (0.08 sec) mysql> INSERT INTO articles (title,body) VALUES ('MySQL Tutorial','DBMS stands for DataBase ...'), ('How To Use MySQL Well','After you went through a ...'), ('Optimizing MySQL','In this tutorial we will show ...'), ('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'), ('MySQL vs. YourSQL','In the following database comparison ...'), ('MySQL Security','When configured properly, MySQL ...'); Query OK, 6 rows affected (0.01 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('database' IN NATURAL LANGUAGE MODE); +----+-------------------+------------------------------------------+ | id | title | body | +----+-------------------+------------------------------------------+ | 1 | MySQL Tutorial | DBMS stands for DataBase ... | | 5 | MySQL vs. YourSQL | In the following database comparison ... | +----+-------------------+------------------------------------------+ 2 rows in set (0.00 sec)
By default, the search is performed in case-insensitive fashion. To perform a case-sensitive full-text search, use a binary collation for the indexed columns. For example, a column that uses the latin1 character set of can be assigned a collation of latin1_bin to make it case-sensitive for full-text searches. When MATCH() is used in a WHERE clause, as in the example shown earlier, the rows returned are automatically sorted with the highest relevance first. Relevance values are nonnegative floatingpoint numbers. Zero relevance means no similarity. Relevance is computed based on the number of words in the row (document), the number of unique words in the row, the total number of words in the collection, and the number of rows that contain a particular word. Note The term “document” may be used interchangeably with the term “row”, and both terms refer to the indexed part of the row. The term “collection” refers to the indexed columns and encompasses all rows. To simply count matches, you could use a query like this: mysql> SELECT COUNT(*) FROM articles WHERE MATCH (title,body) AGAINST ('database' IN NATURAL LANGUAGE MODE);
1704
Natural Language Full-Text Searches
+----------+ | COUNT(*) | +----------+ | 2 | +----------+ 1 row in set (0.00 sec)
You might find it quicker to rewrite the query as follows: mysql> SELECT COUNT(IF(MATCH (title,body) AGAINST ('database' IN NATURAL LANGUAGE MODE), 1, NULL)) AS count FROM articles; +-------+ | count | +-------+ | 2 | +-------+ 1 row in set (0.03 sec)
The first query does some extra work (sorting the results by relevance) but also can use an index lookup based on the WHERE clause. The index lookup might make the first query faster if the search matches few rows. The second query performs a full table scan, which might be faster than the index lookup if the search term was present in most rows. For natural-language full-text searches, the columns named in the MATCH() function must be the same columns included in some FULLTEXT index in your table. For the preceding query, note that the columns named in the MATCH() function (title and body) are the same as those named in the definition of the article table's FULLTEXT index. To search the title or body separately, you would create separate FULLTEXT indexes for each column. You can also perform a boolean search or a search with query expansion. These search types are described in Section 12.9.2, “Boolean Full-Text Searches”, and Section 12.9.3, “Full-Text Searches with Query Expansion”. A full-text search that uses an index can name columns only from a single table in the MATCH() clause because an index cannot span multiple tables. For MyISAM tables, a boolean search can be done in the absence of an index (albeit more slowly), in which case it is possible to name columns from multiple tables. The preceding example is a basic illustration that shows how to use the MATCH() function where rows are returned in order of decreasing relevance. The next example shows how to retrieve the relevance values explicitly. Returned rows are not ordered because the SELECT statement includes neither WHERE nor ORDER BY clauses: mysql> SELECT id, MATCH (title,body) AGAINST ('Tutorial' IN NATURAL LANGUAGE MODE) AS score FROM articles; +----+---------------------+ | id | score | +----+---------------------+ | 1 | 0.22764469683170319 | | 2 | 0 | | 3 | 0.22764469683170319 | | 4 | 0 | | 5 | 0 | | 6 | 0 | +----+---------------------+ 6 rows in set (0.00 sec)
The following example is more complex. The query returns the relevance values and it also sorts the rows in order of decreasing relevance. To achieve this result, specify MATCH() twice: once in the SELECT list and once in the WHERE clause. This causes no additional overhead, because the MySQL
1705
Natural Language Full-Text Searches
optimizer notices that the two MATCH() calls are identical and invokes the full-text search code only once. mysql> SELECT id, body, MATCH (title,body) AGAINST ('Security implications of running MySQL as root' IN NATURAL LANGUAGE MODE) AS score FROM articles WHERE MATCH (title,body) AGAINST ('Security implications of running MySQL as root' IN NATURAL LANGUAGE MODE); +----+-------------------------------------+-----------------+ | id | body | score | +----+-------------------------------------+-----------------+ | 4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 | | 6 | When configured properly, MySQL ... | 1.3114095926285 | +----+-------------------------------------+-----------------+ 2 rows in set (0.00 sec)
A phrase that is enclosed within double quote (") characters matches only rows that contain the phrase literally, as it was typed. The full-text engine splits the phrase into words and performs a search in the FULLTEXT index for the words. Nonword characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase". If the phrase contains no words that are in the index, the result is empty. For example, if all words are either stopwords or shorter than the minimum length of indexed words, the result is empty. The MySQL FULLTEXT implementation regards any sequence of true word characters (letters, digits, and underscores) as a word. That sequence may also contain apostrophes ('), but not more than one in a row. This means that aaa'bbb is regarded as one word, but aaa''bbb is regarded as two words. Apostrophes at the beginning or the end of a word are stripped by the FULLTEXT parser; 'aaa'bbb' would be parsed as aaa'bbb. The built-in FULLTEXT parser determines where words start and end by looking for certain delimiter characters; for example, (space), , (comma), and . (period). If words are not separated by delimiters (as in, for example, Chinese), the built-in FULLTEXT parser cannot determine where a word begins or ends. To be able to add words or other indexed terms in such languages to a FULLTEXT index that uses the built-in FULLTEXT parser, you must preprocess them so that they are separated by some arbitrary delimiter. Alternatively, as of MySQL 5.7.6, you can create FULLTEXT indexes using the ngram parser plugin (for Chinese, Japanese, or Korean) or the MeCab parser plugin (for Japanese). It is possible to write a plugin that replaces the built-in full-text parser. For details, see Section 28.2, “The MySQL Plugin API”. For example parser plugin source code, see the plugin/fulltext directory of a MySQL source distribution. Some words are ignored in full-text searches: • Any word that is too short is ignored. The default minimum length of words that are found by full-text searches is three characters for InnoDB search indexes, or four characters for MyISAM. You can control the cutoff by setting a configuration option before creating the index: innodb_ft_min_token_size configuration option for InnoDB search indexes, or ft_min_word_len for MyISAM. Note This behavior does not apply to FULLTEXT indexes that use the ngram parser. For the ngram parser, token length is defined by the ngram_token_size option. • Words in the stopword list are ignored. A stopword is a word such as “the” or “some” that is so common that it is considered to have zero semantic value. There is a built-in stopword list, but it can be overridden by a user-defined list. The stopword lists and related configuration options are different for InnoDB search indexes and MyISAM ones. Stopword processing is controlled by the 1706
Boolean Full-Text Searches
configuration options innodb_ft_enable_stopword, innodb_ft_server_stopword_table, and innodb_ft_user_stopword_table for InnoDB search indexes, and ft_stopword_file for MyISAM ones. See Section 12.9.4, “Full-Text Stopwords” to view default stopword lists and how to change them. The default minimum word length can be changed as described in Section 12.9.6, “Fine-Tuning MySQL Full-Text Search”. Every correct word in the collection and in the query is weighted according to its significance in the collection or query. Thus, a word that is present in many documents has a lower weight, because it has lower semantic value in this particular collection. Conversely, if the word is rare, it receives a higher weight. The weights of the words are combined to compute the relevance of the row. This technique works best with large collections. MyISAM Limitation For very small tables, word distribution does not adequately reflect their semantic value, and this model may sometimes produce bizarre results for search indexes on MyISAM tables. For example, although the word “MySQL” is present in every row of the articles table shown earlier, a search for the word in a MyISAM search index produces no results: mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('MySQL' IN NATURAL LANGUAGE MODE); Empty set (0.00 sec)
The search result is empty because the word “MySQL” is present in at least 50% of the rows, and so is effectively treated as a stopword. This filtering technique is more suitable for large data sets, where you might not want the result set to return every second row from a 1GB table, than for small data sets where it might cause poor results for popular terms. The 50% threshold can surprise you when you first try full-text searching to see how it works, and makes InnoDB tables more suited to experimentation with full-text searches. If you create a MyISAM table and insert only one or two rows of text into it, every word in the text occurs in at least 50% of the rows. As a result, no search returns any results until the table contains more rows. Users who need to bypass the 50% limitation can build search indexes on InnoDB tables, or use the boolean search mode explained in Section 12.9.2, “Boolean Full-Text Searches”.
12.9.2 Boolean Full-Text Searches MySQL can perform boolean full-text searches using the IN BOOLEAN MODE modifier. With this modifier, certain characters have special meaning at the beginning or end of words in the search string. In the following query, the + and - operators indicate that a word must be present or absent, respectively, for a match to occur. Thus, the query retrieves all the rows that contain the word “MySQL” but that do not contain the word “YourSQL”: mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE); +----+-----------------------+-------------------------------------+ | id | title | body | +----+-----------------------+-------------------------------------+ | 1 | MySQL Tutorial | DBMS stands for DataBase ... | | 2 | How To Use MySQL Well | After you went through a ... | | 3 | Optimizing MySQL | In this tutorial we will show ... | | 4 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. ... | | 6 | MySQL Security | When configured properly, MySQL ... |
1707
Boolean Full-Text Searches
+----+-----------------------+-------------------------------------+
Note In implementing this feature, MySQL uses what is sometimes referred to as implied Boolean logic, in which • + stands for AND • - stands for NOT • [no operator] implies OR Boolean full-text searches have these characteristics: • They do not automatically sort rows in order of decreasing relevance. • InnoDB tables require a FULLTEXT index on all columns of the MATCH() expression to perform boolean queries. Boolean queries against a MyISAM search index can work even without a FULLTEXT index, although a search executed in this fashion would be quite slow. • The minimum and maximum word length full-text parameters apply to FULLTEXT indexes created using the built-in FULLTEXT parser and MeCab parser plugin. innodb_ft_min_token_size and innodb_ft_max_token_size are used for InnoDB search indexes. ft_min_word_len and ft_max_word_len are used for MyISAM search indexes. Minimum and maximum word length full-text parameters do not apply to FULLTEXT indexes created using the ngram parser. ngram token size is defined by the ngram_token_size option. • The stopword list applies, controlled by innodb_ft_enable_stopword, innodb_ft_server_stopword_table, and innodb_ft_user_stopword_table for InnoDB search indexes, and ft_stopword_file for MyISAM ones. • InnoDB full-text search does not support the use of multiple operators on a single search word, as in this example: '++apple'. Use of multiple operators on a single search word returns a syntax error to standard out. MyISAM full-text search will successfully process the same search ignoring all operators except for the operator immediately adjacent to the search word. • InnoDB full-text search only supports leading plus or minus signs. For example, InnoDB supports '+apple' but does not support 'apple+'. Specifying a trailing plus or minus sign causes InnoDB to report a syntax error. • InnoDB full-text search does not support the use of a leading plus sign with wildcard ('+*'), a plus and minus sign combination ('+-'), or leading a plus and minus sign combination ('+-apple'). These invalid queries return a syntax error. • InnoDB full-text search does not support the use of the @ symbol in boolean full-text searches. The @ symbol is reserved for use by the @distance proximity search operator. • They do not use the 50% threshold that applies to MyISAM search indexes. The boolean full-text search capability supports the following operators: • + A leading or trailing plus sign indicates that this word must be present in each row that is returned. InnoDB only supports leading plus signs. • A leading or trailing minus sign indicates that this word must not be present in any of the rows that are returned. InnoDB only supports leading minus signs.
1708
Boolean Full-Text Searches
Note: The - operator acts only to exclude rows that are otherwise matched by other search terms. Thus, a boolean-mode search that contains only terms preceded by - returns an empty result. It does not return “all rows except those containing any of the excluded terms.” • (no operator) By default (when neither + nor - is specified), the word is optional, but the rows that contain it are rated higher. This mimics the behavior of MATCH() ... AGAINST() without the IN BOOLEAN MODE modifier. • @distance This operator works on InnoDB tables only. It tests whether two or more words all start within a specified distance from each other, measured in words. Specify the search words within a double-quoted string immediately before the @distance operator, for example, MATCH(col1) AGAINST('"word1 word2 word3" @8' IN BOOLEAN MODE) • > < These two operators are used to change a word's contribution to the relevance value that is assigned to a row. The > operator increases the contribution and the < operator decreases it. See the example following this list. • ( ) Parentheses group words into subexpressions. Parenthesized groups can be nested. • ~ A leading tilde acts as a negation operator, causing the word's contribution to the row's relevance to be negative. This is useful for marking “noise” words. A row containing such a word is rated lower than others, but is not excluded altogether, as it would be with the - operator. • * The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it is appended to the word to be affected. Words match if they begin with the word preceding the * operator. If a word is specified with the truncation operator, it is not stripped from a boolean query, even if it is too short or a stopword. Whether a word is too short is determined from the innodb_ft_min_token_size setting for InnoDB tables, or ft_min_word_len for MyISAM tables. These options are not applicable to FULLTEXT indexes that use the ngram parser. The wildcarded word is considered as a prefix that must be present at the start of one or more words. If the minimum word length is 4, a search for '+word +the*' could return fewer rows than a search for '+word +the', because the second query ignores the too-short search term the. • " A phrase that is enclosed within double quote (") characters matches only rows that contain the phrase literally, as it was typed. The full-text engine splits the phrase into words and performs a search in the FULLTEXT index for the words. Nonword characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase". If the phrase contains no words that are in the index, the result is empty. The words might not be in the index because of a combination of factors: if they do not exist in the text, are stopwords, or are shorter than the minimum length of indexed words. The following examples demonstrate some search strings that use boolean full-text operators:
1709
Boolean Full-Text Searches
• 'apple banana' Find rows that contain at least one of the two words. • '+apple +juice' Find rows that contain both words. • '+apple macintosh' Find rows that contain the word “apple”, but rank rows higher if they also contain “macintosh”. • '+apple -macintosh' Find rows that contain the word “apple” but not “macintosh”. • '+apple ~macintosh' Find rows that contain the word “apple”, but if the row also contains the word “macintosh”, rate it lower than if row does not. This is “softer” than a search for '+apple -macintosh', for which the presence of “macintosh” causes the row not to be returned at all. • '+apple +(>turnover <strudel)' Find rows that contain the words “apple” and “turnover”, or “apple” and “strudel” (in any order), but rank “apple turnover” higher than “apple strudel”. • 'apple*' Find rows that contain words such as “apple”, “apples”, “applesauce”, or “applet”. • '"some words"' Find rows that contain the exact phrase “some words” (for example, rows that contain “some words of wisdom” but not “some noise words”). Note that the " characters that enclose the phrase are operator characters that delimit the phrase. They are not the quotation marks that enclose the search string itself.
Relevancy Rankings for InnoDB Boolean Mode Search InnoDB full-text search is modeled on the Sphinx full-text search engine, and the algorithms used are based on BM25 and TF-IDF ranking algorithms. For these reasons, relevancy rankings for InnoDB boolean full-text search may differ from MyISAM relevancy rankings. InnoDB uses a variation of the “term frequency-inverse document frequency” (TF-IDF) weighting system to rank a document's relevance for a given full-text search query. The TF-IDF weighting is based on how frequently a word appears in a document, offset by how frequently the word appears in all documents in the collection. In other words, the more frequently a word appears in a document, and the less frequently the word appears in the document collection, the higher the document is ranked.
How Relevancy Ranking is Calculated The term frequency (TF) value is the number of times that a word appears in a document. The inverse document frequency (IDF) value of a word is calculated using the following formula, where total_records is the number of records in the collection, and matching_records is the number of records that the search term appears in. ${IDF} = log10( ${total_records} / ${matching_records} )
When a document contains a word multiple times, the IDF value is multiplied by the TF value: ${TF} * ${IDF}
1710
Boolean Full-Text Searches
Using the TF and IDF values, the relevancy ranking for a document is calculated using this formula: ${rank} = ${TF} * ${IDF} * ${IDF}
The formula is demonstrated in the following examples.
Relevancy Ranking for a Single Word Search This example demonstrates the relevancy ranking calculation for a single-word search. mysql> CREATE TABLE articles ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, title VARCHAR(200), body TEXT, FULLTEXT (title,body) ) ENGINE=InnoDB; Query OK, 0 rows affected (1.04 sec) mysql> INSERT INTO articles (title,body) VALUES ('MySQL Tutorial','This database tutorial ...'), ("How To Use MySQL",'After you went through a ...'), ('Optimizing Your Database','In this database tutorial ...'), ('MySQL vs. YourSQL','When comparing databases ...'), ('MySQL Security','When configured properly, MySQL ...'), ('Database, Database, Database','database database database'), ('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'), ('MySQL Full-Text Indexes', 'MySQL fulltext indexes use a ..'); Query OK, 8 rows affected (0.06 sec) Records: 8 Duplicates: 0 Warnings: 0 mysql> SELECT id, title, body, MATCH (title,body) AGAINST ('database' IN BOOLEAN MODE) AS score FROM articles ORDER BY score DESC; +----+------------------------------+-------------------------------------+---------------------+ | id | title | body | score | +----+------------------------------+-------------------------------------+---------------------+ | 6 | Database, Database, Database | database database database | 1.0886961221694946 | | 3 | Optimizing Your Database | In this database tutorial ... | 0.36289870738983154 | | 1 | MySQL Tutorial | This database tutorial ... | 0.18144935369491577 | | 2 | How To Use MySQL | After you went through a ... | 0 | | 4 | MySQL vs. YourSQL | When comparing databases ... | 0 | | 5 | MySQL Security | When configured properly, MySQL ... | 0 | | 7 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. ... | 0 | | 8 | MySQL Full-Text Indexes | MySQL fulltext indexes use a .. | 0 | +----+------------------------------+-------------------------------------+---------------------+ 8 rows in set (0.00 sec)
There are 8 records in total, with 3 that match the “database” search term. The first record (id 6) contains the search term 6 times and has a relevancy ranking of 1.0886961221694946. This ranking value is calculated using a TF value of 6 (the “database” search term appears 6 times in record id 6) and an IDF value of 0.42596873216370745, which is calculated as follows (where 8 is the total number of records and 3 is the number of records that the search term appears in): ${IDF} = log10( 8 / 3 ) = 0.42596873216370745
The TF and IDF values are then entered into the ranking formula: ${rank} = ${TF} * ${IDF} * ${IDF}
Performing the calculation in the MySQL command-line client returns a ranking value of 1.088696164686938. mysql> SELECT 6*log10(8/3)*log10(8/3); +-------------------------+ | 6*log10(8/3)*log10(8/3) | +-------------------------+ | 1.088696164686938 | +-------------------------+ 1 row in set (0.00 sec)
1711
Full-Text Searches with Query Expansion
Note You may notice a slight difference in the ranking values returned by the SELECT ... MATCH ... AGAINST statement and the MySQL command-line client (1.0886961221694946 versus 1.088696164686938). The difference is due to how the casts between integers and floats/doubles are performed internally by InnoDB (along with related precision and rounding decisions), and how they are performed elsewhere, such as in the MySQL command-line client or other types of calculators.
Relevancy Ranking for a Multiple Word Search This example demonstrates the relevancy ranking calculation for a multiple-word full-text search based on the articles table and data used in the previous example. If you search on more than one word, the relevancy ranking value is a sum of the relevancy ranking value for each word, as shown in this formula: ${rank} = ${TF} * ${IDF} * ${IDF} + ${TF} * ${IDF} * ${IDF}
Performing a search on two terms ('mysql tutorial') returns the following results: mysql> SELECT id, title, body, MATCH (title,body) AGAINST ('mysql tutorial' IN BOOLEAN MODE) AS score FROM articles ORDER BY score DESC; +----+------------------------------+-------------------------------------+----------------------+ | id | title | body | score | +----+------------------------------+-------------------------------------+----------------------+ | 1 | MySQL Tutorial | This database tutorial ... | 0.7405621409416199 | | 3 | Optimizing Your Database | In this database tutorial ... | 0.3624762296676636 | | 5 | MySQL Security | When configured properly, MySQL ... | 0.031219376251101494 | | 8 | MySQL Full-Text Indexes | MySQL fulltext indexes use a .. | 0.031219376251101494 | | 2 | How To Use MySQL | After you went through a ... | 0.015609688125550747 | | 4 | MySQL vs. YourSQL | When comparing databases ... | 0.015609688125550747 | | 7 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. ... | 0.015609688125550747 | | 6 | Database, Database, Database | database database database | 0 | +----+------------------------------+-------------------------------------+----------------------+ 8 rows in set (0.00 sec)
In the first record (id 8), 'mysql' appears once and 'tutorial' appears twice. There are six matching records for 'mysql' and two matching records for 'tutorial'. The MySQL command-line client returns the expected ranking value when inserting these values into the ranking formula for a multiple word search: mysql> SELECT (1*log10(8/6)*log10(8/6)) + (2*log10(8/2)*log10(8/2)); +-------------------------------------------------------+ | (1*log10(8/6)*log10(8/6)) + (2*log10(8/2)*log10(8/2)) | +-------------------------------------------------------+ | 0.7405621541938003 | +-------------------------------------------------------+ 1 row in set (0.00 sec)
Note The slight difference in the ranking values returned by the SELECT ... MATCH ... AGAINST statement and the MySQL command-line client is explained in the preceding example.
12.9.3 Full-Text Searches with Query Expansion Full-text search supports query expansion (and in particular, its variant “blind query expansion”). This is generally useful when a search phrase is too short, which often means that the user is relying on implied knowledge that the full-text search engine lacks. For example, a user searching for “database” may really mean that “MySQL”, “Oracle”, “DB2”, and “RDBMS” all are phrases that should match “databases” and should be returned, too. This is implied knowledge. 1712
Full-Text Stopwords
Blind query expansion (also known as automatic relevance feedback) is enabled by adding WITH QUERY EXPANSION or IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION following the search phrase. It works by performing the search twice, where the search phrase for the second search is the original search phrase concatenated with the few most highly relevant documents from the first search. Thus, if one of these documents contains the word “databases” and the word “MySQL”, the second search finds the documents that contain the word “MySQL” even if they do not contain the word “database”. The following example shows this difference: mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('database' IN NATURAL LANGUAGE MODE); +----+-------------------+------------------------------------------+ | id | title | body | +----+-------------------+------------------------------------------+ | 1 | MySQL Tutorial | DBMS stands for DataBase ... | | 5 | MySQL vs. YourSQL | In the following database comparison ... | +----+-------------------+------------------------------------------+ 2 rows in set (0.00 sec) mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('database' WITH QUERY EXPANSION); +----+-----------------------+------------------------------------------+ | id | title | body | +----+-----------------------+------------------------------------------+ | 5 | MySQL vs. YourSQL | In the following database comparison ... | | 1 | MySQL Tutorial | DBMS stands for DataBase ... | | 3 | Optimizing MySQL | In this tutorial we will show ... | | 6 | MySQL Security | When configured properly, MySQL ... | | 2 | How To Use MySQL Well | After you went through a ... | | 4 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. ... | +----+-----------------------+------------------------------------------+ 6 rows in set (0.00 sec)
Another example could be searching for books by Georges Simenon about Maigret, when a user is not sure how to spell “Maigret”. A search for “Megre and the reluctant witnesses” finds only “Maigret and the Reluctant Witnesses” without query expansion. A search with query expansion finds all books with the word “Maigret” on the second pass. Note Because blind query expansion tends to increase noise significantly by returning nonrelevant documents, use it only when a search phrase is short.
12.9.4 Full-Text Stopwords The stopword list is loaded and searched for full-text queries using the server character set and collation (the values of the character_set_server and collation_server system variables). False hits or misses might occur for stopword lookups if the stopword file or columns used for full-text indexing or searches have a character set or collation different from character_set_server or collation_server. Case sensitivity of stopword lookups depends on the server collation. For example, lookups are case insensitive if the collation is latin1_swedish_ci, whereas lookups are case-sensitive if the collation is latin1_general_cs or latin1_bin. • Stopwords for InnoDB Search Indexes • Stopwords for MyISAM Search Indexes
Stopwords for InnoDB Search Indexes InnoDB has a relatively short list of default stopwords, because documents from technical, literary, and other sources often use short words as keywords or in significant phrases. For example, you might
1713
Full-Text Stopwords
search for “to be or not to be” and expect to get a sensible result, rather than having all those words ignored. To see the default InnoDB stopword list, query the INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD table. mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD; +-------+ | value | +-------+ | a | | about | | an | | are | | as | | at | | be | | by | | com | | de | | en | | for | | from | | how | | i | | in | | is | | it | | la | | of | | on | | or | | that | | the | | this | | to | | was | | what | | when | | where | | who | | will | | with | | und | | the | | www | +-------+ 36 rows in set (0.00 sec)
To define your own stopword list for all InnoDB tables, define a table with the same structure as the INNODB_FT_DEFAULT_STOPWORD table, populate it with stopwords, and set the value of the innodb_ft_server_stopword_table option to a value in the form db_name/table_name before creating the full-text index. The stopword table must have a single VARCHAR column named value. The following example demonstrates creating and configuring a new global stopword table for InnoDB. -- Create a new stopword table mysql> CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE = INNODB; Query OK, 0 rows affected (0.01 sec) -- Insert stopwords (for simplicity, a single stopword is used in this example) mysql> INSERT INTO my_stopwords(value) VALUES ('Ishmael'); Query OK, 1 row affected (0.00 sec) -- Create the table mysql> CREATE TABLE opening_lines ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, opening_line TEXT(500), author VARCHAR(200),
1714
Full-Text Stopwords
title VARCHAR(200) ) ENGINE=InnoDB; Query OK, 0 rows affected (0.01 sec) -- Insert data into the table mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES ('Call me Ishmael.','Herman Melville','Moby-Dick'), ('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'), ('I am an invisible man.','Ralph Ellison','Invisible Man'), ('Where now? Who now? When now?','Samuel Beckett','The Unnamable'), ('It was love at first sight.','Joseph Heller','Catch-22'), ('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'), ('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'), ('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451'); Query OK, 8 rows affected (0.00 sec) Records: 8 Duplicates: 0 Warnings: 0 -- Set the innodb_ft_server_stopword_table option to the new stopword table mysql> SET GLOBAL innodb_ft_server_stopword_table = 'test/my_stopwords'; Query OK, 0 rows affected (0.00 sec) -- Create the full-text index (which rebuilds the table if no FTS_DOC_ID column is defined) mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line); Query OK, 0 rows affected, 1 warning (1.17 sec) Records: 0 Duplicates: 0 Warnings: 1
Verify that the specified stopword ('Ishmael') does not appear by querying the words in INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE. Note By default, words less than 3 characters in length or greater than 84 characters in length do not appear in an InnoDB full-text search index. Maximum and minimum word length values are configurable using the innodb_ft_max_token_size and innodb_ft_min_token_size variables. This default behavior does not apply to the ngram parser plugin. ngram token size is defined by the ngram_token_size option. mysql> SET GLOBAL innodb_ft_aux_table='test/opening_lines'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT word FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE LIMIT 15; +-----------+ | word | +-----------+ | across | | all | | burn | | buy | | call | | comes | | dalloway | | first | | flowers | | happened | | herself | | invisible | | less | | love | | man | +-----------+ 15 rows in set (0.00 sec)
To create stopword lists on a table-by-table basis, create other stopword tables and use the innodb_ft_user_stopword_table option to specify the stopword table that you want to use before you create the full-text index.
1715
Full-Text Stopwords
Stopwords for MyISAM Search Indexes The stopword file is loaded and searched using latin1 if character_set_server is ucs2, utf16, utf16le, or utf32. To override the default stopword list for MyISAM tables, set the ft_stopword_file system variable. (See Section 5.1.7, “Server System Variables”.) The variable value should be the path name of the file containing the stopword list, or the empty string to disable stopword filtering. The server looks for the file in the data directory unless an absolute path name is given to specify a different directory. After changing the value of this variable or the contents of the stopword file, restart the server and rebuild your FULLTEXT indexes. The stopword list is free-form, separating stopwords with any nonalphanumeric character such as newline, space, or comma. Exceptions are the underscore character (_) and a single apostrophe (') which are treated as part of a word. The character set of the stopword list is the server's default character set; see Section 10.3.2, “Server Character Set and Collation”. The following list shows the default stopwords for MyISAM search indexes. In a MySQL source distribution, you can find this list in the storage/myisam/ft_static.c file. a's accordingly again allows also amongst anybody anyways appropriate aside available because before below between by can't certain com consider corresponding definitely different don't each else et everybody exactly fifth follows four gets goes greetings has he her herein him how i'm immediate indicate instead it itself know
1716
able across against almost although an anyhow anywhere are ask away become beforehand beside beyond c'mon cannot certainly come considering could described do done edu elsewhere etc everyone example first for from getting going had hasn't he's here hereupon himself howbeit i've in indicated into it'd just known
about actually ain't alone always and anyone apart aren't asking awfully becomes behind besides both c's cant changes comes contain couldn't despite does down eg enough even everything except five former further given gone hadn't have hello here's hers his however ie inasmuch indicates inward it'll keep knows
above after all along am another anything appear around associated be becoming being best brief came cause clearly concerning containing course did doesn't downwards eight entirely ever everywhere far followed formerly furthermore gives got happens haven't help hereafter herself hither i'd if inc inner is it's keeps last
according afterwards allow already among any anyway appreciate as at became been believe better but can causes co consequently contains currently didn't doing during either especially every ex few following forth get go gotten hardly having hence hereby hi hopefully i'll ignored indeed insofar isn't its kept lately
Full-Text Restrictions
later lest likely ltd me more must nd needs next none nothing of okay ones others ourselves own placed probably rather regarding right saying seeing seen serious she so something soon still t's th that theirs there therein they'd third though thus toward try under unto used value vs way we've weren't whence whereas whether who's why within wouldn't you'll yourself
latter let little mainly mean moreover my near neither nine noone novel off old only otherwise out particular please provides rd regardless said says seem self seriously should some sometime sorry sub take than that's them there's theres they'll this three to towards trying unfortunately up useful various want we welcome what whenever whereby which whoever will without yes you're yourselves
latterly let's look many meanwhile most myself nearly never no nor now often on onto ought outside particularly plus que re regards same second seemed selves seven shouldn't somebody sometimes specified such taken thank thats themselves thereafter thereupon they're thorough through together tried twice unless upon uses very wants we'd well what's where wherein while whole willing won't yet you've zero
least like looking may merely mostly name necessary nevertheless nobody normally nowhere oh once or our over per possible quite really relatively saw secondly seeming sensible several since somehow somewhat specify sup tell thanks the then thereby these they've thoroughly throughout too tries two unlikely us using via was we'll went whatever where's whereupon whither whom wish wonder you your
less liked looks maybe might much namely need new non not obviously ok one other ours overall perhaps presumably qv reasonably respectively say see seems sent shall six someone somewhere specifying sure tends thanx their thence therefore they think those thru took truly un until use usually viz wasn't we're were when whereafter wherever who whose with would you'd yours
12.9.5 Full-Text Restrictions • Full-text searches are supported for InnoDB and MyISAM tables only. • Full-text searches are not supported for partitioned tables. See Section 22.6, “Restrictions and Limitations on Partitioning”. • Full-text searches can be used with most multibyte character sets. The exception is that for Unicode, the utf8 character set can be used, but not the ucs2 character set. Although FULLTEXT indexes on
1717
Fine-Tuning MySQL Full-Text Search
ucs2 columns cannot be used, you can perform IN BOOLEAN MODE searches on a ucs2 column that has no such index. The remarks for utf8 also apply to utf8mb4, and the remarks for ucs2 also apply to utf16, utf16le, and utf32. • Ideographic languages such as Chinese and Japanese do not have word delimiters. Therefore, the built-in full-text parser cannot determine where words begin and end in these and other such languages. In MySQL 5.7.6, a character-based ngram full-text parser that supports Chinese, Japanese, and Korean (CJK), and a word-based MeCab parser plugin that supports Japanese are provided for use with InnoDB and MySIAM tables. • Although the use of multiple character sets within a single table is supported, all columns in a FULLTEXT index must use the same character set and collation. • The MATCH() column list must match exactly the column list in some FULLTEXT index definition for the table, unless this MATCH() is IN BOOLEAN MODE on a MyISAM table. For MyISAM tables, boolean-mode searches can be done on nonindexed columns, although they are likely to be slow. • The argument to AGAINST() must be a string value that is constant during query evaluation. This rules out, for example, a table column because that can differ for each row. • Index hints are more limited for FULLTEXT searches than for non-FULLTEXT searches. See Section 8.9.4, “Index Hints”. • For InnoDB, all DML operations (INSERT, UPDATE, DELETE) involving columns with full-text indexes are processed at transaction commit time. For example, for an INSERT operation, an inserted string is tokenized and decomposed into individual words. The individual words are then added to full-text index tables when the transaction is committed. As a result, full-text searches only return committed data. • The '%' character is not a supported wildcard character for full-text searches.
12.9.6 Fine-Tuning MySQL Full-Text Search MySQL's full-text search capability has few user-tunable parameters. You can exert more control over full-text searching behavior if you have a MySQL source distribution because some changes require source code modifications. See Section 2.9, “Installing MySQL from Source”. Full-text search is carefully tuned for effectiveness. Modifying the default behavior in most cases can actually decrease effectiveness. Do not alter the MySQL sources unless you know what you are doing. Most full-text variables described in this section must be set at server startup time. A server restart is required to change them; they cannot be modified while the server is running. Some variable changes require that you rebuild the FULLTEXT indexes in your tables. Instructions for doing so are given later in this section. • Configuring Minimum and Maximum Word Length • Configuring the Natural Language Search Threshold • Modifying Boolean Full-Text Search Operators • Character Set Modifications • Rebuilding InnoDB Full-Text Indexes • Optimizing InnoDB Full-Text Indexes
1718
Fine-Tuning MySQL Full-Text Search
• Rebuilding MyISAM Full-Text Indexes
Configuring Minimum and Maximum Word Length The minimum and maximum lengths of words to be indexed are defined by the innodb_ft_min_token_size and innodb_ft_max_token_size for InnoDB search indexes, and ft_min_word_len and ft_max_word_len for MyISAM ones. Note Minimum and maximum word length full-text parameters do not apply to FULLTEXT indexes created using the ngram parser. ngram token size is defined by the ngram_token_size option. After changing any of these options, rebuild your FULLTEXT indexes for the change to take effect. For example, to make two-character words searchable, you could put the following lines in an option file: [mysqld] innodb_ft_min_token_size=2 ft_min_word_len=2
Then restart the server and rebuild your FULLTEXT indexes. For MyISAM tables, note the remarks regarding myisamchk in the instructions that follow for rebuilding MyISAM full-text indexes.
Configuring the Natural Language Search Threshold For MyISAM search indexes, the 50% threshold for natural language searches is determined by the particular weighting scheme chosen. To disable it, look for the following line in storage/myisam/ ftdefs.h: #define GWS_IN_USE GWS_PROB
Change that line to this: #define GWS_IN_USE GWS_FREQ
Then recompile MySQL. There is no need to rebuild the indexes in this case. Note By making this change, you severely decrease MySQL's ability to provide adequate relevance values for the MATCH() function. If you really need to search for such common words, it would be better to search using IN BOOLEAN MODE instead, which does not observe the 50% threshold.
Modifying Boolean Full-Text Search Operators To change the operators used for boolean full-text searches on MyISAM tables, set the ft_boolean_syntax system variable. (InnoDB does not have an equivalent setting.) This variable can be changed while the server is running, but you must have privileges sufficient to set global system variables (see Section 5.1.8.1, “System Variable Privileges”). No rebuilding of indexes is necessary in this case.
Character Set Modifications For the built-in full-text parser, you can change the set of characters that are considered word characters in several ways, as described in the following list. After making the modification, rebuild the indexes for each table that contains any FULLTEXT indexes. Suppose that you want to treat the hyphen character ('-') as a word character. Use one of these methods:
1719
Fine-Tuning MySQL Full-Text Search
• Modify the MySQL source: In storage/innobase/handler/ha_innodb.cc (for InnoDB), or in storage/myisam/ftdefs.h (for MyISAM), see the true_word_char() and misc_word_char() macros. Add '-' to one of those macros and recompile MySQL. • Modify a character set file: This requires no recompilation. The true_word_char() macro uses a “character type” table to distinguish letters and numbers from other characters. . You can edit the contents of the <map> array in one of the character set XML files to specify that '-' is a “letter.” Then use the given character set for your FULLTEXT indexes. For information about the <map> array format, see Section 10.12.1, “Character Definition Arrays”. • Add a new collation for the character set used by the indexed columns, and alter the columns to use that collation. For general information about adding collations, see Section 10.13, “Adding a Collation to a Character Set”. For an example specific to full-text indexing, see Section 12.9.7, “Adding a Collation for Full-Text Indexing”.
Rebuilding InnoDB Full-Text Indexes For the changes to take effect, FULLTEXT indexes must be rebuilt after modifying any of the following full-text index variables: innodb_ft_min_token_size; innodb_ft_max_token_size; innodb_ft_server_stopword_table; innodb_ft_user_stopword_table; innodb_ft_enable_stopword; ngram_token_size. Modifying innodb_ft_min_token_size, innodb_ft_max_token_size, or ngram_token_size requires restarting the server. To rebuild FULLTEXT indexes for an InnoDB table, use ALTER TABLE with the DROP INDEX and ADD INDEX options to drop and re-create each index.
Optimizing InnoDB Full-Text Indexes Running OPTIMIZE TABLE on a table with a full-text index rebuilds the full-text index, removing deleted Document IDs and consolidating multiple entries for the same word, where possible. To optimize a full-text index, enable innodb_optimize_fulltext_only and run OPTIMIZE TABLE. mysql> set GLOBAL innodb_optimize_fulltext_only=ON; Query OK, 0 rows affected (0.01 sec) mysql> OPTIMIZE TABLE opening_lines; +--------------------+----------+----------+----------+ | Table | Op | Msg_type | Msg_text | +--------------------+----------+----------+----------+ | test.opening_lines | optimize | status | OK | +--------------------+----------+----------+----------+ 1 row in set (0.01 sec)
To avoid lengthy rebuild times for full-text indexes on large tables, you can use the innodb_ft_num_word_optimize option to perform the optimization in stages. The innodb_ft_num_word_optimize option defines the number of words that are optimized each time OPTIMIZE TABLE is run. The default setting is 2000, which means that 2000 words are optimized each time OPTIMIZE TABLE is run. Subsequent OPTIMIZE TABLE operations continue from where the preceding OPTIMIZE TABLE operation ended.
Rebuilding MyISAM Full-Text Indexes If you modify full-text variables that affect indexing (ft_min_word_len, ft_max_word_len, or ft_stopword_file), or if you change the stopword file itself, you must rebuild your FULLTEXT indexes after making the changes and restarting the server. To rebuild the FULLTEXT indexes for a MyISAM table, it is sufficient to do a QUICK repair operation: mysql> REPAIR TABLE tbl_name QUICK;
1720
Adding a Collation for Full-Text Indexing
Alternatively, use ALTER TABLE as just described. In some cases, this may be faster than a repair operation. Each table that contains any FULLTEXT index must be repaired as just shown. Otherwise, queries for the table may yield incorrect results, and modifications to the table will cause the server to see the table as corrupt and in need of repair. If you use myisamchk to perform an operation that modifies MyISAM table indexes (such as repair or analyze), the FULLTEXT indexes are rebuilt using the default full-text parameter values for minimum word length, maximum word length, and stopword file unless you specify otherwise. This can result in queries failing. The problem occurs because these parameters are known only by the server. They are not stored in MyISAM index files. To avoid the problem if you have modified the minimum or maximum word length or stopword file values used by the server, specify the same ft_min_word_len, ft_max_word_len, and ft_stopword_file values for myisamchk that you use for mysqld. For example, if you have set the minimum word length to 3, you can repair a table with myisamchk like this: myisamchk --recover --ft_min_word_len=3 tbl_name.MYI
To ensure that myisamchk and the server use the same values for full-text parameters, place each one in both the [mysqld] and [myisamchk] sections of an option file: [mysqld] ft_min_word_len=3 [myisamchk] ft_min_word_len=3
An alternative to using myisamchk for MyISAM table index modification is to use the REPAIR TABLE, ANALYZE TABLE, OPTIMIZE TABLE, or ALTER TABLE statements. These statements are performed by the server, which knows the proper full-text parameter values to use.
12.9.7 Adding a Collation for Full-Text Indexing This section describes how to add a new collation for full-text searches using the built-in full-text parser. The sample collation is like latin1_swedish_ci but treats the '-' character as a letter rather than as a punctuation character so that it can be indexed as a word character. General information about adding collations is given in Section 10.13, “Adding a Collation to a Character Set”; it is assumed that you have read it and are familiar with the files involved. To add a collation for full-text indexing, use the following procedure. The instructions here add a collation for a simple character set, which as discussed in Section 10.13, “Adding a Collation to a Character Set”, can be created using a configuration file that describes the character set properties. For a complex character set such as Unicode, create collations using C source files that describe the character set properties. 1. Add a collation to the Index.xml file. The collation ID must be unused, so choose a value different from 1000 if that ID is already taken on your system. ...
2. Declare the sort order for the collation in the latin1.xml file. In this case, the order can be copied from latin1_swedish_ci: <map>
1721
Adding a Collation for Full-Text Indexing
00 01 02 03 04 10 11 12 13 14 20 21 22 23 24 30 31 32 33 34 40 41 42 43 44 50 51 52 53 54 60 41 42 43 44 50 51 52 53 54 80 81 82 83 84 90 91 92 93 94 A0 A1 A2 A3 A4 B0 B1 B2 B3 B4 41 41 41 41 5C 44 4E 4F 4F 4F 41 41 41 41 5C 44 4E 4F 4F 4F
05 15 25 35 45 55 45 55 85 95 A5 B5 5B 4F 5B 4F
06 16 26 36 46 56 46 56 86 96 A6 B6 5C 5D 5C 5D
07 17 27 37 47 57 47 57 87 97 A7 B7 43 D7 43 F7
08 18 28 38 48 58 48 58 88 98 A8 B8 45 D8 45 D8
09 19 29 39 49 59 49 59 89 99 A9 B9 45 55 45 55
0A 1A 2A 3A 4A 5A 4A 5A 8A 9A AA BA 45 55 45 55
0B 1B 2B 3B 4B 5B 4B 7B 8B 9B AB BB 45 55 45 55
0C 1C 2C 3C 4C 5C 4C 7C 8C 9C AC BC 49 59 49 59
0D 1D 2D 3D 4D 5D 4D 7D 8D 9D AD BD 49 59 49 59
0E 1E 2E 3E 4E 5E 4E 7E 8E 9E AE BE 49 DE 49 DE
0F 1F 2F 3F 4F 5F 4F 7F 8F 9F AF BF 49 DF 49 FF
3. Modify the ctype array in latin1.xml. Change the value corresponding to 0x2D (which is the code for the '-' character) from 10 (punctuation) to 01 (small letter). In the following array, this is the element in the fourth row down, third value from the end. <map> 00 20 20 20 20 20 20 48 10 10 84 84 84 10 81 81 01 01 01 10 82 82 02 02 02 10 00 10 00 10 10 48 10 10 10 10 10 01 01 01 01 01 01 02 02 02 02 02 02
20 20 10 84 81 01 82 02 02 10 10 10 01 01 02 02
20 20 10 84 81 01 82 02 10 10 10 10 01 01 02 02
20 20 10 84 81 01 82 02 10 10 10 10 01 01 02 02
20 20 10 84 81 01 82 02 10 10 10 10 01 01 02 02
20 20 10 84 01 01 02 02 10 10 10 10 01 10 02 10
20 20 10 84 01 01 02 02 10 10 10 10 01 01 02 02
28 20 10 84 01 01 02 02 10 10 10 10 01 01 02 02
28 20 10 10 01 01 02 02 01 02 10 10 01 01 02 02
28 20 10 10 01 10 02 10 10 10 10 10 01 01 02 02
28 20 10 10 01 10 02 10 01 02 10 10 01 01 02 02
28 20 01 10 01 10 02 10 00 00 10 10 01 01 02 02
20 20 10 10 01 10 02 10 01 02 10 10 01 01 02 02
20 20 10 10 01 10 02 20 00 01 10 10 01 02 02 02
4. Restart the server. 5. To employ the new collation, include it in the definition of columns that are to use it: mysql> DROP TABLE IF EXISTS t1; Query OK, 0 rows affected (0.13 sec) mysql> CREATE TABLE t1 ( a TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci, FULLTEXT INDEX(a) ) ENGINE=InnoDB; Query OK, 0 rows affected (0.47 sec)
6. Test the collation to verify that hyphen is considered as a word character: mysql> INSERT INTO t1 VALUEs ('----'),('....'),('abcd'); Query OK, 3 rows affected (0.22 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM t1 WHERE MATCH a AGAINST ('----' IN BOOLEAN MODE); +------+ | a | +------+
1722
ngram Full-Text Parser
| ---- | +------+ 1 row in set (0.00 sec)
12.9.8 ngram Full-Text Parser The built-in MySQL full-text parser uses the white space between words as a delimiter to determine where words begin and end, which is a limitation when working with ideographic languages that do not use word delimiters. To address this limitation, MySQL provides an ngram full-text parser that supports Chinese, Japanese, and Korean (CJK). The ngram full-text parser is supported for use with InnoDB and MyISAM. Note MySQL also provides a MeCab full-text parser plugin for Japanese, which tokenizes documents into meaningful words. For more information, see Section 12.9.9, “MeCab Full-Text Parser Plugin”. An ngram is a contiguous sequence of n characters from a given sequence of text. The ngram parser tokenizes a sequence of text into a contiguous sequence of n characters. For example, you can tokenize “abcd” for different values of n using the ngram full-text parser. n=1: n=2: n=3: n=4:
'a', 'b', 'c', 'd' 'ab', 'bc', 'cd' 'abc', 'bcd' 'abcd'
The ngram full-text parser, introduced in MySQL 5.7.6, is a built-in server plugin. As with other built-in server plugins, it is automatically loaded when the server is started. The full-text search syntax described in Section 12.9, “Full-Text Search Functions” applies to the ngram parser plugin. Differences in parsing behavior are described in this section. Fulltext-related configuration options, except for minimum and maximum word length options (innodb_ft_min_token_size, innodb_ft_max_token_size, ft_min_word_len, ft_max_word_len) are also applicable.
Configuring ngram Token Size The ngram parser has a default ngram token size of 2 (bigram). For example, with a token size of 2, the ngram parser parses the string “abc def” into four tokens: “ab”, “bc”, “de” and “ef”. ngram token size is configurable using the ngram_token_size configuration option, which has a minimum value of 1 and maximum value of 10. Typically, ngram_token_size is set to the size of the largest token that you want to search for. If you only intend to search for single characters, set ngram_token_size to 1. A smaller token size produces a smaller full-text search index, and faster searches. If you need to search for words comprised of more than one character, set ngram_token_size accordingly. For example, “Happy Birthday” is “生日快乐” in simplified Chinese, where “生日” is “birthday”, and “快乐” translates as “happy”. To search on two-character words such as these, set ngram_token_size to a value of 2 or higher. As a read-only variable, ngram_token_size may only be set as part of a startup string or in a configuration file: • Startup string: mysqld --ngram_token_size=2
• Configuration file: [mysqld] ngram_token_size=2
1723
ngram Full-Text Parser
Note The following minimum and maximum word length configuration options are ignored for FULLTEXT indexes that use the ngram parser: innodb_ft_min_token_size, innodb_ft_max_token_size, ft_min_word_len, and ft_max_word_len.
Creating a FULLTEXT Index that Uses the ngram Parser To create a FULLTEXT index that uses the ngram parser, specify WITH PARSER ngram with CREATE TABLE, ALTER TABLE, or CREATE INDEX. The following example demonstrates creating a table with an ngram FULLTEXT index, inserting sample data (Simplified Chinese text), and viewing tokenized data in the INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE table. mysql> USE test; mysql> CREATE TABLE articles ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, title VARCHAR(200), body TEXT, FULLTEXT (title,body) WITH PARSER ngram ) ENGINE=InnoDB CHARACTER SET utf8mb4; mysql> SET NAMES utf8mb4; INSERT INTO articles (title,body) VALUES ('数据库管理','在本教程中我将向你展示如何管理数据库'), ('数据库应用开发','学习开发数据库应用程序'); mysql> SET GLOBAL innodb_ft_aux_table="test/articles"; mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position;
To add a FULLTEXT index to an existing table, you can use ALTER TABLE or CREATE INDEX. For example: CREATE TABLE articles ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, title VARCHAR(200), body TEXT ) ENGINE=InnoDB CHARACTER SET utf8; ALTER TABLE articles ADD FULLTEXT INDEX ft_index (title,body) WITH PARSER ngram; # Or: CREATE FULLTEXT INDEX ft_index ON articles (title,body) WITH PARSER ngram;
ngram Parser Space Handling The ngram parser eliminates spaces when parsing. For example: • “ab cd” is parsed to “ab”, “cd” • “a bc” is parsed to “bc”
ngram Parser Stopword Handling The built-in MySQL full-text parser compares words to entries in the stopword list. If a word is equal to an entry in the stopword list, the word is excluded from the index. For the ngram parser, stopword handling is performed differently. Instead of excluding tokens that are equal to entries in the stopword list, the ngram parser excludes tokens that contain stopwords. For example, assuming ngram_token_size=2, a document that contains “a,b” is parsed to “a,” and “,b”. If a comma (“,”) is defined as a stopword, both “a,” and “,b” are excluded from the index because they contain a comma.
1724
MeCab Full-Text Parser Plugin
By default, the ngram parser uses the default stopword list, which contains a list of English stopwords. For a stopword list applicable to Chinese, Japanese, or Korean, you must create your own. For information about creating a stopword list, see Section 12.9.4, “Full-Text Stopwords”. Stopwords greater in length than ngram_token_size are ignored.
ngram Parser Term Search For natural language mode search, the search term is converted to a union of ngram terms. For example, the string “abc” (assuming ngram_token_size=2) is converted to “ab bc”. Given two documents, one containing “ab” and the other containing “abc”, the search term “ab bc” matches both documents. For boolean mode search, the search term is converted to an ngram phrase search. For example, the string 'abc' (assuming ngram_token_size=2) is converted to '“ab bc”'. Given two documents, one containing 'ab' and the other containing 'abc', the search phrase '“ab bc”' only matches the document containing 'abc'.
ngram Parser Wildcard Search Because an ngram FULLTEXT index contains only ngrams, and does not contain information about the beginning of terms, wildcard searches may return unexpected results. The following behaviors apply to wildcard searches using ngram FULLTEXT search indexes: • If the prefix term of a wildcard search is shorter than ngram token size, the query returns all indexed rows that contain ngram tokens starting with the prefix term. For example, assuming ngram_token_size=2, a search on “a*” returns all rows starting with “a”. • If the prefix term of a wildcard search is longer than ngram token size, the prefix term is converted to an ngram phrase and the wildcard operator is ignored. For example, assuming ngram_token_size=2, an “abc*” wildcard search is converted to “ab bc”.
ngram Parser Phrase Search Phrase searches are converted to ngram phrase searches. For example, The search phrase “abc” is converted to “ab bc”, which returns documents containing “abc” and “ab bc”. The search phrase “abc def” is converted to “ab bc de ef”, which returns documents containing “abc def” and “ab bc de ef”. A document that contains “abcdef” is not returned.
12.9.9 MeCab Full-Text Parser Plugin The built-in MySQL full-text parser uses the white space between words as a delimiter to determine where words begin and end, which is a limitation when working with ideographic languages that do not use word delimiters. To address this limitation for Japanese, MySQL provides a MeCab full-text parser plugin. The MeCab full-text parser plugin is supported for use with InnoDB and MyISAM. Note MySQL also provides an ngram full-text parser plugin that supports Japanese. For more information, see Section 12.9.8, “ngram Full-Text Parser”. The MeCab full-text parser plugin, introduced in MySQL 5.7.6, is a full-text parser plugin for Japanese that tokenizes a sequence of text into meaningful words. For example, MeCab tokenizes “データベース管理” (“Database Management”) into “データベース” (“Database”) and “管理” (“Management”). By comparison, the ngram full-text parser tokenizes text into a contiguous sequence of n characters, where n represents a number between 1 and 10. In addition to tokenizing text into meaningful words, MeCab indexes are typically smaller than ngram indexes, and MeCab full-text searches are generally faster. One drawback is that it may take longer for the MeCab full-text parser to tokenize documents, compared to the ngram full-text parser.
1725
MeCab Full-Text Parser Plugin
The full-text search syntax described in Section 12.9, “Full-Text Search Functions” applies to the MeCab parser plugin. Differences in parsing behavior are described in this section. Full-text related configuration options are also applicable. For additional information about the MeCab parser, refer to the MeCab: Yet Another Part-of-Speech and Morphological Analyzer project on Github.
Installing the MeCab Parser Plugin The MeCab parser plugin requires mecab and mecab-ipadic. On supported Fedora, Debian and Ubuntu platforms (except Ubuntu 12.04 where the system mecab version is too old), MySQL dynamically links to the system mecab installation if it is installed to the default location. On other supported Unix-like platforms, libmecab.so is statically linked in libpluginmecab.so, which is located in the MySQL plugin directory. mecab-ipadic is included in MySQL binaries and is located in MYSQL_HOME\lib\mecab. You can install mecab and mecab-ipadic using a native package management utility (on Fedora, Debian, and Ubuntu), or you can build mecab and mecab-ipadic from source. For information about installing mecab and mecab-ipadic using a native package management utility, see Installing MeCab From a Binary Distribution (Optional). If you want to build mecab and mecab-ipadic from source, see Building MeCab From Source (Optional). On Windows, libmecab.dll is found in the MySQL bin directory. mecab-ipadic is located in MYSQL_HOME/lib/mecab. To install and configure the MeCab parser plugin, perform the following steps: 1. In the MySQL configuration file, set the mecab_rc_file configuration option to the location of the mecabrc configuration file, which is the configuration file for MeCab. If you are using the MeCab package distributed with MySQL, the mecabrc file is located in MYSQL_HOME/lib/mecab/etc/. [mysqld] loose-mecab-rc-file=MYSQL_HOME/lib/mecab/etc/mecabrc
The loose prefix is an option modifier. The mecab_rc_file option is not recognized by MySQL until the MeCaB parser plugin is installed but it must be set before attempting to install the MeCaB parser plugin. The loose prefix allows you restart MySQL without encountering an error due to an unrecognized variable. If you use your own MeCab installation, or build MeCab from source, the location of the mecabrc configuration file may differ. For information about the MySQL configuration file and its location, see Section 4.2.6, “Using Option Files”. 2. Also in the MySQL configuration file, set the minimum token size to 1 or 2, which are the values recommended for use with the MeCab parser. For InnoDB tables, minimum token size is defined by the innodb_ft_min_token_size configuration option, which has a default value of 3. For MyISAM tables, minimum token size is defined by ft_min_word_len, which has a default value of 4. [mysqld] innodb_ft_min_token_size=1
3. Modify the mecabrc configuration file to specify the dictionary you want to use. The mecabipadic package distributed with MySQL binaries includes three dictionaries (ipadic_euc-jp, ipadic_sjis, and ipadic_utf-8). The mecabrc configuration file packaged with MySQL contains and entry similar to the following: dicdir =
/path/to/mysql/lib/mecab/lib/mecab/dic/ipadic_euc-jp
To use the ipadic_utf-8 dictionary, for example, modify the entry as follows:
1726
MeCab Full-Text Parser Plugin
dicdir=MYSQL_HOME/lib/mecab/dic/ipadic_utf-8
If you are using your own MeCab installation or have built MeCab from source, the default dicdir entry in the mecabrc file will differ, as will the dictionaries and their location. Note After the MeCab parser plugin is installed, you can use the mecab_charset status variable to view the character set used with MeCab. The three MeCab dictionaries provided with the MySQL binary support the following character sets. • The ipadic_euc-jp dictionary supports the ujis and eucjpms character sets. • The ipadic_sjis dictionary supports the sjis and cp932 character sets. • The ipadic_utf-8 dictionary supports the utf8 and utf8mb4 character sets. mecab_charset only reports the first supported character set. For example, the ipadic_utf-8 dictionary supports both utf8 and utf8mb4. mecab_charset always reports utf8 when this dictionary is in use. 4. Restart MySQL. 5. Install the MeCab parser plugin: The MeCab parser plugin is installed using INSTALL PLUGIN syntax. The plugin name is mecab, and the shared library name is libpluginmecab.so. For additional information about installing plugins, see Section 5.5.1, “Installing and Uninstalling Plugins”. INSTALL PLUGIN mecab SONAME 'libpluginmecab.so';
Once installed, the MeCab parser plugin loads at every normal MySQL restart. 6. Verify that the MeCab parser plugin is loaded using the SHOW PLUGINS statement. mysql> SHOW PLUGINS;
A mecab plugin should appear in the list of plugins.
Creating a FULLTEXT Index that uses the MeCab Parser To create a FULLTEXT index that uses the mecab parser, specify WITH PARSER ngram with CREATE TABLE, ALTER TABLE, or CREATE INDEX. This example demonstrates creating a table with a mecab FULLTEXT index, inserting sample data, and viewing tokenized data in the INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE table: mysql> USE test; mysql> CREATE TABLE articles ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, title VARCHAR(200), body TEXT, FULLTEXT (title,body) WITH PARSER mecab ) ENGINE=InnoDB CHARACTER SET utf8; mysql> SET NAMES utf8; mysql> INSERT INTO articles (title,body) VALUES ('データベース管理','このチュートリアルでは、私はどのようにデータベースを管理する方法を紹介します'),
1727
MeCab Full-Text Parser Plugin
('データベースアプリケーション開発','データベースアプリケーションを開発することを学ぶ'); mysql> SET GLOBAL innodb_ft_aux_table="test/articles"; mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position;
To add a FULLTEXT index to an existing table, you can use ALTER TABLE or CREATE INDEX. For example: CREATE TABLE articles ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, title VARCHAR(200), body TEXT ) ENGINE=InnoDB CHARACTER SET utf8; ALTER TABLE articles ADD FULLTEXT INDEX ft_index (title,body) WITH PARSER mecab; # Or: CREATE FULLTEXT INDEX ft_index ON articles (title,body) WITH PARSER mecab;
MeCab Parser Space Handling The MeCab parser uses spaces as separators in query strings. For example, the MeCab parser tokenizes データベース管理 as データベース and 管理.
MeCab Parser Stopword Handling By default, the MeCab parser uses the default stopword list, which contains a short list of English stopwords. For a stopword list applicable to Japanese, you must create your own. For information about creating stopword lists, see Section 12.9.4, “Full-Text Stopwords”.
MeCab Parser Term Search For natural language mode search, the search term is converted to a union of tokens. For example, データベース管理 is converted to データベース 管理. SELECT COUNT(*) FROM articles WHERE MATCH(title,body) AGAINST('データベース管理' IN NATURAL LANGUAGE MODE);
For boolean mode search, the search term is converted to a search phrase. For example, データベース管理 is converted to データベース 管理. SELECT COUNT(*) FROM articles WHERE MATCH(title,body) AGAINST('データベース管理' IN BOOLEAN MODE);
MeCab Parser Wildcard Search Wildcard search terms are not tokenized. A search on データベース管理* is performed on the prefix, データベース管理. SELECT COUNT(*) FROM articles WHERE MATCH(title,body) AGAINST('データベース*' IN BOOLEAN MODE);
MeCab Parser Phrase Search Phrases are tokenized. For example, データベース管理 is tokenized as データベース 管理. SELECT COUNT(*) FROM articles WHERE MATCH(title,body) AGAINST('"データベース管理"' IN BOOLEAN MODE);
Installing MeCab From a Binary Distribution (Optional) This section describes how to install mecab and mecab-ipadic from a binary distribution using a native package management utility. For example, on Fedora, you can use Yum to perform the installation: yum mecab-devel
On Debian or Ubuntu, you can perform an APT installation:
1728
Cast Functions and Operators
apt-get install mecab apt-get install mecab-ipadic
Installing MeCab From Source (Optional) If you want to build mecab and mecab-ipadic from source, basic installation steps are provided below. For additional information, refer to the MeCab documentation. 1. Download the tar.gz packages for mecab and mecab-ipadic from http://taku910.github.io/mecab/ #download. As of February, 2016, the latest available packages are mecab-0.996.tar.gz and mecab-ipadic-2.7.0-20070801.tar.gz. 2. Install mecab: tar zxfv mecab-0.996.tar cd mecab-0.996 ./configure make make check su make install
3. Install mecab-ipadic: tar zxfv mecab-ipadic-2.7.0-20070801.tar cd mecab-ipadic-2.7.0-20070801 ./configure make su make install
4. Compile MySQL using the WITH_MECAB CMake option. Set the WITH_MECAB option to system if you have installed mecab and mecab-ipadic to the default location. -DWITH_MECAB=system
If you defined a custom installation directory, set WITH_MECAB to the custom directory. For example: -DWITH_MECAB=/path/to/mecab
12.10 Cast Functions and Operators Table 12.14 Cast Functions and Operators Name
Description
BINARY
Cast a string to a binary string
CAST()
Cast a value as a certain type
CONVERT()
Cast a value as a certain type
Cast functions and operators enable conversion of values from one data type to another. CONVERT() with a USING clause provides a way to convert data between different character sets: CONVERT(expr USING transcoding_name)
In MySQL, transcoding names are the same as the corresponding character set names. Examples: SELECT CONVERT(_latin1'Müller' USING utf8); INSERT INTO utf8_table (utf8_column) SELECT CONVERT(latin1_column USING utf8) FROM latin1_table;
1729
Cast Functions and Operators
You can also use CONVERT() without USING or CAST() to convert strings between different character sets: CONVERT(string, CHAR[(N)] CHARACTER SET charset_name) CAST(string AS CHAR[(N)] CHARACTER SET charset_name)
Examples: SELECT CONVERT('test', CHAR CHARACTER SET utf8); SELECT CAST('test' AS CHAR CHARACTER SET utf8);
If you specify CHARACTER SET charset_name as just shown, the resulting character set and collation are charset_name and the default collation of charset_name. If you omit CHARACTER SET charset_name, the resulting character set and collation are defined by the character_set_connection and collation_connection system variables that determine the default connection character set and collation (see Section 10.4, “Connection Character Sets and Collations”). A COLLATE clause is not permitted within a CONVERT() or CAST() call, but you can apply it to the function result. For example, this is legal: SELECT CAST('test' AS CHAR CHARACTER SET utf8) COLLATE utf8_bin;
But this is illegal: SELECT CAST('test' AS CHAR CHARACTER SET utf8 COLLATE utf8_bin);
Normally, you cannot compare a BLOB value or other binary string in case-insensitive fashion because binary strings use the binary character set, which has no collation with the concept of lettercase. To perform a case-insensitive comparison, use the CONVERT() or CAST() function to convert the value to a nonbinary string. Comparisons of the resulting string use its collation. For example, if the conversion result character set has a case-insensitive collation, a LIKE operation is not case-sensitive: SELECT 'A' LIKE CONVERT(blob_col USING latin1) FROM tbl_name;
To use a different character set, substitute its name for latin1 in the preceding statement. To specify a particular collation for the converted string, use a COLLATE clause following the CONVERT() call: SELECT 'A' LIKE CONVERT(blob_col USING latin1) COLLATE latin1_german1_ci FROM tbl_name;
CONVERT() and CAST() can be used more generally for comparing strings that are represented in different character sets. For example, a comparison of these strings results in an error because they have different character sets: mysql> SET @s1 = _latin1 'abc', @s2 = _latin2 'abc'; mysql> SELECT @s1 = @s2; ERROR 1267 (HY000): Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (latin2_general_ci,IMPLICIT) for operation '='
Converting one of the strings to a character set compatible with the other enables the comparison to occur without error: mysql> SELECT @s1 = CONVERT(@s2 USING latin1); +---------------------------------+ | @s1 = CONVERT(@s2 USING latin1) | +---------------------------------+ | 1 |
1730
Cast Functions and Operators
+---------------------------------+
For string literals, another way to specify the character set is to use a character set introducer (_latin1 and _latin2 in the preceding example are instances of introducers). Unlike conversion functions such as CAST(), or CONVERT(), which convert a string from one character set to another, an introducer designates a string literal as having a particular character set, with no conversion involved. For more information, see Section 10.3.8, “Character Set Introducers”. Character set conversion is also useful preceding lettercase conversion of binary strings. LOWER() and UPPER() are ineffective when applied directly to binary strings because the concept of lettercase does not apply. To perform lettercase conversion of a binary string, first convert it to a nonbinary string: mysql> SET @str = BINARY 'New York'; mysql> SELECT LOWER(@str), LOWER(CONVERT(@str USING latin1)); +-------------+-----------------------------------+ | LOWER(@str) | LOWER(CONVERT(@str USING latin1)) | +-------------+-----------------------------------+ | New York | new york | +-------------+-----------------------------------+
If you convert an indexed column using BINARY, CAST(), or CONVERT(), MySQL may not be able to use the index efficiently. The cast functions are useful for creating a column with a specific type in a CREATE TABLE ... SELECT statement:
mysql> CREATE TABLE new_table SELECT CAST('2000-01-01' AS DATE) AS c1; mysql> SHOW CREATE TABLE new_table\G *************************** 1. row *************************** Table: new_table Create Table: CREATE TABLE `new_table` ( `c1` date DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1
The cast functions are useful for sorting ENUM columns in lexical order. Normally, sorting of ENUM columns occurs using the internal numeric values. Casting the values to CHAR results in a lexical sort: SELECT enum_col FROM tbl_name ORDER BY CAST(enum_col AS CHAR);
CAST() also changes the result if you use it as part of a more complex expression such as CONCAT('Date: ',CAST(NOW() AS DATE)). For temporal values, there is little need to use CAST() to extract data in different formats. Instead, use a function such as EXTRACT(), DATE_FORMAT(), or TIME_FORMAT(). See Section 12.7, “Date and Time Functions”. To cast a string to a number, you normally need do nothing other than use the string value in numeric context: mysql> SELECT 1+'1'; -> 2
That is also true for hexadecimal and bit literals, which are binary strings by default: mysql> SELECT X'41', X'41'+0; -> 'A', 65 mysql> SELECT b'1100001', b'1100001'+0; -> 'a', 97
A string used in an arithmetic operation is converted to a floating-point number during expression evaluation.
1731
Cast Functions and Operators
A number used in string context is converted to a string: mysql> SELECT CONCAT('hello you ',2); -> 'hello you 2'
For information about implicit conversion of numbers to strings, see Section 12.2, “Type Conversion in Expression Evaluation”. MySQL supports arithmetic with both signed and unsigned 64-bit values. For numeric operators (such as + or -) where one of the operands is an unsigned integer, the result is unsigned by default (see Section 12.6.1, “Arithmetic Operators”). To override this, use the SIGNED or UNSIGNED cast operator to cast a value to a signed or unsigned 64-bit integer, respectively. mysql> SELECT 1 - 2; -> -1 mysql> SELECT CAST(1 - 2 AS UNSIGNED); -> 18446744073709551615 mysql> SELECT CAST(CAST(1 - 2 AS UNSIGNED) AS SIGNED); -> -1
If either operand is a floating-point value, the result is a floating-point value and is not affected by the preceding rule. (In this context, DECIMAL column values are regarded as floating-point values.) mysql> SELECT CAST(1 AS UNSIGNED) - 2.0; -> -1.0
The SQL mode affects the result of conversion operations (see Section 5.1.10, “Server SQL Modes”). Examples: • For conversion of a “zero” date string to a date, CONVERT() and CAST() return NULL and produce a warning when the NO_ZERO_DATE SQL mode is enabled. • For integer subtraction, if the NO_UNSIGNED_SUBTRACTION SQL mode is enabled, the subtraction result is signed even if any operand is unsigned. The following list describes the available cast functions and operators: • BINARY expr The BINARY operator converts the expression to a binary string. A common use for BINARY is to force a character string comparison to be done byte by byte rather than character by character, in effect becoming case-sensitive. The BINARY operator also causes trailing spaces in comparisons to be significant. mysql> SELECT -> 1 mysql> SELECT -> 0 mysql> SELECT -> 1 mysql> SELECT -> 0
'a' = 'A'; BINARY 'a' = 'A'; 'a' = 'a '; BINARY 'a' = 'a ';
In a comparison, BINARY affects the entire operation; it can be given before either operand with the same result. For purposes of converting a string expression to a binary string, these constructs are equivalent: BINARY expr CAST(expr AS BINARY)
1732
Cast Functions and Operators
CONVERT(expr USING BINARY)
If a value is a string literal, it can be designated as a binary string without performing any conversion by using the _binary character set introducer: mysql> SELECT 'a' = 'A'; -> 1 mysql> SELECT _binary 'a' = 'A'; -> 0
For information about introducers, see Section 10.3.8, “Character Set Introducers”. The BINARY operator in expressions differs in effect from the BINARY attribute in character column definitions. A character column defined with the BINARY attribute is assigned table default character set and the binary (_bin) collation of that character set. Every nonbinary character set has a _bin collation. For example, the binary collation for the utf8 character set is utf8_bin, so if the table default character set is utf8, these two column definitions are equivalent: CHAR(10) BINARY CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin
The use of CHARACTER SET binary in the definition of a CHAR, VARCHAR, or TEXT column causes the column to be treated as the corresponding binary string data type. For example, the following pairs of definitions are equivalent: CHAR(10) CHARACTER SET binary BINARY(10) VARCHAR(10) CHARACTER SET binary VARBINARY(10) TEXT CHARACTER SET binary BLOB
• CAST(expr AS type) The CAST() function takes an expression of any type and produces a result value of the specified type, similar to CONVERT(). For more information, see the description of CONVERT(). CAST() is standard SQL syntax. • CONVERT(expr,type), CONVERT(expr USING transcoding_name) The CONVERT() function takes an expression of any type and produces a result value of the specified type. Discussion of CONVERT(expr, type) syntax here also applies to CAST(expr AS type), which is equivalent. CONVERT(... USING ...) is standard SQL syntax. The non-USING form of CONVERT() is ODBC syntax. CONVERT() with USING converts data between different character sets. In MySQL, transcoding names are the same as the corresponding character set names. For example, this statement converts the string 'abc' in the default character set to the corresponding string in the utf8 character set: SELECT CONVERT('abc' USING utf8);
CONVERT() without USING and CAST() take an expression and a type value specifying the result type. These type values are permitted:
1733
Cast Functions and Operators
• BINARY[(N)] Produces a string with the BINARY data type. See Section 11.4.2, “The BINARY and VARBINARY Types” for a description of how this affects comparisons. If the optional length N is given, BINARY(N) causes the cast to use no more than N bytes of the argument. Values shorter than N bytes are padded with 0x00 bytes to a length of N. • CHAR[(N)] [charset_info] Produces a string with the CHAR data type. If the optional length N is given, CHAR(N) causes the cast to use no more than N characters of the argument. No padding occurs for values shorter than N characters. With no charset_info clause, CHAR produces a string with the default character set. To specify the character set explicitly, these charset_info values are permitted: • CHARACTER SET charset_name: Produces a string with the given character set. • ASCII: Shorthand for CHARACTER SET latin1. • UNICODE: Shorthand for CHARACTER SET ucs2. In all cases, the string has the default collation for the character set. • DATE Produces a DATE value. • DATETIME Produces a DATETIME value. • DECIMAL[(M[,D])] Produces a DECIMAL value. If the optional M and D values are given, they specify the maximum number of digits (the precision) and the number of digits following the decimal point (the scale). • JSON (added in MySQL 5.7.8) Produces a JSON value. For details on the rules for conversion of values between JSON and other types, see Comparison and Ordering of JSON Values. • NCHAR[(N)] Like CHAR, but produces a string with the national character set. See Section 10.3.7, “The National Character Set”. Unlike CHAR, NCHAR does not permit trailing character set information to be specified. • SIGNED [INTEGER] Produces a signed integer value. • TIME Produces a TIME value. • UNSIGNED [INTEGER] Produces an unsigned integer value.
1734
XML Functions
12.11 XML Functions Table 12.15 XML Functions Name
Description
ExtractValue()
Extract a value from an XML string using XPath notation
UpdateXML()
Return replaced XML fragment
This section discusses XML and related functionality in MySQL. Note It is possible to obtain XML-formatted output from MySQL in the mysql and mysqldump clients by invoking them with the --xml option. See Section 4.5.1, “mysql — The MySQL Command-Line Client”, and Section 4.5.4, “mysqldump — A Database Backup Program”. Two functions providing basic XPath 1.0 (XML Path Language, version 1.0) capabilities are available. Some basic information about XPath syntax and usage is provided later in this section; however, an in-depth discussion of these topics is beyond the scope of this manual, and you should refer to the XML Path Language (XPath) 1.0 standard for definitive information. A useful resource for those new to XPath or who desire a refresher in the basics is the Zvon.org XPath Tutorial, which is available in several languages. Note These functions remain under development. We continue to improve these and other aspects of XML and XPath functionality in MySQL 5.7 and onwards. You may discuss these, ask questions about them, and obtain help from other users with them in the MySQL XML User Forum. XPath expressions used with these functions support user variables and local stored program variables. User variables are weakly checked; variables local to stored programs are strongly checked (see also Bug #26518): • User variables (weak checking). Variables using the syntax $@variable_name (that is, user variables) are not checked. No warnings or errors are issued by the server if a variable has the wrong type or has previously not been assigned a value. This also means the user is fully responsible for any typographical errors, since no warnings will be given if (for example) $@myvariable is used where $@myvariable was intended. Example: mysql> SET @xml = 'XY'; Query OK, 0 rows affected (0.00 sec) mysql> SET @i =1, @j = 2; Query OK, 0 rows affected (0.00 sec) mysql> SELECT @i, ExtractValue(@xml, '//b[$@i]'); +------+--------------------------------+ | @i | ExtractValue(@xml, '//b[$@i]') | +------+--------------------------------+ | 1 | X | +------+--------------------------------+ 1 row in set (0.00 sec) mysql> SELECT @j, ExtractValue(@xml, '//b[$@j]'); +------+--------------------------------+ | @j | ExtractValue(@xml, '//b[$@j]') | +------+--------------------------------+ | 2 | Y | +------+--------------------------------+
1735
XML Functions
1 row in set (0.00 sec) mysql> SELECT @k, ExtractValue(@xml, '//b[$@k]'); +------+--------------------------------+ | @k | ExtractValue(@xml, '//b[$@k]') | +------+--------------------------------+ | NULL | | +------+--------------------------------+ 1 row in set (0.00 sec)
• Variables in stored programs (strong checking). Variables using the syntax $variable_name can be declared and used with these functions when they are called inside stored programs. Such variables are local to the stored program in which they are defined, and are strongly checked for type and value. Example: mysql> DELIMITER | mysql> CREATE PROCEDURE myproc () -> BEGIN -> DECLARE i INT DEFAULT 1; -> DECLARE xml VARCHAR(25) DEFAULT 'XYZ'; -> -> WHILE i < 4 DO -> SELECT xml, i, ExtractValue(xml, '//a[$i]'); -> SET i = i+1; -> END WHILE; -> END | Query OK, 0 rows affected (0.01 sec) mysql> DELIMITER ; mysql> CALL myproc(); +--------------------------+---+------------------------------+ | xml | i | ExtractValue(xml, '//a[$i]') | +--------------------------+---+------------------------------+ | XYZ | 1 | X | +--------------------------+---+------------------------------+ 1 row in set (0.00 sec) +--------------------------+---+------------------------------+ | xml | i | ExtractValue(xml, '//a[$i]') | +--------------------------+---+------------------------------+ | XYZ | 2 | Y | +--------------------------+---+------------------------------+ 1 row in set (0.01 sec) +--------------------------+---+------------------------------+ | xml | i | ExtractValue(xml, '//a[$i]') | +--------------------------+---+------------------------------+ | XYZ | 3 | Z | +--------------------------+---+------------------------------+ 1 row in set (0.01 sec)
Parameters. Variables used in XPath expressions inside stored routines that are passed in as parameters are also subject to strong checking. Expressions containing user variables or variables local to stored programs must otherwise (except for notation) conform to the rules for XPath expressions containing variables as given in the XPath 1.0 specification. Note A user variable used to store an XPath expression is treated as an empty string. Because of this, it is not possible to store an XPath expression as a user variable. (Bug #32911) 1736
XML Functions
• ExtractValue(xml_frag, xpath_expr) ExtractValue() takes two string arguments, a fragment of XML markup xml_frag and an XPath expression xpath_expr (also known as a locator); it returns the text (CDATA) of the first text node which is a child of the element or elements matched by the XPath expression. Using this function is the equivalent of performing a match using the xpath_expr after appending /text(). In other words, ExtractValue('Sakila', '/a/b') and ExtractValue('Sakila', '/a/b/text()') produce the same result. If multiple matches are found, the content of the first child text node of each matching element is returned (in the order matched) as a single, space-delimited string. If no matching text node is found for the expression (including the implicit /text())—for whatever reason, as long as xpath_expr is valid, and xml_frag consists of elements which are properly nested and closed—an empty string is returned. No distinction is made between a match on an empty element and no match at all. This is by design. If you need to determine whether no matching element was found in xml_frag or such an element was found but contained no child text nodes, you should test the result of an expression that uses the XPath count() function. For example, both of these statements return an empty string, as shown here: mysql> SELECT ExtractValue('', '/a/b'); +-------------------------------------+ | ExtractValue('', '/a/b') | +-------------------------------------+ | | +-------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT ExtractValue('', '/a/b'); +-------------------------------------+ | ExtractValue('', '/a/b') | +-------------------------------------+ | | +-------------------------------------+ 1 row in set (0.00 sec)
However, you can determine whether there was actually a matching element using the following: mysql> SELECT ExtractValue('', 'count(/a/b)'); +-------------------------------------+ | ExtractValue('', 'count(/a/b)') | +-------------------------------------+ | 1 | +-------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT ExtractValue('', 'count(/a/b)'); +-------------------------------------+ | ExtractValue('', 'count(/a/b)') | +-------------------------------------+ | 0 | +-------------------------------------+ 1 row in set (0.01 sec)
Important ExtractValue() returns only CDATA, and does not return any tags that might be contained within a matching tag, nor any of their content (see the result returned as val1 in the following example).
1737
XML Functions
mysql> SELECT -> ExtractValue('cccddd', '/a') AS val1, -> ExtractValue('cccddd', '/a/b') AS val2, -> ExtractValue('cccddd', '//b') AS val3, -> ExtractValue('cccddd', '/b') AS val4, -> ExtractValue('cccdddeee', '//b') AS val5; +------+------+------+------+---------+ | val1 | val2 | val3 | val4 | val5 | +------+------+------+------+---------+ | ccc | ddd | ddd | | ddd eee | +------+------+------+------+---------+
This function uses the current SQL collation for making comparisons with contains(), performing the same collation aggregation as other string functions (such as CONCAT()), in taking into account the collation coercibility of their arguments; see Section 10.8.4, “Collation Coercibility in Expressions”, for an explanation of the rules governing this behavior. (Previously, binary—that is, case-sensitive—comparison was always used.) NULL is returned if xml_frag contains elements which are not properly nested or closed, and a warning is generated, as shown in this example: mysql> SELECT ExtractValue('cc