rule does not specify any ordering in and of itself. Instead, it “resets” the ordering for subsequent shift rules to cause them to be taken in relation to a given character. Either of the following rules resets subsequent shift rules to be taken in relation to the letter 'A': A \u0041
• The , <s>, and shift rules define primary, secondary, and tertiary differences of a character from another character: • Use primary differences to distinguish separate letters. • Use secondary differences to distinguish accent variations. • Use tertiary differences to distinguish lettercase variations. Either of these rules specifies a primary shift rule for the 'G' character: G
\u0047
• The shift rule indicates that one character sorts identically to another. The following rules cause 'b' to sort the same as 'a': a b
The shift rules is supported as of MySQL 5.5.3. Prior to 5.5.3, use <s> ... instead.
10.5 Character Set Configuration You can change the default server character set and collation with the --character-set-server and --collation-server options when you start the server. The collation must be a legal collation for the default character set. (Use the SHOW COLLATION statement to determine which collations are available for each character set.) See Section 5.1.4, “Server Command Options”. If you try to use a character set that is not compiled into your binary, you might run into the following problems: • Your program uses an incorrect path to determine where the character sets are stored (which is typically the share/mysql/charsets or share/charsets directory under the MySQL installation directory). This can be fixed by using the --character-sets-dir option when you run the program in question. For example, to specify a directory to be used by MySQL client programs, list it in the [client] group of your option file. The examples given here show what the setting might look like for Unix or Windows, respectively:
1065
MySQL Server Time Zone Support
[client] character-sets-dir=/usr/local/mysql/share/mysql/charsets [client] character-sets-dir="C:/Program Files/MySQL/MySQL Server 5.5/share/charsets"
• The character set is a complex character set that cannot be loaded dynamically. In this case, you must recompile the program with support for the character set. For Unicode character sets, you can define collations without recompiling by using LDML notation. See Section 10.4.4, “Adding a UCA Collation to a Unicode Character Set”. • The character set is a dynamic character set, but you do not have a configuration file for it. In this case, you should install the configuration file for the character set from a new MySQL distribution. • If your character set index file does not contain the name for the character set, your program displays an error message. The file is named Index.xml and the message is: Character set 'charset_name' is not a compiled character set and is not specified in the '/usr/share/mysql/charsets/Index.xml' file
To solve this problem, you should either get a new index file or manually add the name of any missing character sets to the current file. You can force client programs to use specific character set as follows: [client] default-character-set=charset_name
This is normally unnecessary. However, when character_set_system differs from character_set_server or character_set_client, and you input characters manually (as database object identifiers, column values, or both), these may be displayed incorrectly in output from the client or the output itself may be formatted incorrectly. In such cases, starting the mysql client with --default-character-set=system_character_set—that is, setting the client character set to match the system character set—should fix the problem. For MyISAM tables, you can check the character set name and number for a table with myisamchk dvv tbl_name.
10.6 MySQL Server Time Zone Support MySQL Server maintains several time zone settings: • The system time zone. When the server starts, it attempts to determine the time zone of the host machine and uses it to set the system_time_zone system variable. The value does not change thereafter. You can set the system time zone for MySQL Server at startup with the -timezone=timezone_name option to mysqld_safe. You can also set it by setting the TZ environment variable before you start mysqld. The permissible values for --timezone or TZ are system dependent. Consult your operating system documentation to see what values are acceptable. • The server's current time zone. The global time_zone system variable indicates the time zone the server currently is operating in. The initial value for time_zone is 'SYSTEM', which indicates that the server time zone is the same as the system time zone. The initial global server time zone value can be specified explicitly at startup with the --defaulttime-zone=timezone option on the command line, or you can use the following line in an option file: 1066
Populating the Time Zone Tables
default-time-zone='timezone'
If you have the SUPER privilege, you can set the global server time zone value at runtime with this statement: mysql> SET GLOBAL time_zone = timezone;
• Per-connection time zones. Each client that connects has its own time zone setting, given by the session time_zone variable. Initially, the session variable takes its value from the global time_zone variable, but the client can change its own time zone with this statement: mysql> SET time_zone = timezone;
The current session time zone setting affects display and storage of time values that are zonesensitive. This includes the values displayed by functions such as NOW() or CURTIME(), and values stored in and retrieved from TIMESTAMP columns. Values for TIMESTAMP columns are converted from the current time zone to UTC for storage, and from UTC to the current time zone for retrieval. The current time zone setting does not affect values displayed by functions such as UTC_TIMESTAMP() or values in DATE, TIME, or DATETIME columns. Nor are values in those data types stored in UTC; the time zone applies for them only when converting from TIMESTAMP values. If you want locale-specific arithmetic for DATE, TIME, or DATETIME values, convert them to UTC, perform the arithmetic, and then convert back. The current values of the global and client-specific time zones can be retrieved like this: mysql> SELECT @@global.time_zone, @@session.time_zone;
timezone values can be given in several formats, none of which are case sensitive: • The value 'SYSTEM' indicates that the time zone should be the same as the system time zone. • The value can be given as a string indicating an offset from UTC, such as '+10:00' or '-6:00'. • The value can be given as a named time zone, such as 'Europe/Helsinki', 'US/Eastern', or 'MET'. Named time zones can be used only if the time zone information tables in the mysql database have been created and populated.
Populating the Time Zone Tables Several tables in the mysql system database exist to maintain time zone information (see Section 5.3, “The mysql System Database”). The MySQL installation procedure creates the time zone tables, but does not load them. You must do so manually using the following instructions. Note Loading the time zone information is not necessarily a one-time operation because the information changes occasionally. When such changes occur, applications that use the old rules become out of date and you may find it necessary to reload the time zone tables to keep the information used by your MySQL server current. See the notes at the end of this section. If your system has its own zoneinfo database (the set of files describing time zones), you should use the mysql_tzinfo_to_sql program for filling the time zone tables. Examples of such systems are Linux, FreeBSD, Solaris, and OS X. One likely location for these files is the /usr/share/zoneinfo directory. If your system does not have a zoneinfo database, you can use the downloadable package described later in this section.
1067
Staying Current with Time Zone Changes
The mysql_tzinfo_to_sql program is used to load the time zone tables. On the command line, pass the zoneinfo directory path name to mysql_tzinfo_to_sql and send the output into the mysql program. For example: shell> mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql
mysql_tzinfo_to_sql reads your system's time zone files and generates SQL statements from them. mysql processes those statements to load the time zone tables. mysql_tzinfo_to_sql also can be used to load a single time zone file or to generate leap second information: • To load a single time zone file tz_file that corresponds to a time zone name tz_name, invoke mysql_tzinfo_to_sql like this: shell> mysql_tzinfo_to_sql tz_file tz_name | mysql -u root mysql
With this approach, you must execute a separate command to load the time zone file for each named zone that the server needs to know about. • If your time zone needs to account for leap seconds, initialize the leap second information like this, where tz_file is the name of your time zone file: shell> mysql_tzinfo_to_sql --leap tz_file | mysql -u root mysql
• After running mysql_tzinfo_to_sql, it is best to restart the server so that it does not continue to use any previously cached time zone data. If your system is one that has no zoneinfo database (for example, Windows), you can use a package that is available for download at the MySQL Developer Zone: http://dev.mysql.com/downloads/timezones.html
You can use either a package that contains SQL statements to populate your existing time zone tables, or a package that contains pre-built MyISAM time zone tables to replace your existing tables: • To use a time zone package that contains SQL statements, download and unpack it, then load the package file contents into your existing time zone tables: shell> mysql -u root mysql < file_name
Then restart the server. • To use a time zone package that contains .frm, .MYD, and .MYI files for the MyISAM time zone tables, download and unpack it. These table files are part of the mysql database, so you should place the files in the mysql subdirectory of your MySQL server's data directory. Stop the server before doing this and restart it afterward. Warning Do not use a downloadable package if your system has a zoneinfo database. Use the mysql_tzinfo_to_sql utility instead. Otherwise, you may cause a difference in datetime handling between MySQL and other applications on your system. For information about time zone settings in replication setup, please see Section 17.4.1, “Replication Features and Issues”.
10.6.1 Staying Current with Time Zone Changes 1068
Staying Current with Time Zone Changes
When time zone rules change, applications that use the old rules become out of date. To stay current, it is necessary to make sure that your system uses current time zone information is used. For MySQL, there are two factors to consider in staying current: • The operating system time affects the value that the MySQL server uses for times if its time zone is set to SYSTEM. Make sure that your operating system is using the latest time zone information. For most operating systems, the latest update or service pack prepares your system for the time changes. Check the Web site for your operating system vendor for an update that addresses the time changes. • If you replace the system's /etc/localtime timezone file with a version that uses rules differing from those in effect at mysqld startup, you should restart mysqld so that it uses the updated rules. Otherwise, mysqld might not notice when the system changes its time. • If you use named time zones with MySQL, make sure that the time zone tables in the mysql database are up to date. If your system has its own zoneinfo database, you should reload the MySQL time zone tables whenever the zoneinfo database is updated. For systems that do not have their own zoneinfo database, check the MySQL Developer Zone for updates. When a new update is available, download it and use it to replace the content of your current time zone tables. For instructions for both methods, see Populating the Time Zone Tables. mysqld caches time zone information that it looks up, so after updating the time zone tables, you should restart mysqld to make sure that it does not continue to serve outdated time zone data. If you are uncertain whether named time zones are available, for use either as the server's time zone setting or by clients that set their own time zone, check whether your time zone tables are empty. The following query determines whether the table that contains time zone names has any rows: mysql> SELECT COUNT(*) FROM mysql.time_zone_name; +----------+ | COUNT(*) | +----------+ | 0 | +----------+
A count of zero indicates that the table is empty. In this case, no one can be using named time zones, and you don't need to update the tables. A count greater than zero indicates that the table is not empty and that its contents are available to be used for named time zone support. In this case, you should be sure to reload your time zone tables so that anyone who uses named time zones will get correct query results. To check whether your MySQL installation is updated properly for a change in Daylight Saving Time rules, use a test like the one following. The example uses values that are appropriate for the 2007 DST 1-hour change that occurs in the United States on March 11 at 2 a.m. The test uses these two queries: SELECT CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central'); SELECT CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central');
The two time values indicate the times at which the DST change occurs, and the use of named time zones requires that the time zone tables be used. The desired result is that both queries return the same result (the input time, converted to the equivalent value in the 'US/Central' time zone). Before updating the time zone tables, you would see an incorrect result like this: mysql> SELECT CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central'); +------------------------------------------------------------+ | CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central') | +------------------------------------------------------------+ | 2007-03-11 01:00:00 |
1069
Time Zone Leap Second Support
+------------------------------------------------------------+ mysql> SELECT CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central'); +------------------------------------------------------------+ | CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central') | +------------------------------------------------------------+ | 2007-03-11 02:00:00 | +------------------------------------------------------------+
After updating the tables, you should see the correct result: mysql> SELECT CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central'); +------------------------------------------------------------+ | CONVERT_TZ('2007-03-11 2:00:00','US/Eastern','US/Central') | +------------------------------------------------------------+ | 2007-03-11 01:00:00 | +------------------------------------------------------------+ mysql> SELECT CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central'); +------------------------------------------------------------+ | CONVERT_TZ('2007-03-11 3:00:00','US/Eastern','US/Central') | +------------------------------------------------------------+ | 2007-03-11 01:00:00 | +------------------------------------------------------------+
10.6.2 Time Zone Leap Second Support Leap second values are returned with a time part that ends with :59:59. This means that a function such as NOW() can return the same value for two or three consecutive seconds during the leap second. It remains true that literal temporal values having a time part that ends with :59:60 or :59:61 are considered invalid. If it is necessary to search for TIMESTAMP values one second before the leap second, anomalous results may be obtained if you use a comparison with 'YYYY-MM-DD hh:mm:ss' values. The following example demonstrates this. It changes the local time zone to UTC so there is no difference between internal values (which are in UTC) and displayed values (which have time zone correction applied). mysql> CREATE TABLE t1 ( a INT, ts TIMESTAMP DEFAULT NOW(), PRIMARY KEY (ts) ); Query OK, 0 rows affected (0.01 sec) mysql> -- change to UTC mysql> SET time_zone = '+00:00'; Query OK, 0 rows affected (0.00 sec) mysql> -- Simulate NOW() = '2008-12-31 23:59:59' mysql> SET timestamp = 1230767999; Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO t1 (a) VALUES (1); Query OK, 1 row affected (0.00 sec) mysql> -- Simulate NOW() = '2008-12-31 23:59:60' mysql> SET timestamp = 1230768000; Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO t1 (a) VALUES (2); Query OK, 1 row affected (0.00 sec) mysql> -- values differ internally but display the same mysql> SELECT a, ts, UNIX_TIMESTAMP(ts) FROM t1; +------+---------------------+--------------------+ | a | ts | UNIX_TIMESTAMP(ts) |
1070
MySQL Server Locale Support
+------+---------------------+--------------------+ | 1 | 2008-12-31 23:59:59 | 1230767999 | | 2 | 2008-12-31 23:59:59 | 1230768000 | +------+---------------------+--------------------+ 2 rows in set (0.00 sec) mysql> -- only the non-leap value matches mysql> SELECT * FROM t1 WHERE ts = '2008-12-31 23:59:59'; +------+---------------------+ | a | ts | +------+---------------------+ | 1 | 2008-12-31 23:59:59 | +------+---------------------+ 1 row in set (0.00 sec) mysql> -- the leap value with seconds=60 is invalid mysql> SELECT * FROM t1 WHERE ts = '2008-12-31 23:59:60'; Empty set, 2 warnings (0.00 sec)
To work around this, you can use a comparison based on the UTC value actually stored in column, which has the leap second correction applied: mysql> -- selecting using UNIX_TIMESTAMP value return leap value mysql> SELECT * FROM t1 WHERE UNIX_TIMESTAMP(ts) = 1230768000; +------+---------------------+ | a | ts | +------+---------------------+ | 2 | 2008-12-31 23:59:59 | +------+---------------------+ 1 row in set (0.00 sec)
10.7 MySQL Server Locale Support The locale indicated by the lc_time_names system variable controls the language used to display day and month names and abbreviations. This variable affects the output from the DATE_FORMAT(), DAYNAME(), and MONTHNAME() functions. lc_time_names does not affect the STR_TO_DATE() or GET_FORMAT() function. The lc_time_names value does not affect the result from FORMAT(), but this function takes an optional third parameter that enables a locale to be specified to be used for the result number's decimal point, thousands separator, and grouping between separators. Permissible locale values are the same as the legal values for the lc_time_names system variable. Locale names have language and region subtags listed by IANA (http://www.iana.org/assignments/ language-subtag-registry) such as 'ja_JP' or 'pt_BR'. The default value is 'en_US' regardless of your system's locale setting, but you can set the value at server startup or set the GLOBAL value if you have the SUPER privilege. Any client can examine the value of lc_time_names or set its SESSION value to affect the locale for its own connection. mysql> SET NAMES 'utf8'; Query OK, 0 rows affected (0.09 sec) mysql> SELECT @@lc_time_names; +-----------------+ | @@lc_time_names | +-----------------+ | en_US | +-----------------+ 1 row in set (0.00 sec) mysql> SELECT DAYNAME('2010-01-01'), MONTHNAME('2010-01-01'); +-----------------------+-------------------------+ | DAYNAME('2010-01-01') | MONTHNAME('2010-01-01') | +-----------------------+-------------------------+ | Friday | January |
1071
MySQL Server Locale Support
+-----------------------+-------------------------+ 1 row in set (0.00 sec) mysql> SELECT DATE_FORMAT('2010-01-01','%W %a %M %b'); +-----------------------------------------+ | DATE_FORMAT('2010-01-01','%W %a %M %b') | +-----------------------------------------+ | Friday Fri January Jan | +-----------------------------------------+ 1 row in set (0.00 sec) mysql> SET lc_time_names = 'es_MX'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT @@lc_time_names; +-----------------+ | @@lc_time_names | +-----------------+ | es_MX | +-----------------+ 1 row in set (0.00 sec) mysql> SELECT DAYNAME('2010-01-01'), MONTHNAME('2010-01-01'); +-----------------------+-------------------------+ | DAYNAME('2010-01-01') | MONTHNAME('2010-01-01') | +-----------------------+-------------------------+ | viernes | enero | +-----------------------+-------------------------+ 1 row in set (0.00 sec) mysql> SELECT DATE_FORMAT('2010-01-01','%W %a %M %b'); +-----------------------------------------+ | DATE_FORMAT('2010-01-01','%W %a %M %b') | +-----------------------------------------+ | viernes vie enero ene | +-----------------------------------------+ 1 row in set (0.00 sec)
The day or month name for each of the affected functions is converted from utf8 to the character set indicated by the character_set_connection system variable. lc_time_names may be set to any of the following locale values. The set of locales supported by MySQL may differ from those supported by your operating system. ar_AE: Arabic - United Arab Emirates
ar_BH: Arabic - Bahrain
ar_DZ: Arabic - Algeria
ar_EG: Arabic - Egypt
ar_IN: Arabic - India
ar_IQ: Arabic - Iraq
ar_JO: Arabic - Jordan
ar_KW: Arabic - Kuwait
ar_LB: Arabic - Lebanon
ar_LY: Arabic - Libya
ar_MA: Arabic - Morocco
ar_OM: Arabic - Oman
ar_QA: Arabic - Qatar
ar_SA: Arabic - Saudi Arabia
ar_SD: Arabic - Sudan
ar_SY: Arabic - Syria
ar_TN: Arabic - Tunisia
ar_YE: Arabic - Yemen
be_BY: Belarusian - Belarus
bg_BG: Bulgarian - Bulgaria
ca_ES: Catalan - Spain
cs_CZ: Czech - Czech Republic
da_DK: Danish - Denmark
de_AT: German - Austria
de_BE: German - Belgium
de_CH: German - Switzerland
de_DE: German - Germany
de_LU: German - Luxembourg
el_GR: Greek - Greece
en_AU: English - Australia
en_CA: English - Canada
en_GB: English - United Kingdom
1072
MySQL Server Locale Support
en_IN: English - India
en_NZ: English - New Zealand
en_PH: English - Philippines
en_US: English - United States
en_ZA: English - South Africa
en_ZW: English - Zimbabwe
es_AR: Spanish - Argentina
es_BO: Spanish - Bolivia
es_CL: Spanish - Chile
es_CO: Spanish - Columbia
es_CR: Spanish - Costa Rica
es_DO: Spanish - Dominican Republic
es_EC: Spanish - Ecuador
es_ES: Spanish - Spain
es_GT: Spanish - Guatemala
es_HN: Spanish - Honduras
es_MX: Spanish - Mexico
es_NI: Spanish - Nicaragua
es_PA: Spanish - Panama
es_PE: Spanish - Peru
es_PR: Spanish - Puerto Rico
es_PY: Spanish - Paraguay
es_SV: Spanish - El Salvador
es_US: Spanish - United States
es_UY: Spanish - Uruguay
es_VE: Spanish - Venezuela
et_EE: Estonian - Estonia
eu_ES: Basque - Basque
fi_FI: Finnish - Finland
fo_FO: Faroese - Faroe Islands
fr_BE: French - Belgium
fr_CA: French - Canada
fr_CH: French - Switzerland
fr_FR: French - France
fr_LU: French - Luxembourg
gl_ES: Galician - Spain
gu_IN: Gujarati - India
he_IL: Hebrew - Israel
hi_IN: Hindi - India
hr_HR: Croatian - Croatia
hu_HU: Hungarian - Hungary
id_ID: Indonesian - Indonesia
is_IS: Icelandic - Iceland
it_CH: Italian - Switzerland
it_IT: Italian - Italy
ja_JP: Japanese - Japan
ko_KR: Korean - Republic of Korea
lt_LT: Lithuanian - Lithuania
lv_LV: Latvian - Latvia
mk_MK: Macedonian - FYROM
mn_MN: Mongolia - Mongolian
ms_MY: Malay - Malaysia
nb_NO: Norwegian(Bokmål) - Norway
nl_BE: Dutch - Belgium
nl_NL: Dutch - The Netherlands
no_NO: Norwegian - Norway
pl_PL: Polish - Poland
pt_BR: Portugese - Brazil
pt_PT: Portugese - Portugal
ro_RO: Romanian - Romania
ru_RU: Russian - Russia
ru_UA: Russian - Ukraine
sk_SK: Slovak - Slovakia
sl_SI: Slovenian - Slovenia
sq_AL: Albanian - Albania
sr_RS: Serbian - Yugoslavia
sv_FI: Swedish - Finland
sv_SE: Swedish - Sweden
ta_IN: Tamil - India
te_IN: Telugu - India
th_TH: Thai - Thailand
tr_TR: Turkish - Turkey
uk_UA: Ukrainian - Ukraine
ur_PK: Urdu - Pakistan
vi_VN: Vietnamese - Viet Nam
zh_CN: Chinese - China
zh_HK: Chinese - Hong Kong
zh_TW: Chinese - Taiwan Province of China
1073
1074
Chapter 11 Data Types Table of Contents 11.1 Data Type Overview ........................................................................................................ 11.1.1 Numeric Type Overview ........................................................................................ 11.1.2 Date and Time Type Overview .............................................................................. 11.1.3 String Type Overview ............................................................................................ 11.2 Numeric Types ................................................................................................................ 11.2.1 Integer Types (Exact Value) - INTEGER, INT, SMALLINT, TINYINT, MEDIUMINT, BIGINT ........................................................................................................................... 11.2.2 Fixed-Point Types (Exact Value) - DECIMAL, NUMERIC ........................................ 11.2.3 Floating-Point Types (Approximate Value) - FLOAT, DOUBLE ................................ 11.2.4 Bit-Value Type - BIT ............................................................................................. 11.2.5 Numeric Type Attributes ........................................................................................ 11.2.6 Out-of-Range and Overflow Handling ..................................................................... 11.3 Date and Time Types ...................................................................................................... 11.3.1 The DATE, DATETIME, and TIMESTAMP Types ................................................... 11.3.2 The TIME Type .................................................................................................... 11.3.3 The YEAR Type ................................................................................................... 11.3.4 YEAR(2) Limitations and Migrating to YEAR(4) ...................................................... 11.3.5 Automatic Initialization and Updating for TIMESTAMP ............................................ 11.3.6 Fractional Seconds in Time Values ........................................................................ 11.3.7 Conversion Between Date and Time Types ............................................................ 11.3.8 Two-Digit Years in Dates ...................................................................................... 11.4 String Types .................................................................................................................... 11.4.1 The CHAR and VARCHAR Types ......................................................................... 11.4.2 The BINARY and VARBINARY Types ................................................................... 11.4.3 The BLOB and TEXT Types ................................................................................. 11.4.4 The ENUM Type .................................................................................................. 11.4.5 The SET Type ...................................................................................................... 11.5 Extensions for Spatial Data .............................................................................................. 11.5.1 Spatial Data Types ............................................................................................... 11.5.2 The OpenGIS Geometry Model ............................................................................. 11.5.3 Supported Spatial Data Formats ............................................................................ 11.5.4 Creating Spatial Columns ...................................................................................... 11.5.5 Populating Spatial Columns ................................................................................... 11.5.6 Fetching Spatial Data ............................................................................................ 11.5.7 Optimizing Spatial Analysis ................................................................................... 11.5.8 Creating Spatial Indexes ....................................................................................... 11.5.9 Using Spatial Indexes ........................................................................................... 11.6 Data Type Default Values ................................................................................................ 11.7 Data Type Storage Requirements .................................................................................... 11.8 Choosing the Right Type for a Column ............................................................................. 11.9 Using Data Types from Other Database Engines ..............................................................
1076 1076 1079 1080 1084 1084 1084 1085 1085 1086 1087 1088 1089 1091 1091 1092 1093 1096 1097 1097 1098 1098 1100 1101 1102 1105 1107 1109 1110 1115 1118 1118 1119 1120 1120 1121 1123 1124 1128 1128
MySQL supports a number of SQL data types in several categories: numeric types, date and time types, string (character and byte) types, and spatial types. This chapter provides an overview of these data types, a more detailed description of the properties of the types in each category, and a summary of the data type storage requirements. The initial overview is intentionally brief. The more detailed descriptions later in the chapter should be consulted for additional information about particular data types, such as the permissible formats in which you can specify values. Data type descriptions use these conventions:
1075
Data Type Overview
•
M indicates the maximum display width for integer types. For floating-point and fixed-point types, M is the total number of digits that can be stored (the precision). For string types, M is the maximum length. The maximum permissible value of M depends on the data type.
•
D applies to floating-point and fixed-point types and indicates the number of digits following the decimal point (the scale). The maximum possible value is 30, but should be no greater than M−2.
•
Square brackets ([ and ]) indicate optional parts of type definitions.
11.1 Data Type Overview 11.1.1 Numeric Type Overview A summary of the numeric data types follows. For additional information about properties and storage requirements of the numeric types, see Section 11.2, “Numeric Types”, and Section 11.7, “Data Type Storage Requirements”. M indicates the maximum display width for integer types. The maximum display width is 255. Display width is unrelated to the range of values a type can contain, as described in Section 11.2, “Numeric Types”. For floating-point and fixed-point types, M is the total number of digits that can be stored. If you specify ZEROFILL for a numeric column, MySQL automatically adds the UNSIGNED attribute to the column. Numeric data types that permit the UNSIGNED attribute also permit SIGNED. However, these data types are signed by default, so the SIGNED attribute has no effect. SERIAL is an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE. SERIAL DEFAULT VALUE in the definition of an integer column is an alias for NOT NULL AUTO_INCREMENT UNIQUE. Warning When you use subtraction between integer values where one is of type UNSIGNED, the result is unsigned unless the NO_UNSIGNED_SUBTRACTION SQL mode is enabled. See Section 12.10, “Cast Functions and Operators”. •
BIT[(M)] A bit-value type. M indicates the number of bits per value, from 1 to 64. The default is 1 if M is omitted.
•
TINYINT[(M)] [UNSIGNED] [ZEROFILL] A very small integer. The signed range is -128 to 127. The unsigned range is 0 to 255.
•
BOOL, BOOLEAN These types are synonyms for TINYINT(1). A value of zero is considered false. Nonzero values are considered true: mysql> SELECT IF(0, 'true', 'false'); +------------------------+ | IF(0, 'true', 'false') | +------------------------+ | false | +------------------------+ mysql> SELECT IF(1, 'true', 'false'); +------------------------+
1076
Numeric Type Overview
| IF(1, 'true', 'false') | +------------------------+ | true | +------------------------+ mysql> SELECT IF(2, 'true', 'false'); +------------------------+ | IF(2, 'true', 'false') | +------------------------+ | true | +------------------------+
However, the values TRUE and FALSE are merely aliases for 1 and 0, respectively, as shown here: mysql> SELECT IF(0 = FALSE, 'true', 'false'); +--------------------------------+ | IF(0 = FALSE, 'true', 'false') | +--------------------------------+ | true | +--------------------------------+ mysql> SELECT IF(1 = TRUE, 'true', 'false'); +-------------------------------+ | IF(1 = TRUE, 'true', 'false') | +-------------------------------+ | true | +-------------------------------+ mysql> SELECT IF(2 = TRUE, 'true', 'false'); +-------------------------------+ | IF(2 = TRUE, 'true', 'false') | +-------------------------------+ | false | +-------------------------------+ mysql> SELECT IF(2 = FALSE, 'true', 'false'); +--------------------------------+ | IF(2 = FALSE, 'true', 'false') | +--------------------------------+ | false | +--------------------------------+
The last two statements display the results shown because 2 is equal to neither 1 nor 0. •
SMALLINT[(M)] [UNSIGNED] [ZEROFILL] A small integer. The signed range is -32768 to 32767. The unsigned range is 0 to 65535.
•
MEDIUMINT[(M)] [UNSIGNED] [ZEROFILL] A medium-sized integer. The signed range is -8388608 to 8388607. The unsigned range is 0 to 16777215.
•
INT[(M)] [UNSIGNED] [ZEROFILL] A normal-size integer. The signed range is -2147483648 to 2147483647. The unsigned range is 0 to 4294967295.
•
INTEGER[(M)] [UNSIGNED] [ZEROFILL] This type is a synonym for INT.
•
BIGINT[(M)] [UNSIGNED] [ZEROFILL] A large integer. The signed range is -9223372036854775808 to 9223372036854775807. The unsigned range is 0 to 18446744073709551615. SERIAL is an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE.
1077
Numeric Type Overview
Some things you should be aware of with respect to BIGINT columns: • All arithmetic is done using signed BIGINT or DOUBLE values, so you should not use unsigned big integers larger than 9223372036854775807 (63 bits) except with bit functions! If you do that, some of the last digits in the result may be wrong because of rounding errors when converting a BIGINT value to a DOUBLE. MySQL can handle BIGINT in the following cases: • When using integers to store large unsigned values in a BIGINT column. • In MIN(col_name) or MAX(col_name), where col_name refers to a BIGINT column. • When using operators (+, -, *, and so on) where both operands are integers. • You can always store an exact integer value in a BIGINT column by storing it using a string. In this case, MySQL performs a string-to-number conversion that involves no intermediate doubleprecision representation. • The -, +, and * operators use BIGINT arithmetic when both operands are integer values. This means that if you multiply two big integers (or results from functions that return integers), you may get unexpected results when the result is larger than 9223372036854775807. •
DECIMAL[(M[,D])] [UNSIGNED] [ZEROFILL] A packed “exact” fixed-point number. M is the total number of digits (the precision) and D is the number of digits after the decimal point (the scale). The decimal point and (for negative numbers) the - sign are not counted in M. If D is 0, values have no decimal point or fractional part. The maximum number of digits (M) for DECIMAL is 65. The maximum number of supported decimals (D) is 30. If D is omitted, the default is 0. If M is omitted, the default is 10. UNSIGNED, if specified, disallows negative values. All basic calculations (+, -, *, /) with DECIMAL columns are done with a precision of 65 digits.
•
DEC[(M[,D])] [UNSIGNED] [ZEROFILL], NUMERIC[(M[,D])] [UNSIGNED] [ZEROFILL], FIXED[(M[,D])] [UNSIGNED] [ZEROFILL] These types are synonyms for DECIMAL. The FIXED synonym is available for compatibility with other database systems.
•
FLOAT[(M,D)] [UNSIGNED] [ZEROFILL] A small (single-precision) floating-point number. Permissible values are -3.402823466E+38 to -1.175494351E-38, 0, and 1.175494351E-38 to 3.402823466E+38. These are the theoretical limits, based on the IEEE standard. The actual range might be slightly smaller depending on your hardware or operating system. M is the total number of digits and D is the number of digits following the decimal point. If M and D are omitted, values are stored to the limits permitted by the hardware. A single-precision floating-point number is accurate to approximately 7 decimal places. UNSIGNED, if specified, disallows negative values. Using FLOAT might give you some unexpected problems because all calculations in MySQL are done with double precision. See Section B.5.4.7, “Solving Problems with No Matching Rows”.
•
DOUBLE[(M,D)] [UNSIGNED] [ZEROFILL] A normal-size (double-precision) floating-point number. Permissible values are -1.7976931348623157E+308 to -2.2250738585072014E-308, 0, and
1078
Date and Time Type Overview
2.2250738585072014E-308 to 1.7976931348623157E+308. These are the theoretical limits, based on the IEEE standard. The actual range might be slightly smaller depending on your hardware or operating system. M is the total number of digits and D is the number of digits following the decimal point. If M and D are omitted, values are stored to the limits permitted by the hardware. A double-precision floating-point number is accurate to approximately 15 decimal places. UNSIGNED, if specified, disallows negative values. •
DOUBLE PRECISION[(M,D)] [UNSIGNED] [ZEROFILL], REAL[(M,D)] [UNSIGNED] [ZEROFILL] These types are synonyms for DOUBLE. Exception: If the REAL_AS_FLOAT SQL mode is enabled, REAL is a synonym for FLOAT rather than DOUBLE.
•
FLOAT(p) [UNSIGNED] [ZEROFILL] A floating-point number. p represents the precision in bits, but MySQL uses this value only to determine whether to use FLOAT or DOUBLE for the resulting data type. If p is from 0 to 24, the data type becomes FLOAT with no M or D values. If p is from 25 to 53, the data type becomes DOUBLE with no M or D values. The range of the resulting column is the same as for the single-precision FLOAT or double-precision DOUBLE data types described earlier in this section. FLOAT(p) syntax is provided for ODBC compatibility.
11.1.2 Date and Time Type Overview A summary of the temporal data types follows. For additional information about properties and storage requirements of the temporal types, see Section 11.3, “Date and Time Types”, and Section 11.7, “Data Type Storage Requirements”. For descriptions of functions that operate on temporal values, see Section 12.7, “Date and Time Functions”. For the DATE and DATETIME range descriptions, “supported” means that although earlier values might work, there is no guarantee. •
DATE A date. The supported range is '1000-01-01' to '9999-12-31'. MySQL displays DATE values in 'YYYY-MM-DD' format, but permits assignment of values to DATE columns using either strings or numbers.
•
DATETIME A date and time combination. The supported range is '1000-01-01 00:00:00' to '9999-12-31 23:59:59'. MySQL displays DATETIME values in 'YYYY-MM-DD HH:MM:SS' format, but permits assignment of values to DATETIME columns using either strings or numbers.
•
TIMESTAMP A timestamp. The range is '1970-01-01 00:00:01' UTC to '2038-01-19 03:14:07' UTC. TIMESTAMP values are stored as the number of seconds since the epoch ('1970-01-01 00:00:00' UTC). A TIMESTAMP cannot represent the value '1970-01-01 00:00:00' because that is equivalent to 0 seconds from the epoch and the value 0 is reserved for representing '0000-00-00 00:00:00', the “zero” TIMESTAMP value. Unless specified otherwise, the first TIMESTAMP column in a table is defined to be automatically set to the date and time of the most recent modification if not explicitly assigned a value. This makes TIMESTAMP useful for recording the timestamp of an INSERT or UPDATE operation. You can also set any TIMESTAMP column to the current date and time by assigning it a NULL value, unless it has been defined with the NULL attribute to permit NULL values. The automatic initialization and
1079
String Type Overview
updating to the current date and time can be specified using DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP clauses, as described in Section 11.3.5, “Automatic Initialization and Updating for TIMESTAMP”. •
TIME A time. The range is '-838:59:59' to '838:59:59'. MySQL displays TIME values in 'HH:MM:SS' format, but permits assignment of values to TIME columns using either strings or numbers.
•
YEAR[(2|4)] A year in two-digit or four-digit format. The default is four-digit format. YEAR(2) or YEAR(4) differ in display format, but have the same range of values. In four-digit format, values display as 1901 to 2155, and 0000. In two-digit format, values display as 70 to 69, representing years from 1970 to 2069. MySQL displays YEAR values in YYYY or YY format, but permits assignment of values to YEAR columns using either strings or numbers. Note The YEAR(2) data type has certain issues that you should consider before choosing to use it. As of MySQL 5.5.27, YEAR(2) is deprecated. For more information, see Section 11.3.4, “YEAR(2) Limitations and Migrating to YEAR(4)”. For additional information about YEAR display format and interpretation of input values, see Section 11.3.3, “The YEAR Type”.
The SUM() and AVG() aggregate functions do not work with temporal values. (They convert the values to numbers, losing everything after the first nonnumeric character.) To work around this problem, convert to numeric units, perform the aggregate operation, and convert back to a temporal value. Examples: SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(time_col))) FROM tbl_name; SELECT FROM_DAYS(SUM(TO_DAYS(date_col))) FROM tbl_name;
Note The MySQL server can be run with the MAXDB SQL mode enabled. In this case, TIMESTAMP is identical with DATETIME. If this mode is enabled at the time that a table is created, TIMESTAMP columns are created as DATETIME columns. As a result, such columns use DATETIME display format, have the same range of values, and there is no automatic initialization or updating to the current date and time. See Section 5.1.8, “Server SQL Modes”.
11.1.3 String Type Overview A summary of the string data types follows. For additional information about properties and storage requirements of the string types, see Section 11.4, “String Types”, and Section 11.7, “Data Type Storage Requirements”. In some cases, MySQL may change a string column to a type different from that given in a CREATE TABLE or ALTER TABLE statement. See Section 13.1.17.7, “Silent Column Specification Changes”. MySQL interprets length specifications in character column definitions in character units. This applies to CHAR, VARCHAR, and the TEXT types. Column definitions for many string data types can include attributes that specify the character set or collation of the column. These attributes apply to the CHAR, VARCHAR, the TEXT types, ENUM, and SET data types:
1080
String Type Overview
• The CHARACTER SET attribute specifies the character set, and the COLLATE attribute specifies a collation for the character set. For example: CREATE TABLE t ( c1 VARCHAR(20) CHARACTER SET utf8, c2 TEXT CHARACTER SET latin1 COLLATE latin1_general_cs );
This table definition creates a column named c1 that has a character set of utf8 with the default collation for that character set, and a column named c2 that has a character set of latin1 and a case-sensitive collation. The rules for assigning the character set and collation when either or both of the CHARACTER SET and COLLATE attributes are missing are described in Section 10.1.3.5, “Column Character Set and Collation”. CHARSET is a synonym for CHARACTER SET. • Specifying the CHARACTER SET binary attribute for a character string data type causes the column to be created as the corresponding binary string data type: CHAR becomes BINARY, VARCHAR becomes VARBINARY, and TEXT becomes BLOB. For the ENUM and SET data types, this does not occur; they are created as declared. Suppose that you specify a table using this definition: CREATE TABLE t ( c1 VARCHAR(10) CHARACTER SET binary, c2 TEXT CHARACTER SET binary, c3 ENUM('a','b','c') CHARACTER SET binary );
The resulting table has this definition: CREATE TABLE t ( c1 VARBINARY(10), c2 BLOB, c3 ENUM('a','b','c') CHARACTER SET binary );
• The BINARY attribute is shorthand for specifying the table default character set and the binary (_bin) collation of that character set. In this case, comparison and sorting are based on numeric character code values. • The ASCII attribute is shorthand for CHARACTER SET latin1. • The UNICODE attribute is shorthand for CHARACTER SET ucs2. Character column comparison and sorting are based on the collation assigned to the column. For the CHAR, VARCHAR, TEXT, ENUM, and SET data types, you can declare a column with a binary (_bin) collation or the BINARY attribute to cause comparison and sorting to use the underlying character code values rather than a lexical ordering. For additional information about use of character sets in MySQL, see Section 10.1, “Character Set Support”. •
[NATIONAL] CHAR[(M)] [CHARACTER SET charset_name] [COLLATE collation_name] A fixed-length string that is always right-padded with spaces to the specified length when stored. M represents the column length in characters. The range of M is 0 to 255. If M is omitted, the length is 1. 1081
String Type Overview
Note Trailing spaces are removed when CHAR values are retrieved unless the PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled. CHAR is shorthand for CHARACTER. NATIONAL CHAR (or its equivalent short form, NCHAR) is the standard SQL way to define that a CHAR column should use some predefined character set. MySQL uses utf8 as this predefined character set. Section 10.1.3.7, “The National Character Set”. The CHAR BYTE data type is an alias for the BINARY data type. This is a compatibility feature. MySQL permits you to create a column of type CHAR(0). This is useful primarily when you have to be compliant with old applications that depend on the existence of a column but that do not actually use its value. CHAR(0) is also quite nice when you need a column that can take only two values: A column that is defined as CHAR(0) NULL occupies only one bit and can take only the values NULL and '' (the empty string). •
[NATIONAL] VARCHAR(M) [CHARACTER SET charset_name] [COLLATE collation_name] A variable-length string. M represents the maximum column length in characters. The range of M is 0 to 65,535. The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used. For example, utf8 characters can require up to three bytes per character, so a VARCHAR column that uses the utf8 character set can be declared to be a maximum of 21,844 characters. See Section C.10.4, “Limits on Table Column Count and Row Size”. MySQL stores VARCHAR values as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A VARCHAR column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes. Note MySQL follows the standard SQL specification, and does not remove trailing spaces from VARCHAR values. VARCHAR is shorthand for CHARACTER VARYING. NATIONAL VARCHAR is the standard SQL way to define that a VARCHAR column should use some predefined character set. MySQL uses utf8 as this predefined character set. Section 10.1.3.7, “The National Character Set”. NVARCHAR is shorthand for NATIONAL VARCHAR.
•
BINARY(M) The BINARY type is similar to the CHAR type, but stores binary byte strings rather than nonbinary character strings. M represents the column length in bytes.
•
VARBINARY(M) The VARBINARY type is similar to the VARCHAR type, but stores binary byte strings rather than nonbinary character strings. M represents the maximum column length in bytes.
•
TINYBLOB A BLOB column with a maximum length of 255 (2 − 1) bytes. Each TINYBLOB value is stored using a 1-byte length prefix that indicates the number of bytes in the value. 8
•
TINYTEXT [CHARACTER SET charset_name] [COLLATE collation_name] A TEXT column with a maximum length of 255 (2 − 1) characters. The effective maximum length is less if the value contains multibyte characters. Each TINYTEXT value is stored using a 1-byte length prefix that indicates the number of bytes in the value. 8
1082
String Type Overview
•
BLOB[(M)] A BLOB column with a maximum length of 65,535 (2 − 1) bytes. Each BLOB value is stored using a 2-byte length prefix that indicates the number of bytes in the value. 16
An optional length M can be given for this type. If this is done, MySQL creates the column as the smallest BLOB type large enough to hold values M bytes long. •
TEXT[(M)] [CHARACTER SET charset_name] [COLLATE collation_name] A TEXT column with a maximum length of 65,535 (2 − 1) characters. The effective maximum length is less if the value contains multibyte characters. Each TEXT value is stored using a 2-byte length prefix that indicates the number of bytes in the value. 16
An optional length M can be given for this type. If this is done, MySQL creates the column as the smallest TEXT type large enough to hold values M characters long. •
MEDIUMBLOB A BLOB column with a maximum length of 16,777,215 (2 − 1) bytes. Each MEDIUMBLOB value is stored using a 3-byte length prefix that indicates the number of bytes in the value. 24
•
MEDIUMTEXT [CHARACTER SET charset_name] [COLLATE collation_name] A TEXT column with a maximum length of 16,777,215 (2 − 1) characters. The effective maximum length is less if the value contains multibyte characters. Each MEDIUMTEXT value is stored using a 3byte length prefix that indicates the number of bytes in the value. 24
•
LONGBLOB A BLOB column with a maximum length of 4,294,967,295 or 4GB (2 − 1) bytes. The effective maximum length of LONGBLOB columns depends on the configured maximum packet size in the client/server protocol and available memory. Each LONGBLOB value is stored using a 4-byte length prefix that indicates the number of bytes in the value. 32
•
LONGTEXT [CHARACTER SET charset_name] [COLLATE collation_name] A TEXT column with a maximum length of 4,294,967,295 or 4GB (2 − 1) characters. The effective maximum length is less if the value contains multibyte characters. The effective maximum length of LONGTEXT columns also depends on the configured maximum packet size in the client/server protocol and available memory. Each LONGTEXT value is stored using a 4-byte length prefix that indicates the number of bytes in the value. 32
•
ENUM('value1','value2',...) [CHARACTER SET charset_name] [COLLATE collation_name] An enumeration. A string object that can have only one value, chosen from the list of values 'value1', 'value2', ..., NULL or the special '' error value. ENUM values are represented internally as integers. An ENUM column can have a maximum of 65,535 distinct elements. (The practical limit is less than 3000.) A table can have no more than 255 unique element list definitions among its ENUM and SET columns considered as a group. For more information on these limits, see Section C.10.5, “Limits Imposed by .frm File Structure”.
•
SET('value1','value2',...) [CHARACTER SET charset_name] [COLLATE collation_name] A set. A string object that can have zero or more values, each of which must be chosen from the list of values 'value1', 'value2', ... SET values are represented internally as integers. 1083
Numeric Types
A SET column can have a maximum of 64 distinct members. A table can have no more than 255 unique element list definitions among its ENUM and SET columns considered as a group. For more information on this limit, see Section C.10.5, “Limits Imposed by .frm File Structure”.
11.2 Numeric Types MySQL supports all standard SQL numeric data types. These types include the exact numeric data types (INTEGER, SMALLINT, DECIMAL, and NUMERIC), as well as the approximate numeric data types (FLOAT, REAL, and DOUBLE PRECISION). The keyword INT is a synonym for INTEGER, and the keywords DEC and FIXED are synonyms for DECIMAL. MySQL treats DOUBLE as a synonym for DOUBLE PRECISION (a nonstandard extension). MySQL also treats REAL as a synonym for DOUBLE PRECISION (a nonstandard variation), unless the REAL_AS_FLOAT SQL mode is enabled. The BIT data type stores bit values and is supported for MyISAM, MEMORY, InnoDB, and NDBCLUSTER tables. For information about how MySQL handles assignment of out-of-range values to columns and overflow during expression evaluation, see Section 11.2.6, “Out-of-Range and Overflow Handling”. For information about numeric type storage requirements, see Section 11.7, “Data Type Storage Requirements”. The data type used for the result of a calculation on numeric operands depends on the types of the operands and the operations performed on them. For more information, see Section 12.6.1, “Arithmetic Operators”.
11.2.1 Integer Types (Exact Value) - INTEGER, INT, SMALLINT, TINYINT, MEDIUMINT, BIGINT MySQL supports the SQL standard integer types INTEGER (or INT) and SMALLINT. As an extension to the standard, MySQL also supports the integer types TINYINT, MEDIUMINT, and BIGINT. The following table shows the required storage and range for each integer type. Type TINYINT
SMALLINT
MEDIUMINT
INT
BIGINT
Storage
Minimum Value
Maximum Value
(Bytes)
(Signed/Unsigned)
(Signed/Unsigned)
1
-128
127
0
255
-32768
32767
0
65535
-8388608
8388607
0
16777215
-2147483648
2147483647
0
4294967295
-9223372036854775808
9223372036854775807
0
18446744073709551615
2 3 4 8
11.2.2 Fixed-Point Types (Exact Value) - DECIMAL, NUMERIC The DECIMAL and NUMERIC types store exact numeric data values. These types are used when it is important to preserve exact precision, for example with monetary data. In MySQL, NUMERIC is implemented as DECIMAL, so the following remarks about DECIMAL apply equally to NUMERIC.
1084
Floating-Point Types (Approximate Value) - FLOAT, DOUBLE
MySQL stores DECIMAL values in binary format. See Section 12.18, “Precision Math”. In a DECIMAL column declaration, the precision and scale can be (and usually is) specified; for example: salary DECIMAL(5,2)
In this example, 5 is the precision and 2 is the scale. The precision represents the number of significant digits that are stored for values, and the scale represents the number of digits that can be stored following the decimal point. Standard SQL requires that DECIMAL(5,2) be able to store any value with five digits and two decimals, so values that can be stored in the salary column range from -999.99 to 999.99. In standard SQL, the syntax DECIMAL(M) is equivalent to DECIMAL(M,0). Similarly, the syntax DECIMAL is equivalent to DECIMAL(M,0), where the implementation is permitted to decide the value of M. MySQL supports both of these variant forms of DECIMAL syntax. The default value of M is 10. If the scale is 0, DECIMAL values contain no decimal point or fractional part. The maximum number of digits for DECIMAL is 65, but the actual range for a given DECIMAL column can be constrained by the precision or scale for a given column. When such a column is assigned a value with more digits following the decimal point than are permitted by the specified scale, the value is converted to that scale. (The precise behavior is operating system-specific, but generally the effect is truncation to the permissible number of digits.)
11.2.3 Floating-Point Types (Approximate Value) - FLOAT, DOUBLE The FLOAT and DOUBLE types represent approximate numeric data values. MySQL uses four bytes for single-precision values and eight bytes for double-precision values. For FLOAT, the SQL standard permits an optional specification of the precision (but not the range of the exponent) in bits following the keyword FLOAT in parentheses. MySQL also supports this optional precision specification, but the precision value is used only to determine storage size. A precision from 0 to 23 results in a 4-byte single-precision FLOAT column. A precision from 24 to 53 results in an 8-byte double-precision DOUBLE column. MySQL permits a nonstandard syntax: FLOAT(M,D) or REAL(M,D) or DOUBLE PRECISION(M,D). Here, (M,D) means than values can be stored with up to M digits in total, of which D digits may be after the decimal point. For example, a column defined as FLOAT(7,4) will look like -999.9999 when displayed. MySQL performs rounding when storing values, so if you insert 999.00009 into a FLOAT(7,4) column, the approximate result is 999.0001. Because floating-point values are approximate and not stored as exact values, attempts to treat them as exact in comparisons may lead to problems. They are also subject to platform or implementation dependencies. For more information, see Section B.5.4.8, “Problems with Floating-Point Values” For maximum portability, code requiring storage of approximate numeric data values should use FLOAT or DOUBLE PRECISION with no specification of precision or number of digits.
11.2.4 Bit-Value Type - BIT The BIT data type is used to store bit values. A type of BIT(M) enables storage of M-bit values. M can range from 1 to 64. To specify bit values, b'value' notation can be used. value is a binary value written using zeros and ones. For example, b'111' and b'10000000' represent 7 and 128, respectively. See Section 9.1.5, “Bit-Value Literals”.
1085
Numeric Type Attributes
If you assign a value to a BIT(M) column that is less than M bits long, the value is padded on the left with zeros. For example, assigning a value of b'101' to a BIT(6) column is, in effect, the same as assigning b'000101'. NDB Cluster. The maximum combined size of all BIT columns used in a given NDB table must not exceed 4096 bits.
11.2.5 Numeric Type Attributes MySQL supports an extension for optionally specifying the display width of integer data types in parentheses following the base keyword for the type. For example, INT(4) specifies an INT with a display width of four digits. This optional display width may be used by applications to display integer values having a width less than the width specified for the column by left-padding them with spaces. (That is, this width is present in the metadata returned with result sets. Whether it is used or not is up to the application.) The display width does not constrain the range of values that can be stored in the column. Nor does it prevent values wider than the column display width from being displayed correctly. For example, a column specified as SMALLINT(3) has the usual SMALLINT range of -32768 to 32767, and values outside the range permitted by three digits are displayed in full using more than three digits. When used in conjunction with the optional (nonstandard) attribute ZEROFILL, the default padding of spaces is replaced with zeros. For example, for a column declared as INT(4) ZEROFILL, a value of 5 is retrieved as 0005. Note The ZEROFILL attribute is ignored when a column is involved in expressions or UNION queries. If you store values larger than the display width in an integer column that has the ZEROFILL attribute, you may experience problems when MySQL generates temporary tables for some complicated joins. In these cases, MySQL assumes that the data values fit within the column display width. All integer types can have an optional (nonstandard) attribute UNSIGNED. Unsigned type can be used to permit only nonnegative numbers in a column or when you need a larger upper numeric range for the column. For example, if an INT column is UNSIGNED, the size of the column's range is the same but its endpoints shift from -2147483648 and 2147483647 up to 0 and 4294967295. Floating-point and fixed-point types also can be UNSIGNED. As with integer types, this attribute prevents negative values from being stored in the column. Unlike the integer types, the upper range of column values remains the same. If you specify ZEROFILL for a numeric column, MySQL automatically adds the UNSIGNED attribute to the column. Integer or floating-point data types can have the additional attribute AUTO_INCREMENT. When you insert a value of NULL into an indexed AUTO_INCREMENT column, the column is set to the next sequence value. Typically this is value+1, where value is the largest value for the column currently in the table. (AUTO_INCREMENT sequences begin with 1.) Storing 0 into an AUTO_INCREMENT column has the same effect as storing NULL, unless the NO_AUTO_VALUE_ON_ZERO SQL mode is enabled. Inserting NULL to generate AUTO_INCREMENT values requires that the column be declared NOT NULL. If the column is declared NULL, inserting NULL stores a NULL. When you insert any other value into an AUTO_INCREMENT column, the column is set to that value and the sequence is reset so that the next automatically generated value follows sequentially from the inserted value. 1086
Out-of-Range and Overflow Handling
As of MySQL 5.5.29, negative values for AUTO_INCREMENT columns are not supported.
11.2.6 Out-of-Range and Overflow Handling When MySQL stores a value in a numeric column that is outside the permissible range of the column data type, the result depends on the SQL mode in effect at the time: • If strict SQL mode is enabled, MySQL rejects the out-of-range value with an error, and the insert fails, in accordance with the SQL standard. • If no restrictive modes are enabled, MySQL clips the value to the appropriate endpoint of the column data type range and stores the resulting value instead. When an out-of-range value is assigned to an integer column, MySQL stores the value representing the corresponding endpoint of the column data type range. When a floating-point or fixed-point column is assigned a value that exceeds the range implied by the specified (or default) precision and scale, MySQL stores the value representing the corresponding endpoint of that range. Suppose that a table t1 has this definition: CREATE TABLE t1 (i1 TINYINT, i2 TINYINT UNSIGNED);
With strict SQL mode enabled, an out of range error occurs: mysql> SET sql_mode = 'TRADITIONAL'; mysql> INSERT INTO t1 (i1, i2) VALUES(256, 256); ERROR 1264 (22003): Out of range value for column 'i1' at row 1 mysql> SELECT * FROM t1; Empty set (0.00 sec)
With strict SQL mode not enabled, clipping with warnings occurs: mysql> SET sql_mode = ''; mysql> INSERT INTO t1 (i1, i2) VALUES(256, 256); mysql> SHOW WARNINGS; +---------+------+---------------------------------------------+ | Level | Code | Message | +---------+------+---------------------------------------------+ | Warning | 1264 | Out of range value for column 'i1' at row 1 | | Warning | 1264 | Out of range value for column 'i2' at row 1 | +---------+------+---------------------------------------------+ mysql> SELECT * FROM t1; +------+------+ | i1 | i2 | +------+------+ | 127 | 255 | +------+------+
When strict SQL mode is not enabled, column-assignment conversions that occur due to clipping are reported as warnings for ALTER TABLE, LOAD DATA INFILE, UPDATE, and multiple-row INSERT statements. In strict mode, these statements fail, and some or all the values are not inserted or changed, depending on whether the table is a transactional table and other factors. For details, see Section 5.1.8, “Server SQL Modes”. Overflow during numeric expression evaluation results in an error. For example, the largest signed BIGINT value is 9223372036854775807, so the following expression produces an error: mysql> SELECT 9223372036854775807 + 1; ERROR 1690 (22003): BIGINT value is out of range in '(9223372036854775807 + 1)'
1087
Date and Time Types
To enable the operation to succeed in this case, convert the value to unsigned; mysql> SELECT CAST(9223372036854775807 AS UNSIGNED) + 1; +-------------------------------------------+ | CAST(9223372036854775807 AS UNSIGNED) + 1 | +-------------------------------------------+ | 9223372036854775808 | +-------------------------------------------+
Whether overflow occurs depends on the range of the operands, so another way to handle the preceding expression is to use exact-value arithmetic because DECIMAL values have a larger range than integers: mysql> SELECT 9223372036854775807.0 + 1; +---------------------------+ | 9223372036854775807.0 + 1 | +---------------------------+ | 9223372036854775808.0 | +---------------------------+
Subtraction between integer values, where one is of type UNSIGNED, produces an unsigned result by default. If the result would otherwise have been negative, an error results: mysql> SET sql_mode = ''; Query OK, 0 rows affected (0.00 sec) mysql> SELECT CAST(0 AS UNSIGNED) - 1; ERROR 1690 (22003): BIGINT UNSIGNED value is out of range in '(cast(0 as unsigned) - 1)'
If the NO_UNSIGNED_SUBTRACTION SQL mode is enabled, the result is negative: mysql> SET sql_mode = 'NO_UNSIGNED_SUBTRACTION'; mysql> SELECT CAST(0 AS UNSIGNED) - 1; +-------------------------+ | CAST(0 AS UNSIGNED) - 1 | +-------------------------+ | -1 | +-------------------------+
If the result of such an operation is used to update an UNSIGNED integer column, the result is clipped to the maximum value for the column type, or clipped to 0 if NO_UNSIGNED_SUBTRACTION is enabled. If strict SQL mode is enabled, an error occurs and the column remains unchanged.
11.3 Date and Time Types The date and time types for representing temporal values are DATE, TIME, DATETIME, TIMESTAMP, and YEAR. Each temporal type has a range of valid values, as well as a “zero” value that may be used when you specify an invalid value that MySQL cannot represent. The TIMESTAMP type has special automatic updating behavior, described later. For temporal type storage requirements, see Section 11.7, “Data Type Storage Requirements”. Keep in mind these general considerations when working with date and time types: • MySQL retrieves values for a given date or time type in a standard output format, but it attempts to interpret a variety of formats for input values that you supply (for example, when you specify a value to be assigned to or compared to a date or time type). For a description of the permitted formats for date and time types, see Section 9.1.3, “Date and Time Literals”. It is expected that you supply valid values. Unpredictable results may occur if you use values in other formats. • Although MySQL tries to interpret values in several formats, date parts must always be given in yearmonth-day order (for example, '98-09-04'), rather than in the month-day-year or day-month-year orders commonly used elsewhere (for example, '09-04-98', '04-09-98').
1088
The DATE, DATETIME, and TIMESTAMP Types
• Dates containing two-digit year values are ambiguous because the century is unknown. MySQL interprets two-digit year values using these rules: • Year values in the range 70-99 are converted to 1970-1999. • Year values in the range 00-69 are converted to 2000-2069. See also Section 11.3.8, “Two-Digit Years in Dates”. • Conversion of values from one temporal type to another occurs according to the rules in Section 11.3.7, “Conversion Between Date and Time Types”. • MySQL automatically converts a date or time value to a number if the value is used in a numeric context and vice versa. • By default, when MySQL encounters a value for a date or time type that is out of range or otherwise invalid for the type, it converts the value to the “zero” value for that type. The exception is that out-ofrange TIME values are clipped to the appropriate endpoint of the TIME range. • By setting the SQL mode to the appropriate value, you can specify more exactly what kind of dates you want MySQL to support. (See Section 5.1.8, “Server SQL Modes”.) You can get MySQL to accept certain dates, such as '2009-11-31', by enabling the ALLOW_INVALID_DATES SQL mode. This is useful when you want to store a “possibly wrong” value which the user has specified (for example, in a web form) in the database for future processing. Under this mode, MySQL verifies only that the month is in the range from 1 to 12 and that the day is in the range from 1 to 31. • MySQL permits you to store dates where the day or month and day are zero in a DATE or DATETIME column. This is useful for applications that need to store birthdates for which you may not know the exact date. In this case, you simply store the date as '2009-00-00' or '2009-01-00'. If you store dates such as these, you should not expect to get correct results for functions such as DATE_SUB() or DATE_ADD() that require complete dates. To disallow zero month or day parts in dates, enable the NO_ZERO_IN_DATE SQL mode. • MySQL permits you to store a “zero” value of '0000-00-00' as a “dummy date.” This is in some cases more convenient than using NULL values, and uses less data and index space. To disallow '0000-00-00', enable the NO_ZERO_DATE SQL mode. • “Zero” date or time values used through Connector/ODBC are converted automatically to NULL because ODBC cannot handle such values. The following table shows the format of the “zero” value for each type. The “zero” values are special, but you can store or refer to them explicitly using the values shown in the table. You can also do this using the values '0' or 0, which are easier to write. For temporal types that include a date part (DATE, DATETIME, and TIMESTAMP), use of these values produces warnings if the NO_ZERO_DATE SQL mode is enabled. Data Type
“Zero” Value
DATE
'0000-00-00'
TIME
'00:00:00'
DATETIME
'0000-00-00 00:00:00'
TIMESTAMP
'0000-00-00 00:00:00'
YEAR
0000
11.3.1 The DATE, DATETIME, and TIMESTAMP Types The DATE, DATETIME, and TIMESTAMP types are related. This section describes their characteristics, how they are similar, and how they differ. MySQL recognizes DATE, DATETIME, and TIMESTAMP values in several formats, described in Section 9.1.3, “Date and Time Literals”. For the DATE and
1089
The DATE, DATETIME, and TIMESTAMP Types
DATETIME range descriptions, “supported” means that although earlier values might work, there is no guarantee. The DATE type is used for values with a date part but no time part. MySQL retrieves and displays DATE values in 'YYYY-MM-DD' format. The supported range is '1000-01-01' to '9999-12-31'. The DATETIME type is used for values that contain both date and time parts. MySQL retrieves and displays DATETIME values in 'YYYY-MM-DD HH:MM:SS' format. The supported range is '1000-01-01 00:00:00' to '9999-12-31 23:59:59'. The TIMESTAMP data type is used for values that contain both date and time parts. TIMESTAMP has a range of '1970-01-01 00:00:01' UTC to '2038-01-19 03:14:07' UTC. MySQL converts TIMESTAMP values from the current time zone to UTC for storage, and back from UTC to the current time zone for retrieval. (This does not occur for other types such as DATETIME.) By default, the current time zone for each connection is the server's time. The time zone can be set on a per-connection basis. As long as the time zone setting remains constant, you get back the same value you store. If you store a TIMESTAMP value, and then change the time zone and retrieve the value, the retrieved value is different from the value you stored. This occurs because the same time zone was not used for conversion in both directions. The current time zone is available as the value of the time_zone system variable. For more information, see Section 10.6, “MySQL Server Time Zone Support”. The TIMESTAMP data type offers automatic initialization and updating to the current date and time. For more information, see Section 11.3.5, “Automatic Initialization and Updating for TIMESTAMP”. A DATETIME or TIMESTAMP value can include a trailing fractional seconds part in up to microseconds (6 digits) precision. Although this fractional part is recognized, it is discarded from values stored into DATETIME or TIMESTAMP columns. For information about fractional seconds support in MySQL, see Section 11.3.6, “Fractional Seconds in Time Values”. Invalid DATE, DATETIME, or TIMESTAMP values are converted to the “zero” value of the appropriate type ('0000-00-00' or '0000-00-00 00:00:00'). Be aware of certain properties of date value interpretation in MySQL: • MySQL permits a “relaxed” format for values specified as strings, in which any punctuation character may be used as the delimiter between date parts or time parts. In some cases, this syntax can be deceiving. For example, a value such as '10:11:12' might look like a time value because of the :, but is interpreted as the year '2010-11-12' if used in a date context. The value '10:45:15' is converted to '0000-00-00' because '45' is not a valid month. • The server requires that month and day values be valid, and not merely in the range 1 to 12 and 1 to 31, respectively. With strict mode disabled, invalid dates such as '2004-04-31' are converted to '0000-00-00' and a warning is generated. With strict mode enabled, invalid dates generate an error. To permit such dates, enable ALLOW_INVALID_DATES. See Section 5.1.8, “Server SQL Modes”, for more information. • MySQL does not accept TIMESTAMP values that include a zero in the day or month column or values that are not a valid date. The sole exception to this rule is the special “zero” value '0000-00-00 00:00:00'. • CAST() treats a TIMESTAMP value as a string when not selecting from a table. (This is true even if you specify FROM DUAL.) See Section 12.10, “Cast Functions and Operators”. • Dates containing two-digit year values are ambiguous because the century is unknown. MySQL interprets two-digit year values using these rules: • Year values in the range 00-69 are converted to 2000-2069. • Year values in the range 70-99 are converted to 1970-1999.
1090
The TIME Type
See also Section 11.3.8, “Two-Digit Years in Dates”. Note The MySQL server can be run with the MAXDB SQL mode enabled. In this case, TIMESTAMP is identical with DATETIME. If this mode is enabled at the time that a table is created, TIMESTAMP columns are created as DATETIME columns. As a result, such columns use DATETIME display format, have the same range of values, and there is no automatic initialization or updating to the current date and time. See Section 5.1.8, “Server SQL Modes”.
11.3.2 The TIME Type MySQL retrieves and displays TIME values in 'HH:MM:SS' format (or 'HHH:MM:SS' format for large hours values). TIME values may range from '-838:59:59' to '838:59:59'. The hours part may be so large because the TIME type can be used not only to represent a time of day (which must be less than 24 hours), but also elapsed time or a time interval between two events (which may be much greater than 24 hours, or even negative). MySQL recognizes TIME values in several formats, described in Section 9.1.3, “Date and Time Literals”. Some of these formats can include a trailing fractional seconds part in up to microseconds (6 digits) precision. Although this fractional part is recognized, it is discarded from values stored into TIME columns. For information about fractional seconds support in MySQL, see Section 11.3.6, “Fractional Seconds in Time Values”. Be careful about assigning abbreviated values to a TIME column. MySQL interprets abbreviated TIME values with colons as time of the day. That is, '11:12' means '11:12:00', not '00:11:12'. MySQL interprets abbreviated values without colons using the assumption that the two rightmost digits represent seconds (that is, as elapsed time rather than as time of day). For example, you might think of '1112' and 1112 as meaning '11:12:00' (12 minutes after 11 o'clock), but MySQL interprets them as '00:11:12' (11 minutes, 12 seconds). Similarly, '12' and 12 are interpreted as '00:00:12'. By default, values that lie outside the TIME range but are otherwise valid are clipped to the closest endpoint of the range. For example, '-850:00:00' and '850:00:00' are converted to '-838:59:59' and '838:59:59'. Invalid TIME values are converted to '00:00:00'. Note that because '00:00:00' is itself a valid TIME value, there is no way to tell, from a value of '00:00:00' stored in a table, whether the original value was specified as '00:00:00' or whether it was invalid. For more restrictive treatment of invalid TIME values, enable strict SQL mode to cause errors to occur. See Section 5.1.8, “Server SQL Modes”.
11.3.3 The YEAR Type The YEAR type is a 1-byte type used to represent year values. It can be declared as YEAR(4) or YEAR(2) to specify a display width of four or two characters. The default is four characters if no width is given. Note The YEAR(2) data type has certain issues that you should consider before choosing to use it. As of MySQL 5.5.27, YEAR(2) is deprecated. For more information, see Section 11.3.4, “YEAR(2) Limitations and Migrating to YEAR(4)”. YEAR(4) and YEAR(2) differ in display format, but have the same range of values. For 4-digit format, MySQL displays YEAR values in YYYY format, with a range of 1901 to 2155, or 0000. For 2-digit format, MySQL displays only the last two (least significant) digits; for example, 70 (1970 or 2070) or 69 (2069). You can specify input YEAR values in a variety of formats:
1091
YEAR(2) Limitations and Migrating to YEAR(4)
• As a 4-digit number in the range 1901 to 2155. • As a 4-digit string in the range '1901' to '2155'. • As a 1- or 2-digit number in the range 1 to 99. MySQL converts values in the ranges 1 to 69 and 70 to 99 to YEAR values in the ranges 2001 to 2069 and 1970 to 1999. • As a 1- or 2-digit string in the range '0' to '99'. MySQL converts values in the ranges '0' to '69' and '70' to '99' to YEAR values in the ranges 2000 to 2069 and 1970 to 1999. • Inserting a numeric 0 has a different effect for YEAR(2) and YEAR(4). For YEAR(2), the result has a display value of 00 and an internal value of 2000. For YEAR(4), the result has a display value of 0000 and an internal value of 0000. To specify zero for YEAR(4) and have it be interpreted as 2000, specify it as a string '0' or '00'. • As the result of a function that returns a value that is acceptable in a YEAR context, such as NOW(). MySQL converts invalid YEAR values to 0000. See also Section 11.3.8, “Two-Digit Years in Dates”.
11.3.4 YEAR(2) Limitations and Migrating to YEAR(4) This section describes problems that can occur when using YEAR(2) and provides information about converting existing YEAR(2) columns to YEAR(4). Although the internal range of values for YEAR(4) and YEAR(2) is the same (1901 to 2155, and 0000), the display width for YEAR(2) makes that type inherently ambiguous because displayed values indicate only the last two digits of the internal values and omit the century digits. The result can be a loss of information under certain circumstances. For this reason, consider avoiding YEAR(2) throughout your applications and using YEAR(4) wherever you need a YEAR data type. Note that conversion will become necessary at some point because support for YEAR data types with display values other than 4, most notably YEAR(2), is reduced as of MySQL 5.6.6 and will be removed entirely in a future release.
YEAR(2) Limitations Issues with the YEAR(2) data type include ambiguity of displayed values, and possible loss of information when values are dumped and reloaded or converted to strings. • Displayed YEAR(2) values can be ambiguous. It is possible for up to three YEAR(2) values that have different internal values to have the same displayed value, as the following example demonstrates: mysql> CREATE TABLE t (y2 YEAR(2), y4 YEAR(4)); Query OK, 0 rows affected (0.01 sec) mysql> INSERT INTO t (y2) VALUES(1912),(2012),(2112); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> UPDATE t SET y4 = y2; Query OK, 3 rows affected (0.00 sec) Rows matched: 3 Changed: 3 Warnings: 0 mysql> SELECT * FROM t; +------+------+ | y2 | y4 | +------+------+ | 12 | 1912 | | 12 | 2012 | | 12 | 2112 | +------+------+ 3 rows in set (0.00 sec)
1092
Automatic Initialization and Updating for TIMESTAMP
• If you use mysqldump to dump the table created in the preceding item, the dump file represents all y2 values using the same 2-digit representation (12). If you reload the table from the dump file, all resulting rows have internal value 2012 and display value 12, thus losing the distinctions among them. • Conversion of a YEAR(2) or YEAR(4) data value to string form uses the display width of the YEAR type. Suppose that YEAR(2) and YEAR(4) columns both contain the value 1970. Assigning each column to a string results in a value of '70' or '1970', respectively. That is, loss of information occurs for conversion from YEAR(2) to string. • Values outside the range from 1970 to 2069 are stored incorrectly when inserted into a YEAR(2) column in a CSV table. For example, inserting 2111 results in a display value of 11 but an internal value of 2011. To avoid these problems, use YEAR(4) rather than YEAR(2). Suggestions regarding migration strategies appear later in this section.
Migrating from YEAR(2) to YEAR(4) To convert YEAR(2) columns to YEAR(4), use ALTER TABLE. Suppose that a table t1 has this definition: CREATE TABLE t1 (ycol YEAR(2) NOT NULL DEFAULT '70');
Modify the column using ALTER TABLE as follows. Remember to include any column attributes such as NOT NULL or DEFAULT: ALTER TABLE t1 MODIFY ycol YEAR(4) NOT NULL DEFAULT '1970';
The ALTER TABLE statement converts the table without changing YEAR(2) values. If the server is a replication master, the ALTER TABLE statement replicates to slaves and makes the corresponding table change on each one. One migration method should be avoided: Do not dump your data with mysqldump and reload the dump file after upgrading. This has the potential to change YEAR(2) values, as described previously. A migration from YEAR(2) to YEAR(4) should also involve examining application code for the possibility of changed behavior under conditions such as these: • Code that expects selecting a YEAR column to produce exactly two digits. • Code that does not account for different handling for inserts of numeric 0: Inserting 0 into YEAR(2) or YEAR(4) results in an internal value of 2000 or 0000, respectively.
11.3.5 Automatic Initialization and Updating for TIMESTAMP Note In older versions of MySQL (prior to 4.1), the properties of the TIMESTAMP data type differed significantly in several ways from what is described in this section (see the MySQL 3.23, 4.0, 4.1 Reference Manual for details); these include syntax extensions which are deprecated in MySQL 5.1, and no longer supported in MySQL 5.5. This has implications for performing a dump and restore or replicating between MySQL Server versions. If you are using columns that are defined using the old TIMESTAMP(N) syntax, see Section 2.11.1.1, “Changes Affecting Upgrades to MySQL 5.5”, prior to upgrading to MySQL 5.5. The TIMESTAMP data type offers automatic initialization and updating to the current date and time (that is, the current timestamp). You can choose whether to use these properties and which column should have them:
1093
Automatic Initialization and Updating for TIMESTAMP
• One TIMESTAMP column in a table can have the current timestamp as the default value for initializing the column, as the auto-update value, or both. It is not possible to have the current timestamp be the default value for one column and the auto-update value for another column. • If the column is auto-initialized, it is set to the current timestamp for inserted rows that specify no value for the column. • If the column is auto-updated, it is automatically updated to the current timestamp when the value of any other column in the row is changed from its current value. The column remains unchanged if all other columns are set to their current values. To prevent the column from updating when other columns change, explicitly set it to its current value. To update the column even when other columns do not change, explicitly set it to the value it should have (for example, set it to CURRENT_TIMESTAMP). In addition, you can initialize or update any TIMESTAMP column to the current date and time by assigning it a NULL value, unless it has been defined with the NULL attribute to permit NULL values. To specify automatic properties, use the DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP clauses. The order of the clauses does not matter. If both are present in a column definition, either can occur first. Any of the synonyms for CURRENT_TIMESTAMP have the same meaning as CURRENT_TIMESTAMP. These are CURRENT_TIMESTAMP(), NOW(), LOCALTIME, LOCALTIME(), LOCALTIMESTAMP, and LOCALTIMESTAMP(). Use of DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP is specific to TIMESTAMP. The DEFAULT clause also can be used to specify a constant (nonautomatic) default value; for example, DEFAULT 0 or DEFAULT '2000-01-01 00:00:00'. Note The following examples use DEFAULT 0, a default that can produce warnings or errors depending on whether strict SQL mode or the NO_ZERO_DATE SQL mode is enabled. Be aware that the TRADITIONAL SQL mode includes strict mode and NO_ZERO_DATE. See Section 5.1.8, “Server SQL Modes”. The following rules describe the possibilities for defining the first TIMESTAMP column in a table with the current timestamp for both the default and auto-update values, for one but not the other, or for neither: • With both DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP, the column has the current timestamp for its default value and is automatically updated to the current timestamp. CREATE TABLE t1 ( ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP );
• With neither DEFAULT CURRENT_TIMESTAMP nor ON UPDATE CURRENT_TIMESTAMP, it is the same as specifying both DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP. CREATE TABLE t1 ( ts TIMESTAMP );
• With a DEFAULT clause but no ON UPDATE CURRENT_TIMESTAMP clause, the column has the given default value and is not automatically updated to the current timestamp. The default depends on whether the DEFAULT clause specifies CURRENT_TIMESTAMP or a constant value. With CURRENT_TIMESTAMP, the default is the current timestamp. CREATE TABLE t1 ( ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
1094
Automatic Initialization and Updating for TIMESTAMP
With a constant, the default is the given value. In this case, the column has no automatic properties at all. CREATE TABLE t1 ( ts TIMESTAMP DEFAULT 0 );
• With an ON UPDATE CURRENT_TIMESTAMP clause and a constant DEFAULT clause, the column is automatically updated to the current timestamp and has the given constant default value. CREATE TABLE t1 ( ts TIMESTAMP DEFAULT 0 ON UPDATE CURRENT_TIMESTAMP );
• With an ON UPDATE CURRENT_TIMESTAMP clause but no DEFAULT clause, the column is automatically updated to the current timestamp. The default is 0 unless the column is defined with the NULL attribute, in which case the default is NULL. CREATE TABLE t1 ( ts TIMESTAMP ON UPDATE CURRENT_TIMESTAMP -- default 0 ); CREATE TABLE t2 ( ts TIMESTAMP NULL ON UPDATE CURRENT_TIMESTAMP -- default NULL );
It need not be the first TIMESTAMP column in a table that is automatically initialized or updated to the current timestamp. However, to specify automatic initialization or updating for a different TIMESTAMP column, you must suppress the automatic properties for the first one. Then, for the other TIMESTAMP column, the rules for the DEFAULT and ON UPDATE clauses are the same as for the first TIMESTAMP column, except that if you omit both clauses, no automatic initialization or updating occurs. To suppress automatic properties for the first TIMESTAMP column, do either of the following: • Define the column with a DEFAULT clause that specifies a constant default value. • Specify the NULL attribute. This also causes the column to permit NULL values, which means that you cannot assign the current timestamp by setting the column to NULL. Assigning NULL sets the column to NULL. Consider these table definitions: CREATE TABLE t1 ( ts1 TIMESTAMP DEFAULT 0, ts2 TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP); CREATE TABLE t2 ( ts1 TIMESTAMP NULL, ts2 TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP); CREATE TABLE t3 ( ts1 TIMESTAMP NULL DEFAULT 0, ts2 TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP);
The tables have these properties: • In each table definition, the first TIMESTAMP column has no automatic initialization or updating. • The tables differ in how the ts1 column handles NULL values. For t1, ts1 is NOT NULL and assigning it a value of NULL sets it to the current timestamp. For t2 and t3, ts1 permits NULL and assigning it a value of NULL sets it to NULL.
1095
Fractional Seconds in Time Values
• t2 and t3 differ in the default value for ts1. For t2, ts1 is defined to permit NULL, so the default is also NULL in the absence of an explicit DEFAULT clause. For t3, ts1 permits NULL but has an explicit default of 0.
TIMESTAMP Initialization and the NULL Attribute By default, TIMESTAMP columns are NOT NULL, cannot contain NULL values, and assigning NULL assigns the current timestamp. To permit a TIMESTAMP column to contain NULL, explicitly declare it with the NULL attribute. In this case, the default value also becomes NULL unless overridden with a DEFAULT clause that specifies a different default value. DEFAULT NULL can be used to explicitly specify NULL as the default value. (For a TIMESTAMP column not declared with the NULL attribute, DEFAULT NULL is invalid.) If a TIMESTAMP column permits NULL values, assigning NULL sets it to NULL, not to the current timestamp. The following table contains several TIMESTAMP columns that permit NULL values: CREATE TABLE t ( ts1 TIMESTAMP NULL DEFAULT NULL, ts2 TIMESTAMP NULL DEFAULT 0, ts3 TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP );
A TIMESTAMP column that permits NULL values does not take on the current timestamp at insert time except under one of the following conditions: • Its default value is defined as CURRENT_TIMESTAMP and no value is specified for the column • CURRENT_TIMESTAMP or any of its synonyms such as NOW() is explicitly inserted into the column In other words, a TIMESTAMP column defined to permit NULL values auto-initializes only if its definition includes DEFAULT CURRENT_TIMESTAMP: CREATE TABLE t (ts TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP);
If the TIMESTAMP column permits NULL values but its definition does not include DEFAULT CURRENT_TIMESTAMP, you must explicitly insert a value corresponding to the current date and time. Suppose that tables t1 and t2 have these definitions: CREATE TABLE t1 (ts TIMESTAMP NULL DEFAULT '0000-00-00 00:00:00'); CREATE TABLE t2 (ts TIMESTAMP NULL DEFAULT NULL);
To set the TIMESTAMP column in either table to the current timestamp at insert time, explicitly assign it that value. For example: INSERT INTO t2 VALUES (CURRENT_TIMESTAMP); INSERT INTO t1 VALUES (NOW());
11.3.6 Fractional Seconds in Time Values A trailing fractional seconds part is permissible for temporal values in contexts such as literal values, and in the arguments to or return values from some temporal functions. Example: mysql> SELECT MICROSECOND('2010-12-10 14:12:09.019473'); +-------------------------------------------+ | MICROSECOND('2010-12-10 14:12:09.019473') | +-------------------------------------------+ | 19473 | +-------------------------------------------+
1096
Conversion Between Date and Time Types
However, when MySQL stores a value into a column of any temporal data type, it discards any fractional part and does not store it.
11.3.7 Conversion Between Date and Time Types To some extent, you can convert a value from one temporal type to another. However, there may be some alteration of the value or loss of information. In all cases, conversion between temporal types is subject to the range of valid values for the resulting type. For example, although DATE, DATETIME, and TIMESTAMP values all can be specified using the same set of formats, the types do not all have the same range of values. TIMESTAMP values cannot be earlier than 1970 UTC or later than '2038-01-19 03:14:07' UTC. This means that a date such as '1968-01-01', while valid as a DATE or DATETIME value, is not valid as a TIMESTAMP value and is converted to 0. Conversion of DATE values: • Conversion to a DATETIME or TIMESTAMP value adds a time part of '00:00:00' because the DATE value contains no time information. • Conversion to a TIME value is not useful; the result is '00:00:00'. Conversion of DATETIME and TIMESTAMP values: • Conversion to a DATE value discards the time part because the DATE type contains no time information. • Conversion to a TIME value discards the date part because the TIME type contains no date information. Conversion of TIME values: MySQL converts a time value to a date or date-and-time value by parsing the string value of the time as a date or date-and-time. This is unlikely to be useful. For example, '23:12:31' interpreted as a date becomes '2023-12-31'. Time values not valid as dates become '0000-00-00' or NULL. Explicit conversion can be used to override implicit conversion. For example, in comparison of DATE and DATETIME values, the DATE value is coerced to the DATETIME type by adding a time part of '00:00:00'. To perform the comparison by ignoring the time part of the DATETIME value instead, use the CAST() function in the following way: date_col = CAST(datetime_col AS DATE)
Conversion of TIME or DATETIME values to numeric form (for example, by adding +0) results in a double-precision value with a microseconds part of .000000: mysql> SELECT CURTIME(), CURTIME()+0; +-----------+---------------+ | CURTIME() | CURTIME()+0 | +-----------+---------------+ | 10:41:36 | 104136.000000 | +-----------+---------------+ mysql> SELECT NOW(), NOW()+0; +---------------------+-----------------------+ | NOW() | NOW()+0 | +---------------------+-----------------------+ | 2007-11-30 10:41:47 | 20071130104147.000000 | +---------------------+-----------------------+
11.3.8 Two-Digit Years in Dates Date values with two-digit years are ambiguous because the century is unknown. Such values must be interpreted into four-digit form because MySQL stores years internally using four digits.
1097
String Types
For DATETIME, DATE, and TIMESTAMP types, MySQL interprets dates specified with ambiguous year values using these rules: • Year values in the range 00-69 are converted to 2000-2069. • Year values in the range 70-99 are converted to 1970-1999. For YEAR, the rules are the same, with this exception: A numeric 00 inserted into YEAR(4) results in 0000 rather than 2000. To specify zero for YEAR(4) and have it be interpreted as 2000, specify it as a string '0' or '00'. Remember that these rules are only heuristics that provide reasonable guesses as to what your data values mean. If the rules used by MySQL do not produce the values you require, you must provide unambiguous input containing four-digit year values. ORDER BY properly sorts YEAR values that have two-digit years. Some functions like MIN() and MAX() convert a YEAR to a number. This means that a value with a two-digit year does not work properly with these functions. The fix in this case is to convert the YEAR to four-digit year format.
11.4 String Types The string types are CHAR, VARCHAR, BINARY, VARBINARY, BLOB, TEXT, ENUM, and SET. This section describes how these types work and how to use them in your queries. For string type storage requirements, see Section 11.7, “Data Type Storage Requirements”.
11.4.1 The CHAR and VARCHAR Types The CHAR and VARCHAR types are similar, but differ in the way they are stored and retrieved. They also differ in maximum length and in whether trailing spaces are retained. The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters. The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed unless the PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled. Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 65,535. The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used. See Section C.10.4, “Limits on Table Column Count and Row Size”. In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes. If strict SQL mode is not enabled and you assign a value to a CHAR or VARCHAR column that exceeds the column's maximum length, the value is truncated to fit and a warning is generated. For truncation of nonspace characters, you can cause an error to occur (rather than a warning) and suppress insertion of the value by using strict SQL mode. See Section 5.1.8, “Server SQL Modes”. For VARCHAR columns, trailing spaces in excess of the column length are truncated prior to insertion and a warning is generated, regardless of the SQL mode in use. For CHAR columns, truncation of excess trailing spaces from inserted values is performed silently regardless of the SQL mode. VARCHAR values are not padded when they are stored. Trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL.
1098
The CHAR and VARCHAR Types
The following table illustrates the differences between CHAR and VARCHAR by showing the result of storing various string values into CHAR(4) and VARCHAR(4) columns (assuming that the column uses a single-byte character set such as latin1). Value
CHAR(4)
Storage Required
VARCHAR(4)
Storage Required
''
'
'
4 bytes
''
1 byte
'ab'
'ab
'
4 bytes
'ab'
3 bytes
'abcd'
'abcd'
4 bytes
'abcd'
5 bytes
'abcdefgh'
'abcd'
4 bytes
'abcd'
5 bytes
The values shown as stored in the last row of the table apply only when not using strict mode; if MySQL is running in strict mode, values that exceed the column length are not stored, and an error results. InnoDB encodes fixed-length fields greater than or equal to 768 bytes in length as variable-length fields, which can be stored off-page. For example, a CHAR(255) column can exceed 768 bytes if the maximum byte length of the character set is greater than 3, as it is with utf8mb4. If a given value is stored into the CHAR(4) and VARCHAR(4) columns, the values retrieved from the columns are not always the same because trailing spaces are removed from CHAR columns upon retrieval. The following example illustrates this difference: mysql> CREATE TABLE vc (v VARCHAR(4), c CHAR(4)); Query OK, 0 rows affected (0.01 sec) mysql> INSERT INTO vc VALUES ('ab ', 'ab Query OK, 1 row affected (0.00 sec)
');
mysql> SELECT CONCAT('(', v, ')'), CONCAT('(', c, ')') FROM vc; +---------------------+---------------------+ | CONCAT('(', v, ')') | CONCAT('(', c, ')') | +---------------------+---------------------+ | (ab ) | (ab) | +---------------------+---------------------+ 1 row in set (0.06 sec)
Values in CHAR and VARCHAR columns are sorted and compared according to the character set collation assigned to the column. All MySQL collations are of type PAD SPACE. This means that all CHAR, VARCHAR, and TEXT values are compared without regard to any trailing spaces. “Comparison” in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant. For example: mysql> CREATE TABLE names (myname CHAR(10)); Query OK, 0 rows affected (0.03 sec) mysql> INSERT INTO names VALUES ('Monty'); Query OK, 1 row affected (0.00 sec) mysql> SELECT myname = 'Monty', myname = 'Monty +------------------+--------------------+ | myname = 'Monty' | myname = 'Monty ' | +------------------+--------------------+ | 1 | 1 | +------------------+--------------------+ 1 row in set (0.00 sec)
' FROM names;
mysql> SELECT myname LIKE 'Monty', myname LIKE 'Monty +---------------------+-----------------------+ | myname LIKE 'Monty' | myname LIKE 'Monty ' | +---------------------+-----------------------+ | 1 | 0 | +---------------------+-----------------------+
1099
' FROM names;
The BINARY and VARBINARY Types
1 row in set (0.00 sec)
This is true for all MySQL versions, and is not affected by the server SQL mode. Note For more information about MySQL character sets and collations, see Section 10.1, “Character Set Support”. For additional information about storage requirements, see Section 11.7, “Data Type Storage Requirements”. For those cases where trailing pad characters are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad characters will result in a duplicate-key error. For example, if a table contains 'a', an attempt to store 'a ' causes a duplicate-key error.
11.4.2 The BINARY and VARBINARY Types The BINARY and VARBINARY types are similar to CHAR and VARCHAR, except that they contain binary strings rather than nonbinary strings. That is, they contain byte strings rather than character strings. This means they have the binary character set and collation, and comparison and sorting are based on the numeric values of the bytes in the values. The permissible maximum length is the same for BINARY and VARBINARY as it is for CHAR and VARCHAR, except that the length for BINARY and VARBINARY is a length in bytes rather than in characters. The BINARY and VARBINARY data types are distinct from the CHAR BINARY and VARCHAR BINARY data types. For the latter types, the BINARY attribute does not cause the column to be treated as a binary string column. Instead, it causes the binary (_bin) collation for the column character set to be used, and the column itself contains nonbinary character strings rather than binary byte strings. For example, CHAR(5) BINARY is treated as CHAR(5) CHARACTER SET latin1 COLLATE latin1_bin, assuming that the default character set is latin1. This differs from BINARY(5), which stores 5-bytes binary strings that have the binary character set and collation. For information about differences between binary strings and binary collations for nonbinary strings, see Section 10.1.8.5, “The binary Collation Compared to _bin Collations”. If strict SQL mode is not enabled and you assign a value to a BINARY or VARBINARY column that exceeds the column's maximum length, the value is truncated to fit and a warning is generated. For cases of truncation, you can cause an error to occur (rather than a warning) and suppress insertion of the value by using strict SQL mode. See Section 5.1.8, “Server SQL Modes”. When BINARY values are stored, they are right-padded with the pad value to the specified length. The pad value is 0x00 (the zero byte). Values are right-padded with 0x00 on insert, and no trailing bytes are removed on select. All bytes are significant in comparisons, including ORDER BY and DISTINCT operations. 0x00 bytes and spaces are different in comparisons, with 0x00 < space. Example: For a BINARY(3) column, 'a ' becomes 'a \0' when inserted. 'a\0' becomes 'a \0\0' when inserted. Both inserted values remain unchanged when selected. For VARBINARY, there is no padding on insert and no bytes are stripped on select. All bytes are significant in comparisons, including ORDER BY and DISTINCT operations. 0x00 bytes and spaces are different in comparisons, with 0x00 < space. For those cases where trailing pad bytes are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad bytes will result in a duplicate-key error. For example, if a table contains 'a', an attempt to store 'a\0' causes a duplicate-key error. You should consider the preceding padding and stripping characteristics carefully if you plan to use the BINARY data type for storing binary data and you require that the value retrieved be exactly the same
1100
The BLOB and TEXT Types
as the value stored. The following example illustrates how 0x00-padding of BINARY values affects column value comparisons: mysql> CREATE TABLE t (c BINARY(3)); Query OK, 0 rows affected (0.01 sec) mysql> INSERT INTO t SET c = 'a'; Query OK, 1 row affected (0.01 sec) mysql> SELECT HEX(c), c = 'a', c = 'a\0\0' from t; +--------+---------+-------------+ | HEX(c) | c = 'a' | c = 'a\0\0' | +--------+---------+-------------+ | 610000 | 0 | 1 | +--------+---------+-------------+ 1 row in set (0.09 sec)
If the value retrieved must be the same as the value specified for storage with no padding, it might be preferable to use VARBINARY or one of the BLOB data types instead.
11.4.3 The BLOB and TEXT Types A BLOB is a binary large object that can hold a variable amount of data. The four BLOB types are TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB. These differ only in the maximum length of the values they can hold. The four TEXT types are TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT. These correspond to the four BLOB types and have the same maximum lengths and storage requirements. See Section 11.7, “Data Type Storage Requirements”. BLOB values are treated as binary strings (byte strings). They have the binary character set and collation, and comparison and sorting are based on the numeric values of the bytes in column values. TEXT values are treated as nonbinary strings (character strings). They have a character set other than binary, and values are sorted and compared based on the collation of the character set. If strict SQL mode is not enabled and you assign a value to a BLOB or TEXT column that exceeds the column's maximum length, the value is truncated to fit and a warning is generated. For truncation of nonspace characters, you can cause an error to occur (rather than a warning) and suppress insertion of the value by using strict SQL mode. See Section 5.1.8, “Server SQL Modes”. Truncation of excess trailing spaces from values to be inserted into TEXT columns always generates a warning, regardless of the SQL mode. For TEXT and BLOB columns, there is no padding on insert and no bytes are stripped on select. If a TEXT column is indexed, index entry comparisons are space-padded at the end. This means that, if the index requires unique values, duplicate-key errors will occur for values that differ only in the number of trailing spaces. For example, if a table contains 'a', an attempt to store 'a ' causes a duplicatekey error. This is not true for BLOB columns. In most respects, you can regard a BLOB column as a VARBINARY column that can be as large as you like. Similarly, you can regard a TEXT column as a VARCHAR column. BLOB and TEXT differ from VARBINARY and VARCHAR in the following ways: • For indexes on BLOB and TEXT columns, you must specify an index prefix length. For CHAR and VARCHAR, a prefix length is optional. See Section 8.3.4, “Column Indexes”. •
BLOB and TEXT columns cannot have DEFAULT values.
If you use the BINARY attribute with a TEXT data type, the column is assigned the binary (_bin) collation of the column character set. LONG and LONG VARCHAR map to the MEDIUMTEXT data type. This is a compatibility feature.
1101
The ENUM Type
MySQL Connector/ODBC defines BLOB values as LONGVARBINARY and TEXT values as LONGVARCHAR. Because BLOB and TEXT values can be extremely long, you might encounter some constraints in using them: • Only the first max_sort_length bytes of the column are used when sorting. The default value of max_sort_length is 1024. You can make more bytes significant in sorting or grouping by increasing the value of max_sort_length at server startup or runtime. Any client can change the value of its session max_sort_length variable: mysql> SET max_sort_length = 2000; mysql> SELECT id, comment FROM t -> ORDER BY comment;
• Instances of BLOB or TEXT columns in the result of a query that is processed using a temporary table causes the server to use a table on disk rather than in memory because the MEMORY storage engine does not support those data types (see Section 8.4.4, “Internal Temporary Table Use in MySQL”). Use of disk incurs a performance penalty, so include BLOB or TEXT columns in the query result only if they are really needed. For example, avoid using SELECT *, which selects all columns. • The maximum size of a BLOB or TEXT object is determined by its type, but the largest value you actually can transmit between the client and server is determined by the amount of available memory and the size of the communications buffers. You can change the message buffer size by changing the value of the max_allowed_packet variable, but you must do so for both the server and your client program. For example, both mysql and mysqldump enable you to change the client-side max_allowed_packet value. See Section 5.1.1, “Configuring the Server”, Section 4.5.1, “mysql — The MySQL Command-Line Tool”, and Section 4.5.4, “mysqldump — A Database Backup Program”. You may also want to compare the packet sizes and the size of the data objects you are storing with the storage requirements, see Section 11.7, “Data Type Storage Requirements” Each BLOB or TEXT value is represented internally by a separately allocated object. This is in contrast to all other data types, for which storage is allocated once per column when the table is opened. In some cases, it may be desirable to store binary data such as media files in BLOB or TEXT columns. You may find MySQL's string handling functions useful for working with such data. See Section 12.5, “String Functions”. For security and other reasons, it is usually preferable to do so using application code rather than giving application users the FILE privilege. You can discuss specifics for various languages and platforms in the MySQL Forums (http://forums.mysql.com/).
11.4.4 The ENUM Type An ENUM is a string object with a value chosen from a list of permitted values that are enumerated explicitly in the column specification at table creation time. It has these advantages: • Compact data storage in situations where a column has a limited set of possible values. The strings you specify as input values are automatically encoded as numbers. See Section 11.7, “Data Type Storage Requirements” for the storage requirements for ENUM types. • Readable queries and output. The numbers are translated back to the corresponding strings in query results. and these potential issues to consider: • If you make enumeration values that look like numbers, it is easy to mix up the literal values with their internal index numbers, as explained in Enumeration Limitations. • Using ENUM columns in ORDER BY clauses requires extra care, as explained in Enumeration Sorting. • Creating and Using ENUM Columns
1102
The ENUM Type
• Index Values for Enumeration Literals • Handling of Enumeration Literals • Empty or NULL Enumeration Values • Enumeration Sorting • Enumeration Limitations
Creating and Using ENUM Columns An enumeration value must be a quoted string literal. For example, you can create a table with an ENUM column like this: CREATE TABLE shirts ( name VARCHAR(40), size ENUM('x-small', 'small', 'medium', 'large', 'x-large') ); INSERT INTO shirts (name, size) VALUES ('dress shirt','large'), ('t-shirt','medium'), ('polo shirt','small'); SELECT name, size FROM shirts WHERE size = 'medium'; +---------+--------+ | name | size | +---------+--------+ | t-shirt | medium | +---------+--------+ UPDATE shirts SET size = 'small' WHERE size = 'large'; COMMIT;
Inserting 1 million rows into this table with a value of 'medium' would require 1 million bytes of storage, as opposed to 6 million bytes if you stored the actual string 'medium' in a VARCHAR column.
Index Values for Enumeration Literals Each enumeration value has an index: • The elements listed in the column specification are assigned index numbers, beginning with 1. • The index value of the empty string error value is 0. This means that you can use the following SELECT statement to find rows into which invalid ENUM values were assigned: mysql> SELECT * FROM tbl_name WHERE enum_col=0;
• The index of the NULL value is NULL. • The term “index” here refers to a position within the list of enumeration values. It has nothing to do with table indexes. For example, a column specified as ENUM('Mercury', 'Venus', 'Earth') can have any of the values shown here. The index of each value is also shown. Value
Index
NULL
NULL
''
0
'Mercury'
1
'Venus'
2
'Earth'
3
An ENUM column can have a maximum of 65,535 distinct elements. (The practical limit is less than 3000.) A table can have no more than 255 unique element list definitions among its ENUM and SET
1103
The ENUM Type
columns considered as a group. For more information on these limits, see Section C.10.5, “Limits Imposed by .frm File Structure”. If you retrieve an ENUM value in a numeric context, the column value's index is returned. For example, you can retrieve numeric values from an ENUM column like this: mysql> SELECT enum_col+0 FROM tbl_name;
Functions such as SUM() or AVG() that expect a numeric argument cast the argument to a number if necessary. For ENUM values, the index number is used in the calculation.
Handling of Enumeration Literals Trailing spaces are automatically deleted from ENUM member values in the table definition when a table is created. When retrieved, values stored into an ENUM column are displayed using the lettercase that was used in the column definition. Note that ENUM columns can be assigned a character set and collation. For binary or case-sensitive collations, lettercase is taken into account when assigning values to the column. If you store a number into an ENUM column, the number is treated as the index into the possible values, and the value stored is the enumeration member with that index. (However, this does not work with LOAD DATA, which treats all input as strings.) If the numeric value is quoted, it is still interpreted as an index if there is no matching string in the list of enumeration values. For these reasons, it is not advisable to define an ENUM column with enumeration values that look like numbers, because this can easily become confusing. For example, the following column has enumeration members with string values of '0', '1', and '2', but numeric index values of 1, 2, and 3: numbers ENUM('0','1','2')
If you store 2, it is interpreted as an index value, and becomes '1' (the value with index 2). If you store '2', it matches an enumeration value, so it is stored as '2'. If you store '3', it does not match any enumeration value, so it is treated as an index and becomes '2' (the value with index 3). mysql> INSERT INTO t (numbers) VALUES(2),('2'),('3'); mysql> SELECT * FROM t; +---------+ | numbers | +---------+ | 1 | | 2 | | 2 | +---------+
To determine all possible values for an ENUM column, use SHOW COLUMNS FROM tbl_name LIKE 'enum_col' and parse the ENUM definition in the Type column of the output. In the C API, ENUM values are returned as strings. For information about using result set metadata to distinguish them from other strings, see Section 23.8.5, “C API Data Structures”.
Empty or NULL Enumeration Values An enumeration value can also be the empty string ('') or NULL under certain circumstances: • If you insert an invalid value into an ENUM (that is, a string not present in the list of permitted values), the empty string is inserted instead as a special error value. This string can be distinguished from a “normal” empty string by the fact that this string has the numeric value 0. See Index Values for Enumeration Literals for details about the numeric indexes for the enumeration values. If strict SQL mode is enabled, attempts to insert invalid ENUM values result in an error.
1104
The SET Type
• If an ENUM column is declared to permit NULL, the NULL value is a valid value for the column, and the default value is NULL. If an ENUM column is declared NOT NULL, its default value is the first element of the list of permitted values.
Enumeration Sorting ENUM values are sorted based on their index numbers, which depend on the order in which the enumeration members were listed in the column specification. For example, 'b' sorts before 'a' for ENUM('b', 'a'). The empty string sorts before nonempty strings, and NULL values sort before all other enumeration values. To prevent unexpected results when using the ORDER BY clause on an ENUM column, use one of these techniques: • Specify the ENUM list in alphabetic order. • Make sure that the column is sorted lexically rather than by index number by coding ORDER BY CAST(col AS CHAR) or ORDER BY CONCAT(col).
Enumeration Limitations An enumeration value cannot be an expression, even one that evaluates to a string value. For example, this CREATE TABLE statement does not work because the CONCAT function cannot be used to construct an enumeration value: CREATE TABLE sizes ( size ENUM('small', CONCAT('med','ium'), 'large') );
You also cannot employ a user variable as an enumeration value. This pair of statements do not work: SET @mysize = 'medium'; CREATE TABLE sizes ( size ENUM('small', @mysize, 'large') );
We strongly recommend that you do not use numbers as enumeration values, because it does not save on storage over the appropriate TINYINT or SMALLINT type, and it is easy to mix up the strings and the underlying number values (which might not be the same) if you quote the ENUM values incorrectly. If you do use a number as an enumeration value, always enclose it in quotation marks. If the quotation marks are omitted, the number is regarded as an index. See Handling of Enumeration Literals to see how even a quoted number could be mistakenly used as a numeric index value. Duplicate values in the definition cause a warning, or an error if strict SQL mode is enabled.
11.4.5 The SET Type A SET is a string object that can have zero or more values, each of which must be chosen from a list of permitted values specified when the table is created. SET column values that consist of multiple set members are specified with members separated by commas (,). A consequence of this is that SET member values should not themselves contain commas. For example, a column specified as SET('one', 'two') NOT NULL can have any of these values: '' 'one' 'two' 'one,two'
1105
The SET Type
A SET column can have a maximum of 64 distinct members. A table can have no more than 255 unique element list definitions among its ENUM and SET columns considered as a group. For more information on this limit, see Section C.10.5, “Limits Imposed by .frm File Structure”. Duplicate values in the definition cause a warning, or an error if strict SQL mode is enabled. Trailing spaces are automatically deleted from SET member values in the table definition when a table is created. When retrieved, values stored in a SET column are displayed using the lettercase that was used in the column definition. Note that SET columns can be assigned a character set and collation. For binary or case-sensitive collations, lettercase is taken into account when assigning values to the column. MySQL stores SET values numerically, with the low-order bit of the stored value corresponding to the first set member. If you retrieve a SET value in a numeric context, the value retrieved has bits set corresponding to the set members that make up the column value. For example, you can retrieve numeric values from a SET column like this: mysql> SELECT set_col+0 FROM tbl_name;
If a number is stored into a SET column, the bits that are set in the binary representation of the number determine the set members in the column value. For a column specified as SET('a','b','c','d'), the members have the following decimal and binary values. SET Member
Decimal Value
Binary Value
'a'
1
0001
'b'
2
0010
'c'
4
0100
'd'
8
1000
If you assign a value of 9 to this column, that is 1001 in binary, so the first and fourth SET value members 'a' and 'd' are selected and the resulting value is 'a,d'. For a value containing more than one SET element, it does not matter what order the elements are listed in when you insert the value. It also does not matter how many times a given element is listed in the value. When the value is retrieved later, each element in the value appears once, with elements listed according to the order in which they were specified at table creation time. For example, suppose that a column is specified as SET('a','b','c','d'): mysql> CREATE TABLE myset (col SET('a', 'b', 'c', 'd'));
If you insert the values 'a,d', 'd,a', 'a,d,d', 'a,d,a', and 'd,a,d': mysql> INSERT INTO myset (col) VALUES -> ('a,d'), ('d,a'), ('a,d,a'), ('a,d,d'), ('d,a,d'); Query OK, 5 rows affected (0.01 sec) Records: 5 Duplicates: 0 Warnings: 0
Then all these values appear as 'a,d' when retrieved: mysql> SELECT col FROM myset; +------+ | col | +------+ | a,d | | a,d | | a,d | | a,d |
1106
Extensions for Spatial Data
| a,d | +------+ 5 rows in set (0.04 sec)
If you set a SET column to an unsupported value, the value is ignored and a warning is issued: mysql> INSERT INTO myset (col) VALUES ('a,d,d,s'); Query OK, 1 row affected, 1 warning (0.03 sec) mysql> SHOW WARNINGS; +---------+------+------------------------------------------+ | Level | Code | Message | +---------+------+------------------------------------------+ | Warning | 1265 | Data truncated for column 'col' at row 1 | +---------+------+------------------------------------------+ 1 row in set (0.04 sec) mysql> SELECT col FROM myset; +------+ | col | +------+ | a,d | | a,d | | a,d | | a,d | | a,d | | a,d | +------+ 6 rows in set (0.01 sec)
If strict SQL mode is enabled, attempts to insert invalid SET values result in an error. SET values are sorted numerically. NULL values sort before non-NULL SET values. Functions such as SUM() or AVG() that expect a numeric argument cast the argument to a number if necessary. For SET values, the cast operation causes the numeric value to be used. Normally, you search for SET values using the FIND_IN_SET() function or the LIKE operator: mysql> SELECT * FROM tbl_name WHERE FIND_IN_SET('value',set_col)>0; mysql> SELECT * FROM tbl_name WHERE set_col LIKE '%value%';
The first statement finds rows where set_col contains the value set member. The second is similar, but not the same: It finds rows where set_col contains value anywhere, even as a substring of another set member. The following statements also are permitted: mysql> SELECT * FROM tbl_name WHERE set_col & 1; mysql> SELECT * FROM tbl_name WHERE set_col = 'val1,val2';
The first of these statements looks for values containing the first set member. The second looks for an exact match. Be careful with comparisons of the second type. Comparing set values to 'val1,val2' returns different results than comparing values to 'val2,val1'. You should specify the values in the same order they are listed in the column definition. To determine all possible values for a SET column, use SHOW COLUMNS FROM tbl_name LIKE set_col and parse the SET definition in the Type column of the output. In the C API, SET values are returned as strings. For information about using result set metadata to distinguish them from other strings, see Section 23.8.5, “C API Data Structures”.
11.5 Extensions for Spatial Data 1107
Extensions for Spatial Data
The Open Geospatial Consortium (OGC) is an international consortium of more than 250 companies, agencies, and universities participating in the development of publicly available conceptual solutions that can be useful with all kinds of applications that manage spatial data. The Open Geospatial Consortium publishes the OpenGIS® Implementation Standard for Geographic information - Simple feature access - Part 2: SQL option, a document that proposes several conceptual ways for extending an SQL RDBMS to support spatial data. This specification is available from the OGC Web site at http://www.opengeospatial.org/standards/sfs. Following the OGC specification, MySQL implements spatial extensions as a subset of the SQL with Geometry Types environment. This term refers to an SQL environment that has been extended with a set of geometry types. A geometry-valued SQL column is implemented as a column that has a geometry type. The specification describes a set of SQL geometry types, as well as functions on those types to create and analyze geometry values. MySQL spatial extensions enable the generation, storage, and analysis of geographic features: • Data types for representing spatial values • Functions for manipulating spatial values • Spatial indexing for improved access times to spatial columns The spatial data types and functions are available for MyISAM, InnoDB, NDB, and ARCHIVE tables. For indexing spatial columns, MyISAM supports both SPATIAL and non-SPATIAL indexes. The other storage engines support non-SPATIAL indexes, as described in Section 13.1.13, “CREATE INDEX Syntax”. A geographic feature is anything in the world that has a location. A feature can be: • An entity. For example, a mountain, a pond, a city. • A space. For example, town district, the tropics. • A definable location. For example, a crossroad, as a particular place where two streets intersect. Some documents use the term geospatial feature to refer to geographic features. Geometry is another word that denotes a geographic feature. Originally the word geometry meant measurement of the earth. Another meaning comes from cartography, referring to the geometric features that cartographers use to map the world. The discussion here considers these terms synonymous: geographic feature, geospatial feature, feature, or geometry. The term most commonly used is geometry, defined as a point or an aggregate of points representing anything in the world that has a location. The following material covers these topics: • The spatial data types implemented in MySQL model • The basis of the spatial extensions in the OpenGIS geometry model • Data formats for representing spatial data • How to use spatial data in MySQL • Use of indexing for spatial data • MySQL differences from the OpenGIS specification For information about functions that operate on spatial data, see Section 12.15, “Spatial Analysis Functions”.
1108
MySQL GIS Conformance and Compatibility
MySQL GIS Conformance and Compatibility MySQL does not implement the following GIS features: • Additional Metadata Views OpenGIS specifications propose several additional metadata views. For example, a system view named GEOMETRY_COLUMNS contains a description of geometry columns, one row for each geometry column in the database. • The OpenGIS function Length() on LineString and MultiLineString should be called in MySQL as GLength() The problem is that there is an existing SQL function Length() that calculates the length of string values, and sometimes it is not possible to distinguish whether the function is called in a textual or spatial context.
Additional Resources • The Open Geospatial Consortium publishes the OpenGIS® Implementation Standard for Geographic information - Simple feature access - Part 2: SQL option, a document that proposes several conceptual ways for extending an SQL RDBMS to support spatial data. The Open Geospatial Consortium (OGC) maintains a Web site at http://www.opengeospatial.org/. The specification is available there at http://www.opengeospatial.org/standards/sfs. It contains additional information relevant to the material here. • If you have questions or concerns about the use of the spatial extensions to MySQL, you can discuss them in the GIS forum: http://forums.mysql.com/list.php?23.
11.5.1 Spatial Data Types MySQL has spatial data types that correspond to OpenGIS classes. The basis for these types is described in Section 11.5.2, “The OpenGIS Geometry Model”. Some spatial data types hold single geometry values: • GEOMETRY • POINT • LINESTRING • POLYGON GEOMETRY can store geometry values of any type. The other single-value types (POINT, LINESTRING, and POLYGON) restrict their values to a particular geometry type. The other spatial data types hold collections of values: • MULTIPOINT • MULTILINESTRING • MULTIPOLYGON • GEOMETRYCOLLECTION GEOMETRYCOLLECTION can store a collection of objects of any type. The other collection types (MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, and GEOMETRYCOLLECTION) restrict collection members to those having a particular geometry type. 1109
The OpenGIS Geometry Model
Example: To create a table named geom that has a column named g that can store values of any geometry type, use this statement: CREATE TABLE geom (g GEOMETRY);
SPATIAL indexes can be created on NOT NULL spatial columns, so if you plan to index the column, declare it NOT NULL: CREATE TABLE geom (g GEOMETRY NOT NULL);
For other examples showing how to use spatial data types in MySQL, see Section 11.5.4, “Creating Spatial Columns”.
11.5.2 The OpenGIS Geometry Model The set of geometry types proposed by OGC's SQL with Geometry Types environment is based on the OpenGIS Geometry Model. In this model, each geometric object has the following general properties: • It is associated with a spatial reference system, which describes the coordinate space in which the object is defined. • It belongs to some geometry class.
11.5.2.1 The Geometry Class Hierarchy The geometry classes define a hierarchy as follows: • Geometry (noninstantiable) • Point (instantiable) • Curve (noninstantiable) • LineString (instantiable) • Line • LinearRing • Surface (noninstantiable) • Polygon (instantiable) • GeometryCollection (instantiable) • MultiPoint (instantiable) • MultiCurve (noninstantiable) • MultiLineString (instantiable) • MultiSurface (noninstantiable) • MultiPolygon (instantiable) It is not possible to create objects in noninstantiable classes. It is possible to create objects in instantiable classes. All classes have properties, and instantiable classes may also have assertions (rules that define valid class instances). Geometry is the base class. It is an abstract class. The instantiable subclasses of Geometry are restricted to zero-, one-, and two-dimensional geometric objects that exist in two-dimensional
1110
The OpenGIS Geometry Model
coordinate space. All instantiable geometry classes are defined so that valid instances of a geometry class are topologically closed (that is, all defined geometries include their boundary). The base Geometry class has subclasses for Point, Curve, Surface, and GeometryCollection: • Point represents zero-dimensional objects. • Curve represents one-dimensional objects, and has subclass LineString, with sub-subclasses Line and LinearRing. • Surface is designed for two-dimensional objects and has subclass Polygon. • GeometryCollection has specialized zero-, one-, and two-dimensional collection classes named MultiPoint, MultiLineString, and MultiPolygon for modeling geometries corresponding to collections of Points, LineStrings, and Polygons, respectively. MultiCurve and MultiSurface are introduced as abstract superclasses that generalize the collection interfaces to handle Curves and Surfaces. Geometry, Curve, Surface, MultiCurve, and MultiSurface are defined as noninstantiable classes. They define a common set of methods for their subclasses and are included for extensibility. Point, LineString, Polygon, GeometryCollection, MultiPoint, MultiLineString, and MultiPolygon are instantiable classes.
11.5.2.2 Geometry Class Geometry is the root class of the hierarchy. It is a noninstantiable class but has a number of properties, described in the following list, that are common to all geometry values created from any of the Geometry subclasses. Particular subclasses have their own specific properties, described later. Geometry Properties A geometry value has the following properties: • Its type. Each geometry belongs to one of the instantiable classes in the hierarchy. • Its SRID, or spatial reference identifier. This value identifies the geometry's associated spatial reference system that describes the coordinate space in which the geometry object is defined. In MySQL, the SRID value is an integer associated with the geometry value. The maximum usable 32 SRID value is 2 −1. If a larger value is given, only the lower 32 bits are used. All computations are done assuming SRID 0, regardless of the actual SRID value. SRID 0 represents an infinite flat Cartesian plane with no units assigned to its axes. • Its coordinates in its spatial reference system, represented as double-precision (8-byte) numbers. All nonempty geometries include at least one pair of (X,Y) coordinates. Empty geometries contain no coordinates. Coordinates are related to the SRID. For example, in different coordinate systems, the distance between two objects may differ even when objects have the same coordinates, because the distance on the planar coordinate system and the distance on the geodetic system (coordinates on the Earth's surface) are different things. • Its interior, boundary, and exterior. Every geometry occupies some position in space. The exterior of a geometry is all space not occupied by the geometry. The interior is the space occupied by the geometry. The boundary is the interface between the geometry's interior and exterior. • Its MBR (minimum bounding rectangle), or envelope. This is the bounding geometry, formed by the minimum and maximum (X,Y) coordinates:
1111
The OpenGIS Geometry Model
((MINX MINY, MAXX MINY, MAXX MAXY, MINX MAXY, MINX MINY))
• Whether the value is simple or nonsimple. Geometry values of types (LineString, MultiPoint, MultiLineString) are either simple or nonsimple. Each type determines its own assertions for being simple or nonsimple. • Whether the value is closed or not closed. Geometry values of types (LineString, MultiString) are either closed or not closed. Each type determines its own assertions for being closed or not closed. • Whether the value is empty or nonempty A geometry is empty if it does not have any points. Exterior, interior, and boundary of an empty geometry are not defined (that is, they are represented by a NULL value). An empty geometry is defined to be always simple and has an area of 0. • Its dimension. A geometry can have a dimension of −1, 0, 1, or 2: • −1 for an empty geometry. • 0 for a geometry with no length and no area. • 1 for a geometry with nonzero length and zero area. • 2 for a geometry with nonzero area. Point objects have a dimension of zero. LineString objects have a dimension of 1. Polygon objects have a dimension of 2. The dimensions of MultiPoint, MultiLineString, and MultiPolygon objects are the same as the dimensions of the elements they consist of.
11.5.2.3 Point Class A Point is a geometry that represents a single location in coordinate space. Point Examples • Imagine a large-scale map of the world with many cities. A Point object could represent each city. • On a city map, a Point object could represent a bus stop. Point Properties • X-coordinate value. • Y-coordinate value. • Point is defined as a zero-dimensional geometry. • The boundary of a Point is the empty set.
11.5.2.4 Curve Class A Curve is a one-dimensional geometry, usually represented by a sequence of points. Particular subclasses of Curve define the type of interpolation between points. Curve is a noninstantiable class. Curve Properties • A Curve has the coordinates of its points. • A Curve is defined as a one-dimensional geometry. • A Curve is simple if it does not pass through the same point twice, with the exception that a curve can still be simple if the start and end points are the same.
1112
The OpenGIS Geometry Model
• A Curve is closed if its start point is equal to its endpoint. • The boundary of a closed Curve is empty. • The boundary of a nonclosed Curve consists of its two endpoints. • A Curve that is simple and closed is a LinearRing.
11.5.2.5 LineString Class A LineString is a Curve with linear interpolation between points. LineString Examples • On a world map, LineString objects could represent rivers. • In a city map, LineString objects could represent streets. LineString Properties • A LineString has coordinates of segments, defined by each consecutive pair of points. • A LineString is a Line if it consists of exactly two points. • A LineString is a LinearRing if it is both closed and simple.
11.5.2.6 Surface Class A Surface is a two-dimensional geometry. It is a noninstantiable class. Its only instantiable subclass is Polygon. Surface Properties • A Surface is defined as a two-dimensional geometry. • The OpenGIS specification defines a simple Surface as a geometry that consists of a single “patch” that is associated with a single exterior boundary and zero or more interior boundaries. • The boundary of a simple Surface is the set of closed curves corresponding to its exterior and interior boundaries.
11.5.2.7 Polygon Class A Polygon is a planar Surface representing a multisided geometry. It is defined by a single exterior boundary and zero or more interior boundaries, where each interior boundary defines a hole in the Polygon. Polygon Examples • On a region map, Polygon objects could represent forests, districts, and so on. Polygon Assertions • The boundary of a Polygon consists of a set of LinearRing objects (that is, LineString objects that are both simple and closed) that make up its exterior and interior boundaries. • A Polygon has no rings that cross. The rings in the boundary of a Polygon may intersect at a Point, but only as a tangent. • A Polygon has no lines, spikes, or punctures. • A Polygon has an interior that is a connected point set.
1113
The OpenGIS Geometry Model
• A Polygon may have holes. The exterior of a Polygon with holes is not connected. Each hole defines a connected component of the exterior. The preceding assertions make a Polygon a simple geometry.
11.5.2.8 GeometryCollection Class A GeometryCollection is a geometry that is a collection of one or more geometries of any class. All the elements in a GeometryCollection must be in the same spatial reference system (that is, in the same coordinate system). There are no other constraints on the elements of a GeometryCollection, although the subclasses of GeometryCollection described in the following sections may restrict membership. Restrictions may be based on: • Element type (for example, a MultiPoint may contain only Point elements) • Dimension • Constraints on the degree of spatial overlap between elements
11.5.2.9 MultiPoint Class A MultiPoint is a geometry collection composed of Point elements. The points are not connected or ordered in any way. MultiPoint Examples • On a world map, a MultiPoint could represent a chain of small islands. • On a city map, a MultiPoint could represent the outlets for a ticket office. MultiPoint Properties • A MultiPoint is a zero-dimensional geometry. • A MultiPoint is simple if no two of its Point values are equal (have identical coordinate values). • The boundary of a MultiPoint is the empty set.
11.5.2.10 MultiCurve Class A MultiCurve is a geometry collection composed of Curve elements. MultiCurve is a noninstantiable class. MultiCurve Properties • A MultiCurve is a one-dimensional geometry. • A MultiCurve is simple if and only if all of its elements are simple; the only intersections between any two elements occur at points that are on the boundaries of both elements. • A MultiCurve boundary is obtained by applying the “mod 2 union rule” (also known as the “oddeven rule”): A point is in the boundary of a MultiCurve if it is in the boundaries of an odd number of Curve elements. • A MultiCurve is closed if all of its elements are closed. • The boundary of a closed MultiCurve is always empty.
11.5.2.11 MultiLineString Class A MultiLineString is a MultiCurve geometry collection composed of LineString elements.
1114
Supported Spatial Data Formats
MultiLineString Examples • On a region map, a MultiLineString could represent a river system or a highway system.
11.5.2.12 MultiSurface Class A MultiSurface is a geometry collection composed of surface elements. MultiSurface is a noninstantiable class. Its only instantiable subclass is MultiPolygon. MultiSurface Assertions • Surfaces within a MultiSurface have no interiors that intersect. • Surfaces within a MultiSurface have boundaries that intersect at most at a finite number of points.
11.5.2.13 MultiPolygon Class A MultiPolygon is a MultiSurface object composed of Polygon elements. MultiPolygon Examples • On a region map, a MultiPolygon could represent a system of lakes. MultiPolygon Assertions • A MultiPolygon has no two Polygon elements with interiors that intersect. • A MultiPolygon has no two Polygon elements that cross (crossing is also forbidden by the previous assertion), or that touch at an infinite number of points. • A MultiPolygon may not have cut lines, spikes, or punctures. A MultiPolygon is a regular, closed point set. • A MultiPolygon that has more than one Polygon has an interior that is not connected. The number of connected components of the interior of a MultiPolygon is equal to the number of Polygon values in the MultiPolygon. MultiPolygon Properties • A MultiPolygon is a two-dimensional geometry. • A MultiPolygon boundary is a set of closed curves (LineString values) corresponding to the boundaries of its Polygon elements. • Each Curve in the boundary of the MultiPolygon is in the boundary of exactly one Polygon element. • Every Curve in the boundary of an Polygon element is in the boundary of the MultiPolygon.
11.5.3 Supported Spatial Data Formats Two standard spatial data formats are used to represent geometry objects in queries: • Well-Known Text (WKT) format • Well-Known Binary (WKB) format Internally, MySQL stores geometry values in a format that is not identical to either WKT or WKB format. (Internal format is like WKB but with an initial 4 bytes to indicate the SRID.) There are functions available to convert between different data formats; see Section 12.15.6, “Geometry Format Conversion Functions”.
1115
Supported Spatial Data Formats
The following sections describe the spatial data formats MySQL uses: • Well-Known Text (WKT) Format • Well-Known Binary (WKB) Format • Internal Geometry Storage Format
Well-Known Text (WKT) Format The Well-Known Text (WKT) representation of geometry values is designed for exchanging geometry data in ASCII form. The OpenGIS specification provides a Backus-Naur grammar that specifies the formal production rules for writing WKT values (see Section 11.5, “Extensions for Spatial Data”). Examples of WKT representations of geometry objects: • A Point: POINT(15 20)
The point coordinates are specified with no separating comma. This differs from the syntax for the SQL Point() function, which requires a comma between the coordinates. Take care to use the syntax appropriate to the context of a given spatial operation. For example, the following statements both use X() to extract the X-coordinate from a Point object. The first produces the object directly using the Point() function. The second uses a WKT representation converted to a Point with GeomFromText(). mysql> SELECT X(Point(15, 20)); +------------------+ | X(POINT(15, 20)) | +------------------+ | 15 | +------------------+ mysql> SELECT X(GeomFromText('POINT(15 20)')); +---------------------------------+ | X(GeomFromText('POINT(15 20)')) | +---------------------------------+ | 15 | +---------------------------------+
• A LineString with four points: LINESTRING(0 0, 10 10, 20 25, 50 60)
The point coordinate pairs are separated by commas. • A Polygon with one exterior ring and one interior ring: POLYGON((0 0,10 0,10 10,0 10,0 0),(5 5,7 5,7 7,5 7, 5 5))
• A MultiPoint with three Point values: MULTIPOINT(0 0, 20 20, 60 60)
• A MultiLineString with two LineString values: MULTILINESTRING((10 10, 20 20), (15 15, 30 15))
• A MultiPolygon with two Polygon values: 1116
Supported Spatial Data Formats
MULTIPOLYGON(((0 0,10 0,10 10,0 10,0 0)),((5 5,7 5,7 7,5 7, 5 5)))
• A GeometryCollection consisting of two Point values and one LineString: GEOMETRYCOLLECTION(POINT(10 10), POINT(30 30), LINESTRING(15 15, 20 20))
Well-Known Binary (WKB) Format The Well-Known Binary (WKB) representation of geometric values is used for exchanging geometry data as binary streams represented by BLOB values containing geometric WKB information. This format is defined by the OpenGIS specification (see Section 11.5, “Extensions for Spatial Data”). It is also defined in the ISO SQL/MM Part 3: Spatial standard. WKB uses 1-byte unsigned integers, 4-byte unsigned integers, and 8-byte double-precision numbers (IEEE 754 format). A byte is eight bits. For example, a WKB value that corresponds to POINT(1 -1) consists of this sequence of 21 bytes, each represented by two hexadecimal digits: 0101000000000000000000F03F000000000000F0BF
The sequence consists of the components shown in the following table. Table 11.1 WKB Components Example Component
Size
Value
Byte order
1 byte
01
WKB type
4 bytes
01000000
X coordinate
8 bytes
000000000000F03F
Y coordinate
8 bytes
000000000000F0BF
Component representation is as follows: • The byte order indicator is either 1 or 0 to signify little-endian or big-endian storage. The little-endian and big-endian byte orders are also known as Network Data Representation (NDR) and External Data Representation (XDR), respectively. • The WKB type is a code that indicates the geometry type. MySQL uses values from 1 through 7 to indicate Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. • A Point value has X and Y coordinates, each represented as a double-precision value. WKB values for more complex geometry values have more complex data structures, as detailed in the OpenGIS specification.
Internal Geometry Storage Format MySQL stores geometry values using 4 bytes to indicate the SRID followed by the WKB representation of the value. For a description of WKB format, see Well-Known Binary (WKB) Format. For the WKB part, these MySQL-specific considerations apply: • The byte-order indicator byte is 1 because MySQL stores geometries as little-ending values. • MySQL supports geometry types of Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. Other geometry types are not supported.
1117
Creating Spatial Columns
The LENGTH() function returns the space in bytes required for value storage. Example: mysql> SET @g = GeomFromText('POINT(1 -1)'); mysql> SELECT LENGTH(@g); +------------+ | LENGTH(@g) | +------------+ | 25 | +------------+ mysql> SELECT HEX(@g); +----------------------------------------------------+ | HEX(@g) | +----------------------------------------------------+ | 000000000101000000000000000000F03F000000000000F0BF | +----------------------------------------------------+
The value length is 25 bytes, made up of these components (as can be seen from the hexadecimal value): • 4 bytes for integer SRID (0) • 1 byte for integer byte order (1 = little-endian) • 4 bytes for integer type information (1 = Point) • 8 bytes for double-precision X coordinate (1) • 8 bytes for double-precision Y coordinate (−1)
11.5.4 Creating Spatial Columns MySQL provides a standard way of creating spatial columns for geometry types, for example, with CREATE TABLE or ALTER TABLE. Spatial columns are supported for MyISAM, InnoDB, NDB, and ARCHIVE tables. See also the notes about spatial indexes under Section 11.5.8, “Creating Spatial Indexes”. • Use the CREATE TABLE statement to create a table with a spatial column: CREATE TABLE geom (g GEOMETRY);
• Use the ALTER TABLE statement to add or drop a spatial column to or from an existing table: ALTER TABLE geom ADD pt POINT; ALTER TABLE geom DROP pt;
11.5.5 Populating Spatial Columns After you have created spatial columns, you can populate them with spatial data. Values should be stored in internal geometry format, but you can convert them to that format from either Well-Known Text (WKT) or Well-Known Binary (WKB) format. The following examples demonstrate how to insert geometry values into a table by converting WKT values to internal geometry format: • Perform the conversion directly in the INSERT statement: INSERT INTO geom VALUES (GeomFromText('POINT(1 1)')); SET @g = 'POINT(1 1)'; INSERT INTO geom VALUES (GeomFromText(@g));
• Perform the conversion prior to the INSERT:
1118
Fetching Spatial Data
SET @g = GeomFromText('POINT(1 1)'); INSERT INTO geom VALUES (@g);
The following examples insert more complex geometries into the table: SET @g = 'LINESTRING(0 0,1 1,2 2)'; INSERT INTO geom VALUES (GeomFromText(@g)); SET @g = 'POLYGON((0 0,10 0,10 10,0 10,0 0),(5 5,7 5,7 7,5 7, 5 5))'; INSERT INTO geom VALUES (GeomFromText(@g)); SET @g = 'GEOMETRYCOLLECTION(POINT(1 1),LINESTRING(0 0,1 1,2 2,3 3,4 4))'; INSERT INTO geom VALUES (GeomFromText(@g));
The preceding examples use GeomFromText() to create geometry values. You can also use typespecific functions: SET @g = 'POINT(1 1)'; INSERT INTO geom VALUES (PointFromText(@g)); SET @g = 'LINESTRING(0 0,1 1,2 2)'; INSERT INTO geom VALUES (LineStringFromText(@g)); SET @g = 'POLYGON((0 0,10 0,10 10,0 10,0 0),(5 5,7 5,7 7,5 7, 5 5))'; INSERT INTO geom VALUES (PolygonFromText(@g)); SET @g = 'GEOMETRYCOLLECTION(POINT(1 1),LINESTRING(0 0,1 1,2 2,3 3,4 4))'; INSERT INTO geom VALUES (GeomCollFromText(@g));
A client application program that wants to use WKB representations of geometry values is responsible for sending correctly formed WKB in queries to the server. There are several ways to satisfy this requirement. For example: • Inserting a POINT(1 1) value with hex literal syntax: INSERT INTO geom VALUES (GeomFromWKB(X'0101000000000000000000F03F000000000000F03F'));
• An ODBC application can send a WKB representation, binding it to a placeholder using an argument of BLOB type: INSERT INTO geom VALUES (GeomFromWKB(?))
Other programming interfaces may support a similar placeholder mechanism. • In a C program, you can escape a binary value using mysql_real_escape_string() and include the result in a query string that is sent to the server. See Section 23.8.7.53, “mysql_real_escape_string()”.
11.5.6 Fetching Spatial Data Geometry values stored in a table can be fetched in internal format. You can also convert them to WKT or WKB format. • Fetching spatial data in internal format: Fetching geometry values using internal format can be useful in table-to-table transfers:
1119
Optimizing Spatial Analysis
CREATE TABLE geom2 (g GEOMETRY) SELECT g FROM geom;
• Fetching spatial data in WKT format: The AsText() function converts a geometry from internal format to a WKT string. SELECT AsText(g) FROM geom;
• Fetching spatial data in WKB format: The AsBinary() function converts a geometry from internal format to a BLOB containing the WKB value. SELECT AsBinary(g) FROM geom;
11.5.7 Optimizing Spatial Analysis For MyISAM tables, search operations in columns containing spatial data can be optimized using SPATIAL indexes. The most typical operations are: • Point queries that search for all objects that contain a given point • Region queries that search for all objects that overlap a given region MySQL uses R-Trees with quadratic splitting for SPATIAL indexes on spatial columns. A SPATIAL index is built using the minimum bounding rectangle (MBR) of a geometry. For most geometries, the MBR is a minimum rectangle that surrounds the geometries. For a horizontal or a vertical linestring, the MBR is a rectangle degenerated into the linestring. For a point, the MBR is a rectangle degenerated into the point. It is also possible to create normal indexes on spatial columns. In a non-SPATIAL index, you must declare a prefix for any spatial column except for POINT columns. MyISAM supports both SPATIAL and non-SPATIAL indexes. Other storage engines support non-SPATIAL indexes, as described in Section 13.1.13, “CREATE INDEX Syntax”.
11.5.8 Creating Spatial Indexes For MyISAM tables, MySQL can create spatial indexes using syntax similar to that for creating regular indexes, but using the SPATIAL keyword. Columns in spatial indexes must be declared NOT NULL. The following examples demonstrate how to create spatial indexes: • With CREATE TABLE: CREATE TABLE geom (g GEOMETRY NOT NULL, SPATIAL INDEX(g)) ENGINE=MyISAM;
• With ALTER TABLE: CREATE TABLE geom (g GEOMETRY NOT NULL) ENGINE=MyISAM; ALTER TABLE geom ADD SPATIAL INDEX(g);
• With CREATE INDEX: CREATE TABLE geom (g GEOMETRY NOT NULL) ENGINE=MyISAM; CREATE SPATIAL INDEX g ON geom (g);
SPATIAL INDEX creates an R-tree index. For storage engines that support nonspatial indexing of spatial columns, the engine creates a B-tree index. A B-tree index on spatial values is useful for exactvalue lookups, but not for range scans.
1120
Using Spatial Indexes
For more information on indexing spatial columns, see Section 13.1.13, “CREATE INDEX Syntax”. To drop spatial indexes, use ALTER TABLE or DROP INDEX: • With ALTER TABLE: ALTER TABLE geom DROP INDEX g;
• With DROP INDEX: DROP INDEX g ON geom;
Example: Suppose that a table geom contains more than 32,000 geometries, which are stored in the column g of type GEOMETRY. The table also has an AUTO_INCREMENT column fid for storing object ID values. mysql> DESCRIBE geom; +-------+----------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------+----------+------+-----+---------+----------------+ | fid | int(11) | | PRI | NULL | auto_increment | | g | geometry | | | | | +-------+----------+------+-----+---------+----------------+ 2 rows in set (0.00 sec) mysql> SELECT COUNT(*) FROM geom; +----------+ | count(*) | +----------+ | 32376 | +----------+ 1 row in set (0.00 sec)
To add a spatial index on the column g, use this statement: mysql> ALTER TABLE geom ADD SPATIAL INDEX(g) ENGINE=MyISAM; Query OK, 32376 rows affected (4.05 sec) Records: 32376 Duplicates: 0 Warnings: 0
11.5.9 Using Spatial Indexes The optimizer investigates whether available spatial indexes can be involved in the search for queries that use a function such as MBRContains() or MBRWithin() in the WHERE clause. The following query finds all objects that are in the given rectangle: mysql> SET @poly = -> 'Polygon((30000 15000, 31000 15000, 31000 16000, 30000 16000, 30000 15000))'; mysql> SELECT fid,AsText(g) FROM geom WHERE -> MBRContains(GeomFromText(@poly),g); +-----+---------------------------------------------------------------+ | fid | AsText(g) | +-----+---------------------------------------------------------------+ | 21 | LINESTRING(30350.4 15828.8,30350.6 15845,30333.8 15845,30 ... | | 22 | LINESTRING(30350.6 15871.4,30350.6 15887.8,30334 15887.8, ... | | 23 | LINESTRING(30350.6 15914.2,30350.6 15930.4,30334 15930.4, ... | | 24 | LINESTRING(30290.2 15823,30290.2 15839.4,30273.4 15839.4, ... | | 25 | LINESTRING(30291.4 15866.2,30291.6 15882.4,30274.8 15882. ... | | 26 | LINESTRING(30291.6 15918.2,30291.6 15934.4,30275 15934.4, ... | | 249 | LINESTRING(30337.8 15938.6,30337.8 15946.8,30320.4 15946. ... | | 1 | LINESTRING(30250.4 15129.2,30248.8 15138.4,30238.2 15136. ... |
1121
Using Spatial Indexes
| 2 | LINESTRING(30220.2 15122.8,30217.2 15137.8,30207.6 15136, ... | | 3 | LINESTRING(30179 15114.4,30176.6 15129.4,30167 15128,3016 ... | | 4 | LINESTRING(30155.2 15121.4,30140.4 15118.6,30142 15109,30 ... | | 5 | LINESTRING(30192.4 15085,30177.6 15082.2,30179.2 15072.4, ... | | 6 | LINESTRING(30244 15087,30229 15086.2,30229.4 15076.4,3024 ... | | 7 | LINESTRING(30200.6 15059.4,30185.6 15058.6,30186 15048.8, ... | | 10 | LINESTRING(30179.6 15017.8,30181 15002.8,30190.8 15003.6, ... | | 11 | LINESTRING(30154.2 15000.4,30168.6 15004.8,30166 15014.2, ... | | 13 | LINESTRING(30105 15065.8,30108.4 15050.8,30118 15053,3011 ... | | 154 | LINESTRING(30276.2 15143.8,30261.4 15141,30263 15131.4,30 ... | | 155 | LINESTRING(30269.8 15084,30269.4 15093.4,30258.6 15093,30 ... | | 157 | LINESTRING(30128.2 15011,30113.2 15010.2,30113.6 15000.4, ... | +-----+---------------------------------------------------------------+ 20 rows in set (0.00 sec)
Use EXPLAIN to check the way this query is executed: mysql> SET @poly = -> 'Polygon((30000 15000, 31000 15000, 31000 16000, 30000 16000, 30000 15000))'; mysql> EXPLAIN SELECT fid,AsText(g) FROM geom WHERE -> MBRContains(GeomFromText(@poly),g)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: geom type: range possible_keys: g key: g key_len: 32 ref: NULL rows: 50 Extra: Using where 1 row in set (0.00 sec)
Check what would happen without a spatial index: mysql> SET @poly = -> 'Polygon((30000 15000, 31000 15000, 31000 16000, 30000 16000, 30000 15000))'; mysql> EXPLAIN SELECT fid,AsText(g) FROM g IGNORE INDEX (g) WHERE -> MBRContains(GeomFromText(@poly),g)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: geom type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 32376 Extra: Using where 1 row in set (0.00 sec)
Executing the SELECT statement without the spatial index yields the same result but causes the execution time to rise from 0.00 seconds to 0.46 seconds: mysql> SET @poly = -> 'Polygon((30000 15000, 31000 15000, 31000 16000,
1122
Data Type Default Values
30000 16000, 30000 15000))'; mysql> SELECT fid,AsText(g) FROM geom IGNORE INDEX (g) WHERE -> MBRContains(GeomFromText(@poly),g); +-----+---------------------------------------------------------------+ | fid | AsText(g) | +-----+---------------------------------------------------------------+ | 1 | LINESTRING(30250.4 15129.2,30248.8 15138.4,30238.2 15136. ... | | 2 | LINESTRING(30220.2 15122.8,30217.2 15137.8,30207.6 15136, ... | | 3 | LINESTRING(30179 15114.4,30176.6 15129.4,30167 15128,3016 ... | | 4 | LINESTRING(30155.2 15121.4,30140.4 15118.6,30142 15109,30 ... | | 5 | LINESTRING(30192.4 15085,30177.6 15082.2,30179.2 15072.4, ... | | 6 | LINESTRING(30244 15087,30229 15086.2,30229.4 15076.4,3024 ... | | 7 | LINESTRING(30200.6 15059.4,30185.6 15058.6,30186 15048.8, ... | | 10 | LINESTRING(30179.6 15017.8,30181 15002.8,30190.8 15003.6, ... | | 11 | LINESTRING(30154.2 15000.4,30168.6 15004.8,30166 15014.2, ... | | 13 | LINESTRING(30105 15065.8,30108.4 15050.8,30118 15053,3011 ... | | 21 | LINESTRING(30350.4 15828.8,30350.6 15845,30333.8 15845,30 ... | | 22 | LINESTRING(30350.6 15871.4,30350.6 15887.8,30334 15887.8, ... | | 23 | LINESTRING(30350.6 15914.2,30350.6 15930.4,30334 15930.4, ... | | 24 | LINESTRING(30290.2 15823,30290.2 15839.4,30273.4 15839.4, ... | | 25 | LINESTRING(30291.4 15866.2,30291.6 15882.4,30274.8 15882. ... | | 26 | LINESTRING(30291.6 15918.2,30291.6 15934.4,30275 15934.4, ... | | 154 | LINESTRING(30276.2 15143.8,30261.4 15141,30263 15131.4,30 ... | | 155 | LINESTRING(30269.8 15084,30269.4 15093.4,30258.6 15093,30 ... | | 157 | LINESTRING(30128.2 15011,30113.2 15010.2,30113.6 15000.4, ... | | 249 | LINESTRING(30337.8 15938.6,30337.8 15946.8,30320.4 15946. ... | +-----+---------------------------------------------------------------+ 20 rows in set (0.46 sec)
11.6 Data Type Default Values The DEFAULT value clause in a data type specification indicates a default value for a column. With one exception, the default value must be a constant; it cannot be a function or an expression. This means, for example, that you cannot set the default for a date column to be the value of a function such as NOW() or CURRENT_DATE. The exception is that you can specify CURRENT_TIMESTAMP as the default for a TIMESTAMP column. See Section 11.3.5, “Automatic Initialization and Updating for TIMESTAMP”. BLOB and TEXT columns cannot be assigned a default value. If a column definition includes no explicit DEFAULT value, MySQL determines the default value as follows: If the column can take NULL as a value, the column is defined with an explicit DEFAULT NULL clause. If the column cannot take NULL as the value, MySQL defines the column with no explicit DEFAULT clause. Exception: If the column is defined as part of a PRIMARY KEY but not explicitly as NOT NULL, MySQL creates it as a NOT NULL column (because PRIMARY KEY columns must be NOT NULL), but also assigns it a DEFAULT clause using the implicit default value. To prevent this, include an explicit NOT NULL in the definition of any PRIMARY KEY column. For data entry into a NOT NULL column that has no explicit DEFAULT clause, if an INSERT or REPLACE statement includes no value for the column, or an UPDATE statement sets the column to NULL, MySQL handles the column according to the SQL mode in effect at the time: • If strict SQL mode is enabled, an error occurs for transactional tables and the statement is rolled back. For nontransactional tables, an error occurs, but if this happens for the second or subsequent row of a multiple-row statement, the preceding rows will have been inserted. • If strict mode is not enabled, MySQL sets the column to the implicit default value for the column data type. Suppose that a table t is defined as follows:
1123
Data Type Storage Requirements
CREATE TABLE t (i INT NOT NULL);
In this case, i has no explicit default, so in strict mode each of the following statements produce an error and no row is inserted. When not using strict mode, only the third statement produces an error; the implicit default is inserted for the first two statements, but the third fails because DEFAULT(i) cannot produce a value: INSERT INTO t VALUES(); INSERT INTO t VALUES(DEFAULT); INSERT INTO t VALUES(DEFAULT(i));
See Section 5.1.8, “Server SQL Modes”. For a given table, you can use the SHOW CREATE TABLE statement to see which columns have an explicit DEFAULT clause. Implicit defaults are defined as follows: • For numeric types, the default is 0, with the exception that for integer or floating-point types declared with the AUTO_INCREMENT attribute, the default is the next value in the sequence. • For date and time types other than TIMESTAMP, the default is the appropriate “zero” value for the type. For the first TIMESTAMP column in a table, the default value is the current date and time. See Section 11.3, “Date and Time Types”. • For string types other than ENUM, the default value is the empty string. For ENUM, the default is the first enumeration value. SERIAL DEFAULT VALUE in the definition of an integer column is an alias for NOT NULL AUTO_INCREMENT UNIQUE.
11.7 Data Type Storage Requirements • InnoDB Table Storage Requirements • NDBCLUSTER Table Storage Requirements • Numeric Type Storage Requirements • Date and Time Type Storage Requirements • String Type Storage Requirements • Spatial Type Storage Requirements The storage requirements for data vary, according to the storage engine being used for the table in question. Different storage engines use different methods for recording the raw data and different data types. In addition, some engines may compress the information in a given row, either on a column or entire row basis, making calculation of the storage requirements for a given table or column structure. However, all storage engines must communicate and exchange information on a given row within a table using the same structure, and this information is consistent, irrespective of the storage engine used to write the information to disk. This sections includes some guideliness and information for the storage requirements for each data type supported by MySQL, including details for the internal format and the sizes used by storage engines that used a fixed size representation for different types. Information is listed by category or storage engine. The internal representation of a table has a maximum row size of 65,535 bytes, even if the storage engine is capable of supporting larger rows. This figure excludes BLOB or TEXT columns, which
1124
InnoDB Table Storage Requirements
contribute only 9 to 12 bytes toward this size. For BLOB and TEXT data, the information is stored internally in a different area of memory than the row buffer. Different storage engines handle the allocation and storage of this data in different ways, according to the method they use for handling the corresponding types. For more information, see Chapter 15, Alternative Storage Engines, and Section C.10.4, “Limits on Table Column Count and Row Size”.
InnoDB Table Storage Requirements See Section 14.11.1.2, “The Physical Row Structure of an InnoDB Table” for information about storage requirements for InnoDB tables.
NDBCLUSTER Table Storage Requirements Important For tables using the NDBCLUSTER storage engine, there is the factor of 4-byte alignment to be taken into account when calculating storage requirements. This means that all NDB data storage is done in multiples of 4 bytes. Thus, a column value that would take 15 bytes in a table using a storage engine other than NDB requires 16 bytes in an NDB table. This requirement applies in addition to any other considerations that are discussed in this section. For example, in NDBCLUSTER tables, the TINYINT, SMALLINT, MEDIUMINT, and INTEGER (INT) column types each require 4 bytes storage per record due to the alignment factor. An exception to this rule is the BIT type, which is not 4-byte aligned. In NDB Cluster tables, a BIT(M) column takes M bits of storage space. However, if a table definition contains 1 or more BIT columns (up to 32 BIT columns), then NDBCLUSTER reserves 4 bytes (32 bits) per row for these. If a table definition contains more than 32 BIT columns (up to 64 such columns), then NDBCLUSTER reserves 8 bytes (that is, 64 bits) per row. In addition, while a NULL itself does not require any storage space, NDBCLUSTER reserves 4 bytes per row if the table definition contains any columns defined as NULL, up to 32 NULL columns. (If an NDB Cluster table is defined with more than 32 NULL columns up to 64 NULL columns, then 8 bytes per row is reserved.) When calculating storage requirements for NDB Cluster tables, you must also remember that every table using the NDBCLUSTER storage engine requires a primary key; if no primary key is defined by the user, then a “hidden” primary key will be created by NDB. This hidden primary key consumes 31-35 bytes per table record. You may find the ndb_size.pl utility to be useful for estimating NDB storage requirements. This Perl script connects to a current MySQL (non-Cluster) database and creates a report on how much space that database would require if it used the NDBCLUSTER storage engine. See Section 18.4.25, “ndb_size.pl — NDBCLUSTER Size Requirement Estimator”, for more information.
Numeric Type Storage Requirements Data Type
Storage Required
TINYINT
1 byte
SMALLINT
2 bytes
MEDIUMINT
3 bytes
INT, INTEGER
4 bytes
BIGINT
8 bytes
1125
Date and Time Type Storage Requirements
Data Type
Storage Required
FLOAT(p)
4 bytes if 0 <= p <= 24, 8 bytes if 25 <= p <= 53
FLOAT
4 bytes
DOUBLE [PRECISION], REAL
8 bytes
DECIMAL(M,D), NUMERIC(M,D)
Varies; see following discussion
BIT(M)
approximately (M+7)/8 bytes
Values for DECIMAL (and NUMERIC) columns are represented using a binary format that packs nine decimal (base 10) digits into four bytes. Storage for the integer and fractional parts of each value are determined separately. Each multiple of nine digits requires four bytes, and the “leftover” digits require some fraction of four bytes. The storage required for excess digits is given by the following table. Leftover Digits
Number of Bytes
0
0
1
1
2
1
3
2
4
2
5
3
6
3
7
4
8
4
Date and Time Type Storage Requirements Data Type
Storage Required
DATE
3 bytes
TIME
3 bytes
DATETIME
8 bytes
TIMESTAMP
4 bytes
YEAR
1 byte
For details about internal representation of temporal values, see MySQL Internals: Important Algorithms and Structures.
String Type Storage Requirements In the following table, M represents the declared column length in characters for nonbinary string types and bytes for binary string types. L represents the actual length in bytes of a given string value. Data Type
Storage Required
CHAR(M)
M × w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set. See Section 14.11.1.2, “The Physical Row Structure of an InnoDB Table” for information about CHAR data type storage requirements for InnoDB tables.
BINARY(M)
M bytes, 0 <= M <= 255
VARCHAR(M), VARBINARY(M)
L + 1 bytes if column values require 0 − 255 bytes, L + 2 bytes if values may require more than 255 bytes
1126
String Type Storage Requirements
Data Type
Storage Required
TINYBLOB, TINYTEXT
L + 1 bytes, where L < 2
BLOB, TEXT
L + 2 bytes, where L < 2
MEDIUMBLOB, MEDIUMTEXT
L + 3 bytes, where L < 2
LONGBLOB, LONGTEXT
L + 4 bytes, where L < 2
ENUM('value1','value2',...)
1 or 2 bytes, depending on the number of enumeration values (65,535 values maximum)
SET('value1','value2',...)
1, 2, 3, 4, or 8 bytes, depending on the number of set members (64 members maximum)
8 16 24 32
Variable-length string types are stored using a length prefix plus data. The length prefix requires from one to four bytes depending on the data type, and the value of the prefix is L (the byte length of the string). For example, storage for a MEDIUMTEXT value requires L bytes to store the value plus three bytes to store the length of the value. To calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multibyte characters. In particular, when using a utf8 Unicode character set, you must keep in mind that not all characters use the same number of bytes. utf8mb3 and utf8mb4 character sets can require up to three and four bytes per character, respectively. For a breakdown of the storage used for different categories of utf8mb3 or utf8mb4 characters, see Section 10.1.9, “Unicode Support”. VARCHAR, VARBINARY, and the BLOB and TEXT types are variable-length types. For each, the storage requirements depend on these factors: • The actual length of the column value • The column's maximum possible length • The character set used for the column, because some character sets contain multibyte characters For example, a VARCHAR(255) column can hold a string with a maximum length of 255 characters. Assuming that the column uses the latin1 character set (one byte per character), the actual storage required is the length of the string (L), plus one byte to record the length of the string. For the string 'abcd', L is 4 and the storage requirement is five bytes. If the same column is instead declared to use the ucs2 double-byte character set, the storage requirement is 10 bytes: The length of 'abcd' is eight bytes and the column requires two bytes to store lengths because the maximum length is greater than 255 (up to 510 bytes). The effective maximum number of bytes that can be stored in a VARCHAR or VARBINARY column is subject to the maximum row size of 65,535 bytes, which is shared among all columns. For a VARCHAR column that stores multibyte characters, the effective maximum number of characters is less. For example, utf8mb3 characters can require up to three bytes per character, so a VARCHAR column that uses the utf8mb3 character set can be declared to be a maximum of 21,844 characters. See Section C.10.4, “Limits on Table Column Count and Row Size”. InnoDB encodes fixed-length fields greater than or equal to 768 bytes in length as variable-length fields, which can be stored off-page. For example, a CHAR(255) column can exceed 768 bytes if the maximum byte length of the character set is greater than 3, as it is with utf8mb4. The NDBCLUSTER storage engine supports variable-width columns. This means that a VARCHAR column in an NDB Cluster table requires the same amount of storage as would any other storage engine, with the exception that such values are 4-byte aligned. Thus, the string 'abcd' stored in a VARCHAR(50) column using the latin1 character set requires 8 bytes (rather than 5 bytes for the same column value in a MyISAM table). TEXT and BLOB columns are implemented differently in the NDB Cluster storage engine, wherein each row in a TEXT column is made up of two separate parts. One of these is of fixed size (256 bytes), and
1127
Spatial Type Storage Requirements
is actually stored in the original table. The other consists of any data in excess of 256 bytes, which is stored in a hidden table. The rows in this second table are always 2,000 bytes long. This means that the size of a TEXT column is 256 if size <= 256 (where size represents the size of the row); otherwise, the size is 256 + size + (2000 − (size − 256) % 2000). The size of an ENUM object is determined by the number of different enumeration values. One byte is used for enumerations with up to 255 possible values. Two bytes are used for enumerations having between 256 and 65,535 possible values. See Section 11.4.4, “The ENUM Type”. The size of a SET object is determined by the number of different set members. If the set size is N, the object occupies (N+7)/8 bytes, rounded up to 1, 2, 3, 4, or 8 bytes. A SET can have a maximum of 64 members. See Section 11.4.5, “The SET Type”.
Spatial Type Storage Requirements MySQL stores geometry values using 4 bytes to indicate the SRID followed by the WKB representation of the value. The LENGTH() function returns the space in bytes required for value storage. For descriptions of WKB and internal storage formats for spatial values, see Section 11.5.3, “Supported Spatial Data Formats”.
11.8 Choosing the Right Type for a Column For optimum storage, you should try to use the most precise type in all cases. For example, if an integer column is used for values in the range from 1 to 99999, MEDIUMINT UNSIGNED is the best type. Of the types that represent all the required values, this type uses the least amount of storage. All basic calculations (+, -, *, and /) with DECIMAL columns are done with precision of 65 decimal (base 10) digits. See Section 11.1.1, “Numeric Type Overview”. If accuracy is not too important or if speed is the highest priority, the DOUBLE type may be good enough. For high precision, you can always convert to a fixed-point type stored in a BIGINT. This enables you to do all calculations with 64-bit integers and then convert results back to floating-point values as necessary. PROCEDURE ANALYSE can be used to obtain suggestions for optimal column data types. For more information, see Section 8.4.2.4, “Using PROCEDURE ANALYSE”.
11.9 Using Data Types from Other Database Engines To facilitate the use of code written for SQL implementations from other vendors, MySQL maps data types as shown in the following table. These mappings make it easier to import table definitions from other database systems into MySQL. Other Vendor Type
MySQL Type
BOOL
TINYINT
BOOLEAN
TINYINT
CHARACTER VARYING(M)
VARCHAR(M)
FIXED
DECIMAL
FLOAT4
FLOAT
FLOAT8
DOUBLE
INT1
TINYINT
INT2
SMALLINT
INT3
MEDIUMINT
INT4
INT
1128
Using Data Types from Other Database Engines
Other Vendor Type
MySQL Type
INT8
BIGINT
LONG VARBINARY
MEDIUMBLOB
LONG VARCHAR
MEDIUMTEXT
LONG
MEDIUMTEXT
MIDDLEINT
MEDIUMINT
NUMERIC
DECIMAL
Data type mapping occurs at table creation time, after which the original type specifications are discarded. If you create a table with types used by other vendors and then issue a DESCRIBE tbl_name statement, MySQL reports the table structure using the equivalent MySQL types. For example: mysql> CREATE TABLE t (a BOOL, b FLOAT8, c LONG VARCHAR, d NUMERIC); Query OK, 0 rows affected (0.00 sec) mysql> DESCRIBE t; +-------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+---------------+------+-----+---------+-------+ | a | tinyint(1) | YES | | NULL | | | b | double | YES | | NULL | | | c | mediumtext | YES | | NULL | | | d | decimal(10,0) | YES | | NULL | | +-------+---------------+------+-----+---------+-------+ 4 rows in set (0.01 sec)
1129
1130
Chapter 12 Functions and Operators Table of Contents 12.1 Function and Operator Reference .................................................................................... 12.2 Type Conversion in Expression Evaluation ....................................................................... 12.3 Operators ........................................................................................................................ 12.3.1 Operator Precedence ............................................................................................ 12.3.2 Comparison Functions and Operators .................................................................... 12.3.3 Logical Operators ................................................................................................. 12.3.4 Assignment Operators ........................................................................................... 12.4 Control Flow Functions .................................................................................................... 12.5 String Functions .............................................................................................................. 12.5.1 String Comparison Functions ................................................................................. 12.5.2 Regular Expressions ............................................................................................. 12.5.3 Character Set and Collation of Function Results ..................................................... 12.6 Numeric Functions and Operators .................................................................................... 12.6.1 Arithmetic Operators ............................................................................................. 12.6.2 Mathematical Functions ......................................................................................... 12.7 Date and Time Functions ................................................................................................. 12.8 What Calendar Is Used By MySQL? ................................................................................ 12.9 Full-Text Search Functions .............................................................................................. 12.9.1 Natural Language Full-Text Searches .................................................................... 12.9.2 Boolean Full-Text Searches .................................................................................. 12.9.3 Full-Text Searches with Query Expansion .............................................................. 12.9.4 Full-Text Stopwords .............................................................................................. 12.9.5 Full-Text Restrictions ............................................................................................ 12.9.6 Fine-Tuning MySQL Full-Text Search .................................................................... 12.9.7 Adding a Collation for Full-Text Indexing ................................................................ 12.10 Cast Functions and Operators ........................................................................................ 12.11 XML Functions .............................................................................................................. 12.12 Bit Functions and Operators ........................................................................................... 12.13 Encryption and Compression Functions .......................................................................... 12.14 Information Functions ..................................................................................................... 12.15 Spatial Analysis Functions ............................................................................................. 12.15.1 Spatial Function Reference ................................................................................. 12.15.2 Argument Handling by Spatial Functions .............................................................. 12.15.3 Functions That Create Geometry Values from WKT Values ................................... 12.15.4 Functions That Create Geometry Values from WKB Values ................................... 12.15.5 MySQL-Specific Functions That Create Geometry Values ..................................... 12.15.6 Geometry Format Conversion Functions .............................................................. 12.15.7 Geometry Property Functions .............................................................................. 12.15.8 Spatial Operator Functions .................................................................................. 12.15.9 Functions That Test Spatial Relations Between Geometry Objects ......................... 12.16 Aggregate (GROUP BY) Functions ................................................................................. 12.16.1 Aggregate (GROUP BY) Function Descriptions .................................................... 12.16.2 GROUP BY Modifiers ......................................................................................... 12.16.3 MySQL Handling of GROUP BY .......................................................................... 12.17 Miscellaneous Functions ................................................................................................ 12.18 Precision Math .............................................................................................................. 12.18.1 Types of Numeric Values .................................................................................... 12.18.2 DECIMAL Data Type Characteristics ................................................................... 12.18.3 Expression Handling ........................................................................................... 12.18.4 Rounding Behavior ............................................................................................. 12.18.5 Precision Math Examples ....................................................................................
1131
1132 1141 1143 1144 1145 1151 1153 1154 1156 1168 1171 1177 1178 1179 1181 1189 1211 1211 1212 1216 1218 1219 1222 1222 1224 1226 1232 1243 1244 1251 1260 1261 1262 1263 1263 1264 1265 1265 1270 1271 1273 1273 1277 1280 1282 1287 1287 1288 1289 1290 1291
Function and Operator Reference
Expressions can be used at several points in SQL statements, such as in the ORDER BY or HAVING clauses of SELECT statements, in the WHERE clause of a SELECT, DELETE, or UPDATE statement, or in SET statements. Expressions can be written using literal values, column values, NULL, built-in functions, stored functions, user-defined functions, and operators. This chapter describes the functions and operators that are permitted for writing expressions in MySQL. Instructions for writing stored functions and user-defined functions are given in Section 20.2, “Using Stored Routines (Procedures and Functions)”, and Section 24.4, “Adding New Functions to MySQL”. See Section 9.2.4, “Function Name Parsing and Resolution”, for the rules describing how the server interprets references to different kinds of functions. An expression that contains NULL always produces a NULL value unless otherwise indicated in the documentation for a particular function or operator. Note By default, there must be no whitespace between a function name and the parenthesis following it. This helps the MySQL parser distinguish between function calls and references to tables or columns that happen to have the same name as a function. However, spaces around function arguments are permitted. You can tell the MySQL server to accept spaces after function names by starting it with the --sqlmode=IGNORE_SPACE option. (See Section 5.1.8, “Server SQL Modes”.) Individual client programs can request this behavior by using the CLIENT_IGNORE_SPACE option for mysql_real_connect(). In either case, all function names become reserved words. For the sake of brevity, most examples in this chapter display the output from the mysql program in abbreviated form. Rather than showing examples in this format: mysql> SELECT MOD(29,9); +-----------+ | mod(29,9) | +-----------+ | 2 | +-----------+ 1 rows in set (0.00 sec)
This format is used instead: mysql> SELECT MOD(29,9); -> 2
12.1 Function and Operator Reference Table 12.1 Functions/Operators Name
Description
ABS()
Return the absolute value
ACOS()
Return the arc cosine
ADDDATE()
Add time values (intervals) to a date value
ADDTIME()
Add time
AES_DECRYPT()
Decrypt using AES
AES_ENCRYPT()
Encrypt using AES
AND, &&
Logical AND
Area()
Return Polygon or MultiPolygon area
AsBinary(), AsWKB()
Convert from internal geometry format to WKB
ASCII()
Return numeric value of left-most character
ASIN()
Return the arc sine
1132
Function and Operator Reference
Name
Description
=
Assign a value (as part of a SET statement, or as part of the SET clause in an UPDATE statement)
:=
Assign a value
AsText(), AsWKT()
Convert from internal geometry format to WKT
ATAN()
Return the arc tangent
ATAN2(), ATAN()
Return the arc tangent of the two arguments
AVG()
Return the average value of the argument
BENCHMARK()
Repeatedly execute an expression
BETWEEN ... AND ...
Check whether a value is within a range of values
BIN()
Return a string containing binary representation of a number
BINARY
Cast a string to a binary string
BIT_AND()
Return bitwise AND
BIT_COUNT()
Return the number of bits that are set
BIT_LENGTH()
Return length of argument in bits
BIT_OR()
Return bitwise OR
BIT_XOR()
Return bitwise XOR
&
Bitwise AND
~
Bitwise inversion
|
Bitwise OR
^
Bitwise XOR
CASE
Case operator
CAST()
Cast a value as a certain type
CEIL()
Return the smallest integer value not less than the argument
CEILING()
Return the smallest integer value not less than the argument
Centroid()
Return centroid as a point
CHAR()
Return the character for each integer passed
CHAR_LENGTH()
Return number of characters in argument
CHARACTER_LENGTH()
Synonym for CHAR_LENGTH()
CHARSET()
Return the character set of the argument
COALESCE()
Return the first non-NULL argument
COERCIBILITY()
Return the collation coercibility value of the string argument
COLLATION()
Return the collation of the string argument
COMPRESS()
Return result as a binary string
CONCAT()
Return concatenated string
CONCAT_WS()
Return concatenate with separator
CONNECTION_ID()
Return the connection ID (thread ID) for the connection
Contains()
Whether MBR of one geometry contains MBR of another
CONV()
Convert numbers between different number bases
CONVERT()
Cast a value as a certain type
CONVERT_TZ()
Convert from one time zone to another
COS()
Return the cosine
1133
Function and Operator Reference
Name
Description
COT()
Return the cotangent
COUNT()
Return a count of the number of rows returned
COUNT(DISTINCT)
Return the count of a number of different values
CRC32()
Compute a cyclic redundancy check value
Crosses()
Whether one geometry crosses another
CURDATE()
Return the current date
CURRENT_DATE(), CURRENT_DATE
Synonyms for CURDATE()
CURRENT_TIME(), CURRENT_TIME
Synonyms for CURTIME()
CURRENT_TIMESTAMP(), CURRENT_TIMESTAMP
Synonyms for NOW()
CURRENT_USER(), CURRENT_USER
The authenticated user name and host name
CURTIME()
Return the current time
DATABASE()
Return the default (current) database name
DATE()
Extract the date part of a date or datetime expression
DATE_ADD()
Add time values (intervals) to a date value
DATE_FORMAT()
Format date as specified
DATE_SUB()
Subtract a time value (interval) from a date
DATEDIFF()
Subtract two dates
DAY()
Synonym for DAYOFMONTH()
DAYNAME()
Return the name of the weekday
DAYOFMONTH()
Return the day of the month (0-31)
DAYOFWEEK()
Return the weekday index of the argument
DAYOFYEAR()
Return the day of the year (1-366)
DECODE()
Decodes a string encrypted using ENCODE()
DEFAULT()
Return the default value for a table column
DEGREES()
Convert radians to degrees
DES_DECRYPT()
Decrypt a string
DES_ENCRYPT()
Encrypt a string
Dimension()
Dimension of geometry
Disjoint()
Whether MBRs of two geometries are disjoint
DIV
Integer division
/
Division operator
ELT()
Return string at index number
ENCODE()
Encode a string
ENCRYPT()
Encrypt a string
EndPoint()
End Point of LineString
Envelope()
Return MBR of geometry
=
Equal operator
<=>
NULL-safe equal to operator
Equals()
Whether MBRs of two geometries are equal
EXP()
Raise to the power of
1134
Function and Operator Reference
Name
Description
EXPORT_SET()
Return a string such that for every bit set in the value bits, you get an on string and for every unset bit, you get an off string
ExteriorRing()
Return exterior ring of Polygon
EXTRACT()
Extract part of a date
ExtractValue()
Extracts a value from an XML string using XPath notation
FIELD()
Return the index (position) of the first argument in the subsequent arguments
FIND_IN_SET()
Return the index position of the first argument within the second argument
FLOOR()
Return the largest integer value not greater than the argument
FORMAT()
Return a number formatted to specified number of decimal places
FOUND_ROWS()
For a SELECT with a LIMIT clause, the number of rows that would be returned were there no LIMIT clause
FROM_DAYS()
Convert a day number to a date
FROM_UNIXTIME()
Format Unix timestamp as a date
GeomCollFromText(), GeometryCollectionFromText()
Return geometry collection from WKT
GeomCollFromWKB(), GeometryCollectionFromWKB()
Return geometry collection from WKB
GeometryCollection()
Construct geometry collection from geometries
GeometryN()
Return N-th geometry from geometry collection
GeometryType()
Return name of geometry type
GeomFromText(), GeometryFromText()
Return geometry from WKT
GeomFromWKB(), GeometryFromWKB()
Return geometry from WKB
GET_FORMAT()
Return a date format string
GET_LOCK()
Get a named lock
GLength()
Return length of LineString
>
Greater than operator
>=
Greater than or equal operator
GREATEST()
Return the largest argument
GROUP_CONCAT()
Return a concatenated string
HEX()
Return a hexadecimal representation of a decimal or string value
HOUR()
Extract the hour
IF()
If/else construct
IFNULL()
Null if/else construct
IN()
Check whether a value is within a set of values
INET_ATON()
Return the numeric value of an IP address
INET_NTOA()
Return the IP address from a numeric value
1135
Function and Operator Reference
Name
Description
INSERT()
Insert a substring at the specified position up to the specified number of characters
INSTR()
Return the index of the first occurrence of substring
InteriorRingN()
Return N-th interior ring of Polygon
Intersects()
Whether MBRs of two geometries intersect
INTERVAL()
Return the index of the argument that is less than the first argument
IS
Test a value against a boolean
IS_FREE_LOCK()
Whether the named lock is free
IS NOT
Test a value against a boolean
IS NOT NULL
NOT NULL value test
IS NULL
NULL value test
IS_USED_LOCK()
Whether the named lock is in use; return connection identifier if true
IsClosed()
Whether a geometry is closed and simple
IsEmpty()
Placeholder function
ISNULL()
Test whether the argument is NULL
IsSimple()
Whether a geometry is simple
LAST_DAY
Return the last day of the month for the argument
LAST_INSERT_ID()
Value of the AUTOINCREMENT column for the last INSERT
LCASE()
Synonym for LOWER()
LEAST()
Return the smallest argument
LEFT()
Return the leftmost number of characters as specified
<<
Left shift
LENGTH()
Return the length of a string in bytes
<
Less than operator
<=
Less than or equal operator
LIKE
Simple pattern matching
LineFromText(), LineStringFromText()
Construct LineString from WKT
LineFromWKB(), LineStringFromWKB()
Construct LineString from WKB
LineString()
Construct LineString from Point values
LN()
Return the natural logarithm of the argument
LOAD_FILE()
Load the named file
LOCALTIME(), LOCALTIME
Synonym for NOW()
LOCALTIMESTAMP, LOCALTIMESTAMP()
Synonym for NOW()
LOCATE()
Return the position of the first occurrence of substring
LOG()
Return the natural logarithm of the first argument
LOG10()
Return the base-10 logarithm of the argument
LOG2()
Return the base-2 logarithm of the argument
1136
Function and Operator Reference
Name
Description
LOWER()
Return the argument in lowercase
LPAD()
Return the string argument, left-padded with the specified string
LTRIM()
Remove leading spaces
MAKE_SET()
Return a set of comma-separated strings that have the corresponding bit in bits set
MAKEDATE()
Create a date from the year and day of year
MAKETIME()
Create time from hour, minute, second
MASTER_POS_WAIT()
Block until the slave has read and applied all updates up to the specified position
MATCH
Perform full-text search
MAX()
Return the maximum value
MBRContains()
Whether MBR of one geometry contains MBR of another
MBRDisjoint()
Whether MBRs of two geometries are disjoint
MBREqual()
Whether MBRs of two geometries are equal
MBRIntersects()
Whether MBRs of two geometries intersect
MBROverlaps()
Whether MBRs of two geometries overlap
MBRTouches()
Whether MBRs of two geometries touch
MBRWithin()
Whether MBR of one geometry is within MBR of another
MD5()
Calculate MD5 checksum
MICROSECOND()
Return the microseconds from argument
MID()
Return a substring starting from the specified position
MIN()
Return the minimum value
-
Minus operator
MINUTE()
Return the minute from the argument
MLineFromText(), MultiLineStringFromText()
Construct MultiLineString from WKT
MLineFromWKB(), MultiLineStringFromWKB()
Construct MultiLineString from WKB
MOD()
Return the remainder
%, MOD
Modulo operator
MONTH()
Return the month from the date passed
MONTHNAME()
Return the name of the month
MPointFromText(), MultiPointFromText()
Construct MultiPoint from WKT
MPointFromWKB(), MultiPointFromWKB()
Construct MultiPoint from WKB
MPolyFromText(), MultiPolygonFromText()
Construct MultiPolygon from WKT
MPolyFromWKB(), MultiPolygonFromWKB()
Construct MultiPolygon from WKB
MultiLineString()
Contruct MultiLineString from LineString values
MultiPoint()
Construct MultiPoint from Point values
1137
Function and Operator Reference
Name
Description
MultiPolygon()
Construct MultiPolygon from Polygon values
NAME_CONST()
Causes the column to have the given name
NOT, !
Negates value
NOT BETWEEN ... AND ...
Check whether a value is not within a range of values
!=, <>
Not equal operator
NOT IN()
Check whether a value is not within a set of values
NOT LIKE
Negation of simple pattern matching
NOT REGEXP
Negation of REGEXP
NOW()
Return the current date and time
NULLIF()
Return NULL if expr1 = expr2
NumGeometries()
Return number of geometries in geometry collection
NumInteriorRings()
Return number of interior rings in Polygon
NumPoints()
Return number of points in LineString
OCT()
Return a string containing octal representation of a number
OCTET_LENGTH()
Synonym for LENGTH()
OLD_PASSWORD()
Return the value of the pre-4.1 implementation of PASSWORD
||, OR
Logical OR
ORD()
Return character code for leftmost character of the argument
Overlaps()
Whether MBRs of two geometries overlap
PASSWORD()
Calculate and return a password string
PERIOD_ADD()
Add a period to a year-month
PERIOD_DIFF()
Return the number of months between periods
PI()
Return the value of pi
+
Addition operator
Point()
Construct Point from coordinates
PointFromText()
Construct Point from WKT
PointFromWKB()
Construct Point from WKB
PointN()
Return N-th point from LineString
PolyFromText(), PolygonFromText()
Construct Polygon from WKT
PolyFromWKB(), PolygonFromWKB()
Construct Polygon from WKB
Polygon()
Construct Polygon from LineString arguments
POSITION()
Synonym for LOCATE()
POW()
Return the argument raised to the specified power
POWER()
Return the argument raised to the specified power
PROCEDURE ANALYSE()
Analyze the results of a query
QUARTER()
Return the quarter from a date argument
QUOTE()
Escape the argument for use in an SQL statement
RADIANS()
Return argument converted to radians
RAND()
Return a random floating-point value
1138
Function and Operator Reference
Name
Description
REGEXP
Pattern matching using regular expressions
RELEASE_LOCK()
Releases the named lock
REPEAT()
Repeat a string the specified number of times
REPLACE()
Replace occurrences of a specified string
REVERSE()
Reverse the characters in a string
RIGHT()
Return the specified rightmost number of characters
>>
Right shift
RLIKE
Synonym for REGEXP
ROUND()
Round the argument
ROW_COUNT()
The number of rows updated
RPAD()
Append string the specified number of times
RTRIM()
Remove trailing spaces
SCHEMA()
Synonym for DATABASE()
SEC_TO_TIME()
Converts seconds to 'HH:MM:SS' format
SECOND()
Return the second (0-59)
SESSION_USER()
Synonym for USER()
SHA1(), SHA()
Calculate an SHA-1 160-bit checksum
SHA2()
Calculate an SHA-2 checksum
SIGN()
Return the sign of the argument
SIN()
Return the sine of the argument
SLEEP()
Sleep for a number of seconds
SOUNDEX()
Return a soundex string
SOUNDS LIKE
Compare sounds
SPACE()
Return a string of the specified number of spaces
SQRT()
Return the square root of the argument
SRID()
Return spatial reference system ID for geometry
StartPoint()
Start Point of LineString
STD()
Return the population standard deviation
STDDEV()
Return the population standard deviation
STDDEV_POP()
Return the population standard deviation
STDDEV_SAMP()
Return the sample standard deviation
STR_TO_DATE()
Convert a string to a date
STRCMP()
Compare two strings
SUBDATE()
Synonym for DATE_SUB() when invoked with three arguments
SUBSTR()
Return the substring as specified
SUBSTRING()
Return the substring as specified
SUBSTRING_INDEX()
Return a substring from a string before the specified number of occurrences of the delimiter
SUBTIME()
Subtract times
SUM()
Return the sum
1139
Function and Operator Reference
Name
Description
SYSDATE()
Return the time at which the function executes
SYSTEM_USER()
Synonym for USER()
TAN()
Return the tangent of the argument
TIME()
Extract the time portion of the expression passed
TIME_FORMAT()
Format as time
TIME_TO_SEC()
Return the argument converted to seconds
TIMEDIFF()
Subtract time
*
Multiplication operator
TIMESTAMP()
With a single argument, this function returns the date or datetime expression; with two arguments, the sum of the arguments
TIMESTAMPADD()
Add an interval to a datetime expression
TIMESTAMPDIFF()
Subtract an interval from a datetime expression
TO_DAYS()
Return the date argument converted to days
TO_SECONDS()
Return the date or datetime argument converted to seconds since Year 0
Touches()
Whether one geometry touches another
TRIM()
Remove leading and trailing spaces
TRUNCATE()
Truncate to specified number of decimal places
UCASE()
Synonym for UPPER()
-
Change the sign of the argument
UNCOMPRESS()
Uncompress a string compressed
UNCOMPRESSED_LENGTH()
Return the length of a string before compression
UNHEX()
Return a string containing hex representation of a number
UNIX_TIMESTAMP()
Return a Unix timestamp
UpdateXML()
Return replaced XML fragment
UPPER()
Convert to uppercase
USER()
The user name and host name provided by the client
UTC_DATE()
Return the current UTC date
UTC_TIME()
Return the current UTC time
UTC_TIMESTAMP()
Return the current UTC date and time
UUID()
Return a Universal Unique Identifier (UUID)
UUID_SHORT()
Return an integer-valued universal identifier
VALUES()
Defines the values to be used during an INSERT
VAR_POP()
Return the population standard variance
VAR_SAMP()
Return the sample variance
VARIANCE()
Return the population standard variance
VERSION()
Return a string that indicates the MySQL server version
WEEK()
Return the week number
WEEKDAY()
Return the weekday index
WEEKOFYEAR()
Return the calendar week of the date (1-53)
Within()
Whether MBR of one geometry is within MBR of another
1140
Type Conversion in Expression Evaluation
Name
Description
X()
Return X coordinate of Point
XOR
Logical XOR
Y()
Return Y coordinate of Point
YEAR()
Return the year
YEARWEEK()
Return the year and week
12.2 Type Conversion in Expression Evaluation When an operator is used with operands of different types, type conversion occurs to make the operands compatible. Some conversions occur implicitly. For example, MySQL automatically converts numbers to strings as necessary, and vice versa. mysql> SELECT 1+'1'; -> 2 mysql> SELECT CONCAT(2,' test'); -> '2 test'
It is also possible to convert a number to a string explicitly using the CAST() function. Conversion occurs implicitly with the CONCAT() function because it expects string arguments. mysql> SELECT 38.8, CAST(38.8 AS CHAR); -> 38.8, '38.8' mysql> SELECT 38.8, CONCAT(38.8); -> 38.8, '38.8'
See later in this section for information about the character set of implicit number-to-string conversions, and for modified rules that apply to CREATE TABLE ... SELECT statements. The following rules describe how conversion occurs for comparison operations: • If one or both arguments are NULL, the result of the comparison is NULL, except for the NULL-safe <=> equality comparison operator. For NULL <=> NULL, the result is true. No conversion is needed. • If both arguments in a comparison operation are strings, they are compared as strings. • If both arguments are integers, they are compared as integers. • Hexadecimal values are treated as binary strings if not compared to a number. •
If one of the arguments is a TIMESTAMP or DATETIME column and the other argument is a constant, the constant is converted to a timestamp before the comparison is performed. This is done to be more ODBC-friendly. Note that this is not done for the arguments to IN()! To be safe, always use complete datetime, date, or time strings when doing comparisons. For example, to achieve best results when using BETWEEN with date or time values, use CAST() to explicitly convert the values to the desired data type.
• If one of the arguments is a decimal value, comparison depends on the other argument. The arguments are compared as decimal values if the other argument is a decimal or integer value, or as floating-point values if the other argument is a floating-point value. • In all other cases, the arguments are compared as floating-point (real) numbers. For information about conversion of values from one temporal type to another, see Section 11.3.7, “Conversion Between Date and Time Types”. The following examples illustrate conversion of strings to numbers for comparison operations: mysql> SELECT 1 > '6x';
1141
Type Conversion in Expression Evaluation
-> 0 mysql> SELECT 7 > '6x'; -> 1 mysql> SELECT 0 > 'x6'; -> 0 mysql> SELECT 0 = 'x6'; -> 1
For comparisons of a string column with a number, MySQL cannot use an index on the column to look up the value quickly. If str_col is an indexed string column, the index cannot be used when performing the lookup in the following statement: SELECT * FROM tbl_name WHERE str_col=1;
The reason for this is that there are many different strings that may convert to the value 1, such as '1', ' 1', or '1a'. Comparisons that use floating-point numbers (or values that are converted to floating-point numbers) are approximate because such numbers are inexact. This might lead to results that appear inconsistent: mysql> SELECT '18015376320243458' = 18015376320243458; -> 1 mysql> SELECT '18015376320243459' = 18015376320243459; -> 0
Such results can occur because the values are converted to floating-point numbers, which have only 53 bits of precision and are subject to rounding: mysql> SELECT '18015376320243459'+0.0; -> 1.8015376320243e+16
Furthermore, the conversion from string to floating-point and from integer to floating-point do not necessarily occur the same way. The integer may be converted to floating-point by the CPU, whereas the string is converted digit by digit in an operation that involves floating-point multiplications. The results shown will vary on different systems, and can be affected by factors such as computer architecture or the compiler version or optimization level. One way to avoid such problems is to use CAST() so that a value is not converted implicitly to a float-point number: mysql> SELECT CAST('18015376320243459' AS UNSIGNED) = 18015376320243459; -> 1
For more information about floating-point comparisons, see Section B.5.4.8, “Problems with FloatingPoint Values”. As of MySQL 5.5.3, the server includes dtoa, a conversion library that provides the basis for improved conversion between string or DECIMAL values and approximate-value (FLOAT/DOUBLE) numbers: • Consistent conversion results across platforms, which eliminates, for example, Unix versus Windows conversion differences. • Accurate representation of values in cases where results previously did not provide sufficient precision, such as for values close to IEEE limits. • Conversion of numbers to string format with the best possible precision. The precision of dtoa is always the same or better than that of the standard C library functions. Because the conversions produced by this library differ in some cases from non-dtoa results, the potential exists for incompatibilities in applications that rely on previous results. For example, applications that depend on a specific exact result from previous conversions might need adjustment to accommodate additional precision.
1142
Operators
The dtoa library provides conversions with the following properties. D represents a value with a DECIMAL or string representation, and F represents a floating-point number in native binary (IEEE) format. • F -> D conversion is done with the best possible precision, returning D as the shortest string that yields F when read back in and rounded to the nearest value in native binary format as specified by IEEE. • D -> F conversion is done such that F is the nearest native binary number to the input decimal string D. These properties imply that F -> D -> F conversions are lossless unless F is -inf, +inf, or NaN. The latter values are not supported because the SQL standard defines them as invalid values for FLOAT or DOUBLE. For D -> F -> D conversions, a sufficient condition for losslessness is that D uses 15 or fewer digits of precision, is not a denormal value, -inf, +inf, or NaN. In some cases, the conversion is lossless even if D has more than 15 digits of precision, but this is not always the case. As of MySQL 5.5.3, implicit conversion of a numeric or temporal value to string produces a value that has a character set and collation determined by the character_set_connection and collation_connection system variables. (These variables commonly are set with SET NAMES. For information about connection character sets, see Section 10.1.4, “Connection Character Sets and Collations”.) This change means that such a conversion results in a character (nonbinary) string (a CHAR, VARCHAR, or LONGTEXT value), except when the connection character set is set to binary. In that case, the conversion result is a binary string (a BINARY, VARBINARY, or LONGBLOB value). Before MySQL 5.5.3, an implicit conversion always produced a binary string, regardless of the connection character set. Such implicit conversions to string typically occur for functions that are passed numeric or temporal values when string values are more usual, and thus could have effects beyond the type of the converted value. Consider the expression CONCAT(1, 'abc'). The numeric argument 1 was converted to the binary string '1' and the concatenation of that value with the nonbinary string 'abc' produced the binary string '1abc'. Some functions are unaffected by this change in behavior: • CHAR() without a USING clause still returns VARBINARY. • Functions that previously returned utf8 strings still do so. Examples include CHARSET() and COLLATION(). • Encryption and compression functions that expect string arguments and previously returned binary strings are unaffected if the return value can contain non-ASCII characters. Examples include AES_ENCRYPT() and COMPRESS(). If the return value contains only ASCII characters, the function now returns a character string with the connection character set and collation. Examples include MD5() and PASSWORD().
12.3 Operators Table 12.2 Operators Name
Description
AND, &&
Logical AND
=
Assign a value (as part of a SET statement, or as part of the SET clause in an UPDATE statement)
:=
Assign a value
BETWEEN ... AND ...
Check whether a value is within a range of values
BINARY
Cast a string to a binary string
1143
Operator Precedence
Name
Description
&
Bitwise AND
~
Bitwise inversion
|
Bitwise OR
^
Bitwise XOR
CASE
Case operator
DIV
Integer division
/
Division operator
=
Equal operator
<=>
NULL-safe equal to operator
>
Greater than operator
>=
Greater than or equal operator
IS
Test a value against a boolean
IS NOT
Test a value against a boolean
IS NOT NULL
NOT NULL value test
IS NULL
NULL value test
<<
Left shift
<
Less than operator
<=
Less than or equal operator
LIKE
Simple pattern matching
-
Minus operator
%, MOD
Modulo operator
NOT, !
Negates value
NOT BETWEEN ... AND ...
Check whether a value is not within a range of values
!=, <>
Not equal operator
NOT LIKE
Negation of simple pattern matching
NOT REGEXP
Negation of REGEXP
||, OR
Logical OR
+
Addition operator
REGEXP
Pattern matching using regular expressions
>>
Right shift
RLIKE
Synonym for REGEXP
SOUNDS LIKE
Compare sounds
*
Multiplication operator
-
Change the sign of the argument
XOR
Logical XOR
12.3.1 Operator Precedence Operator precedences are shown in the following list, from highest precedence to the lowest. Operators that are shown together on a line have the same precedence. INTERVAL BINARY, COLLATE !
1144
Comparison Functions and Operators
- (unary minus), ~ (unary bit inversion) ^ *, /, DIV, %, MOD -, + <<, >> & | = (comparison), <=>, >=, >, <=, <, <>, !=, IS, LIKE, REGEXP, IN BETWEEN, CASE, WHEN, THEN, ELSE NOT AND, && XOR OR, || = (assignment), :=
The precedence of = depends on whether it is used as a comparison operator (=) or as an assignment operator (=). When used as a comparison operator, it has the same precedence as <=>, >=, >, <=, <, <>, !=, IS, LIKE, REGEXP, and IN. When used as an assignment operator, it has the same precedence as :=. Section 13.7.4.1, “SET Syntax for Variable Assignment”, and Section 9.4, “UserDefined Variables”, explain how MySQL determines which interpretation of = should apply. For operators that occur at the same precedence level within an expression, evaluation proceeds left to right, with the exception that assignments evaluate right to left. The meaning of some operators depends on the SQL mode: • By default, || is a logical OR operator. With PIPES_AS_CONCAT enabled, || is string concatenation, with a precedence between ^ and the unary operators. • By default, ! has a higher precedence than NOT. With HIGH_NOT_PRECEDENCE enabled, ! and NOT have the same precedence. See Section 5.1.8, “Server SQL Modes”. The precedence of operators determines the order of evaluation of terms in an expression. To override this order and group terms explicitly, use parentheses. For example: mysql> SELECT 1+2*3; -> 7 mysql> SELECT (1+2)*3; -> 9
12.3.2 Comparison Functions and Operators Table 12.3 Comparison Operators Name
Description
BETWEEN ... AND ...
Check whether a value is within a range of values
COALESCE()
Return the first non-NULL argument
=
Equal operator
<=>
NULL-safe equal to operator
>
Greater than operator
>=
Greater than or equal operator
GREATEST()
Return the largest argument
IN()
Check whether a value is within a set of values
INTERVAL()
Return the index of the argument that is less than the first argument
IS
Test a value against a boolean
IS NOT
Test a value against a boolean
1145
Comparison Functions and Operators
Name
Description
IS NOT NULL
NOT NULL value test
IS NULL
NULL value test
ISNULL()
Test whether the argument is NULL
LEAST()
Return the smallest argument
<
Less than operator
<=
Less than or equal operator
LIKE
Simple pattern matching
NOT BETWEEN ... AND ...
Check whether a value is not within a range of values
!=, <>
Not equal operator
NOT IN()
Check whether a value is not within a set of values
NOT LIKE
Negation of simple pattern matching
STRCMP()
Compare two strings
Comparison operations result in a value of 1 (TRUE), 0 (FALSE), or NULL. These operations work for both numbers and strings. Strings are automatically converted to numbers and numbers to strings as necessary. The following relational comparison operators can be used to compare not only scalar operands, but row operands: =
>
<
>=
<=
<>
!=
The descriptions for those operators later in this section detail how they work with row operands. For additional examples of row comparisons in the context of row subqueries, see Section 13.2.10.5, “Row Subqueries”. Some of the functions in this section return values other than 1 (TRUE), 0 (FALSE), or NULL. For example, LEAST() and GREATEST(). However, the value they return is based on comparison operations performed according to the rules described in Section 12.2, “Type Conversion in Expression Evaluation”. To convert a value to a specific type for comparison purposes, you can use the CAST() function. String values can be converted to a different character set using CONVERT(). See Section 12.10, “Cast Functions and Operators”. By default, string comparisons are not case sensitive and use the current character set. The default is latin1 (cp1252 West European), which also works well for English. •
= Equal: mysql> SELECT -> 0 mysql> SELECT -> 1 mysql> SELECT -> 1 mysql> SELECT -> 0 mysql> SELECT -> 1
1 = 0; '0' = 0; '0.0' = 0; '0.01' = 0; '.01' = 0.01;
For row comparisons, (a, b) = (x, y) is equivalent to:
1146
Comparison Functions and Operators
(a = x) AND (b = y)
•
<=> NULL-safe equal. This operator performs an equality comparison like the = operator, but returns 1 rather than NULL if both operands are NULL, and 0 rather than NULL if one operand is NULL. The <=> operator is equivalent to the standard SQL IS NOT DISTINCT FROM operator. mysql> SELECT -> 1, mysql> SELECT -> 1,
1 <=> 1, NULL <=> NULL, 1 <=> NULL; 1, 0 1 = 1, NULL = NULL, 1 = NULL; NULL, NULL
For row comparisons, (a, b) <=> (x, y) is equivalent to: (a <=> x) AND (b <=> y)
•
<>, != Not equal: mysql> SELECT '.01' <> '0.01'; -> 1 mysql> SELECT .01 <> '0.01'; -> 0 mysql> SELECT 'zapp' <> 'zappp'; -> 1
For row comparisons, (a, b) <> (x, y) and (a, b) != (x, y) are equivalent to: (a <> x) OR (b <> y)
•
<= Less than or equal: mysql> SELECT 0.1 <= 2; -> 1
For row comparisons, (a, b) <= (x, y) is equivalent to: (a < x) OR ((a = x) AND (b <= y))
•
< Less than: mysql> SELECT 2 < 2; -> 0
For row comparisons, (a, b) < (x, y) is equivalent to: (a < x) OR ((a = x) AND (b < y))
•
>= Greater than or equal:
1147
Comparison Functions and Operators
mysql> SELECT 2 >= 2; -> 1
For row comparisons, (a, b) >= (x, y) is equivalent to: (a > x) OR ((a = x) AND (b >= y))
•
> Greater than: mysql> SELECT 2 > 2; -> 0
For row comparisons, (a, b) > (x, y) is equivalent to: (a > x) OR ((a = x) AND (b > y))
•
IS boolean_value Tests a value against a boolean value, where boolean_value can be TRUE, FALSE, or UNKNOWN. mysql> SELECT 1 IS TRUE, 0 IS FALSE, NULL IS UNKNOWN; -> 1, 1, 1
•
IS NOT boolean_value Tests a value against a boolean value, where boolean_value can be TRUE, FALSE, or UNKNOWN. mysql> SELECT 1 IS NOT UNKNOWN, 0 IS NOT UNKNOWN, NULL IS NOT UNKNOWN; -> 1, 1, 0
•
IS NULL Tests whether a value is NULL. mysql> SELECT 1 IS NULL, 0 IS NULL, NULL IS NULL; -> 0, 0, 1
To work well with ODBC programs, MySQL supports the following extra features when using IS NULL: • If sql_auto_is_null variable is set to 1, then after a statement that successfully inserts an automatically generated AUTO_INCREMENT value, you can find that value by issuing a statement of the following form: SELECT * FROM tbl_name WHERE auto_col IS NULL
If the statement returns a row, the value returned is the same as if you invoked the LAST_INSERT_ID() function. For details, including the return value after a multiple-row insert, see Section 12.14, “Information Functions”. If no AUTO_INCREMENT value was successfully inserted, the SELECT statement returns no row. The behavior of retrieving an AUTO_INCREMENT value by using an IS NULL comparison can be disabled by setting sql_auto_is_null = 0. See Section 5.1.5, “Server System Variables”. The default value of sql_auto_is_null is 0 as of MySQL 5.5.3, and 1 for earlier versions. 1148
Comparison Functions and Operators
• For DATE and DATETIME columns that are declared as NOT NULL, you can find the special date '0000-00-00' by using a statement like this: SELECT * FROM tbl_name WHERE date_column IS NULL
This is needed to get some ODBC applications to work because ODBC does not support a '0000-00-00' date value. See Obtaining Auto-Increment Values, and the description for the FLAG_AUTO_IS_NULL option at Connector/ODBC Connection Parameters. •
IS NOT NULL Tests whether a value is not NULL. mysql> SELECT 1 IS NOT NULL, 0 IS NOT NULL, NULL IS NOT NULL; -> 1, 1, 0
• expr BETWEEN min AND max If expr is greater than or equal to min and expr is less than or equal to max, BETWEEN returns 1, otherwise it returns 0. This is equivalent to the expression (min <= expr AND expr <= max) if all the arguments are of the same type. Otherwise type conversion takes place according to the rules described in Section 12.2, “Type Conversion in Expression Evaluation”, but applied to all the three arguments. mysql> SELECT -> 1, mysql> SELECT -> 0 mysql> SELECT -> 1 mysql> SELECT -> 1 mysql> SELECT -> 0
2 BETWEEN 1 AND 3, 2 BETWEEN 3 and 1; 0 1 BETWEEN 2 AND 3; 'b' BETWEEN 'a' AND 'c'; 2 BETWEEN 2 AND '3'; 2 BETWEEN 2 AND 'x-3';
For best results when using BETWEEN with date or time values, use CAST() to explicitly convert the values to the desired data type. Examples: If you compare a DATETIME to two DATE values, convert the DATE values to DATETIME values. If you use a string constant such as '2001-1-1' in a comparison to a DATE, cast the string to a DATE. • expr NOT BETWEEN min AND max This is the same as NOT (expr BETWEEN min AND max). •
COALESCE(value,...) Returns the first non-NULL value in the list, or NULL if there are no non-NULL values. The return type of COALESCE() is the aggregated type of the argument types. mysql> SELECT COALESCE(NULL,1); -> 1 mysql> SELECT COALESCE(NULL,NULL,NULL); -> NULL
• GREATEST(value1,value2,...) With two or more arguments, returns the largest (maximum-valued) argument. The arguments are compared using the same rules as for LEAST().
1149
Comparison Functions and Operators
mysql> SELECT GREATEST(2,0); -> 2 mysql> SELECT GREATEST(34.0,3.0,5.0,767.0); -> 767.0 mysql> SELECT GREATEST('B','A','C'); -> 'C'
GREATEST() returns NULL if any argument is NULL. • expr IN (value,...) Returns 1 if expr is equal to any of the values in the IN list, else returns 0. If all values are constants, they are evaluated according to the type of expr and sorted. The search for the item then is done using a binary search. This means IN is very quick if the IN value list consists entirely of constants. Otherwise, type conversion takes place according to the rules described in Section 12.2, “Type Conversion in Expression Evaluation”, but applied to all the arguments. mysql> SELECT 2 IN (0,3,5,7); -> 0 mysql> SELECT 'wefwf' IN ('wee','wefwf','weg'); -> 1
IN can be used to compare row constructors: mysql> SELECT (3,4) IN ((1,2), (3,4)); -> 1 mysql> SELECT (3,4) IN ((1,2), (3,5)); -> 0
You should never mix quoted and unquoted values in an IN list because the comparison rules for quoted values (such as strings) and unquoted values (such as numbers) differ. Mixing types may therefore lead to inconsistent results. For example, do not write an IN expression like this: SELECT val1 FROM tbl1 WHERE val1 IN (1,2,'a');
Instead, write it like this: SELECT val1 FROM tbl1 WHERE val1 IN ('1','2','a');
The number of values in the IN list is only limited by the max_allowed_packet value. To comply with the SQL standard, IN returns NULL not only if the expression on the left hand side is NULL, but also if no match is found in the list and one of the expressions in the list is NULL. IN() syntax can also be used to write certain types of subqueries. See Section 13.2.10.3, “Subqueries with ANY, IN, or SOME”. • expr NOT IN (value,...) This is the same as NOT (expr IN (value,...)). • ISNULL(expr) If expr is NULL, ISNULL() returns 1, otherwise it returns 0. mysql> SELECT ISNULL(1+1); -> 0 mysql> SELECT ISNULL(1/0); -> 1
1150
Logical Operators
ISNULL() can be used instead of = to test whether a value is NULL. (Comparing a value to NULL using = always yields NULL.) The ISNULL() function shares some special behaviors with the IS NULL comparison operator. See the description of IS NULL. • INTERVAL(N,N1,N2,N3,...) Returns 0 if N < N1, 1 if N < N2 and so on or -1 if N is NULL. All arguments are treated as integers. It is required that N1 < N2 < N3 < ... < Nn for this function to work correctly. This is because a binary search is used (very fast). mysql> SELECT INTERVAL(23, 1, 15, 17, 30, 44, 200); -> 3 mysql> SELECT INTERVAL(10, 1, 10, 100, 1000); -> 2 mysql> SELECT INTERVAL(22, 23, 30, 44, 200); -> 0
• LEAST(value1,value2,...) With two or more arguments, returns the smallest (minimum-valued) argument. The arguments are compared using the following rules: • If any argument is NULL, the result is NULL. No comparison is needed. • If all arguments are integer-valued, they are compared as integers. • If at least one argument is double precision, they are compared as double-precision values. Otherwise, if at least one argument is a DECIMAL value, they are compared as DECIMAL values. • If the arguments comprise a mix of numbers and strings, they are compared as numbers. • If any argument is a nonbinary (character) string, the arguments are compared as nonbinary strings. • In all other cases, the arguments are compared as binary strings. The return type of LEAST() is the aggregated type of the comparison argument types. mysql> SELECT LEAST(2,0); -> 0 mysql> SELECT LEAST(34.0,3.0,5.0,767.0); -> 3.0 mysql> SELECT LEAST('B','A','C'); -> 'A'
12.3.3 Logical Operators Table 12.4 Logical Operators Name
Description
AND, &&
Logical AND
NOT, !
Negates value
||, OR
Logical OR
XOR
Logical XOR
In SQL, all logical operators evaluate to TRUE, FALSE, or NULL (UNKNOWN). In MySQL, these are implemented as 1 (TRUE), 0 (FALSE), and NULL. Most of this is common to different SQL database servers, although some servers may return any nonzero value for TRUE.
1151
Logical Operators
MySQL evaluates any nonzero, non-NULL value to TRUE. For example, the following statements all assess to TRUE: mysql> SELECT 10 IS TRUE; -> 1 mysql> SELECT -10 IS TRUE; -> 1 mysql> SELECT 'string' IS NOT NULL; -> 1
•
NOT, ! Logical NOT. Evaluates to 1 if the operand is 0, to 0 if the operand is nonzero, and NOT NULL returns NULL. mysql> SELECT NOT 10; -> 0 mysql> SELECT NOT 0; -> 1 mysql> SELECT NOT NULL; -> NULL mysql> SELECT ! (1+1); -> 0 mysql> SELECT ! 1+1; -> 1
The last example produces 1 because the expression evaluates the same way as (!1)+1. •
AND, && Logical AND. Evaluates to 1 if all operands are nonzero and not NULL, to 0 if one or more operands are 0, otherwise NULL is returned. mysql> SELECT 1 AND 1; -> 1 mysql> SELECT 1 AND 0; -> 0 mysql> SELECT 1 AND NULL; -> NULL mysql> SELECT 0 AND NULL; -> 0 mysql> SELECT NULL AND 0; -> 0
•
OR, || Logical OR. When both operands are non-NULL, the result is 1 if any operand is nonzero, and 0 otherwise. With a NULL operand, the result is 1 if the other operand is nonzero, and NULL otherwise. If both operands are NULL, the result is NULL. mysql> SELECT 1 -> 1 mysql> SELECT 1 -> 1 mysql> SELECT 0 -> 0 mysql> SELECT 0 -> NULL mysql> SELECT 1 -> 1
OR 1; OR 0; OR 0; OR NULL; OR NULL;
• XOR Logical XOR. Returns NULL if either operand is NULL. For non-NULL operands, evaluates to 1 if an odd number of operands is nonzero, otherwise 0 is returned.
1152
Assignment Operators
mysql> SELECT 1 -> 0 mysql> SELECT 1 -> 1 mysql> SELECT 1 -> NULL mysql> SELECT 1 -> 1
XOR 1; XOR 0; XOR NULL; XOR 1 XOR 1;
a XOR b is mathematically equal to (a AND (NOT b)) OR ((NOT a) and b).
12.3.4 Assignment Operators Table 12.5 Assignment Operators Name
Description
=
Assign a value (as part of a SET statement, or as part of the SET clause in an UPDATE statement)
:=
Assign a value
•
:= Assignment operator. Causes the user variable on the left hand side of the operator to take on the value to its right. The value on the right hand side may be a literal value, another variable storing a value, or any legal expression that yields a scalar value, including the result of a query (provided that this value is a scalar value). You can perform multiple assignments in the same SET statement. You can perform multiple assignments in the same statement. Unlike =, the := operator is never interpreted as a comparison operator. This means you can use := in any valid SQL statement (not just in SET statements) to assign a value to a variable. mysql> SELECT @var1, @var2; -> NULL, NULL mysql> SELECT @var1 := 1, @var2; -> 1, NULL mysql> SELECT @var1, @var2; -> 1, NULL mysql> SELECT @var1, @var2 := @var1; -> 1, 1 mysql> SELECT @var1, @var2; -> 1, 1 mysql> SELECT @var1:=COUNT(*) FROM t1; -> 4 mysql> SELECT @var1; -> 4
You can make value assignments using := in other statements besides SELECT, such as UPDATE, as shown here: mysql> SELECT @var1; -> 4 mysql> SELECT * FROM t1; -> 1, 3, 5, 7 mysql> UPDATE t1 SET c1 = 2 WHERE c1 = @var1:= 1; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> SELECT @var1; -> 1 mysql> SELECT * FROM t1;
1153
Control Flow Functions
-> 2, 3, 5, 7
While it is also possible both to set and to read the value of the same variable in a single SQL statement using the := operator, this is not recommended. Section 9.4, “User-Defined Variables”, explains why you should avoid doing this. •
= This operator is used to perform value assignments in two cases, described in the next two paragraphs. Within a SET statement, = is treated as an assignment operator that causes the user variable on the left hand side of the operator to take on the value to its right. (In other words, when used in a SET statement, = is treated identically to :=.) The value on the right hand side may be a literal value, another variable storing a value, or any legal expression that yields a scalar value, including the result of a query (provided that this value is a scalar value). You can perform multiple assignments in the same SET statement. In the SET clause of an UPDATE statement, = also acts as an assignment operator; in this case, however, it causes the column named on the left hand side of the operator to assume the value given to the right, provided any WHERE conditions that are part of the UPDATE are met. You can make multiple assignments in the same SET clause of an UPDATE statement. In any other context, = is treated as a comparison operator. mysql> SELECT @var1, @var2; -> NULL, NULL mysql> SELECT @var1 := 1, @var2; -> 1, NULL mysql> SELECT @var1, @var2; -> 1, NULL mysql> SELECT @var1, @var2 := @var1; -> 1, 1 mysql> SELECT @var1, @var2; -> 1, 1
For more information, see Section 13.7.4.1, “SET Syntax for Variable Assignment”, Section 13.2.11, “UPDATE Syntax”, and Section 13.2.10, “Subquery Syntax”.
12.4 Control Flow Functions Table 12.6 Flow Control Operators Name
Description
CASE
Case operator
IF()
If/else construct
IFNULL()
Null if/else construct
NULLIF()
Return NULL if expr1 = expr2
• CASE value WHEN [compare_value] THEN result [WHEN [compare_value] THEN result ...] [ELSE result] END CASE WHEN [condition] THEN result [WHEN [condition] THEN result ...] [ELSE result] END The first CASE syntax returns the result for the first value=compare_value comparison that is true. The second syntax returns the result for the first condition that is true. If no comparison or condition is true, the result after ELSE is returned, or NULL if there is no ELSE part. 1154
Control Flow Functions
Note The syntax of the CASE expression described here differs slightly from that of the SQL CASE statement described in Section 13.6.5.1, “CASE Syntax”, for use inside stored programs. The CASE statement cannot have an ELSE NULL clause, and it is terminated with END CASE instead of END. The return type of a CASE expression result is the aggregated type of all result values. mysql> SELECT CASE 1 WHEN 1 THEN 'one' -> WHEN 2 THEN 'two' ELSE 'more' END; -> 'one' mysql> SELECT CASE WHEN 1>0 THEN 'true' ELSE 'false' END; -> 'true' mysql> SELECT CASE BINARY 'B' -> WHEN 'a' THEN 1 WHEN 'b' THEN 2 END; -> NULL
• IF(expr1,expr2,expr3) If expr1 is TRUE (expr1 <> 0 and expr1 <> NULL), IF() returns expr2. Otherwise, it returns expr3. Note There is also an IF statement, which differs from the IF() function described here. See Section 13.6.5.2, “IF Syntax”. If only one of expr2 or expr3 is explicitly NULL, the result type of the IF() function is the type of the non-NULL expression. The default return type of IF() (which may matter when it is stored into a temporary table) is calculated as follows: • If expr2 or expr3 produce a string, the result is a string. If expr2 and expr3 are both strings, the result is case sensitive if either string is case sensitive. • If expr2 or expr3 produce a floating-point value, the result is a floating-point value. • If expr2 or expr3 produce an integer, the result is an integer. mysql> SELECT IF(1>2,2,3); -> 3 mysql> SELECT IF(1<2,'yes','no'); -> 'yes' mysql> SELECT IF(STRCMP('test','test1'),'no','yes'); -> 'no'
•
IFNULL(expr1,expr2) If expr1 is not NULL, IFNULL() returns expr1; otherwise it returns expr2. mysql> SELECT IFNULL(1,0); -> 1 mysql> SELECT IFNULL(NULL,10); -> 10 mysql> SELECT IFNULL(1/0,10); -> 10 mysql> SELECT IFNULL(1/0,'yes'); -> 'yes'
1155
String Functions
The default return type of IFNULL(expr1,expr2) is the more “general” of the two expressions, in the order STRING, REAL, or INTEGER. Consider the case of a table based on expressions or where MySQL must internally store a value returned by IFNULL() in a temporary table:
mysql> CREATE TABLE tmp SELECT IFNULL(1,'test') AS test; mysql> DESCRIBE tmp; +-------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+--------------+------+-----+---------+-------+ | test | varbinary(4) | NO | | | | +-------+--------------+------+-----+---------+-------+
In this example, the type of the test column is VARBINARY(4) (a string type). • NULLIF(expr1,expr2) Returns NULL if expr1 = expr2 is true, otherwise returns expr1. This is the same as CASE WHEN expr1 = expr2 THEN NULL ELSE expr1 END. The return value has the same type as the first argument. mysql> SELECT NULLIF(1,1); -> NULL mysql> SELECT NULLIF(1,2); -> 1
Note MySQL evaluates expr1 twice if the arguments are not equal.
12.5 String Functions Table 12.7 String Operators Name
Description
ASCII()
Return numeric value of left-most character
BIN()
Return a string containing binary representation of a number
BIT_LENGTH()
Return length of argument in bits
CHAR()
Return the character for each integer passed
CHAR_LENGTH()
Return number of characters in argument
CHARACTER_LENGTH()
Synonym for CHAR_LENGTH()
CONCAT()
Return concatenated string
CONCAT_WS()
Return concatenate with separator
ELT()
Return string at index number
EXPORT_SET()
Return a string such that for every bit set in the value bits, you get an on string and for every unset bit, you get an off string
FIELD()
Return the index (position) of the first argument in the subsequent arguments
FIND_IN_SET()
Return the index position of the first argument within the second argument
FORMAT()
Return a number formatted to specified number of decimal places
1156
String Functions
Name
Description
HEX()
Return a hexadecimal representation of a decimal or string value
INSERT()
Insert a substring at the specified position up to the specified number of characters
INSTR()
Return the index of the first occurrence of substring
LCASE()
Synonym for LOWER()
LEFT()
Return the leftmost number of characters as specified
LENGTH()
Return the length of a string in bytes
LIKE
Simple pattern matching
LOAD_FILE()
Load the named file
LOCATE()
Return the position of the first occurrence of substring
LOWER()
Return the argument in lowercase
LPAD()
Return the string argument, left-padded with the specified string
LTRIM()
Remove leading spaces
MAKE_SET()
Return a set of comma-separated strings that have the corresponding bit in bits set
MATCH
Perform full-text search
MID()
Return a substring starting from the specified position
NOT LIKE
Negation of simple pattern matching
NOT REGEXP
Negation of REGEXP
OCT()
Return a string containing octal representation of a number
OCTET_LENGTH()
Synonym for LENGTH()
ORD()
Return character code for leftmost character of the argument
POSITION()
Synonym for LOCATE()
QUOTE()
Escape the argument for use in an SQL statement
REGEXP
Pattern matching using regular expressions
REPEAT()
Repeat a string the specified number of times
REPLACE()
Replace occurrences of a specified string
REVERSE()
Reverse the characters in a string
RIGHT()
Return the specified rightmost number of characters
RLIKE
Synonym for REGEXP
RPAD()
Append string the specified number of times
RTRIM()
Remove trailing spaces
SOUNDEX()
Return a soundex string
SOUNDS LIKE
Compare sounds
SPACE()
Return a string of the specified number of spaces
STRCMP()
Compare two strings
SUBSTR()
Return the substring as specified
SUBSTRING()
Return the substring as specified
SUBSTRING_INDEX()
Return a substring from a string before the specified number of occurrences of the delimiter
1157
String Functions
Name
Description
TRIM()
Remove leading and trailing spaces
UCASE()
Synonym for UPPER()
UNHEX()
Return a string containing hex representation of a number
UPPER()
Convert to uppercase
String-valued functions return NULL if the length of the result would be greater than the value of the max_allowed_packet system variable. See Section 5.1.1, “Configuring the Server”. For functions that operate on string positions, the first position is numbered 1. For functions that take length arguments, noninteger arguments are rounded to the nearest integer. • ASCII(str) Returns the numeric value of the leftmost character of the string str. Returns 0 if str is the empty string. Returns NULL if str is NULL. ASCII() works for 8-bit characters. mysql> SELECT ASCII('2'); -> 50 mysql> SELECT ASCII(2); -> 50 mysql> SELECT ASCII('dx'); -> 100
See also the ORD() function. • BIN(N) Returns a string representation of the binary value of N, where N is a longlong (BIGINT) number. This is equivalent to CONV(N,10,2). Returns NULL if N is NULL. mysql> SELECT BIN(12); -> '1100'
• BIT_LENGTH(str) Returns the length of the string str in bits. mysql> SELECT BIT_LENGTH('text'); -> 32
• CHAR(N,... [USING charset_name]) CHAR() interprets each argument N as an integer and returns a string consisting of the characters given by the code values of those integers. NULL values are skipped. mysql> SELECT CHAR(77,121,83,81,'76'); -> 'MySQL' mysql> SELECT CHAR(77,77.3,'77.3'); -> 'MMM'
CHAR() arguments larger than 255 are converted into multiple result bytes. For example, CHAR(256) is equivalent to CHAR(1,0), and CHAR(256*256) is equivalent to CHAR(1,0,0): mysql> SELECT HEX(CHAR(1,0)), HEX(CHAR(256)); +----------------+----------------+ | HEX(CHAR(1,0)) | HEX(CHAR(256)) | +----------------+----------------+ | 0100 | 0100 |
1158
String Functions
+----------------+----------------+ mysql> SELECT HEX(CHAR(1,0,0)), HEX(CHAR(256*256)); +------------------+--------------------+ | HEX(CHAR(1,0,0)) | HEX(CHAR(256*256)) | +------------------+--------------------+ | 010000 | 010000 | +------------------+--------------------+
By default, CHAR() returns a binary string. To produce a string in a given character set, use the optional USING clause: mysql> SELECT CHARSET(CHAR(X'65')), CHARSET(CHAR(X'65' USING utf8)); +----------------------+---------------------------------+ | CHARSET(CHAR(X'65')) | CHARSET(CHAR(X'65' USING utf8)) | +----------------------+---------------------------------+ | binary | utf8 | +----------------------+---------------------------------+
If USING is given and the result string is illegal for the given character set, a warning is issued. Also, if strict SQL mode is enabled, the result from CHAR() becomes NULL. • CHAR_LENGTH(str) Returns the length of the string str, measured in characters. A multibyte character counts as a single character. This means that for a string containing five 2-byte characters, LENGTH() returns 10, whereas CHAR_LENGTH() returns 5. • CHARACTER_LENGTH(str) CHARACTER_LENGTH() is a synonym for CHAR_LENGTH(). •
CONCAT(str1,str2,...) Returns the string that results from concatenating the arguments. May have one or more arguments. If all arguments are nonbinary strings, the result is a nonbinary string. If the arguments include any binary strings, the result is a binary string. A numeric argument is converted to its equivalent string form. This is a nonbinary string as of MySQL 5.5.3. Before 5.5.3, it is a binary string; to avoid that and produce a nonbinary string, you can use an explicit type cast, as in this example: SELECT CONCAT(CAST(int_col AS CHAR), char_col);
CONCAT() returns NULL if any argument is NULL. mysql> SELECT CONCAT('My', 'S', 'QL'); -> 'MySQL' mysql> SELECT CONCAT('My', NULL, 'QL'); -> NULL mysql> SELECT CONCAT(14.3); -> '14.3'
For quoted strings, concatenation can be performed by placing the strings next to each other: mysql> SELECT 'My' 'S' 'QL'; -> 'MySQL'
• CONCAT_WS(separator,str1,str2,...) CONCAT_WS() stands for Concatenate With Separator and is a special form of CONCAT(). The first argument is the separator for the rest of the arguments. The separator is added between the strings to be concatenated. The separator can be a string, as can the rest of the arguments. If the separator is NULL, the result is NULL.
1159
String Functions
mysql> SELECT CONCAT_WS(',','First name','Second name','Last Name'); -> 'First name,Second name,Last Name' mysql> SELECT CONCAT_WS(',','First name',NULL,'Last Name'); -> 'First name,Last Name'
CONCAT_WS() does not skip empty strings. However, it does skip any NULL values after the separator argument. • ELT(N,str1,str2,str3,...) ELT() returns the Nth element of the list of strings: str1 if N = 1, str2 if N = 2, and so on. Returns NULL if N is less than 1 or greater than the number of arguments. ELT() is the complement of FIELD(). mysql> SELECT ELT(1, 'ej', 'Heja', 'hej', 'foo'); -> 'ej' mysql> SELECT ELT(4, 'ej', 'Heja', 'hej', 'foo'); -> 'foo'
• EXPORT_SET(bits,on,off[,separator[,number_of_bits]]) Returns a string such that for every bit set in the value bits, you get an on string and for every bit not set in the value, you get an off string. Bits in bits are examined from right to left (from low-order to high-order bits). Strings are added to the result from left to right, separated by the separator string (the default being the comma character ,). The number of bits examined is given by number_of_bits, which has a default of 64 if not specified. number_of_bits is silently clipped to 64 if larger than 64. It is treated as an unsigned integer, so a value of −1 is effectively the same as 64. mysql> SELECT EXPORT_SET(5,'Y','N',',',4); -> 'Y,N,Y,N' mysql> SELECT EXPORT_SET(6,'1','0',',',10); -> '0,1,1,0,0,0,0,0,0,0'
• FIELD(str,str1,str2,str3,...) Returns the index (position) of str in the str1, str2, str3, ... list. Returns 0 if str is not found. If all arguments to FIELD() are strings, all arguments are compared as strings. If all arguments are numbers, they are compared as numbers. Otherwise, the arguments are compared as double. If str is NULL, the return value is 0 because NULL fails equality comparison with any value. FIELD() is the complement of ELT(). mysql> SELECT FIELD('ej', 'Hej', 'ej', 'Heja', 'hej', 'foo'); -> 2 mysql> SELECT FIELD('fo', 'Hej', 'ej', 'Heja', 'hej', 'foo'); -> 0
• FIND_IN_SET(str,strlist) Returns a value in the range of 1 to N if the string str is in the string list strlist consisting of N substrings. A string list is a string composed of substrings separated by , characters. If the first argument is a constant string and the second is a column of type SET, the FIND_IN_SET() function is optimized to use bit arithmetic. Returns 0 if str is not in strlist or if strlist is the empty string. Returns NULL if either argument is NULL. This function does not work properly if the first argument contains a comma (,) character. mysql> SELECT FIND_IN_SET('b','a,b,c,d'); -> 2
1160
String Functions
• FORMAT(X,D[,locale]) Formats the number X to a format like '#,###,###.##', rounded to D decimal places, and returns the result as a string. If D is 0, the result has no decimal point or fractional part. The optional third parameter enables a locale to be specified to be used for the result number's decimal point, thousands separator, and grouping between separators. Permissible locale values are the same as the legal values for the lc_time_names system variable (see Section 10.7, “MySQL Server Locale Support”). If no locale is specified, the default is 'en_US'. mysql> SELECT FORMAT(12332.123456, 4); -> '12,332.1235' mysql> SELECT FORMAT(12332.1,4); -> '12,332.1000' mysql> SELECT FORMAT(12332.2,0); -> '12,332' mysql> SELECT FORMAT(12332.2,2,'de_DE'); -> '12.332,20'
• HEX(str), HEX(N) For a string argument str, HEX() returns a hexadecimal string representation of str where each byte of each character in str is converted to two hexadecimal digits. (Multibyte characters therefore become more than two digits.) The inverse of this operation is performed by the UNHEX() function. For a numeric argument N, HEX() returns a hexadecimal string representation of the value of N treated as a longlong (BIGINT) number. This is equivalent to CONV(N,10,16). The inverse of this operation is performed by CONV(HEX(N),16,10). mysql> SELECT X'616263', HEX('abc'), UNHEX(HEX('abc')); -> 'abc', 616263, 'abc' mysql> SELECT HEX(255), CONV(HEX(255),16,10); -> 'FF', 255
• INSERT(str,pos,len,newstr) Returns the string str, with the substring beginning at position pos and len characters long replaced by the string newstr. Returns the original string if pos is not within the length of the string. Replaces the rest of the string from position pos if len is not within the length of the rest of the string. Returns NULL if any argument is NULL. mysql> SELECT INSERT('Quadratic', 3, 4, 'What'); -> 'QuWhattic' mysql> SELECT INSERT('Quadratic', -1, 4, 'What'); -> 'Quadratic' mysql> SELECT INSERT('Quadratic', 3, 100, 'What'); -> 'QuWhat'
This function is multibyte safe. • INSTR(str,substr) Returns the position of the first occurrence of substring substr in string str. This is the same as the two-argument form of LOCATE(), except that the order of the arguments is reversed. mysql> SELECT INSTR('foobarbar', 'bar'); -> 4 mysql> SELECT INSTR('xbar', 'foobar'); -> 0
This function is multibyte safe, and is case sensitive only if at least one argument is a binary string. 1161
String Functions
• LCASE(str) LCASE() is a synonym for LOWER(). • LEFT(str,len) Returns the leftmost len characters from the string str, or NULL if any argument is NULL. mysql> SELECT LEFT('foobarbar', 5); -> 'fooba'
This function is multibyte safe. • LENGTH(str) Returns the length of the string str, measured in bytes. A multibyte character counts as multiple bytes. This means that for a string containing five 2-byte characters, LENGTH() returns 10, whereas CHAR_LENGTH() returns 5. mysql> SELECT LENGTH('text'); -> 4
Note The Length() OpenGIS spatial function is named GLength() in MySQL. •
LOAD_FILE(file_name) Reads the file and returns the file contents as a string. To use this function, the file must be located on the server host, you must specify the full path name to the file, and you must have the FILE privilege. The file must be readable by all and its size less than max_allowed_packet bytes. If the secure_file_priv system variable is set to a nonempty directory name, the file to be loaded must be located in that directory. If the file does not exist or cannot be read because one of the preceding conditions is not satisfied, the function returns NULL. The character_set_filesystem system variable controls interpretation of file names that are given as literal strings. mysql> UPDATE t SET blob_col=LOAD_FILE('/tmp/picture') WHERE id=1;
• LOCATE(substr,str), LOCATE(substr,str,pos) The first syntax returns the position of the first occurrence of substring substr in string str. The second syntax returns the position of the first occurrence of substring substr in string str, starting at position pos. Returns 0 if substr is not in str. Returns NULL if substr or str is NULL. mysql> SELECT LOCATE('bar', 'foobarbar'); -> 4 mysql> SELECT LOCATE('xbar', 'foobar'); -> 0 mysql> SELECT LOCATE('bar', 'foobarbar', 5); -> 7
This function is multibyte safe, and is case-sensitive only if at least one argument is a binary string. • LOWER(str) 1162
String Functions
Returns the string str with all characters changed to lowercase according to the current character set mapping. The default is latin1 (cp1252 West European). mysql> SELECT LOWER('QUADRATICALLY'); -> 'quadratically'
LOWER() (and UPPER()) are ineffective when applied to binary strings (BINARY, VARBINARY, BLOB). To perform lettercase conversion, convert the string to a nonbinary string: mysql> SET @str = BINARY 'New York'; mysql> SELECT LOWER(@str), LOWER(CONVERT(@str USING latin1)); +-------------+-----------------------------------+ | LOWER(@str) | LOWER(CONVERT(@str USING latin1)) | +-------------+-----------------------------------+ | New York | new york | +-------------+-----------------------------------+
This function is multibyte safe. • LPAD(str,len,padstr) Returns the string str, left-padded with the string padstr to a length of len characters. If str is longer than len, the return value is shortened to len characters. mysql> SELECT LPAD('hi',4,'??'); -> '??hi' mysql> SELECT LPAD('hi',1,'??'); -> 'h'
• LTRIM(str) Returns the string str with leading space characters removed. mysql> SELECT LTRIM(' -> 'barbar'
barbar');
This function is multibyte safe. • MAKE_SET(bits,str1,str2,...) Returns a set value (a string containing substrings separated by , characters) consisting of the strings that have the corresponding bit in bits set. str1 corresponds to bit 0, str2 to bit 1, and so on. NULL values in str1, str2, ... are not appended to the result. mysql> SELECT MAKE_SET(1,'a','b','c'); -> 'a' mysql> SELECT MAKE_SET(1 | 4,'hello','nice','world'); -> 'hello,world' mysql> SELECT MAKE_SET(1 | 4,'hello','nice',NULL,'world'); -> 'hello' mysql> SELECT MAKE_SET(0,'a','b','c'); -> ''
• MID(str,pos,len) MID(str,pos,len) is a synonym for SUBSTRING(str,pos,len). • OCT(N) Returns a string representation of the octal value of N, where N is a longlong (BIGINT) number. This is equivalent to CONV(N,10,8). Returns NULL if N is NULL.
1163
String Functions
mysql> SELECT OCT(12); -> '14'
• OCTET_LENGTH(str) OCTET_LENGTH() is a synonym for LENGTH(). • ORD(str) If the leftmost character of the string str is a multibyte character, returns the code for that character, calculated from the numeric values of its constituent bytes using this formula: (1st byte code) + (2nd byte code * 256) + (3rd byte code * 2562) ...
If the leftmost character is not a multibyte character, ORD() returns the same value as the ASCII() function. mysql> SELECT ORD('2'); -> 50
• POSITION(substr IN str) POSITION(substr IN str) is a synonym for LOCATE(substr,str). • QUOTE(str) Quotes a string to produce a result that can be used as a properly escaped data value in an SQL statement. The string is returned enclosed by single quotation marks and with each instance of backslash (\), single quote ('), ASCII NUL, and Control+Z preceded by a backslash. If the argument is NULL, the return value is the word “NULL” without enclosing single quotation marks. mysql> SELECT QUOTE('Don\'t!'); -> 'Don\'t!' mysql> SELECT QUOTE(NULL); -> NULL
For comparison, see the quoting rules for literal strings and within the C API in Section 9.1.1, “String Literals”, and Section 23.8.7.53, “mysql_real_escape_string()”. • REPEAT(str,count) Returns a string consisting of the string str repeated count times. If count is less than 1, returns an empty string. Returns NULL if str or count are NULL. mysql> SELECT REPEAT('MySQL', 3); -> 'MySQLMySQLMySQL'
• REPLACE(str,from_str,to_str) Returns the string str with all occurrences of the string from_str replaced by the string to_str. REPLACE() performs a case-sensitive match when searching for from_str. mysql> SELECT REPLACE('www.mysql.com', 'w', 'Ww'); -> 'WwWwWw.mysql.com'
This function is multibyte safe. 1164
String Functions
• REVERSE(str) Returns the string str with the order of the characters reversed. mysql> SELECT REVERSE('abc'); -> 'cba'
This function is multibyte safe. • RIGHT(str,len) Returns the rightmost len characters from the string str, or NULL if any argument is NULL. mysql> SELECT RIGHT('foobarbar', 4); -> 'rbar'
This function is multibyte safe. • RPAD(str,len,padstr) Returns the string str, right-padded with the string padstr to a length of len characters. If str is longer than len, the return value is shortened to len characters. mysql> SELECT RPAD('hi',5,'?'); -> 'hi???' mysql> SELECT RPAD('hi',1,'?'); -> 'h'
This function is multibyte safe. • RTRIM(str) Returns the string str with trailing space characters removed. mysql> SELECT RTRIM('barbar -> 'barbar'
');
This function is multibyte safe. • SOUNDEX(str) Returns a soundex string from str. Two strings that sound almost the same should have identical soundex strings. A standard soundex string is four characters long, but the SOUNDEX() function returns an arbitrarily long string. You can use SUBSTRING() on the result to get a standard soundex string. All nonalphabetic characters in str are ignored. All international alphabetic characters outside the A-Z range are treated as vowels. Important When using SOUNDEX(), you should be aware of the following limitations: • This function, as currently implemented, is intended to work well with strings that are in the English language only. Strings in other languages may not produce reliable results. • This function is not guaranteed to provide consistent results with strings that use multibyte character sets, including utf-8. See Bug #22638 for more information. mysql> SELECT SOUNDEX('Hello'); -> 'H400' mysql> SELECT SOUNDEX('Quadratically');
1165
String Functions
-> 'Q36324'
Note This function implements the original Soundex algorithm, not the more popular enhanced version (also described by D. Knuth). The difference is that original version discards vowels first and duplicates second, whereas the enhanced version discards duplicates first and vowels second. • expr1 SOUNDS LIKE expr2 This is the same as SOUNDEX(expr1) = SOUNDEX(expr2). • SPACE(N) Returns a string consisting of N space characters. mysql> SELECT SPACE(6); -> ' '
• SUBSTR(str,pos), SUBSTR(str FROM pos), SUBSTR(str,pos,len), SUBSTR(str FROM pos FOR len) SUBSTR() is a synonym for SUBSTRING(). • SUBSTRING(str,pos), SUBSTRING(str FROM pos), SUBSTRING(str,pos,len), SUBSTRING(str FROM pos FOR len) The forms without a len argument return a substring from string str starting at position pos. The forms with a len argument return a substring len characters long from string str, starting at position pos. The forms that use FROM are standard SQL syntax. It is also possible to use a negative value for pos. In this case, the beginning of the substring is pos characters from the end of the string, rather than the beginning. A negative value may be used for pos in any of the forms of this function. For all forms of SUBSTRING(), the position of the first character in the string from which the substring is to be extracted is reckoned as 1. mysql> SELECT SUBSTRING('Quadratically',5); -> 'ratically' mysql> SELECT SUBSTRING('foobarbar' FROM 4); -> 'barbar' mysql> SELECT SUBSTRING('Quadratically',5,6); -> 'ratica' mysql> SELECT SUBSTRING('Sakila', -3); -> 'ila' mysql> SELECT SUBSTRING('Sakila', -5, 3); -> 'aki' mysql> SELECT SUBSTRING('Sakila' FROM -4 FOR 2); -> 'ki'
This function is multibyte safe. If len is less than 1, the result is the empty string. • SUBSTRING_INDEX(str,delim,count) Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. SUBSTRING_INDEX() performs a case-sensitive match when searching for delim.
1166
String Functions
mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2); -> 'www.mysql' mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2); -> 'mysql.com'
This function is multibyte safe. • TRIM([{BOTH | LEADING | TRAILING} [remstr] FROM] str), TRIM([remstr FROM] str) Returns the string str with all remstr prefixes or suffixes removed. If none of the specifiers BOTH, LEADING, or TRAILING is given, BOTH is assumed. remstr is optional and, if not specified, spaces are removed. mysql> SELECT TRIM(' bar '); -> 'bar' mysql> SELECT TRIM(LEADING 'x' FROM 'xxxbarxxx'); -> 'barxxx' mysql> SELECT TRIM(BOTH 'x' FROM 'xxxbarxxx'); -> 'bar' mysql> SELECT TRIM(TRAILING 'xyz' FROM 'barxxyz'); -> 'barx'
This function is multibyte safe. • UCASE(str) UCASE() is a synonym for UPPER(). • UNHEX(str) For a string argument str, UNHEX(str) interprets each pair of characters in the argument as a hexadecimal number and converts it to the byte represented by the number. The return value is a binary string. mysql> SELECT UNHEX('4D7953514C'); -> 'MySQL' mysql> SELECT X'4D7953514C'; -> 'MySQL' mysql> SELECT UNHEX(HEX('string')); -> 'string' mysql> SELECT HEX(UNHEX('1267')); -> '1267'
The characters in the argument string must be legal hexadecimal digits: '0' .. '9', 'A' .. 'F', 'a' .. 'f'. If the argument contains any nonhexadecimal digits, the result is NULL: mysql> SELECT UNHEX('GG'); +-------------+ | UNHEX('GG') | +-------------+ | NULL | +-------------+
A NULL result can occur if the argument to UNHEX() is a BINARY column, because values are padded with 0x00 bytes when stored but those bytes are not stripped on retrieval. For example, '41' is stored into a CHAR(3) column as '41 ' and retrieved as '41' (with the trailing pad space stripped), so UNHEX() for the column value returns 'A'. By contrast '41' is stored into a BINARY(3) column as '41\0' and retrieved as '41\0' (with the trailing pad 0x00 byte not stripped). '\0' is not a legal hexadecimal digit, so UNHEX() for the column value returns NULL. For a numeric argument N, the inverse of HEX(N) is not performed by UNHEX(). Use CONV(HEX(N),16,10) instead. See the description of HEX(). 1167
String Comparison Functions
• UPPER(str) Returns the string str with all characters changed to uppercase according to the current character set mapping. The default is latin1 (cp1252 West European). mysql> SELECT UPPER('Hej'); -> 'HEJ'
See the description of LOWER() for information that also applies to UPPER(), such as information about how to perform lettercase conversion of binary strings (BINARY, VARBINARY, BLOB) for which these functions are ineffective. This function is multibyte safe.
12.5.1 String Comparison Functions Table 12.8 String Comparison Operators Name
Description
LIKE
Simple pattern matching
NOT LIKE
Negation of simple pattern matching
STRCMP()
Compare two strings
If a string function is given a binary string as an argument, the resulting string is also a binary string. A number converted to a string is treated as a binary string. This affects only comparisons. Normally, if any expression in a string comparison is case sensitive, the comparison is performed in case-sensitive fashion. • expr LIKE pat [ESCAPE 'escape_char'] Pattern matching using an SQL pattern. Returns 1 (TRUE) or 0 (FALSE). If either expr or pat is NULL, the result is NULL. The pattern need not be a literal string. For example, it can be specified as a string expression or table column. Per the SQL standard, LIKE performs matching on a per-character basis, thus it can produce results different from the = comparison operator: mysql> SELECT 'ä' LIKE 'ae' COLLATE latin1_german2_ci; +-----------------------------------------+ | 'ä' LIKE 'ae' COLLATE latin1_german2_ci | +-----------------------------------------+ | 0 | +-----------------------------------------+ mysql> SELECT 'ä' = 'ae' COLLATE latin1_german2_ci; +--------------------------------------+ | 'ä' = 'ae' COLLATE latin1_german2_ci | +--------------------------------------+ | 1 | +--------------------------------------+
In particular, trailing spaces are significant, which is not true for CHAR or VARCHAR comparisons performed with the = operator: mysql> SELECT 'a' = 'a ', 'a' LIKE 'a '; +------------+---------------+ | 'a' = 'a ' | 'a' LIKE 'a ' | +------------+---------------+ | 1 | 0 | +------------+---------------+
1168
String Comparison Functions
1 row in set (0.00 sec)
With LIKE you can use the following two wildcard characters in the pattern: • % matches any number of characters, even zero characters. • _ matches exactly one character. mysql> SELECT 'David!' LIKE 'David_'; -> 1 mysql> SELECT 'David!' LIKE '%D%v%'; -> 1
To test for literal instances of a wildcard character, precede it by the escape character. If you do not specify the ESCAPE character, \ is assumed. • \% matches one % character. • \_ matches one _ character. mysql> SELECT 'David!' LIKE 'David\_'; -> 0 mysql> SELECT 'David_' LIKE 'David\_'; -> 1
To specify a different escape character, use the ESCAPE clause: mysql> SELECT 'David_' LIKE 'David|_' ESCAPE '|'; -> 1
The escape sequence should be empty or one character long. The expression must evaluate as a constant at execution time. If the NO_BACKSLASH_ESCAPES SQL mode is enabled, the sequence cannot be empty. The following two statements illustrate that string comparisons are not case sensitive unless one of the operands is a case sensitive (uses a case-sensitive collation or is a binary string): mysql> SELECT -> 1 mysql> SELECT -> 0 mysql> SELECT -> 0 mysql> SELECT -> 0
'abc' LIKE 'ABC'; 'abc' LIKE _latin1 'ABC' COLLATE latin1_general_cs; 'abc' LIKE _latin1 'ABC' COLLATE latin1_bin; 'abc' LIKE BINARY 'ABC';
As an extension to standard SQL, MySQL permits LIKE on numeric expressions. mysql> SELECT 10 LIKE '1%'; -> 1
Note Because MySQL uses C escape syntax in strings (for example, \n to represent a newline character), you must double any \ that you use in LIKE strings. For example, to search for \n, specify it as \\n. To search for \, specify it as \\\\; this is because the backslashes are stripped once by the parser and again when the pattern match is made, leaving a single backslash to be matched against.
1169
String Comparison Functions
Exception: At the end of the pattern string, backslash can be specified as \\. At the end of the string, backslash stands for itself because there is nothing following to escape. Suppose that a table contains the following values: mysql> SELECT filename FROM t1; +--------------+ | filename | +--------------+ | C: | | C:\ | | C:\Programs | | C:\Programs\ | +--------------+
To test for values that end with backslash, you can match the values using either of the following patterns: mysql> SELECT filename, filename LIKE '%\\' FROM t1; +--------------+---------------------+ | filename | filename LIKE '%\\' | +--------------+---------------------+ | C: | 0 | | C:\ | 1 | | C:\Programs | 0 | | C:\Programs\ | 1 | +--------------+---------------------+ mysql> SELECT filename, filename LIKE '%\\\\' FROM t1; +--------------+-----------------------+ | filename | filename LIKE '%\\\\' | +--------------+-----------------------+ | C: | 0 | | C:\ | 1 | | C:\Programs | 0 | | C:\Programs\ | 1 | +--------------+-----------------------+
• expr NOT LIKE pat [ESCAPE 'escape_char'] This is the same as NOT (expr LIKE pat [ESCAPE 'escape_char']). Note Aggregate queries involving NOT LIKE comparisons with columns containing NULL may yield unexpected results. For example, consider the following table and data: CREATE TABLE foo (bar VARCHAR(10)); INSERT INTO foo VALUES (NULL), (NULL);
The query SELECT COUNT(*) FROM foo WHERE bar LIKE '%baz%'; returns 0. You might assume that SELECT COUNT(*) FROM foo WHERE bar NOT LIKE '%baz%'; would return 2. However, this is not the case: The second query returns 0. This is because NULL NOT LIKE expr always returns NULL, regardless of the value of expr. The same is true for aggregate queries involving NULL and comparisons using NOT RLIKE or NOT REGEXP. In such cases, you must test explicitly for NOT NULL using OR (and not AND), as shown here: SELECT COUNT(*) FROM foo WHERE bar NOT LIKE '%baz%' OR bar IS NULL;
1170
Regular Expressions
• STRCMP(expr1,expr2) STRCMP() returns 0 if the strings are the same, -1 if the first argument is smaller than the second according to the current sort order, and 1 otherwise. mysql> SELECT STRCMP('text', 'text2'); -> -1 mysql> SELECT STRCMP('text2', 'text'); -> 1 mysql> SELECT STRCMP('text', 'text'); -> 0
STRCMP() performs the comparison using the collation of the arguments. mysql> SET @s1 = _latin1 'x' COLLATE latin1_general_ci; mysql> SET @s2 = _latin1 'X' COLLATE latin1_general_ci; mysql> SET @s3 = _latin1 'x' COLLATE latin1_general_cs; mysql> SET @s4 = _latin1 'X' COLLATE latin1_general_cs; mysql> SELECT STRCMP(@s1, @s2), STRCMP(@s3, @s4); +------------------+------------------+ | STRCMP(@s1, @s2) | STRCMP(@s3, @s4) | +------------------+------------------+ | 0 | 1 | +------------------+------------------+
If the collations are incompatible, one of the arguments must be converted to be compatible with the other. See Section 10.1.8.4, “Collation Coercibility in Expressions”.
mysql> SELECT STRCMP(@s1, @s3); ERROR 1267 (HY000): Illegal mix of collations (latin1_general_ci,IMPLICIT) and (latin1_general_cs,IMPLICIT) for operation 'strcmp' mysql> SELECT STRCMP(@s1, @s3 COLLATE latin1_general_ci); +--------------------------------------------+ | STRCMP(@s1, @s3 COLLATE latin1_general_ci) | +--------------------------------------------+ | 0 | +--------------------------------------------+
12.5.2 Regular Expressions Table 12.9 String Regular Expression Operators Name
Description
NOT REGEXP
Negation of REGEXP
REGEXP
Pattern matching using regular expressions
RLIKE
Synonym for REGEXP
A regular expression is a powerful way of specifying a pattern for a complex search. MySQL uses Henry Spencer's implementation of regular expressions, which is aimed at conformance with POSIX 1003.2. MySQL uses the extended version to support pattern-matching operations performed with the REGEXP operator in SQL statements. This section summarizes, with examples, the special characters and constructs that can be used in MySQL for REGEXP operations. It does not contain all the details that can be found in Henry Spencer's regex(7) manual page. That manual page is included in MySQL source distributions, in the regex.7 file under the regex directory. See also Section 3.3.4.7, “Pattern Matching”.
Regular Expression Operators • expr NOT REGEXP pat, expr NOT RLIKE pat
1171
Regular Expressions
This is the same as NOT (expr REGEXP pat). •
expr REGEXP pat, expr RLIKE pat Performs a pattern match of a string expression expr against a pattern pat. The pattern can be an extended regular expression, the syntax for which is discussed later in this section. Returns 1 if expr matches pat; otherwise it returns 0. If either expr or pat is NULL, the result is NULL. RLIKE is a synonym for REGEXP, provided for mSQL compatibility. The pattern need not be a literal string. For example, it can be specified as a string expression or table column. Note Because MySQL uses the C escape syntax in strings (for example, \n to represent the newline character), you must double any \ that you use in your REGEXP strings. REGEXP is not case sensitive, except when used with binary strings. mysql> SELECT -> 1 mysql> SELECT -> 1 mysql> SELECT -> 1 mysql> SELECT -> 1
'Michael!' REGEXP '.*'; 'new*\n*line' REGEXP 'new\\*.\\*line'; 'a' REGEXP 'A', 'a' REGEXP BINARY 'A'; 0 'a' REGEXP '^[a-d]';
REGEXP and RLIKE use the character set and collations of the arguments when deciding the type of a character and performing the comparison. If the arguments have different character sets or collations, coercibility rules apply as described in Section 10.1.8.4, “Collation Coercibility in Expressions”. Warning The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multibyte safe and may produce unexpected results with multibyte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.
Syntax of Regular Expressions A regular expression describes a set of strings. The simplest regular expression is one that has no special characters in it. For example, the regular expression hello matches hello and nothing else. Nontrivial regular expressions use certain special constructs so that they can match more than one string. For example, the regular expression hello|word matches either the string hello or the string word. As a more complex example, the regular expression B[an]*s matches any of the strings Bananas, Baaaaas, Bs, and any other string starting with a B, ending with an s, and containing any number of a or n characters in between. A regular expression for the REGEXP operator may use any of the following special characters and constructs: • ^ Match the beginning of a string.
1172
Regular Expressions
mysql> SELECT 'fo\nfo' REGEXP '^fo$'; mysql> SELECT 'fofo' REGEXP '^fo';
-> 0 -> 1
• $ Match the end of a string. mysql> SELECT 'fo\no' REGEXP '^fo\no$'; mysql> SELECT 'fo\no' REGEXP '^fo$';
-> 1 -> 0
• . Match any character (including carriage return and newline). mysql> SELECT 'fofo' REGEXP '^f.*$'; mysql> SELECT 'fo\r\nfo' REGEXP '^f.*$';
-> 1 -> 1
• a* Match any sequence of zero or more a characters. mysql> SELECT 'Ban' REGEXP '^Ba*n'; mysql> SELECT 'Baaan' REGEXP '^Ba*n'; mysql> SELECT 'Bn' REGEXP '^Ba*n';
-> 1 -> 1 -> 1
• a+ Match any sequence of one or more a characters. mysql> SELECT 'Ban' REGEXP '^Ba+n'; mysql> SELECT 'Bn' REGEXP '^Ba+n';
-> 1 -> 0
• a? Match either zero or one a character. mysql> SELECT 'Bn' REGEXP '^Ba?n'; mysql> SELECT 'Ban' REGEXP '^Ba?n'; mysql> SELECT 'Baan' REGEXP '^Ba?n';
-> 1 -> 1 -> 0
• de|abc Match either of the sequences de or abc. mysql> mysql> mysql> mysql> mysql> mysql>
SELECT SELECT SELECT SELECT SELECT SELECT
'pi' REGEXP 'pi|apa'; 'axe' REGEXP 'pi|apa'; 'apa' REGEXP 'pi|apa'; 'apa' REGEXP '^(pi|apa)$'; 'pi' REGEXP '^(pi|apa)$'; 'pix' REGEXP '^(pi|apa)$';
-> -> -> -> -> ->
1 0 1 1 1 0
• (abc)* Match zero or more instances of the sequence abc. mysql> SELECT 'pi' REGEXP '^(pi)*$'; mysql> SELECT 'pip' REGEXP '^(pi)*$'; mysql> SELECT 'pipi' REGEXP '^(pi)*$';
-> 1 -> 0 -> 1
• {1}, {2,3}
1173
Regular Expressions
{n} or {m,n} notation provides a more general way of writing regular expressions that match many occurrences of the previous atom (or “piece”) of the pattern. m and n are integers. • a* Can be written as a{0,}. • a+ Can be written as a{1,}. • a? Can be written as a{0,1}. To be more precise, a{n} matches exactly n instances of a. a{n,} matches n or more instances of a. a{m,n} matches m through n instances of a, inclusive. m and n must be in the range from 0 to RE_DUP_MAX (default 255), inclusive. If both m and n are given, m must be less than or equal to n. mysql> SELECT 'abcde' REGEXP 'a[bcd]{2}e'; mysql> SELECT 'abcde' REGEXP 'a[bcd]{3}e'; mysql> SELECT 'abcde' REGEXP 'a[bcd]{1,10}e';
-> 0 -> 1 -> 1
• [a-dX], [^a-dX] Matches any character that is (or is not, if ^ is used) either a, b, c, d or X. A - character between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9] matches any decimal digit. To include a literal ] character, it must immediately follow the opening bracket [. To include a literal - character, it must be written first or last. Any character that does not have a defined special meaning inside a [] pair matches only itself. mysql> mysql> mysql> mysql> mysql> mysql>
SELECT SELECT SELECT SELECT SELECT SELECT
'aXbc' REGEXP '[a-dXYZ]'; 'aXbc' REGEXP '^[a-dXYZ]$'; 'aXbc' REGEXP '^[a-dXYZ]+$'; 'aXbc' REGEXP '^[^a-dXYZ]+$'; 'gheis' REGEXP '^[^a-dXYZ]+$'; 'gheisa' REGEXP '^[^a-dXYZ]+$';
-> -> -> -> -> ->
1 0 1 0 1 0
• [.characters.] Within a bracket expression (written using [ and ]), matches the sequence of characters of that collating element. characters is either a single character or a character name like newline. The following table lists the permissible character names. The following table shows the permissible character names and the characters that they match. For characters given as numeric values, the values are represented in octal.
Name
Character
Name
Character
NUL
0
SOH
001
STX
002
ETX
003
EOT
004
ENQ
005
ACK
006
BEL
007
alert
007
BS
010
backspace
'\b'
HT
011
tab
'\t'
LF
012
1174
Regular Expressions
Name
Character
Name
Character
newline
'\n'
VT
013
vertical-tab
'\v'
FF
014
form-feed
'\f'
CR
015
carriage-return
'\r'
SO
016
SI
017
DLE
020
DC1
021
DC2
022
DC3
023
DC4
024
NAK
025
SYN
026
ETB
027
CAN
030
EM
031
SUB
032
ESC
033
IS4
034
FS
034
IS3
035
GS
035
IS2
036
RS
036
IS1
037
US
037
space
' '
exclamation-mark
'!'
quotation-mark
'"'
number-sign
'#'
dollar-sign
'$'
percent-sign
'%'
ampersand
'&'
apostrophe
'\''
left-parenthesis
'('
right-parenthesis ')'
asterisk
'*'
plus-sign
'+'
comma
','
hyphen
'-'
hyphen-minus
'-'
period
'.'
full-stop
'.'
slash
'/'
solidus
'/'
zero
'0'
one
'1'
two
'2'
three
'3'
four
'4'
five
'5'
six
'6'
seven
'7'
eight
'8'
nine
'9'
colon
':'
semicolon
';'
less-than-sign
'<'
equals-sign
'='
greater-than-sign '>'
question-mark
'?'
commercial-at
'@'
left-squarebracket
'['
backslash
'\\'
reverse-solidus
'\\'
right-squarebracket
']'
circumflex
'^'
circumflex-accent '^'
underscore
'_'
low-line
'_'
grave-accent
'`'
left-brace
'{'
left-curlybracket
'{'
vertical-line
'|'
right-brace
'}'
1175
Regular Expressions
Name
Character
Name
Character
right-curlybracket
'}'
tilde
'~'
DEL
177
mysql> SELECT '~' REGEXP '[[.~.]]'; mysql> SELECT '~' REGEXP '[[.tilde.]]';
-> 1 -> 1
• [=character_class=] Within a bracket expression (written using [ and ]), [=character_class=] represents an equivalence class. It matches all characters with the same collation value, including itself. For example, if o and (+) are the members of an equivalence class, [[=o=]], [[=(+)=]], and [o(+)] are all synonymous. An equivalence class may not be used as an endpoint of a range. • [:character_class:] Within a bracket expression (written using [ and ]), [:character_class:] represents a character class that matches all characters belonging to that class. The following table lists the standard class names. These names stand for the character classes defined in the ctype(3) manual page. A particular locale may provide other class names. A character class may not be used as an endpoint of a range. Character Class Name
Meaning
alnum
Alphanumeric characters
alpha
Alphabetic characters
blank
Whitespace characters
cntrl
Control characters
digit
Digit characters
graph
Graphic characters
lower
Lowercase alphabetic characters
print
Graphic or space characters
punct
Punctuation characters
space
Space, tab, newline, and carriage return
upper
Uppercase alphabetic characters
xdigit
Hexadecimal digit characters
mysql> SELECT 'justalnums' REGEXP '[[:alnum:]]+'; mysql> SELECT '!!' REGEXP '[[:alnum:]]+';
-> 1 -> 0
• [[:<:]], [[:>:]] These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_). mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]';
-> 1 -> 0
To use a literal instance of a special character in a regular expression, precede it by two backslash (\) characters. The MySQL parser interprets one of the backslashes, and the regular expression library 1176
Character Set and Collation of Function Results
interprets the other. For example, to match the string 1+2 that contains the special + character, only the last of the following regular expressions is the correct one: mysql> SELECT '1+2' REGEXP '1+2'; mysql> SELECT '1+2' REGEXP '1\+2'; mysql> SELECT '1+2' REGEXP '1\\+2';
-> 0 -> 0 -> 1
12.5.3 Character Set and Collation of Function Results MySQL has many operators and functions that return a string. This section answers the question: What is the character set and collation of such a string? For simple functions that take string input and return a string result as output, the output's character set and collation are the same as those of the principal input value. For example, UPPER(X) returns a string with the same character string and collation as X. The same applies for INSTR(), LCASE(), LOWER(), LTRIM(), MID(), REPEAT(), REPLACE(), REVERSE(), RIGHT(), RPAD(), RTRIM(), SOUNDEX(), SUBSTRING(), TRIM(), UCASE(), and UPPER(). Note The REPLACE() function, unlike all other functions, always ignores the collation of the string input and performs a case-sensitive comparison. If a string input or function result is a binary string, the string has the binary character set and collation. This can be checked by using the CHARSET() and COLLATION() functions, both of which return binary for a binary string argument: mysql> SELECT CHARSET(BINARY 'a'), COLLATION(BINARY 'a'); +---------------------+-----------------------+ | CHARSET(BINARY 'a') | COLLATION(BINARY 'a') | +---------------------+-----------------------+ | binary | binary | +---------------------+-----------------------+
For operations that combine multiple string inputs and return a single string output, the “aggregation rules” of standard SQL apply for determining the collation of the result: • If an explicit COLLATE Y occurs, use Y. • If explicit COLLATE Y and COLLATE Z occur, raise an error. • Otherwise, if all collations are Y, use Y. • Otherwise, the result has no collation. For example, with CASE ... WHEN a THEN b WHEN b THEN c COLLATE X END, the resulting collation is X. The same applies for UNION, ||, CONCAT(), ELT(), GREATEST(), IF(), and LEAST(). For operations that convert to character data, the character set and collation of the strings that result from the operations are defined by the character_set_connection and collation_connection system variables that determine the default connection character set and collation (see Section 10.1.4, “Connection Character Sets and Collations”). This applies only to CAST(), CONV(), FORMAT(), HEX(), and SPACE(). If there is any question about the character set or collation of the result returned by a string function, use the CHARSET() or COLLATION() function to find out: mysql> SELECT USER(), CHARSET(USER()), COLLATION(USER()); +----------------+-----------------+-------------------+
1177
Numeric Functions and Operators
| USER() | CHARSET(USER()) | COLLATION(USER()) | +----------------+-----------------+-------------------+ | test@localhost | utf8 | utf8_general_ci | +----------------+-----------------+-------------------+ mysql> SELECT CHARSET(COMPRESS('abc')), COLLATION(COMPRESS('abc')); +--------------------------+----------------------------+ | CHARSET(COMPRESS('abc')) | COLLATION(COMPRESS('abc')) | +--------------------------+----------------------------+ | binary | binary | +--------------------------+----------------------------+
12.6 Numeric Functions and Operators Table 12.10 Numeric Functions and Operators Name
Description
ABS()
Return the absolute value
ACOS()
Return the arc cosine
ASIN()
Return the arc sine
ATAN()
Return the arc tangent
ATAN2(), ATAN()
Return the arc tangent of the two arguments
CEIL()
Return the smallest integer value not less than the argument
CEILING()
Return the smallest integer value not less than the argument
CONV()
Convert numbers between different number bases
COS()
Return the cosine
COT()
Return the cotangent
CRC32()
Compute a cyclic redundancy check value
DEGREES()
Convert radians to degrees
DIV
Integer division
/
Division operator
EXP()
Raise to the power of
FLOOR()
Return the largest integer value not greater than the argument
LN()
Return the natural logarithm of the argument
LOG()
Return the natural logarithm of the first argument
LOG10()
Return the base-10 logarithm of the argument
LOG2()
Return the base-2 logarithm of the argument
-
Minus operator
MOD()
Return the remainder
%, MOD
Modulo operator
PI()
Return the value of pi
+
Addition operator
POW()
Return the argument raised to the specified power
POWER()
Return the argument raised to the specified power
RADIANS()
Return argument converted to radians
RAND()
Return a random floating-point value
ROUND()
Round the argument
SIGN()
Return the sign of the argument
1178
Arithmetic Operators
Name
Description
SIN()
Return the sine of the argument
SQRT()
Return the square root of the argument
TAN()
Return the tangent of the argument
*
Multiplication operator
TRUNCATE()
Truncate to specified number of decimal places
-
Change the sign of the argument
12.6.1 Arithmetic Operators Table 12.11 Arithmetic Operators Name
Description
DIV
Integer division
/
Division operator
-
Minus operator
%, MOD
Modulo operator
+
Addition operator
*
Multiplication operator
-
Change the sign of the argument
The usual arithmetic operators are available. The result is determined according to the following rules: • In the case of -, +, and *, the result is calculated with BIGINT (64-bit) precision if both operands are integers. • If both operands are integers and any of them are unsigned, the result is an unsigned integer. For subtraction, if the NO_UNSIGNED_SUBTRACTION SQL mode is enabled, the result is signed even if any operand is unsigned. • If any of the operands of a +, -, /, *, % is a real or string value, the precision of the result is the precision of the operand with the maximum precision. • In division performed with /, the scale of the result when using two exact-value operands is the scale of the first operand plus the value of the div_precision_increment system variable (which is 4 by default). For example, the result of the expression 5.05 / 0.014 has a scale of six decimal places (360.714286). These rules are applied for each operation, such that nested calculations imply the precision of each component. Hence, (14620 / 9432456) / (24250 / 9432456), resolves first to (0.0014) / (0.0026), with the final result having 8 decimal places (0.60288653). Because of these rules and the way they are applied, care should be taken to ensure that components and subcomponents of a calculation use the appropriate level of precision. See Section 12.10, “Cast Functions and Operators”. For information about handling of overflow in numeric expression evaluation, see Section 11.2.6, “Outof-Range and Overflow Handling”. Arithmetic operators apply to numbers. For other types of values, alternative operations may be available. For example, to add date values, use DATE_ADD(); see Section 12.7, “Date and Time Functions”. •
+
1179
Arithmetic Operators
Addition: mysql> SELECT 3+5; -> 8
•
Subtraction: mysql> SELECT 3-5; -> -2
•
Unary minus. This operator changes the sign of the operand. mysql> SELECT - 2; -> -2
Note If this operator is used with a BIGINT, the return value is also a BIGINT. This means that you should avoid using - on integers that may have the value of 63 −2 . •
* Multiplication: mysql> SELECT 3*5; -> 15 mysql> SELECT 18014398509481984*18014398509481984.0; -> 324518553658426726783156020576256.0 mysql> SELECT 18014398509481984*18014398509481984; -> out-of-range error
The last expression produces an error because the result of the integer multiplication exceeds the 64-bit range of BIGINT calculations. (See Section 11.2, “Numeric Types”.) •
/ Division: mysql> SELECT 3/5; -> 0.60
Division by zero produces a NULL result: mysql> SELECT 102/(1-1); -> NULL
A division is calculated with BIGINT arithmetic only if performed in a context where its result is converted to an integer. • DIV Integer division. Discards from the division result any fractional part to the right of the decimal point. As of MySQL 5.5.3, if either operand has a noninteger type, the operands are converted to DECIMAL and divided using DECIMAL arithmetic before converting the result to BIGINT. If the result exceeds
1180
Mathematical Functions
BIGINT range, an error occurs. Before MySQL 5.5.3, incorrect results may occur for noninteger operands that exceed BIGINT range. mysql> SELECT 5 DIV 2, -5 DIV 2, 5 DIV -2, -5 DIV -2; -> 2, -2, -2, 2
• N % M, N MOD M Modulo operation. Returns the remainder of N divided by M. For more information, see the description for the MOD() function in Section 12.6.2, “Mathematical Functions”.
12.6.2 Mathematical Functions Table 12.12 Mathematical Functions Name
Description
ABS()
Return the absolute value
ACOS()
Return the arc cosine
ASIN()
Return the arc sine
ATAN()
Return the arc tangent
ATAN2(), ATAN()
Return the arc tangent of the two arguments
CEIL()
Return the smallest integer value not less than the argument
CEILING()
Return the smallest integer value not less than the argument
CONV()
Convert numbers between different number bases
COS()
Return the cosine
COT()
Return the cotangent
CRC32()
Compute a cyclic redundancy check value
DEGREES()
Convert radians to degrees
EXP()
Raise to the power of
FLOOR()
Return the largest integer value not greater than the argument
LN()
Return the natural logarithm of the argument
LOG()
Return the natural logarithm of the first argument
LOG10()
Return the base-10 logarithm of the argument
LOG2()
Return the base-2 logarithm of the argument
MOD()
Return the remainder
PI()
Return the value of pi
POW()
Return the argument raised to the specified power
POWER()
Return the argument raised to the specified power
RADIANS()
Return argument converted to radians
RAND()
Return a random floating-point value
ROUND()
Round the argument
SIGN()
Return the sign of the argument
SIN()
Return the sine of the argument
SQRT()
Return the square root of the argument
TAN()
Return the tangent of the argument
TRUNCATE()
Truncate to specified number of decimal places
1181
Mathematical Functions
All mathematical functions return NULL in the event of an error. • ABS(X) Returns the absolute value of X. mysql> SELECT ABS(2); -> 2 mysql> SELECT ABS(-32); -> 32
This function is safe to use with BIGINT values. • ACOS(X) Returns the arc cosine of X, that is, the value whose cosine is X. Returns NULL if X is not in the range -1 to 1. mysql> SELECT ACOS(1); -> 0 mysql> SELECT ACOS(1.0001); -> NULL mysql> SELECT ACOS(0); -> 1.5707963267949
• ASIN(X) Returns the arc sine of X, that is, the value whose sine is X. Returns NULL if X is not in the range -1 to 1. mysql> SELECT ASIN(0.2); -> 0.20135792079033 mysql> SELECT ASIN('foo'); +-------------+ | ASIN('foo') | +-------------+ | 0 | +-------------+ 1 row in set, 1 warning (0.00 sec) mysql> SHOW WARNINGS; +---------+------+-----------------------------------------+ | Level | Code | Message | +---------+------+-----------------------------------------+ | Warning | 1292 | Truncated incorrect DOUBLE value: 'foo' | +---------+------+-----------------------------------------+
• ATAN(X) Returns the arc tangent of X, that is, the value whose tangent is X. mysql> SELECT ATAN(2); -> 1.1071487177941 mysql> SELECT ATAN(-2); -> -1.1071487177941
• ATAN(Y,X), ATAN2(Y,X) Returns the arc tangent of the two variables X and Y. It is similar to calculating the arc tangent of Y / X, except that the signs of both arguments are used to determine the quadrant of the result. mysql> SELECT ATAN(-2,2); -> -0.78539816339745
1182
Mathematical Functions
mysql> SELECT ATAN2(PI(),0); -> 1.5707963267949
• CEIL(X) CEIL() is a synonym for CEILING(). • CEILING(X) Returns the smallest integer value not less than X. mysql> SELECT CEILING(1.23); -> 2 mysql> SELECT CEILING(-1.23); -> -1
For exact-value numeric arguments, the return value has an exact-value numeric type. For string or floating-point arguments, the return value has a floating-point type. • CONV(N,from_base,to_base) Converts numbers between different number bases. Returns a string representation of the number N, converted from base from_base to base to_base. Returns NULL if any argument is NULL. The argument N is interpreted as an integer, but may be specified as an integer or a string. The minimum base is 2 and the maximum base is 36. If from_base is a negative number, N is regarded as a signed number. Otherwise, N is treated as unsigned. CONV() works with 64-bit precision. mysql> SELECT CONV('a',16,2); -> '1010' mysql> SELECT CONV('6E',18,8); -> '172' mysql> SELECT CONV(-17,10,-18); -> '-H' mysql> SELECT CONV(10+'10'+'10'+X'0a',10,10); -> '40'
• COS(X) Returns the cosine of X, where X is given in radians. mysql> SELECT COS(PI()); -> -1
• COT(X) Returns the cotangent of X. mysql> SELECT COT(12); -> -1.5726734063977 mysql> SELECT COT(0); -> out-of-range error
• CRC32(expr) Computes a cyclic redundancy check value and returns a 32-bit unsigned value. The result is NULL if the argument is NULL. The argument is expected to be a string and (if possible) is treated as one if it is not. mysql> SELECT CRC32('MySQL'); -> 3259397556 mysql> SELECT CRC32('mysql'); -> 2501908538
1183
Mathematical Functions
• DEGREES(X) Returns the argument X, converted from radians to degrees. mysql> SELECT DEGREES(PI()); -> 180 mysql> SELECT DEGREES(PI() / 2); -> 90
• EXP(X) Returns the value of e (the base of natural logarithms) raised to the power of X. The inverse of this function is LOG() (using a single argument only) or LN(). mysql> SELECT EXP(2); -> 7.3890560989307 mysql> SELECT EXP(-2); -> 0.13533528323661 mysql> SELECT EXP(0); -> 1
• FLOOR(X) Returns the largest integer value not greater than X. mysql> SELECT FLOOR(1.23), FLOOR(-1.23); -> 1, -2
For exact-value numeric arguments, the return value has an exact-value numeric type. For string or floating-point arguments, the return value has a floating-point type. • FORMAT(X,D) Formats the number X to a format like '#,###,###.##', rounded to D decimal places, and returns the result as a string. For details, see Section 12.5, “String Functions”. • HEX(N_or_S) This function can be used to obtain a hexadecimal representation of a decimal number or a string; the manner in which it does so varies according to the argument's type. See this function's description in Section 12.5, “String Functions”, for details. • LN(X) Returns the natural logarithm of X; that is, the base-e logarithm of X. If X is less than or equal to 0, then NULL is returned. mysql> SELECT LN(2); -> 0.69314718055995 mysql> SELECT LN(-2); -> NULL
This function is synonymous with LOG(X). The inverse of this function is the EXP() function. • LOG(X), LOG(B,X) If called with one parameter, this function returns the natural logarithm of X. If X is less than or equal to 0, then NULL is returned. The inverse of this function (when called with a single argument) is the EXP() function. mysql> SELECT LOG(2);
1184
Mathematical Functions
-> 0.69314718055995 mysql> SELECT LOG(-2); -> NULL
If called with two parameters, this function returns the logarithm of X to the base B. If X is less than or equal to 0, or if B is less than or equal to 1, then NULL is returned. mysql> SELECT LOG(2,65536); -> 16 mysql> SELECT LOG(10,100); -> 2 mysql> SELECT LOG(1,100); -> NULL
LOG(B,X) is equivalent to LOG(X) / LOG(B). • LOG2(X) Returns the base-2 logarithm of X. mysql> SELECT LOG2(65536); -> 16 mysql> SELECT LOG2(-100); -> NULL
LOG2() is useful for finding out how many bits a number requires for storage. This function is equivalent to the expression LOG(X) / LOG(2). • LOG10(X) Returns the base-10 logarithm of X. mysql> SELECT LOG10(2); -> 0.30102999566398 mysql> SELECT LOG10(100); -> 2 mysql> SELECT LOG10(-100); -> NULL
LOG10(X) is equivalent to LOG(10,X). •
MOD(N,M), N % M, N MOD M Modulo operation. Returns the remainder of N divided by M. mysql> SELECT -> 4 mysql> SELECT -> 1 mysql> SELECT -> 2 mysql> SELECT -> 2
MOD(234, 10); 253 % 7; MOD(29,9); 29 MOD 9;
This function is safe to use with BIGINT values. MOD() also works on values that have a fractional part and returns the exact remainder after division: mysql> SELECT MOD(34.5,3); -> 1.5
MOD(N,0) returns NULL.
1185
Mathematical Functions
• PI() Returns the value of π (pi). The default number of decimal places displayed is seven, but MySQL uses the full double-precision value internally. mysql> SELECT PI(); -> 3.141593 mysql> SELECT PI()+0.000000000000000000; -> 3.141592653589793116
• POW(X,Y) Returns the value of X raised to the power of Y. mysql> SELECT POW(2,2); -> 4 mysql> SELECT POW(2,-2); -> 0.25
• POWER(X,Y) This is a synonym for POW(). • RADIANS(X) Returns the argument X, converted from degrees to radians. (Note that π radians equals 180 degrees.) mysql> SELECT RADIANS(90); -> 1.5707963267949
• RAND([N]) Returns a random floating-point value v in the range 0 <= v < 1.0. To obtain a random integer R in the range i <= R < j, use the expression FLOOR(i + RAND() * (j − i)). For example, to obtain a random integer in the range the range 7 <= R < 12, use the following statement: SELECT FLOOR(7 + (RAND() * 5));
If an integer argument N is specified, it is used as the seed value: • With a constant initializer argument, the seed is initialized once when the statement is prepared, prior to execution. • With a nonconstant initializer argument (such as a column name), the seed is initialized with the value for each invocation of RAND(). One implication of this behavior is that for equal argument values, RAND(N) returns the same value each time, and thus produces a repeatable sequence of column values. In the following example, the sequence of values produced by RAND(3) is the same both places it occurs.
mysql> CREATE TABLE t (i INT); Query OK, 0 rows affected (0.42 sec) mysql> INSERT INTO t VALUES(1),(2),(3); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> SELECT i, RAND() FROM t; +------+------------------+ | i | RAND() |
1186
Mathematical Functions
+------+------------------+ | 1 | 0.61914388706828 | | 2 | 0.93845168309142 | | 3 | 0.83482678498591 | +------+------------------+ 3 rows in set (0.00 sec) mysql> SELECT i, RAND(3) FROM t; +------+------------------+ | i | RAND(3) | +------+------------------+ | 1 | 0.90576975597606 | | 2 | 0.37307905813035 | | 3 | 0.14808605345719 | +------+------------------+ 3 rows in set (0.00 sec) mysql> SELECT i, RAND() FROM t; +------+------------------+ | i | RAND() | +------+------------------+ | 1 | 0.35877890638893 | | 2 | 0.28941420772058 | | 3 | 0.37073435016976 | +------+------------------+ 3 rows in set (0.00 sec) mysql> SELECT i, RAND(3) FROM t; +------+------------------+ | i | RAND(3) | +------+------------------+ | 1 | 0.90576975597606 | | 2 | 0.37307905813035 | | 3 | 0.14808605345719 | +------+------------------+ 3 rows in set (0.01 sec)
RAND() in a WHERE clause is evaluated for every row (when selecting from one table) or combination of rows (when selecting from a multiple-table join). Thus, for optimizer purposes, RAND() is not a constant value and cannot be used for index optimizations. For more information, see Section 8.2.1.14, “Function Call Optimization”. Use of a column with RAND() values in an ORDER BY or GROUP BY clause may yield unexpected results because for either clause a RAND() expression can be evaluated multiple times for the same row, each time returning a different result. If the goal is to retrieve rows in random order, you can use a statement like this: SELECT * FROM tbl_name ORDER BY RAND();
To select a random sample from a set of rows, combine ORDER BY RAND() with LIMIT: SELECT * FROM table1, table2 WHERE a=b AND c
RAND() is not meant to be a perfect random generator. It is a fast way to generate random numbers on demand that is portable between platforms for the same MySQL version. This function is unsafe for statement-based replication. Beginning with MySQL 5.5.2, a warning is logged if you use this function when binlog_format is set to STATEMENT. (Bug #49222) • ROUND(X), ROUND(X,D) Rounds the argument X to D decimal places. The rounding algorithm depends on the data type of X. D defaults to 0 if not specified. D can be negative to cause D digits left of the decimal point of the value X to become zero.
1187
Mathematical Functions
mysql> SELECT ROUND(-1.23); -> -1 mysql> SELECT ROUND(-1.58); -> -2 mysql> SELECT ROUND(1.58); -> 2 mysql> SELECT ROUND(1.298, 1); -> 1.3 mysql> SELECT ROUND(1.298, 0); -> 1 mysql> SELECT ROUND(23.298, -1); -> 20
The return value has the same type as the first argument (assuming that it is integer, double, or decimal). This means that for an integer argument, the result is an integer (no decimal places): mysql> SELECT ROUND(150.000,2), ROUND(150,2); +------------------+--------------+ | ROUND(150.000,2) | ROUND(150,2) | +------------------+--------------+ | 150.00 | 150 | +------------------+--------------+
ROUND() uses the following rules depending on the type of the first argument: • For exact-value numbers, ROUND() uses the “round half away from zero” or “round toward nearest” rule: A value with a fractional part of .5 or greater is rounded up to the next integer if positive or down to the next integer if negative. (In other words, it is rounded away from zero.) A value with a fractional part less than .5 is rounded down to the next integer if positive or up to the next integer if negative. • For approximate-value numbers, the result depends on the C library. On many systems, this means that ROUND() uses the "round to nearest even" rule: A value with any fractional part is rounded to the nearest even integer. The following example shows how rounding differs for exact and approximate values: mysql> SELECT ROUND(2.5), ROUND(25E-1); +------------+--------------+ | ROUND(2.5) | ROUND(25E-1) | +------------+--------------+ | 3 | 2 | +------------+--------------+
For more information, see Section 12.18, “Precision Math”. • SIGN(X) Returns the sign of the argument as -1, 0, or 1, depending on whether X is negative, zero, or positive. mysql> SELECT SIGN(-32); -> -1 mysql> SELECT SIGN(0); -> 0 mysql> SELECT SIGN(234); -> 1
• SIN(X) Returns the sine of X, where X is given in radians. mysql> SELECT SIN(PI()); -> 1.2246063538224e-16
1188
Date and Time Functions
mysql> SELECT ROUND(SIN(PI())); -> 0
• SQRT(X) Returns the square root of a nonnegative number X. mysql> SELECT SQRT(4); -> 2 mysql> SELECT SQRT(20); -> 4.4721359549996 mysql> SELECT SQRT(-16); -> NULL
• TAN(X) Returns the tangent of X, where X is given in radians. mysql> SELECT TAN(PI()); -> -1.2246063538224e-16 mysql> SELECT TAN(PI()+1); -> 1.5574077246549
• TRUNCATE(X,D) Returns the number X, truncated to D decimal places. If D is 0, the result has no decimal point or fractional part. D can be negative to cause D digits left of the decimal point of the value X to become zero. mysql> SELECT TRUNCATE(1.223,1); -> 1.2 mysql> SELECT TRUNCATE(1.999,1); -> 1.9 mysql> SELECT TRUNCATE(1.999,0); -> 1 mysql> SELECT TRUNCATE(-1.999,1); -> -1.9 mysql> SELECT TRUNCATE(122,-2); -> 100 mysql> SELECT TRUNCATE(10.28*100,0); -> 1028
All numbers are rounded toward zero.
12.7 Date and Time Functions This section describes the functions that can be used to manipulate temporal values. See Section 11.3, “Date and Time Types”, for a description of the range of values each date and time type has and the valid formats in which values may be specified. Table 12.13 Date/Time Functions Name
Description
ADDDATE()
Add time values (intervals) to a date value
ADDTIME()
Add time
CONVERT_TZ()
Convert from one time zone to another
CURDATE()
Return the current date
CURRENT_DATE(), CURRENT_DATE
Synonyms for CURDATE()
CURRENT_TIME(), CURRENT_TIME
Synonyms for CURTIME() 1189
Date and Time Functions
Name
Description
CURRENT_TIMESTAMP(), CURRENT_TIMESTAMP
Synonyms for NOW()
CURTIME()
Return the current time
DATE()
Extract the date part of a date or datetime expression
DATE_ADD()
Add time values (intervals) to a date value
DATE_FORMAT()
Format date as specified
DATE_SUB()
Subtract a time value (interval) from a date
DATEDIFF()
Subtract two dates
DAY()
Synonym for DAYOFMONTH()
DAYNAME()
Return the name of the weekday
DAYOFMONTH()
Return the day of the month (0-31)
DAYOFWEEK()
Return the weekday index of the argument
DAYOFYEAR()
Return the day of the year (1-366)
EXTRACT()
Extract part of a date
FROM_DAYS()
Convert a day number to a date
FROM_UNIXTIME()
Format Unix timestamp as a date
GET_FORMAT()
Return a date format string
HOUR()
Extract the hour
LAST_DAY
Return the last day of the month for the argument
LOCALTIME(), LOCALTIME
Synonym for NOW()
LOCALTIMESTAMP, LOCALTIMESTAMP()
Synonym for NOW()
MAKEDATE()
Create a date from the year and day of year
MAKETIME()
Create time from hour, minute, second
MICROSECOND()
Return the microseconds from argument
MINUTE()
Return the minute from the argument
MONTH()
Return the month from the date passed
MONTHNAME()
Return the name of the month
NOW()
Return the current date and time
PERIOD_ADD()
Add a period to a year-month
PERIOD_DIFF()
Return the number of months between periods
QUARTER()
Return the quarter from a date argument
SEC_TO_TIME()
Converts seconds to 'HH:MM:SS' format
SECOND()
Return the second (0-59)
STR_TO_DATE()
Convert a string to a date
SUBDATE()
Synonym for DATE_SUB() when invoked with three arguments
SUBTIME()
Subtract times
SYSDATE()
Return the time at which the function executes
TIME()
Extract the time portion of the expression passed
TIME_FORMAT()
Format as time
TIME_TO_SEC()
Return the argument converted to seconds
1190
Date and Time Functions
Name
Description
TIMEDIFF()
Subtract time
TIMESTAMP()
With a single argument, this function returns the date or datetime expression; with two arguments, the sum of the arguments
TIMESTAMPADD()
Add an interval to a datetime expression
TIMESTAMPDIFF()
Subtract an interval from a datetime expression
TO_DAYS()
Return the date argument converted to days
TO_SECONDS()
Return the date or datetime argument converted to seconds since Year 0
UNIX_TIMESTAMP()
Return a Unix timestamp
UTC_DATE()
Return the current UTC date
UTC_TIME()
Return the current UTC time
UTC_TIMESTAMP()
Return the current UTC date and time
WEEK()
Return the week number
WEEKDAY()
Return the weekday index
WEEKOFYEAR()
Return the calendar week of the date (1-53)
YEAR()
Return the year
YEARWEEK()
Return the year and week
Here is an example that uses date functions. The following query selects all rows with a date_col value from within the last 30 days: mysql> SELECT something FROM tbl_name -> WHERE DATE_SUB(CURDATE(),INTERVAL 30 DAY) <= date_col;
The query also selects rows with dates that lie in the future. Functions that expect date values usually accept datetime values and ignore the time part. Functions that expect time values usually accept datetime values and ignore the date part. Functions that return the current date or time each are evaluated only once per query at the start of query execution. This means that multiple references to a function such as NOW() within a single query always produce the same result. (For our purposes, a single query also includes a call to a stored program (stored routine, trigger, or event) and all subprograms called by that program.) This principle also applies to CURDATE(), CURTIME(), UTC_DATE(), UTC_TIME(), UTC_TIMESTAMP(), and to any of their synonyms. The CURRENT_TIMESTAMP(), CURRENT_TIME(), CURRENT_DATE(), and FROM_UNIXTIME() functions return values in the connection's current time zone, which is available as the value of the time_zone system variable. In addition, UNIX_TIMESTAMP() assumes that its argument is a datetime value in the current time zone. See Section 10.6, “MySQL Server Time Zone Support”. Some date functions can be used with “zero” dates or incomplete dates such as '2001-11-00', whereas others cannot. Functions that extract parts of dates typically work with incomplete dates and thus can return 0 when you might otherwise expect a nonzero value. For example: mysql> SELECT DAYOFMONTH('2001-11-00'), MONTH('2005-00-00'); -> 0, 0
Other functions expect complete dates and return NULL for incomplete dates. These include functions that perform date arithmetic or that map parts of dates to names. For example:
1191
Date and Time Functions
mysql> SELECT DATE_ADD('2006-05-00',INTERVAL 1 DAY); -> NULL mysql> SELECT DAYNAME('2006-05-00'); -> NULL
Note From MySQL 5.5.16 to 5.5.20, a change in handling of a date-related assertion caused several functions to become more strict when passed a DATE() function value as their argument and reject incomplete dates with a day part of zero. These functions are affected: CONVERT_TZ(), DATE_ADD(), DATE_SUB(), DAYOFYEAR(), LAST_DAY(), TIMESTAMPDIFF(), TO_DAYS(), TO_SECONDS(), WEEK(), WEEKDAY(), WEEKOFYEAR(), YEARWEEK(). Because this changes date-handling behavior in General Availability-status series MySQL 5.5, the change was reverted in 5.5.21. • ADDDATE(date,INTERVAL expr unit), ADDDATE(expr,days) When invoked with the INTERVAL form of the second argument, ADDDATE() is a synonym for DATE_ADD(). The related function SUBDATE() is a synonym for DATE_SUB(). For information on the INTERVAL unit argument, see the discussion for DATE_ADD(). mysql> SELECT DATE_ADD('2008-01-02', INTERVAL 31 DAY); -> '2008-02-02' mysql> SELECT ADDDATE('2008-01-02', INTERVAL 31 DAY); -> '2008-02-02'
When invoked with the days form of the second argument, MySQL treats it as an integer number of days to be added to expr. mysql> SELECT ADDDATE('2008-01-02', 31); -> '2008-02-02'
• ADDTIME(expr1,expr2) ADDTIME() adds expr2 to expr1 and returns the result. expr1 is a time or datetime expression, and expr2 is a time expression. mysql> SELECT ADDTIME('2007-12-31 23:59:59.999999', '1 1:1:1.000002'); -> '2008-01-02 01:01:01.000001' mysql> SELECT ADDTIME('01:00:00.999999', '02:00:00.999998'); -> '03:00:01.999997'
• CONVERT_TZ(dt,from_tz,to_tz) CONVERT_TZ() converts a datetime value dt from the time zone given by from_tz to the time zone given by to_tz and returns the resulting value. Time zones are specified as described in Section 10.6, “MySQL Server Time Zone Support”. This function returns NULL if the arguments are invalid. If the value falls out of the supported range of the TIMESTAMP type when converted from from_tz to UTC, no conversion occurs. The TIMESTAMP range is described in Section 11.1.2, “Date and Time Type Overview”. mysql> SELECT CONVERT_TZ('2004-01-01 12:00:00','GMT','MET'); -> '2004-01-01 13:00:00' mysql> SELECT CONVERT_TZ('2004-01-01 12:00:00','+00:00','+10:00'); -> '2004-01-01 22:00:00'
1192
Date and Time Functions
Note To use named time zones such as 'MET' or 'Europe/Moscow', the time zone tables must be properly set up. See Section 10.6, “MySQL Server Time Zone Support”, for instructions. • CURDATE() Returns the current date as a value in 'YYYY-MM-DD' or YYYYMMDD format, depending on whether the function is used in a string or numeric context. mysql> SELECT CURDATE(); -> '2008-06-13' mysql> SELECT CURDATE() + 0; -> 20080613
• CURRENT_DATE, CURRENT_DATE() CURRENT_DATE and CURRENT_DATE() are synonyms for CURDATE(). • CURRENT_TIME, CURRENT_TIME() CURRENT_TIME and CURRENT_TIME() are synonyms for CURTIME(). • CURRENT_TIMESTAMP, CURRENT_TIMESTAMP() CURRENT_TIMESTAMP and CURRENT_TIMESTAMP() are synonyms for NOW(). • CURTIME() Returns the current time as a value in 'HH:MM:SS' or HHMMSS.uuuuuu format, depending on whether the function is used in a string or numeric context. The value is expressed in the current time zone. mysql> SELECT CURTIME(); -> '23:50:26' mysql> SELECT CURTIME() + 0; -> 235026.000000
• DATE(expr) Extracts the date part of the date or datetime expression expr. mysql> SELECT DATE('2003-12-31 01:02:03'); -> '2003-12-31'
• DATEDIFF(expr1,expr2) DATEDIFF() returns expr1 − expr2 expressed as a value in days from one date to the other. expr1 and expr2 are date or date-and-time expressions. Only the date parts of the values are used in the calculation. mysql> SELECT DATEDIFF('2007-12-31 23:59:59','2007-12-30'); -> 1 mysql> SELECT DATEDIFF('2010-11-30 23:59:59','2010-12-31'); -> -31
•
DATE_ADD(date,INTERVAL expr unit), DATE_SUB(date,INTERVAL expr unit) These functions perform date arithmetic. The date argument specifies the starting date or datetime value. expr is an expression specifying the interval value to be added or subtracted from the starting
1193
Date and Time Functions
date. expr is a string; it may start with a - for negative intervals. unit is a keyword indicating the units in which the expression should be interpreted. The INTERVAL keyword and the unit specifier are not case sensitive. The following table shows the expected form of the expr argument for each unit value.
unit Value
Expected expr Format
MICROSECOND
MICROSECONDS
SECOND
SECONDS
MINUTE
MINUTES
HOUR
HOURS
DAY
DAYS
WEEK
WEEKS
MONTH
MONTHS
QUARTER
QUARTERS
YEAR
YEARS
SECOND_MICROSECOND
'SECONDS.MICROSECONDS'
MINUTE_MICROSECOND
'MINUTES:SECONDS.MICROSECONDS'
MINUTE_SECOND
'MINUTES:SECONDS'
HOUR_MICROSECOND
'HOURS:MINUTES:SECONDS.MICROSECONDS'
HOUR_SECOND
'HOURS:MINUTES:SECONDS'
HOUR_MINUTE
'HOURS:MINUTES'
DAY_MICROSECOND
'DAYS HOURS:MINUTES:SECONDS.MICROSECONDS'
DAY_SECOND
'DAYS HOURS:MINUTES:SECONDS'
DAY_MINUTE
'DAYS HOURS:MINUTES'
DAY_HOUR
'DAYS HOURS'
YEAR_MONTH
'YEARS-MONTHS'
The return value depends on the arguments: • DATETIME if the first argument is a DATETIME (or TIMESTAMP) value, or if the first argument is a DATE and the unit value uses HOURS, MINUTES, or SECONDS. • String otherwise. To ensure that the result is DATETIME, you can use CAST() to convert the first argument to DATETIME. MySQL permits any punctuation delimiter in the expr format. Those shown in the table are the suggested delimiters. If the date argument is a DATE value and your calculations involve only YEAR, MONTH, and DAY parts (that is, no time parts), the result is a DATE value. Otherwise, the result is a DATETIME value. Date arithmetic also can be performed using INTERVAL together with the + or - operator: date + INTERVAL expr unit date - INTERVAL expr unit
1194
Date and Time Functions
INTERVAL expr unit is permitted on either side of the + operator if the expression on the other side is a date or datetime value. For the - operator, INTERVAL expr unit is permitted only on the right side, because it makes no sense to subtract a date or datetime value from an interval. mysql> SELECT '2008-12-31 23:59:59' + INTERVAL 1 SECOND; -> '2009-01-01 00:00:00' mysql> SELECT INTERVAL 1 DAY + '2008-12-31'; -> '2009-01-01' mysql> SELECT '2005-01-01' - INTERVAL 1 SECOND; -> '2004-12-31 23:59:59' mysql> SELECT DATE_ADD('2000-12-31 23:59:59', -> INTERVAL 1 SECOND); -> '2001-01-01 00:00:00' mysql> SELECT DATE_ADD('2010-12-31 23:59:59', -> INTERVAL 1 DAY); -> '2011-01-01 23:59:59' mysql> SELECT DATE_ADD('2100-12-31 23:59:59', -> INTERVAL '1:1' MINUTE_SECOND); -> '2101-01-01 00:01:00' mysql> SELECT DATE_SUB('2005-01-01 00:00:00', -> INTERVAL '1 1:1:1' DAY_SECOND); -> '2004-12-30 22:58:59' mysql> SELECT DATE_ADD('1900-01-01 00:00:00', -> INTERVAL '-1 10' DAY_HOUR); -> '1899-12-30 14:00:00' mysql> SELECT DATE_SUB('1998-01-02', INTERVAL 31 DAY); -> '1997-12-02' mysql> SELECT DATE_ADD('1992-12-31 23:59:59.000002', -> INTERVAL '1.999999' SECOND_MICROSECOND); -> '1993-01-01 00:00:01.000001'
If you specify an interval value that is too short (does not include all the interval parts that would be expected from the unit keyword), MySQL assumes that you have left out the leftmost parts of the interval value. For example, if you specify a unit of DAY_SECOND, the value of expr is expected to have days, hours, minutes, and seconds parts. If you specify a value like '1:10', MySQL assumes that the days and hours parts are missing and the value represents minutes and seconds. In other words, '1:10' DAY_SECOND is interpreted in such a way that it is equivalent to '1:10' MINUTE_SECOND. This is analogous to the way that MySQL interprets TIME values as representing elapsed time rather than as a time of day. Because expr is treated as a string, be careful if you specify a nonstring value with INTERVAL. For example, with an interval specifier of HOUR_MINUTE, 6/4 evaluates to 1.5000 and is treated as 1 hour, 5000 minutes: mysql> SELECT 6/4; -> 1.5000 mysql> SELECT DATE_ADD('2009-01-01', INTERVAL 6/4 HOUR_MINUTE); -> '2009-01-04 12:20:00'
To ensure interpretation of the interval value as you expect, a CAST() operation may be used. To treat 6/4 as 1 hour, 5 minutes, cast it to a DECIMAL value with a single fractional digit: mysql> SELECT CAST(6/4 AS DECIMAL(3,1)); -> 1.5 mysql> SELECT DATE_ADD('1970-01-01 12:00:00', -> INTERVAL CAST(6/4 AS DECIMAL(3,1)) HOUR_MINUTE); -> '1970-01-01 13:05:00'
If you add to or subtract from a date value something that contains a time part, the result is automatically converted to a datetime value: mysql> SELECT DATE_ADD('2013-01-01', INTERVAL 1 DAY);
1195
Date and Time Functions
-> '2013-01-02' mysql> SELECT DATE_ADD('2013-01-01', INTERVAL 1 HOUR); -> '2013-01-01 01:00:00'
If you add MONTH, YEAR_MONTH, or YEAR and the resulting date has a day that is larger than the maximum day for the new month, the day is adjusted to the maximum days in the new month: mysql> SELECT DATE_ADD('2009-01-30', INTERVAL 1 MONTH); -> '2009-02-28'
Date arithmetic operations require complete dates and do not work with incomplete dates such as '2006-07-00' or badly malformed dates: mysql> SELECT DATE_ADD('2006-07-00', INTERVAL 1 DAY); -> NULL mysql> SELECT '2005-03-32' + INTERVAL 1 MONTH; -> NULL
• DATE_FORMAT(date,format) Formats the date value according to the format string. The following specifiers may be used in the format string. The % character is required before format specifier characters. Specifier
Description
%a
Abbreviated weekday name (Sun..Sat)
%b
Abbreviated month name (Jan..Dec)
%c
Month, numeric (0..12)
%D
Day of the month with English suffix (0th, 1st, 2nd, 3rd, …)
%d
Day of the month, numeric (00..31)
%e
Day of the month, numeric (0..31)
%f
Microseconds (000000..999999)
%H
Hour (00..23)
%h
Hour (01..12)
%I
Hour (01..12)
%i
Minutes, numeric (00..59)
%j
Day of year (001..366)
%k
Hour (0..23)
%l
Hour (1..12)
%M
Month name (January..December)
%m
Month, numeric (00..12)
%p
AM or PM
%r
Time, 12-hour (hh:mm:ss followed by AM or PM)
%S
Seconds (00..59)
%s
Seconds (00..59)
%T
Time, 24-hour (hh:mm:ss)
%U
Week (00..53), where Sunday is the first day of the week; WEEK() mode 0
%u
Week (00..53), where Monday is the first day of the week; WEEK() mode 1
1196
Date and Time Functions
Specifier
Description
%V
Week (01..53), where Sunday is the first day of the week; WEEK() mode 2; used with %X
%v
Week (01..53), where Monday is the first day of the week; WEEK() mode 3; used with %x
%W
Weekday name (Sunday..Saturday)
%w
Day of the week (0=Sunday..6=Saturday)
%X
Year for the week where Sunday is the first day of the week, numeric, four digits; used with %V
%x
Year for the week, where Monday is the first day of the week, numeric, four digits; used with %v
%Y
Year, numeric, four digits
%y
Year, numeric (two digits)
%%
A literal % character
%x
x, for any “x” not listed above
Ranges for the month and day specifiers begin with zero due to the fact that MySQL permits the storing of incomplete dates such as '2014-00-00'. The language used for day and month names and abbreviations is controlled by the value of the lc_time_names system variable (Section 10.7, “MySQL Server Locale Support”). For the %U, %u, %V, and %v specifiers, see the description of the WEEK() function for information about the mode values. The mode affects how week numbering occurs. DATE_FORMAT() returns a string with a character set and collation given by character_set_connection and collation_connection so that it can return month and weekday names containing non-ASCII characters. mysql> SELECT DATE_FORMAT('2009-10-04 22:23:00', '%W %M %Y'); -> 'Sunday October 2009' mysql> SELECT DATE_FORMAT('2007-10-04 22:23:00', '%H:%i:%s'); -> '22:23:00' mysql> SELECT DATE_FORMAT('1900-10-04 22:23:00', -> '%D %y %a %d %m %b %j'); -> '4th 00 Thu 04 10 Oct 277' mysql> SELECT DATE_FORMAT('1997-10-04 22:23:00', -> '%H %k %I %r %T %S %w'); -> '22 22 10 10:23:00 PM 22:23:00 00 6' mysql> SELECT DATE_FORMAT('1999-01-01', '%X %V'); -> '1998 52' mysql> SELECT DATE_FORMAT('2006-06-00', '%d'); -> '00'
• DATE_SUB(date,INTERVAL expr unit) See the description for DATE_ADD(). • DAY(date) DAY() is a synonym for DAYOFMONTH(). • DAYNAME(date) Returns the name of the weekday for date. The language used for the name is controlled by the value of the lc_time_names system variable (Section 10.7, “MySQL Server Locale Support”). mysql> SELECT DAYNAME('2007-02-03');
1197
Date and Time Functions
-> 'Saturday'
• DAYOFMONTH(date) Returns the day of the month for date, in the range 1 to 31, or 0 for dates such as '0000-00-00' or '2008-00-00' that have a zero day part. mysql> SELECT DAYOFMONTH('2007-02-03'); -> 3
• DAYOFWEEK(date) Returns the weekday index for date (1 = Sunday, 2 = Monday, …, 7 = Saturday). These index values correspond to the ODBC standard. mysql> SELECT DAYOFWEEK('2007-02-03'); -> 7
• DAYOFYEAR(date) Returns the day of the year for date, in the range 1 to 366. mysql> SELECT DAYOFYEAR('2007-02-03'); -> 34
• EXTRACT(unit FROM date) The EXTRACT() function uses the same kinds of unit specifiers as DATE_ADD() or DATE_SUB(), but extracts parts from the date rather than performing date arithmetic. mysql> SELECT EXTRACT(YEAR FROM '2009-07-02'); -> 2009 mysql> SELECT EXTRACT(YEAR_MONTH FROM '2009-07-02 01:02:03'); -> 200907 mysql> SELECT EXTRACT(DAY_MINUTE FROM '2009-07-02 01:02:03'); -> 20102 mysql> SELECT EXTRACT(MICROSECOND -> FROM '2003-01-02 10:30:00.000123'); -> 123
• FROM_DAYS(N) Given a day number N, returns a DATE value. mysql> SELECT FROM_DAYS(730669); -> '2007-07-03'
Use FROM_DAYS() with caution on old dates. It is not intended for use with values that precede the advent of the Gregorian calendar (1582). See Section 12.8, “What Calendar Is Used By MySQL?”. • FROM_UNIXTIME(unix_timestamp), FROM_UNIXTIME(unix_timestamp,format) Returns a representation of the unix_timestamp argument as a value in 'YYYY-MM-DD HH:MM:SS' or YYYYMMDDHHMMSS.uuuuuu format, depending on whether the function is used in a string or numeric context. The value is expressed in the current time zone. unix_timestamp is an internal timestamp value such as is produced by the UNIX_TIMESTAMP() function. If format is given, the result is formatted according to the format string, which is used the same way as listed in the entry for the DATE_FORMAT() function. mysql> SELECT FROM_UNIXTIME(1447430881);
1198
Date and Time Functions
-> '2015-11-13 10:08:01' mysql> SELECT FROM_UNIXTIME(1447430881) + 0; -> 20151113100801 mysql> SELECT FROM_UNIXTIME(UNIX_TIMESTAMP(), -> '%Y %D %M %h:%i:%s %x'); -> '2015 13th November 10:08:01 2015'
Note: If you use UNIX_TIMESTAMP() and FROM_UNIXTIME() to convert between TIMESTAMP values and Unix timestamp values, the conversion is lossy because the mapping is not one-to-one in both directions. For details, see the description of the UNIX_TIMESTAMP() function. • GET_FORMAT({DATE|TIME|DATETIME}, {'EUR'|'USA'|'JIS'|'ISO'|'INTERNAL'}) Returns a format string. This function is useful in combination with the DATE_FORMAT() and the STR_TO_DATE() functions. The possible values for the first and second arguments result in several possible format strings (for the specifiers used, see the table in the DATE_FORMAT() function description). ISO format refers to ISO 9075, not ISO 8601.
Function Call
Result
GET_FORMAT(DATE,'USA')
'%m.%d.%Y'
GET_FORMAT(DATE,'JIS')
'%Y-%m-%d'
GET_FORMAT(DATE,'ISO')
'%Y-%m-%d'
GET_FORMAT(DATE,'EUR')
'%d.%m.%Y'
GET_FORMAT(DATE,'INTERNAL')
'%Y%m%d'
GET_FORMAT(DATETIME,'USA')
'%Y-%m-%d %H.%i.%s'
GET_FORMAT(DATETIME,'JIS')
'%Y-%m-%d %H:%i:%s'
GET_FORMAT(DATETIME,'ISO')
'%Y-%m-%d %H:%i:%s'
GET_FORMAT(DATETIME,'EUR')
'%Y-%m-%d %H.%i.%s'
GET_FORMAT(DATETIME,'INTERNAL')
'%Y%m%d%H%i%s'
GET_FORMAT(TIME,'USA')
'%h:%i:%s %p'
GET_FORMAT(TIME,'JIS')
'%H:%i:%s'
GET_FORMAT(TIME,'ISO')
'%H:%i:%s'
GET_FORMAT(TIME,'EUR')
'%H.%i.%s'
GET_FORMAT(TIME,'INTERNAL')
'%H%i%s'
TIMESTAMP can also be used as the first argument to GET_FORMAT(), in which case the function returns the same values as for DATETIME. mysql> SELECT DATE_FORMAT('2003-10-03',GET_FORMAT(DATE,'EUR')); -> '03.10.2003' mysql> SELECT STR_TO_DATE('10.31.2003',GET_FORMAT(DATE,'USA')); -> '2003-10-31'
• HOUR(time) Returns the hour for time. The range of the return value is 0 to 23 for time-of-day values. However, the range of TIME values actually is much larger, so HOUR can return values greater than 23. mysql> SELECT HOUR('10:05:03'); -> 10 mysql> SELECT HOUR('272:59:59'); -> 272
1199
Date and Time Functions
• LAST_DAY(date) Takes a date or datetime value and returns the corresponding value for the last day of the month. Returns NULL if the argument is invalid. mysql> SELECT LAST_DAY('2003-02-05'); -> '2003-02-28' mysql> SELECT LAST_DAY('2004-02-05'); -> '2004-02-29' mysql> SELECT LAST_DAY('2004-01-01 01:01:01'); -> '2004-01-31' mysql> SELECT LAST_DAY('2003-03-32'); -> NULL
• LOCALTIME, LOCALTIME() LOCALTIME and LOCALTIME() are synonyms for NOW(). • LOCALTIMESTAMP, LOCALTIMESTAMP() LOCALTIMESTAMP and LOCALTIMESTAMP() are synonyms for NOW(). • MAKEDATE(year,dayofyear) Returns a date, given year and day-of-year values. dayofyear must be greater than 0 or the result is NULL. mysql> SELECT MAKEDATE(2011,31), MAKEDATE(2011,32); -> '2011-01-31', '2011-02-01' mysql> SELECT MAKEDATE(2011,365), MAKEDATE(2014,365); -> '2011-12-31', '2014-12-31' mysql> SELECT MAKEDATE(2011,0); -> NULL
• MAKETIME(hour,minute,second) Returns a time value calculated from the hour, minute, and second arguments. mysql> SELECT MAKETIME(12,15,30); -> '12:15:30'
• MICROSECOND(expr) Returns the microseconds from the time or datetime expression expr as a number in the range from 0 to 999999. mysql> SELECT MICROSECOND('12:00:00.123456'); -> 123456 mysql> SELECT MICROSECOND('2009-12-31 23:59:59.000010'); -> 10
• MINUTE(time) Returns the minute for time, in the range 0 to 59. mysql> SELECT MINUTE('2008-02-03 10:05:03'); -> 5
• MONTH(date) Returns the month for date, in the range 1 to 12 for January to December, or 0 for dates such as '0000-00-00' or '2008-00-00' that have a zero month part.
1200
Date and Time Functions
mysql> SELECT MONTH('2008-02-03'); -> 2
• MONTHNAME(date) Returns the full name of the month for date. The language used for the name is controlled by the value of the lc_time_names system variable (Section 10.7, “MySQL Server Locale Support”). mysql> SELECT MONTHNAME('2008-02-03'); -> 'February'
• NOW() Returns the current date and time as a value in 'YYYY-MM-DD HH:MM:SS' or YYYYMMDDHHMMSS.uuuuuu format, depending on whether the function is used in a string or numeric context. The value is expressed in the current time zone. mysql> SELECT NOW(); -> '2007-12-15 23:50:26' mysql> SELECT NOW() + 0; -> 20071215235026.000000
NOW() returns a constant time that indicates the time at which the statement began to execute. (Within a stored function or trigger, NOW() returns the time at which the function or triggering statement began to execute.) This differs from the behavior for SYSDATE(), which returns the exact time at which it executes. mysql> SELECT NOW(), SLEEP(2), NOW(); +---------------------+----------+---------------------+ | NOW() | SLEEP(2) | NOW() | +---------------------+----------+---------------------+ | 2006-04-12 13:47:36 | 0 | 2006-04-12 13:47:36 | +---------------------+----------+---------------------+ mysql> SELECT SYSDATE(), SLEEP(2), SYSDATE(); +---------------------+----------+---------------------+ | SYSDATE() | SLEEP(2) | SYSDATE() | +---------------------+----------+---------------------+ | 2006-04-12 13:47:44 | 0 | 2006-04-12 13:47:46 | +---------------------+----------+---------------------+
In addition, the SET TIMESTAMP statement affects the value returned by NOW() but not by SYSDATE(). This means that timestamp settings in the binary log have no effect on invocations of SYSDATE(). Setting the timestamp to a nonzero value causes each subsequent invocation of NOW() to return that value. Setting the timestamp to zero cancels this effect so that NOW() once again returns the current date and time. See the description for SYSDATE() for additional information about the differences between the two functions. • PERIOD_ADD(P,N) Adds N months to period P (in the format YYMM or YYYYMM). Returns a value in the format YYYYMM. Note that the period argument P is not a date value. mysql> SELECT PERIOD_ADD(200801,2); -> 200803
• PERIOD_DIFF(P1,P2)
1201
Date and Time Functions
Returns the number of months between periods P1 and P2. P1 and P2 should be in the format YYMM or YYYYMM. Note that the period arguments P1 and P2 are not date values. mysql> SELECT PERIOD_DIFF(200802,200703); -> 11
• QUARTER(date) Returns the quarter of the year for date, in the range 1 to 4. mysql> SELECT QUARTER('2008-04-01'); -> 2
• SECOND(time) Returns the second for time, in the range 0 to 59. mysql> SELECT SECOND('10:05:03'); -> 3
• SEC_TO_TIME(seconds) Returns the seconds argument, converted to hours, minutes, and seconds, as a TIME value. The range of the result is constrained to that of the TIME data type. A warning occurs if the argument corresponds to a value outside that range. mysql> SELECT SEC_TO_TIME(2378); -> '00:39:38' mysql> SELECT SEC_TO_TIME(2378) + 0; -> 3938
• STR_TO_DATE(str,format) This is the inverse of the DATE_FORMAT() function. It takes a string str and a format string format. STR_TO_DATE() returns a DATETIME value if the format string contains both date and time parts, or a DATE or TIME value if the string contains only date or time parts. If the date, time, or datetime value extracted from str is illegal, STR_TO_DATE() returns NULL and produces a warning. The server scans str attempting to match format to it. The format string can contain literal characters and format specifiers beginning with %. Literal characters in format must match literally in str. Format specifiers in format must match a date or time part in str. For the specifiers that can be used in format, see the DATE_FORMAT() function description. mysql> SELECT STR_TO_DATE('01,5,2013','%d,%m,%Y'); -> '2013-05-01' mysql> SELECT STR_TO_DATE('May 1, 2013','%M %d,%Y'); -> '2013-05-01'
Scanning starts at the beginning of str and fails if format is found not to match. Extra characters at the end of str are ignored. mysql> SELECT STR_TO_DATE('a09:30:17','a%h:%i:%s'); -> '09:30:17' mysql> SELECT STR_TO_DATE('a09:30:17','%h:%i:%s'); -> NULL mysql> SELECT STR_TO_DATE('09:30:17a','%h:%i:%s'); -> '09:30:17'
1202
Date and Time Functions
Unspecified date or time parts have a value of 0, so incompletely specified values in str produce a result with some or all parts set to 0: mysql> SELECT STR_TO_DATE('abc','abc'); -> '0000-00-00' mysql> SELECT STR_TO_DATE('9','%m'); -> '0000-09-00' mysql> SELECT STR_TO_DATE('9','%s'); -> '00:00:09'
Range checking on the parts of date values is as described in Section 11.3.1, “The DATE, DATETIME, and TIMESTAMP Types”. This means, for example, that “zero” dates or dates with part values of 0 are permitted unless the SQL mode is set to disallow such values. mysql> SELECT STR_TO_DATE('00/00/0000', '%m/%d/%Y'); -> '0000-00-00' mysql> SELECT STR_TO_DATE('04/31/2004', '%m/%d/%Y'); -> '2004-04-31'
If the NO_ZERO_DATE or NO_ZERO_IN_DATE SQL mode is enabled, zero dates or part of dates are disallowed. In that case, STR_TO_DATE() returns NULL and generates a warning: mysql> SET sql_mode = ''; mysql> SELECT STR_TO_DATE('15:35:00', '%H:%i:%s'); +-------------------------------------+ | STR_TO_DATE('15:35:00', '%H:%i:%s') | +-------------------------------------+ | 15:35:00 | +-------------------------------------+ mysql> SET sql_mode = 'NO_ZERO_IN_DATE'; mysql> SELECT STR_TO_DATE('15:35:00', '%h:%i:%s'); +-------------------------------------+ | STR_TO_DATE('15:35:00', '%h:%i:%s') | +-------------------------------------+ | NULL | +-------------------------------------+ mysql> SHOW WARNINGS\G *************************** 1. row *************************** Level: Warning Code: 1411 Message: Incorrect datetime value: '15:35:00' for function str_to_date
Note You cannot use format "%X%V" to convert a year-week string to a date because the combination of a year and week does not uniquely identify a year and month if the week crosses a month boundary. To convert a year-week to a date, you should also specify the weekday: mysql> SELECT STR_TO_DATE('200442 Monday', '%X%V %W'); -> '2004-10-18'
• SUBDATE(date,INTERVAL expr unit), SUBDATE(expr,days) When invoked with the INTERVAL form of the second argument, SUBDATE() is a synonym for DATE_SUB(). For information on the INTERVAL unit argument, see the discussion for DATE_ADD(). mysql> SELECT DATE_SUB('2008-01-02', INTERVAL 31 DAY); -> '2007-12-02' mysql> SELECT SUBDATE('2008-01-02', INTERVAL 31 DAY); -> '2007-12-02'
1203
Date and Time Functions
The second form enables the use of an integer value for days. In such cases, it is interpreted as the number of days to be subtracted from the date or datetime expression expr. mysql> SELECT SUBDATE('2008-01-02 12:00:00', 31); -> '2007-12-02 12:00:00'
• SUBTIME(expr1,expr2) SUBTIME() returns expr1 − expr2 expressed as a value in the same format as expr1. expr1 is a time or datetime expression, and expr2 is a time expression. mysql> SELECT SUBTIME('2007-12-31 23:59:59.999999','1 1:1:1.000002'); -> '2007-12-30 22:58:58.999997' mysql> SELECT SUBTIME('01:00:00.999999', '02:00:00.999998'); -> '-00:59:59.999999'
• SYSDATE() Returns the current date and time as a value in 'YYYY-MM-DD HH:MM:SS' or YYYYMMDDHHMMSS.uuuuuu format, depending on whether the function is used in a string or numeric context. SYSDATE() returns the time at which it executes. This differs from the behavior for NOW(), which returns a constant time that indicates the time at which the statement began to execute. (Within a stored function or trigger, NOW() returns the time at which the function or triggering statement began to execute.) mysql> SELECT NOW(), SLEEP(2), NOW(); +---------------------+----------+---------------------+ | NOW() | SLEEP(2) | NOW() | +---------------------+----------+---------------------+ | 2006-04-12 13:47:36 | 0 | 2006-04-12 13:47:36 | +---------------------+----------+---------------------+ mysql> SELECT SYSDATE(), SLEEP(2), SYSDATE(); +---------------------+----------+---------------------+ | SYSDATE() | SLEEP(2) | SYSDATE() | +---------------------+----------+---------------------+ | 2006-04-12 13:47:44 | 0 | 2006-04-12 13:47:46 | +---------------------+----------+---------------------+
In addition, the SET TIMESTAMP statement affects the value returned by NOW() but not by SYSDATE(). This means that timestamp settings in the binary log have no effect on invocations of SYSDATE(). Because SYSDATE() can return different values even within the same statement, and is not affected by SET TIMESTAMP, it is nondeterministic and therefore unsafe for replication if statement-based binary logging is used. If that is a problem, you can use row-based logging. Alternatively, you can use the --sysdate-is-now option to cause SYSDATE() to be an alias for NOW(). This works if the option is used on both the master and the slave. The nondeterministic nature of SYSDATE() also means that indexes cannot be used for evaluating expressions that refer to it. • TIME(expr) Extracts the time part of the time or datetime expression expr and returns it as a string. This function is unsafe for statement-based replication. Beginning with MySQL 5.5.1, a warning is logged if you use this function when binlog_format is set to STATEMENT. (Bug #47995) 1204
Date and Time Functions
mysql> SELECT TIME('2003-12-31 01:02:03'); -> '01:02:03' mysql> SELECT TIME('2003-12-31 01:02:03.000123'); -> '01:02:03.000123'
• TIMEDIFF(expr1,expr2) TIMEDIFF() returns expr1 − expr2 expressed as a time value. expr1 and expr2 are time or date-and-time expressions, but both must be of the same type. The result returned by TIMEDIFF() is limited to the range allowed for TIME values. Alternatively, you can use either of the functions TIMESTAMPDIFF() and UNIX_TIMESTAMP(), both of which return integers. mysql> SELECT TIMEDIFF('2000:01:01 -> '2000:01:01 -> '-00:00:00.000001' mysql> SELECT TIMEDIFF('2008-12-31 -> '2008-12-30 -> '46:58:57.999999'
00:00:00', 00:00:00.000001'); 23:59:59.000001', 01:01:01.000002');
• TIMESTAMP(expr), TIMESTAMP(expr1,expr2) With a single argument, this function returns the date or datetime expression expr as a datetime value. With two arguments, it adds the time expression expr2 to the date or datetime expression expr1 and returns the result as a datetime value. mysql> SELECT TIMESTAMP('2003-12-31'); -> '2003-12-31 00:00:00' mysql> SELECT TIMESTAMP('2003-12-31 12:00:00','12:00:00'); -> '2004-01-01 00:00:00'
• TIMESTAMPADD(unit,interval,datetime_expr) Adds the integer expression interval to the date or datetime expression datetime_expr. The unit for interval is given by the unit argument, which should be one of the following values: MICROSECOND (microseconds), SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, or YEAR. It is possible to use FRAC_SECOND in place of MICROSECOND, but FRAC_SECOND is deprecated. FRAC_SECOND was removed in MySQL 5.5.3. The unit value may be specified using one of keywords as shown, or with a prefix of SQL_TSI_. For example, DAY and SQL_TSI_DAY both are legal. mysql> SELECT TIMESTAMPADD(MINUTE,1,'2003-01-02'); -> '2003-01-02 00:01:00' mysql> SELECT TIMESTAMPADD(WEEK,1,'2003-01-02'); -> '2003-01-09'
• TIMESTAMPDIFF(unit,datetime_expr1,datetime_expr2) Returns datetime_expr2 − datetime_expr1, where datetime_expr1 and datetime_expr2 are date or datetime expressions. One expression may be a date and the other a datetime; a date value is treated as a datetime having the time part '00:00:00' where necessary. The unit for the result (an integer) is given by the unit argument. The legal values for unit are the same as those listed in the description of the TIMESTAMPADD() function. mysql> SELECT TIMESTAMPDIFF(MONTH,'2003-02-01','2003-05-01'); -> 3 mysql> SELECT TIMESTAMPDIFF(YEAR,'2002-05-01','2001-01-01');
1205
Date and Time Functions
-> -1 mysql> SELECT TIMESTAMPDIFF(MINUTE,'2003-02-01','2003-05-01 12:05:55'); -> 128885
Note The order of the date or datetime arguments for this function is the opposite of that used with the TIMESTAMP() function when invoked with 2 arguments. • TIME_FORMAT(time,format) This is used like the DATE_FORMAT() function, but the format string may contain format specifiers only for hours, minutes, seconds, and microseconds. Other specifiers produce a NULL value or 0. If the time value contains an hour part that is greater than 23, the %H and %k hour format specifiers produce a value larger than the usual range of 0..23. The other hour format specifiers produce the hour value modulo 12. mysql> SELECT TIME_FORMAT('100:00:00', '%H %k %h %I %l'); -> '100 100 04 04 4'
• TIME_TO_SEC(time) Returns the time argument, converted to seconds. mysql> SELECT TIME_TO_SEC('22:23:00'); -> 80580 mysql> SELECT TIME_TO_SEC('00:39:38'); -> 2378
• TO_DAYS(date) Given a date date, returns a day number (the number of days since year 0). mysql> SELECT TO_DAYS(950501); -> 728779 mysql> SELECT TO_DAYS('2007-10-07'); -> 733321
TO_DAYS() is not intended for use with values that precede the advent of the Gregorian calendar (1582), because it does not take into account the days that were lost when the calendar was changed. For dates before 1582 (and possibly a later year in other locales), results from this function are not reliable. See Section 12.8, “What Calendar Is Used By MySQL?”, for details. Remember that MySQL converts two-digit year values in dates to four-digit form using the rules in Section 11.3, “Date and Time Types”. For example, '2008-10-07' and '08-10-07' are seen as identical dates: mysql> SELECT TO_DAYS('2008-10-07'), TO_DAYS('08-10-07'); -> 733687, 733687
In MySQL, the zero date is defined as '0000-00-00', even though this date is itself considered invalid. This means that, for '0000-00-00' and '0000-01-01', TO_DAYS() returns the values shown here: mysql> SELECT TO_DAYS('0000-00-00'); +-----------------------+ | to_days('0000-00-00') | +-----------------------+ | NULL | +-----------------------+
1206
Date and Time Functions
1 row in set, 1 warning (0.00 sec) mysql> SHOW WARNINGS; +---------+------+----------------------------------------+ | Level | Code | Message | +---------+------+----------------------------------------+ | Warning | 1292 | Incorrect datetime value: '0000-00-00' | +---------+------+----------------------------------------+ 1 row in set (0.00 sec)
mysql> SELECT TO_DAYS('0000-01-01'); +-----------------------+ | to_days('0000-01-01') | +-----------------------+ | 1 | +-----------------------+ 1 row in set (0.00 sec)
This is true whether or not the ALLOW_INVALID_DATES SQL server mode is enabled. • TO_SECONDS(expr) Given a date or datetime expr, returns a the number of seconds since the year 0. If expr is not a valid date or datetime value, returns NULL. mysql> SELECT TO_SECONDS(950501); -> 62966505600 mysql> SELECT TO_SECONDS('2009-11-29'); -> 63426672000 mysql> SELECT TO_SECONDS('2009-11-29 13:43:32'); -> 63426721412 mysql> SELECT TO_SECONDS( NOW() ); -> 63426721458
Like TO_DAYS(), TO_SECONDS() is not intended for use with values that precede the advent of the Gregorian calendar (1582), because it does not take into account the days that were lost when the calendar was changed. For dates before 1582 (and possibly a later year in other locales), results from this function are not reliable. See Section 12.8, “What Calendar Is Used By MySQL?”, for details. Like TO_DAYS(), TO_SECONDS(), converts two-digit year values in dates to four-digit form using the rules in Section 11.3, “Date and Time Types”. TO_SECONDS() is available beginning with MySQL 5.5.0. In MySQL, the zero date is defined as '0000-00-00', even though this date is itself considered invalid. This means that, for '0000-00-00' and '0000-01-01', TO_SECONDS() returns the values shown here: mysql> SELECT TO_SECONDS('0000-00-00'); +--------------------------+ | TO_SECONDS('0000-00-00') | +--------------------------+ | NULL | +--------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> SHOW WARNINGS; +---------+------+----------------------------------------+ | Level | Code | Message | +---------+------+----------------------------------------+ | Warning | 1292 | Incorrect datetime value: '0000-00-00' | +---------+------+----------------------------------------+ 1 row in set (0.00 sec)
1207
Date and Time Functions
mysql> SELECT TO_SECONDS('0000-01-01'); +--------------------------+ | TO_SECONDS('0000-01-01') | +--------------------------+ | 86400 | +--------------------------+ 1 row in set (0.00 sec)
This is true whether or not the ALLOW_INVALID_DATES SQL server mode is enabled. • UNIX_TIMESTAMP(), UNIX_TIMESTAMP(date) If called with no argument, returns a Unix timestamp (seconds since '1970-01-01 00:00:00' UTC) as an unsigned integer. If UNIX_TIMESTAMP() is called with a date argument, it returns the value of the argument as seconds since '1970-01-01 00:00:00' UTC. The date argument may be a DATE, DATETIME, or TIMESTAMP string, or a number in YYMMDD, YYMMDDHHMMSS, YYYYMMDD, or YYYYMMDDHHMMSS format. The server interprets date as a value in the current time zone and converts it to an internal value in UTC. Clients can set their time zone as described in Section 10.6, “MySQL Server Time Zone Support”. mysql> SELECT UNIX_TIMESTAMP(); -> 1447431666 mysql> SELECT UNIX_TIMESTAMP('2015-11-13 10:20:19'); -> 1447431619
When UNIX_TIMESTAMP() is used on a TIMESTAMP column, the function returns the internal timestamp value directly, with no implicit “string-to-Unix-timestamp” conversion. If you pass an out-of-range date to UNIX_TIMESTAMP(), it returns 0. The valid range of values is the same as for the TIMESTAMP data type: '1970-01-01 00:00:01.000000' UTC to '2038-01-19 03:14:07.999999' UTC. Note: If you use UNIX_TIMESTAMP() and FROM_UNIXTIME() to convert between TIMESTAMP values and Unix timestamp values, the conversion is lossy because the mapping is not one-toone in both directions. For example, due to conventions for local time zone changes, it is possible for two UNIX_TIMESTAMP() to map two TIMESTAMP values to the same Unix timestamp value. FROM_UNIXTIME() will map that value back to only one of the original TIMESTAMP values. Here is an example, using TIMESTAMP values in the CET time zone:
mysql> SELECT UNIX_TIMESTAMP('2005-03-27 03:00:00'); +---------------------------------------+ | UNIX_TIMESTAMP('2005-03-27 03:00:00') | +---------------------------------------+ | 1111885200 | +---------------------------------------+ mysql> SELECT UNIX_TIMESTAMP('2005-03-27 02:00:00'); +---------------------------------------+ | UNIX_TIMESTAMP('2005-03-27 02:00:00') | +---------------------------------------+ | 1111885200 | +---------------------------------------+ mysql> SELECT FROM_UNIXTIME(1111885200); +---------------------------+ | FROM_UNIXTIME(1111885200) | +---------------------------+ | 2005-03-27 03:00:00 | +---------------------------+
If you want to subtract UNIX_TIMESTAMP() columns, you might want to cast the result to signed integers. See Section 12.10, “Cast Functions and Operators”. • UTC_DATE, UTC_DATE()
1208
Date and Time Functions
Returns the current UTC date as a value in 'YYYY-MM-DD' or YYYYMMDD format, depending on whether the function is used in a string or numeric context. mysql> SELECT UTC_DATE(), UTC_DATE() + 0; -> '2003-08-14', 20030814
• UTC_TIME, UTC_TIME() Returns the current UTC time as a value in 'HH:MM:SS' or HHMMSS.uuuuuu format, depending on whether the function is used in a string or numeric context. mysql> SELECT UTC_TIME(), UTC_TIME() + 0; -> '18:07:53', 180753.000000
• UTC_TIMESTAMP, UTC_TIMESTAMP() Returns the current UTC date and time as a value in 'YYYY-MM-DD HH:MM:SS' or YYYYMMDDHHMMSS.uuuuuu format, depending on whether the function is used in a string or numeric context. mysql> SELECT UTC_TIMESTAMP(), UTC_TIMESTAMP() + 0; -> '2003-08-14 18:08:04', 20030814180804.000000
• WEEK(date[,mode]) This function returns the week number for date. The two-argument form of WEEK() enables you to specify whether the week starts on Sunday or Monday and whether the return value should be in the range from 0 to 53 or from 1 to 53. If the mode argument is omitted, the value of the default_week_format system variable is used. See Section 5.1.5, “Server System Variables”. The following table describes how the mode argument works. Mode
First day of week Range
Week 1 is the first week …
0
Sunday
0-53
with a Sunday in this year
1
Monday
0-53
with 4 or more days this year
2
Sunday
1-53
with a Sunday in this year
3
Monday
1-53
with 4 or more days this year
4
Sunday
0-53
with 4 or more days this year
5
Monday
0-53
with a Monday in this year
6
Sunday
1-53
with 4 or more days this year
7
Monday
1-53
with a Monday in this year
For mode values with a meaning of “with 4 or more days this year,” weeks are numbered according to ISO 8601:1988: • If the week containing January 1 has 4 or more days in the new year, it is week 1. • Otherwise, it is the last week of the previous year, and the next week is week 1. mysql> SELECT -> 7 mysql> SELECT -> 7 mysql> SELECT -> 8 mysql> SELECT
WEEK('2008-02-20'); WEEK('2008-02-20',0); WEEK('2008-02-20',1); WEEK('2008-12-31',1);
1209
Date and Time Functions
-> 53
Note that if a date falls in the last week of the previous year, MySQL returns 0 if you do not use 2, 3, 6, or 7 as the optional mode argument: mysql> SELECT YEAR('2000-01-01'), WEEK('2000-01-01',0); -> 2000, 0
One might argue that WEEK() should return 52 because the given date actually occurs in the 52nd week of 1999. WEEK() returns 0 instead so that the return value is “the week number in the given year.” This makes use of the WEEK() function reliable when combined with other functions that extract a date part from a date. If you prefer a result evaluated with respect to the year that contains the first day of the week for the given date, use 0, 2, 5, or 7 as the optional mode argument. mysql> SELECT WEEK('2000-01-01',2); -> 52
Alternatively, use the YEARWEEK() function: mysql> SELECT YEARWEEK('2000-01-01'); -> 199952 mysql> SELECT MID(YEARWEEK('2000-01-01'),5,2); -> '52'
• WEEKDAY(date) Returns the weekday index for date (0 = Monday, 1 = Tuesday, … 6 = Sunday). mysql> SELECT WEEKDAY('2008-02-03 22:23:00'); -> 6 mysql> SELECT WEEKDAY('2007-11-06'); -> 1
• WEEKOFYEAR(date) Returns the calendar week of the date as a number in the range from 1 to 53. WEEKOFYEAR() is a compatibility function that is equivalent to WEEK(date,3). mysql> SELECT WEEKOFYEAR('2008-02-20'); -> 8
• YEAR(date) Returns the year for date, in the range 1000 to 9999, or 0 for the “zero” date. mysql> SELECT YEAR('1987-01-01'); -> 1987
• YEARWEEK(date), YEARWEEK(date,mode) Returns year and week for a date. The year in the result may be different from the year in the date argument for the first and the last week of the year. The mode argument works exactly like the mode argument to WEEK(). For the single-argument syntax, a mode value of 0 is used. Unlike WEEK(), the value of default_week_format does not influence YEARWEEK(). mysql> SELECT YEARWEEK('1987-01-01');
1210
What Calendar Is Used By MySQL?
-> 198652
Note that the week number is different from what the WEEK() function would return (0) for optional arguments 0 or 1, as WEEK() then returns the week in the context of the given year.
12.8 What Calendar Is Used By MySQL? MySQL uses what is known as a proleptic Gregorian calendar. Every country that has switched from the Julian to the Gregorian calendar has had to discard at least ten days during the switch. To see how this works, consider the month of October 1582, when the first Julian-to-Gregorian switch occurred. Monday
Tuesday
Wednesday Thursday
Friday
Saturday
Sunday
1
2
3
4
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
There are no dates between October 4 and October 15. This discontinuity is called the cutover. Any dates before the cutover are Julian, and any dates following the cutover are Gregorian. Dates during a cutover are nonexistent. A calendar applied to dates when it was not actually in use is called proleptic. Thus, if we assume there was never a cutover and Gregorian rules always rule, we have a proleptic Gregorian calendar. This is what is used by MySQL, as is required by standard SQL. For this reason, dates prior to the cutover stored as MySQL DATE or DATETIME values must be adjusted to compensate for the difference. It is important to realize that the cutover did not occur at the same time in all countries, and that the later it happened, the more days were lost. For example, in Great Britain, it took place in 1752, when Wednesday September 2 was followed by Thursday September 14. Russia remained on the Julian calendar until 1918, losing 13 days in the process, and what is popularly referred to as its “October Revolution” occurred in November according to the Gregorian calendar. For integer expressions, the preceding remarks about expression evaluation apply somewhat differently for expression assignment; for example, in a statement such as this: CREATE TABLE t SELECT integer_expr;
In this case, the table in the column resulting from the expression has type INT or BIGINT depending on the length of the integer expression. If the maximum length of the expression does not fit in an INT, BIGINT is used instead. The length is taken from the max_length value of the SELECT result set metadata (see Section 23.8.5, “C API Data Structures”). This means that you can force a BIGINT rather than INT by use of a sufficiently long expression: CREATE TABLE t SELECT 000000000000000000000;
12.9 Full-Text Search Functions MATCH (col1,col2,...) AGAINST (expr [search_modifier]) search_modifier: { IN NATURAL | IN NATURAL | IN BOOLEAN | WITH QUERY }
LANGUAGE MODE LANGUAGE MODE WITH QUERY EXPANSION MODE EXPANSION
MySQL has support for full-text indexing and searching: • A full-text index in MySQL is an index of type FULLTEXT.
1211
Natural Language Full-Text Searches
• Full-text indexes can be used only with MyISAM tables. (In MySQL 5.6 and up, they can also be used with InnoDB tables.) Full-text indexes can be created only for CHAR, VARCHAR, or TEXT columns. • A FULLTEXT index definition can be given in the CREATE TABLE statement when a table is created, or added later using ALTER TABLE or CREATE INDEX. • For large data sets, it is much faster to load your data into a table that has no FULLTEXT index and then create the index after that, than to load data into a table that has an existing FULLTEXT index. Full-text searching is performed using MATCH() ... AGAINST syntax. MATCH() takes a commaseparated list that names the columns to be searched. AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform. The search string must be a string value that is constant during query evaluation. This rules out, for example, a table column because that can differ for each row. There are three types of full-text searches: • A natural language search interprets the search string as a phrase in natural human language (a phrase in free text). There are no special operators, with the exception of double quote (") characters. The stopword list applies. In addition, words that are present in 50% or more of the rows are considered common and do not match. Full-text searches are natural language searches if the IN NATURAL LANGUAGE MODE modifier is given or if no modifier is given. For more information, see Section 12.9.1, “Natural Language FullText Searches”. • A boolean search interprets the search string using the rules of a special query language. The string contains the words to search for. It can also contain operators that specify requirements such that a word must be present or absent in matching rows, or that it should be weighted higher or lower than usual. Common words such as “some” or “then” are stopwords and do not match if present in the search string. The IN BOOLEAN MODE modifier specifies a boolean search. For more information, see Section 12.9.2, “Boolean Full-Text Searches”. • A query expansion search is a modification of a natural language search. The search string is used to perform a natural language search. Then words from the most relevant rows returned by the search are added to the search string and the search is done again. The query returns the rows from the second search. The IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION or WITH QUERY EXPANSION modifier specifies a query expansion search. For more information, see Section 12.9.3, “Full-Text Searches with Query Expansion”. Constraints on full-text searching are listed in Section 12.9.5, “Full-Text Restrictions”. The myisam_ftdump utility can be used to dump the contents of a full-text index. This may be helpful for debugging full-text queries. See Section 4.6.2, “myisam_ftdump — Display Full-Text Index information”.
12.9.1 Natural Language Full-Text Searches By default or with the IN NATURAL LANGUAGE MODE modifier, the MATCH() function performs a natural language search for a string against a text collection. A collection is a set of one or more columns included in a FULLTEXT index. The search string is given as the argument to AGAINST(). For each row in the table, MATCH() returns a relevance value; that is, a similarity measure between the search string and the text in that row in the columns named in the MATCH() list. mysql> CREATE TABLE articles ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, title VARCHAR(200), body TEXT, FULLTEXT (title,body) ) ENGINE=InnoDB; Query OK, 0 rows affected (0.08 sec)
1212
Natural Language Full-Text Searches
mysql> INSERT INTO articles (title,body) VALUES ('MySQL Tutorial','DBMS stands for DataBase ...'), ('How To Use MySQL Well','After you went through a ...'), ('Optimizing MySQL','In this tutorial we will show ...'), ('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'), ('MySQL vs. YourSQL','In the following database comparison ...'), ('MySQL Security','When configured properly, MySQL ...'); Query OK, 6 rows affected (0.01 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('database' IN NATURAL LANGUAGE MODE); +----+-------------------+------------------------------------------+ | id | title | body | +----+-------------------+------------------------------------------+ | 1 | MySQL Tutorial | DBMS stands for DataBase ... | | 5 | MySQL vs. YourSQL | In the following database comparison ... | +----+-------------------+------------------------------------------+ 2 rows in set (0.00 sec)
By default, the search is performed in case-insensitive fashion. However, you can perform a casesensitive full-text search by using a binary collation for the indexed columns. For example, a column that uses the latin1 character set of can be assigned a collation of latin1_bin to make it case sensitive for full-text searches. When MATCH() is used in a WHERE clause, as in the example shown earlier, the rows returned are automatically sorted with the highest relevance first. Relevance values are nonnegative floatingpoint numbers. Zero relevance means no similarity. Relevance is computed based on the number of words in the row (document), the number of unique words in the row, the total number of words in the collection, and the number of rows that contain a particular word. Note The term “document” may be used interchangeably with the term “row”, and both terms refer to the indexed part of the row. The term “collection” refers to the indexed columns and encompasses all rows. To simply count matches, you could use a query like this: mysql> SELECT COUNT(*) FROM articles -> WHERE MATCH (title,body) -> AGAINST ('database' IN NATURAL LANGUAGE MODE); +----------+ | COUNT(*) | +----------+ | 2 | +----------+ 1 row in set (0.00 sec)
However, you might find it quicker to rewrite the query as follows: mysql> SELECT -> COUNT(IF(MATCH (title,body) AGAINST ('database' IN NATURAL LANGUAGE MODE), 1, NULL)) -> AS count -> FROM articles; +-------+ | count | +-------+ | 2 | +-------+ 1 row in set (0.03 sec)
The first query sorts the results by relevance whereas the second does not. However, the second query performs a full table scan and the first does not. The first may be faster if the search matches few rows; otherwise, the second may be faster because it would read many rows anyway.
1213
Natural Language Full-Text Searches
For natural-language full-text searches, it is a requirement that the columns named in the MATCH() function be the same columns included in some FULLTEXT index in your table. For the preceding query, note that the columns named in the MATCH() function (title and body) are the same as those named in the definition of the article table's FULLTEXT index. If you wanted to search the title or body separately, you would need to create separate FULLTEXT indexes for each column. It is also possible to perform a boolean search or a search with query expansion. These search types are described in Section 12.9.2, “Boolean Full-Text Searches”, and Section 12.9.3, “Full-Text Searches with Query Expansion”. A full-text search that uses an index can name columns only from a single table in the MATCH() clause because an index cannot span multiple tables. A boolean search can be done in the absence of an index (albeit more slowly), in which case it is possible to name columns from multiple tables. The preceding example is a basic illustration that shows how to use the MATCH() function where rows are returned in order of decreasing relevance. The next example shows how to retrieve the relevance values explicitly. Returned rows are not ordered because the SELECT statement includes neither WHERE nor ORDER BY clauses: mysql> SELECT id, MATCH (title,body) -> AGAINST ('Tutorial' IN NATURAL LANGUAGE MODE) AS score -> FROM articles; +----+------------------+ | id | score | +----+------------------+ | 1 | 0.65545833110809 | | 2 | 0 | | 3 | 0.66266459226608 | | 4 | 0 | | 5 | 0 | | 6 | 0 | +----+------------------+ 6 rows in set (0.00 sec)
The following example is more complex. The query returns the relevance values and it also sorts the rows in order of decreasing relevance. To achieve this result, specify MATCH() twice: once in the SELECT list and once in the WHERE clause. This causes no additional overhead, because the MySQL optimizer notices that the two MATCH() calls are identical and invokes the full-text search code only once. mysql> SELECT id, body, MATCH (title,body) AGAINST -> ('Security implications of running MySQL as root' -> IN NATURAL LANGUAGE MODE) AS score -> FROM articles WHERE MATCH (title,body) AGAINST -> ('Security implications of running MySQL as root' -> IN NATURAL LANGUAGE MODE); +----+-------------------------------------+-----------------+ | id | body | score | +----+-------------------------------------+-----------------+ | 4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 | | 6 | When configured properly, MySQL ... | 1.3114095926285 | +----+-------------------------------------+-----------------+ 2 rows in set (0.00 sec)
A phrase that is enclosed within double quote (") characters matches only rows that contain the phrase literally, as it was typed. The full-text engine splits the phrase into words and performs a search in the FULLTEXT index for the words. Nonword characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase". If the phrase contains no words that are in the index, the result is empty. For example, if all words are either stopwords or shorter than the minimum length of indexed words, the result is empty. The MySQL FULLTEXT implementation regards any sequence of true word characters (letters, digits, and underscores) as a word. That sequence may also contain apostrophes ('), but not more than one
1214
Natural Language Full-Text Searches
in a row. This means that aaa'bbb is regarded as one word, but aaa''bbb is regarded as two words. Apostrophes at the beginning or the end of a word are stripped by the FULLTEXT parser; 'aaa'bbb' would be parsed as aaa'bbb. The FULLTEXT parser determines where words start and end by looking for certain delimiter characters; for example, (space), , (comma), and . (period). If words are not separated by delimiters (as in, for example, Chinese), the FULLTEXT parser cannot determine where a word begins or ends. To be able to add words or other indexed terms in such languages to a FULLTEXT index, you must preprocess them so that they are separated by some arbitrary delimiter. It is possible to write a plugin that replaces the built-in full-text parser. For details, see Section 24.2, “The MySQL Plugin API”. For example parser plugin source code, see the plugin/fulltext directory of a MySQL source distribution. Some words are ignored in full-text searches: • Any word that is too short is ignored. The default minimum length of words that are found by full-text searches is four characters. • Words in the stopword list are ignored. A stopword is a word such as “the” or “some” that is so common that it is considered to have zero semantic value. There is a built-in stopword list, but it can be overwritten by a user-defined list. The default stopword list is given in Section 12.9.4, “Full-Text Stopwords”. The default minimum word length and stopword list can be changed as described in Section 12.9.6, “Fine-Tuning MySQL Full-Text Search”. Every correct word in the collection and in the query is weighted according to its significance in the collection or query. Consequently, a word that is present in many documents has a lower weight (and may even have a zero weight), because it has lower semantic value in this particular collection. Conversely, if the word is rare, it receives a higher weight. The weights of the words are combined to compute the relevance of the row. Such a technique works best with large collections (in fact, it was carefully tuned this way). For very small tables, word distribution does not adequately reflect their semantic value, and this model may sometimes produce bizarre results. For example, although the word “MySQL” is present in every row of the articles table shown earlier, a search for the word produces no results: mysql> SELECT * FROM articles -> WHERE MATCH (title,body) -> AGAINST ('MySQL' IN NATURAL LANGUAGE MODE); Empty set (0.00 sec)
The search result is empty because the word “MySQL” is present in at least 50% of the rows. As such, it is effectively treated as a stopword. For large data sets, this is the most desirable behavior: A natural language query should not return every second row from a 1GB table. For small data sets, it may be less desirable. A word that matches half of the rows in a table is less likely to locate relevant documents. In fact, it most likely finds plenty of irrelevant documents. We all know this happens far too often when we are trying to find something on the Internet with a search engine. It is with this reasoning that rows containing the word are assigned a low semantic value for the particular data set in which they occur. A given word may reach the 50% threshold in one data set but not another. The 50% threshold has a significant implication when you first try full-text searching to see how it works: If you create a table and insert only one or two rows of text into it, every word in the text occurs in at least 50% of the rows. As a result, no search returns any results. Be sure to insert at least three rows, and preferably many more. Users who need to bypass the 50% limitation can use the boolean search mode; see Section 12.9.2, “Boolean Full-Text Searches”.
1215
Boolean Full-Text Searches
12.9.2 Boolean Full-Text Searches MySQL can perform boolean full-text searches using the IN BOOLEAN MODE modifier. With this modifier, certain characters have special meaning at the beginning or end of words in the search string. In the following query, the + and - operators indicate that a word is required to be present or absent, respectively, for a match to occur. Thus, the query retrieves all the rows that contain the word “MySQL” but that do not contain the word “YourSQL”: mysql> SELECT * FROM articles WHERE MATCH (title,body) -> AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE); +----+-----------------------+-------------------------------------+ | id | title | body | +----+-----------------------+-------------------------------------+ | 1 | MySQL Tutorial | DBMS stands for DataBase ... | | 2 | How To Use MySQL Well | After you went through a ... | | 3 | Optimizing MySQL | In this tutorial we will show ... | | 4 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. ... | | 6 | MySQL Security | When configured properly, MySQL ... | +----+-----------------------+-------------------------------------+
Note In implementing this feature, MySQL uses what is sometimes referred to as implied Boolean logic, in which • + stands for AND • - stands for NOT • [no operator] implies OR Boolean full-text searches have these characteristics: • They do not use the 50% threshold. • They do not automatically sort rows in order of decreasing relevance. You can see this from the preceding query result: The row with the highest relevance is the one that contains “MySQL” twice, but it is listed last, not first. • They can work even without a FULLTEXT index, although a search executed in this fashion would be quite slow. • The minimum and maximum word length full-text parameters apply. • The stopword list applies. The boolean full-text search capability supports the following operators: • + A leading plus sign indicates that this word must be present in each row that is returned. • A leading minus sign indicates that this word must not be present in any of the rows that are returned. Note: The - operator acts only to exclude rows that are otherwise matched by other search terms. Thus, a boolean-mode search that contains only terms preceded by - returns an empty result. It does not return “all rows except those containing any of the excluded terms.” • (no operator) 1216
Boolean Full-Text Searches
By default (when neither + nor - is specified) the word is optional, but the rows that contain it are rated higher. This mimics the behavior of MATCH() ... AGAINST() without the IN BOOLEAN MODE modifier. • > < These two operators are used to change a word's contribution to the relevance value that is assigned to a row. The > operator increases the contribution and the < operator decreases it. See the example following this list. • ( ) Parentheses group words into subexpressions. Parenthesized groups can be nested. • ~ A leading tilde acts as a negation operator, causing the word's contribution to the row's relevance to be negative. This is useful for marking “noise” words. A row containing such a word is rated lower than others, but is not excluded altogether, as it would be with the - operator. • * The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it should be appended to the word to be affected. Words match if they begin with the word preceding the * operator. If a word is specified with the truncation operator, it is not stripped from a boolean query, even if it is too short (as determined from the ft_min_word_len setting) or a stopword. This occurs because the word is not seen as too short or a stopword, but as a prefix that must be present in the document in the form of a word that begins with the prefix. Suppose that ft_min_word_len=4. Then a search for '+word +the*' will likely return fewer rows than a search for '+word +the': • The former query remains as is and requires both word and the* (a word starting with the) to be present in the document. • The latter query is transformed to +word (requiring only word to be present). the is both too short and a stopword, and either condition is enough to cause it to be ignored. • " A phrase that is enclosed within double quote (") characters matches only rows that contain the phrase literally, as it was typed. The full-text engine splits the phrase into words and performs a search in the FULLTEXT index for the words. Nonword characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase". If the phrase contains no words that are in the index, the result is empty. For example, if all words are either stopwords or shorter than the minimum length of indexed words, the result is empty. The following examples demonstrate some search strings that use boolean full-text operators: • 'apple banana' Find rows that contain at least one of the two words. • '+apple +juice' Find rows that contain both words. • '+apple macintosh' Find rows that contain the word “apple”, but rank rows higher if they also contain “macintosh”.
1217
Full-Text Searches with Query Expansion
• '+apple -macintosh' Find rows that contain the word “apple” but not “macintosh”. • '+apple ~macintosh' Find rows that contain the word “apple”, but if the row also contains the word “macintosh”, rate it lower than if row does not. This is “softer” than a search for '+apple -macintosh', for which the presence of “macintosh” causes the row not to be returned at all. • '+apple +(>turnover <strudel)' Find rows that contain the words “apple” and “turnover”, or “apple” and “strudel” (in any order), but rank “apple turnover” higher than “apple strudel”. • 'apple*' Find rows that contain words such as “apple”, “apples”, “applesauce”, or “applet”. • '"some words"' Find rows that contain the exact phrase “some words” (for example, rows that contain “some words of wisdom” but not “some noise words”). Note that the " characters that enclose the phrase are operator characters that delimit the phrase. They are not the quotation marks that enclose the search string itself.
12.9.3 Full-Text Searches with Query Expansion Full-text search supports query expansion (and in particular, its variant “blind query expansion”). This is generally useful when a search phrase is too short, which often means that the user is relying on implied knowledge that the full-text search engine lacks. For example, a user searching for “database” may really mean that “MySQL”, “Oracle”, “DB2”, and “RDBMS” all are phrases that should match “databases” and should be returned, too. This is implied knowledge. Blind query expansion (also known as automatic relevance feedback) is enabled by adding WITH QUERY EXPANSION or IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION following the search phrase. It works by performing the search twice, where the search phrase for the second search is the original search phrase concatenated with the few most highly relevant documents from the first search. Thus, if one of these documents contains the word “databases” and the word “MySQL”, the second search finds the documents that contain the word “MySQL” even if they do not contain the word “database”. The following example shows this difference: mysql> SELECT * FROM articles -> WHERE MATCH (title,body) -> AGAINST ('database' IN NATURAL LANGUAGE MODE); +----+-------------------+------------------------------------------+ | id | title | body | +----+-------------------+------------------------------------------+ | 5 | MySQL vs. YourSQL | In the following database comparison ... | | 1 | MySQL Tutorial | DBMS stands for DataBase ... | +----+-------------------+------------------------------------------+ 2 rows in set (0.00 sec) mysql> SELECT * FROM articles -> WHERE MATCH (title,body) -> AGAINST ('database' WITH QUERY EXPANSION); +----+-------------------+------------------------------------------+ | id | title | body | +----+-------------------+------------------------------------------+ | 1 | MySQL Tutorial | DBMS stands for DataBase ... | | 5 | MySQL vs. YourSQL | In the following database comparison ... | | 3 | Optimizing MySQL | In this tutorial we will show ... | +----+-------------------+------------------------------------------+ 3 rows in set (0.00 sec)
1218
Full-Text Stopwords
Another example could be searching for books by Georges Simenon about Maigret, when a user is not sure how to spell “Maigret”. A search for “Megre and the reluctant witnesses” finds only “Maigret and the Reluctant Witnesses” without query expansion. A search with query expansion finds all books with the word “Maigret” on the second pass. Note Because blind query expansion tends to increase noise significantly by returning nonrelevant documents, it is meaningful to use only when a search phrase is rather short.
12.9.4 Full-Text Stopwords The stopword list is loaded and searched for full-text queries using the server character set and collation (the values of the character_set_server and collation_server system variables). False hits or misses may occur for stopword lookups if the stopword file or columns used for full-text indexing or searches have a character set or collation different from character_set_server or collation_server. Case sensitivity of stopword lookups depends on the server collation. For example, lookups are case insensitive if the collation is latin1_swedish_ci, whereas lookups are case sensitive if the collation is latin1_general_cs or latin1_bin. As of MySQL 5.5.6, the stopword file is loaded and searched using latin1 if character_set_server is ucs2, utf16, or utf32. If any table was created with FULLTEXT indexes while the server character set was ucs2, utf16, or utf32, it should be repaired using this statement: REPAIR TABLE tbl_name QUICK;
The following table shows the default list of full-text stopwords. In a MySQL source distribution, you can find this list in the storage/myisam/ft_static.c file. a's
able
about
above
according
accordingly
across
actually
after
afterwards
again
against
ain't
all
allow
allows
almost
alone
along
already
also
although
always
am
among
amongst
an
and
another
any
anybody
anyhow
anyone
anything
anyway
anyways
anywhere
apart
appear
appreciate
appropriate
are
aren't
around
as
aside
ask
asking
associated
at
available
away
awfully
be
became
because
become
becomes
becoming
been
before
beforehand
behind
being
believe
below
beside
besides
best
better
between
beyond
both
brief
but
by
c'mon
c's
came
can
can't
cannot
cant
cause
causes
certain
certainly
changes
clearly
co
com
come
comes
concerning
consequently
1219
Full-Text Stopwords
consider
considering
contain
containing
contains
corresponding
could
couldn't
course
currently
definitely
described
despite
did
didn't
different
do
does
doesn't
doing
don't
done
down
downwards
during
each
edu
eg
eight
either
else
elsewhere
enough
entirely
especially
et
etc
even
ever
every
everybody
everyone
everything
everywhere
ex
exactly
example
except
far
few
fifth
first
five
followed
following
follows
for
former
formerly
forth
four
from
further
furthermore
get
gets
getting
given
gives
go
goes
going
gone
got
gotten
greetings
had
hadn't
happens
hardly
has
hasn't
have
haven't
having
he
he's
hello
help
hence
her
here
here's
hereafter
hereby
herein
hereupon
hers
herself
hi
him
himself
his
hither
hopefully
how
howbeit
however
i'd
i'll
i'm
i've
ie
if
ignored
immediate
in
inasmuch
inc
indeed
indicate
indicated
indicates
inner
insofar
instead
into
inward
is
isn't
it
it'd
it'll
it's
its
itself
just
keep
keeps
kept
know
known
knows
last
lately
later
latter
latterly
least
less
lest
let
let's
like
liked
likely
little
look
looking
looks
ltd
mainly
many
may
maybe
me
mean
meanwhile
merely
might
more
moreover
most
mostly
much
must
my
myself
name
namely
nd
near
nearly
necessary
need
needs
neither
never
nevertheless
new
next
nine
no
nobody
non
none
noone
nor
normally
not
nothing
novel
now
nowhere
obviously
of
off
often
oh
ok
1220
Full-Text Stopwords
okay
old
on
once
one
ones
only
onto
or
other
others
otherwise
ought
our
ours
ourselves
out
outside
over
overall
own
particular
particularly
per
perhaps
placed
please
plus
possible
presumably
probably
provides
que
quite
qv
rather
rd
re
really
reasonably
regarding
regardless
regards
relatively
respectively
right
said
same
saw
say
saying
says
second
secondly
see
seeing
seem
seemed
seeming
seems
seen
self
selves
sensible
sent
serious
seriously
seven
several
shall
she
should
shouldn't
since
six
so
some
somebody
somehow
someone
something
sometime
sometimes
somewhat
somewhere
soon
sorry
specified
specify
specifying
still
sub
such
sup
sure
t's
take
taken
tell
tends
th
than
thank
thanks
thanx
that
that's
thats
the
their
theirs
them
themselves
then
thence
there
there's
thereafter
thereby
therefore
therein
theres
thereupon
these
they
they'd
they'll
they're
they've
think
third
this
thorough
thoroughly
those
though
three
through
throughout
thru
thus
to
together
too
took
toward
towards
tried
tries
truly
try
trying
twice
two
un
under
unfortunately
unless
unlikely
until
unto
up
upon
us
use
used
useful
uses
using
usually
value
various
very
via
viz
vs
want
wants
was
wasn't
way
we
we'd
we'll
we're
we've
welcome
well
went
were
weren't
what
what's
whatever
when
whence
whenever
where
where's
whereafter
whereas
whereby
wherein
whereupon
wherever
whether
which
while
whither
who
1221
Full-Text Restrictions
who's
whoever
whole
whom
whose
why
will
willing
wish
with
within
without
won't
wonder
would
wouldn't
yes
yet
you
you'd
you'll
you're
you've
your
yours
yourself
yourselves
zero
12.9.5 Full-Text Restrictions • Full-text searches are supported for MyISAM tables only. (In MySQL 5.6 and up, they can also be used with InnoDB tables.) • Full-text searches are not supported for partitioned tables. See Section 19.5, “Restrictions and Limitations on Partitioning”. • Full-text searches can be used with most multibyte character sets. The exception is that for Unicode, the utf8 character set can be used, but not the ucs2 character set. However, although FULLTEXT indexes on ucs2 columns cannot be used, you can perform IN BOOLEAN MODE searches on a ucs2 column that has no such index. The remarks for utf8 also apply to utf8mb4, and the remarks for ucs2 also apply to utf16 and utf32. • Ideographic languages such as Chinese and Japanese do not have word delimiters. Therefore, the FULLTEXT parser cannot determine where words begin and end in these and other such languages. The implications of this and some workarounds for the problem are described in Section 12.9, “FullText Search Functions”. • Although the use of multiple character sets within a single table is supported, all columns in a FULLTEXT index must use the same character set and collation. • The MATCH() column list must match exactly the column list in some FULLTEXT index definition for the table, unless this MATCH() is IN BOOLEAN MODE. Boolean-mode searches can be done on nonindexed columns, although they are likely to be slow. • The argument to AGAINST() must be a string value that is constant during query evaluation. This rules out, for example, a table column because that can differ for each row. • Index hints are more limited for FULLTEXT searches than for non-FULLTEXT searches. See Section 8.9.3, “Index Hints”. • The '%' character is not a supported wildcard character for full-text searches.
12.9.6 Fine-Tuning MySQL Full-Text Search MySQL's full-text search capability has few user-tunable parameters. You can exert more control over full-text searching behavior if you have a MySQL source distribution because some changes require source code modifications. See Section 2.9, “Installing MySQL from Source”. Full-text search is carefully tuned for the most effectiveness. Modifying the default behavior in most cases can actually decrease effectiveness. Do not alter the MySQL sources unless you know what you are doing. Most full-text variables described in this section must be set at server startup time. A server restart is required to change them; they cannot be modified while the server is running. Some variable changes require that you rebuild the FULLTEXT indexes in your tables. Instructions for doing so are given later in this section.
1222
Fine-Tuning MySQL Full-Text Search
• The minimum and maximum lengths of words to be indexed are defined by the ft_min_word_len and ft_max_word_len system variables. (See Section 5.1.5, “Server System Variables”.) The default minimum value is four characters; the default maximum is version dependent. If you change either value, you must rebuild your FULLTEXT indexes. For example, if you want three-character words to be searchable, you can set the ft_min_word_len variable by putting the following lines in an option file: [mysqld] ft_min_word_len=3
Then restart the server and rebuild your FULLTEXT indexes. Note particularly the remarks regarding myisamchk in the instructions following this list. •
To override the default stopword list, set the ft_stopword_file system variable. (See Section 5.1.5, “Server System Variables”.) The variable value should be the path name of the file containing the stopword list, or the empty string to disable stopword filtering. The server looks for the file in the data directory unless an absolute path name is given to specify a different directory. After changing the value of this variable or the contents of the stopword file, restart the server and rebuild your FULLTEXT indexes. The stopword list is free-form. That is, you may use any nonalphanumeric character such as newline, space, or comma to separate stopwords. Exceptions are the underscore character (_) and a single apostrophe (') which are treated as part of a word. The character set of the stopword list is the server's default character set; see Section 10.1.3.2, “Server Character Set and Collation”.
• The 50% threshold for natural language searches is determined by the particular weighting scheme chosen. To disable it, look for the following line in storage/myisam/ftdefs.h: #define GWS_IN_USE GWS_PROB
Change that line to this: #define GWS_IN_USE GWS_FREQ
Then recompile MySQL. There is no need to rebuild the indexes in this case. Note By making this change, you severely decrease MySQL's ability to provide adequate relevance values for the MATCH() function. If you really need to search for such common words, it would be better to search using IN BOOLEAN MODE instead, which does not observe the 50% threshold. • To change the operators used for boolean full-text searches, set the ft_boolean_syntax system variable. This variable can be changed while the server is running, but you must have the SUPER privilege to do so. No rebuilding of indexes is necessary in this case. See Section 5.1.5, “Server System Variables”, which describes the rules governing how to set this variable. • If you want to change the set of characters that are considered word characters, you can do so in several ways, as described in the following list. After making the modification, you must rebuild the indexes for each table that contains any FULLTEXT indexes. Suppose that you want to treat the hyphen character ('-') as a word character. Use one of these methods: • Modify the MySQL source: In storage/myisam/ftdefs.h, see the true_word_char() and misc_word_char() macros. Add '-' to one of those macros and recompile MySQL. • Modify a character set file: This requires no recompilation. The true_word_char() macro uses a “character type” table to distinguish letters and numbers from other characters. . You can edit the contents of the <map> array in one of the character set XML files to specify that '-' is 1223
Adding a Collation for Full-Text Indexing
a “letter.” Then use the given character set for your FULLTEXT indexes. For information about the <map> array format, see Section 10.3.1, “Character Definition Arrays”. • Add a new collation for the character set used by the indexed columns, and alter the columns to use that collation. For general information about adding collations, see Section 10.4, “Adding a Collation to a Character Set”. For an example specific to full-text indexing, see Section 12.9.7, “Adding a Collation for Full-Text Indexing”. If you modify full-text variables that affect indexing (ft_min_word_len, ft_max_word_len, or ft_stopword_file), or if you change the stopword file itself, you must rebuild your FULLTEXT indexes after making the changes and restarting the server. To rebuild the indexes in this case, it is sufficient to do a QUICK repair operation: mysql> REPAIR TABLE tbl_name QUICK;
Alternatively, use ALTER TABLE with the DROP INDEX and ADD INDEX options to drop and re-create each FULLTEXT index. In some cases, this may be faster than a repair operation. Each table that contains any FULLTEXT index must be repaired as just shown. Otherwise, queries for the table may yield incorrect results, and modifications to the table will cause the server to see the table as corrupt and in need of repair. If you use myisamchk to perform an operation that modifies table indexes (such as repair or analyze), the FULLTEXT indexes are rebuilt using the default full-text parameter values for minimum word length, maximum word length, and stopword file unless you specify otherwise. This can result in queries failing. The problem occurs because these parameters are known only by the server. They are not stored in MyISAM index files. To avoid the problem if you have modified the minimum or maximum word length or stopword file values used by the server, specify the same ft_min_word_len, ft_max_word_len, and ft_stopword_file values for myisamchk that you use for mysqld. For example, if you have set the minimum word length to 3, you can repair a table with myisamchk like this: shell> myisamchk --recover --ft_min_word_len=3 tbl_name.MYI
To ensure that myisamchk and the server use the same values for full-text parameters, place each one in both the [mysqld] and [myisamchk] sections of an option file: [mysqld] ft_min_word_len=3 [myisamchk] ft_min_word_len=3
An alternative to using myisamchk for index modification is to use the REPAIR TABLE, ANALYZE TABLE, OPTIMIZE TABLE, or ALTER TABLE statements. These statements are performed by the server, which knows the proper full-text parameter values to use.
12.9.7 Adding a Collation for Full-Text Indexing This section describes how to add a new collation for full-text searches. The sample collation is like latin1_swedish_ci but treats the '-' character as a letter rather than as a punctuation character so that it can be indexed as a word character. General information about adding collations is given in Section 10.4, “Adding a Collation to a Character Set”; it is assumed that you have read it and are familiar with the files involved. To add a collation for full-text indexing, use the following procedure. The instructions here add a collation for a simple character set, which as discussed in Section 10.4, “Adding a Collation to a Character Set”, can be created using a configuration file that describes the character set properties.
1224
Adding a Collation for Full-Text Indexing
For a complex character set such as Unicode, create collations using C source files that describe the character set properties. 1. Add a collation to the Index.xml file. The collation ID must be unused, so choose a value different from 62 if that ID is already taken on your system. ...
2. Declare the sort order for the collation in the latin1.xml file. In this case, the order can be copied from latin1_swedish_ci: <map> 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 60 41 42 43 44 45 46 47 48 49 4A 4B 4C 50 51 52 53 54 55 56 57 58 59 5A 7B 7C 80 81 82 83 84 85 86 87 88 89 8A 8B 8C 90 91 92 93 94 95 96 97 98 99 9A 9B 9C A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC 41 41 41 41 5C 5B 5C 43 45 45 45 45 49 44 4E 4F 4F 4F 4F 5D D7 D8 55 55 55 59 41 41 41 41 5C 5B 5C 43 45 45 45 45 49 44 4E 4F 4F 4F 4F 5D F7 D8 55 55 55 59
0D 1D 2D 3D 4D 5D 4D 7D 8D 9D AD BD 49 59 49 59
0E 1E 2E 3E 4E 5E 4E 7E 8E 9E AE BE 49 DE 49 DE
0F 1F 2F 3F 4F 5F 4F 7F 8F 9F AF BF 49 DF 49 FF
3. Modify the ctype array in latin1.xml. Change the value corresponding to 0x2D (which is the code for the '-' character) from 10 (punctuation) to 01 (small letter). In the following array, this is the element in the fourth row down, third value from the end. <map> 00 20 20 20 20 20 20 48 10 10 84 84 84 10 81 81 01 01 01 10 82 82 02 02 02 10 00 10 00 10 10 48 10 10 10 10 10 01 01 01 01 01 01 02 02 02 02 02 02
20 20 10 84 81 01 82 02 02 10 10 10 01 01 02 02
20 20 10 84 81 01 82 02 10 10 10 10 01 01 02 02
20 20 10 84 81 01 82 02 10 10 10 10 01 01 02 02
20 20 10 84 81 01 82 02 10 10 10 10 01 01 02 02
20 20 10 84 01 01 02 02 10 10 10 10 01 10 02 10
20 20 10 84 01 01 02 02 10 10 10 10 01 01 02 02
28 20 10 84 01 01 02 02 10 10 10 10 01 01 02 02
28 20 10 10 01 01 02 02 01 02 10 10 01 01 02 02
28 20 10 10 01 10 02 10 10 10 10 10 01 01 02 02
28 20 10 10 01 10 02 10 01 02 10 10 01 01 02 02
28 20 01 10 01 10 02 10 00 00 10 10 01 01 02 02
20 20 10 10 01 10 02 10 01 02 10 10 01 01 02 02
20 20 10 10 01 10 02 20 00 01 10 10 01 02 02 02
4. Restart the server. 5. To employ the new collation, include it in the definition of columns that are to use it: mysql> DROP TABLE IF EXISTS t1;
1225
Cast Functions and Operators
Query OK, 0 rows affected (0.13 sec) mysql> CREATE TABLE t1 ( -> a TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci, -> FULLTEXT INDEX(a) -> ) ENGINE=MyISAM; Query OK, 0 rows affected (0.47 sec)
6. Test the collation to verify that hyphen is considered as a word character: mysql> INSERT INTO t1 VALUEs ('----'),('....'),('abcd'); Query OK, 3 rows affected (0.22 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM t1 WHERE MATCH a AGAINST ('----' IN BOOLEAN MODE); +------+ | a | +------+ | ---- | +------+ 1 row in set (0.00 sec)
12.10 Cast Functions and Operators Table 12.14 Cast Functions and Operators Name
Description
BINARY
Cast a string to a binary string
CAST()
Cast a value as a certain type
CONVERT()
Cast a value as a certain type
Cast functions and operators enable conversion of values from one data type to another. CONVERT() with a USING clause provides a way to convert data between different character sets: CONVERT(expr USING transcoding_name)
In MySQL, transcoding names are the same as the corresponding character set names. Examples: SELECT CONVERT(_latin1'Müller' USING utf8); INSERT INTO utf8_table (utf8_column) SELECT CONVERT(latin1_column USING utf8) FROM latin1_table;
You can also use CONVERT() without USING or CAST() to convert strings between different character sets: CONVERT(string, CHAR[(N)] CHARACTER SET charset_name) CAST(string AS CHAR[(N)] CHARACTER SET charset_name)
Examples: SELECT CONVERT('test', CHAR CHARACTER SET utf8); SELECT CAST('test' AS CHAR CHARACTER SET utf8);
If you specify CHARACTER SET charset_name as just shown, the resulting character set and collation are charset_name and the default collation of charset_name. If you omit CHARACTER SET charset_name, the resulting character set and collation are defined by the character_set_connection and collation_connection system variables that determine the default connection character set and collation (see Section 10.1.4, “Connection Character Sets and Collations”).
1226
Cast Functions and Operators
A COLLATE clause is not permitted within a CONVERT() or CAST() call, but you can apply it to the function result. For example, this is legal: SELECT CAST('test' AS CHAR CHARACTER SET utf8) COLLATE utf8_bin;
But this is illegal: SELECT CAST('test' AS CHAR CHARACTER SET utf8 COLLATE utf8_bin);
Normally, you cannot compare a BLOB value or other binary string in case-insensitive fashion because binary strings use the binary character set, which has no collation with the concept of lettercase. To perform a case-insensitive comparison, use the CONVERT() or CAST() function to convert the value to a nonbinary string. Comparisons of the resulting string use its collation. For example, if the conversion result character set has a case-insensitive collation, a LIKE operation is not case sensitive: SELECT 'A' LIKE CONVERT(blob_col USING latin1) FROM tbl_name;
To use a different character set, substitute its name for latin1 in the preceding statement. To specify a particular collation for the converted string, use a COLLATE clause following the CONVERT() call: SELECT 'A' LIKE CONVERT(blob_col USING latin1) COLLATE latin1_german1_ci FROM tbl_name;
CONVERT() and CAST() can be used more generally for comparing strings that are represented in different character sets. For example, a comparison of these strings results in an error because they have different character sets: mysql> SET @s1 = _latin1 'abc', @s2 = _latin2 'abc'; mysql> SELECT @s1 = @s2; ERROR 1267 (HY000): Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (latin2_general_ci,IMPLICIT) for operation '='
Converting one of the strings to a character set compatible with the other enables the comparison to occur without error: mysql> SELECT @s1 = CONVERT(@s2 USING latin1); +---------------------------------+ | @s1 = CONVERT(@s2 USING latin1) | +---------------------------------+ | 1 | +---------------------------------+
For string literals, another way to specify the character set is to use a character set introducer (_latin1 and _latin2 in the preceding example are instances of introducers). Unlike conversion functions such as CAST(), or CONVERT(), which convert a string from one character set to another, an introducer designates a string literal as having a particular character set, with no conversion involved. For more information, see Section 10.1.3.8, “Character Set Introducers”. Character set conversion is also useful preceding lettercase conversion of binary strings. LOWER() and UPPER() are ineffective when applied directly to binary strings because the concept of lettercase does not apply. To perform lettercase conversion of a binary string, first convert it to a nonbinary string: mysql> SET @str = BINARY 'New York'; mysql> SELECT LOWER(@str), LOWER(CONVERT(@str USING latin1)); +-------------+-----------------------------------+ | LOWER(@str) | LOWER(CONVERT(@str USING latin1)) | +-------------+-----------------------------------+ | New York | new york | +-------------+-----------------------------------+
1227
Cast Functions and Operators
If you convert an indexed column using BINARY, CAST(), or CONVERT(), MySQL may not be able to use the index efficiently. The cast functions are useful for creating a column with a specific type in a CREATE TABLE ... SELECT statement:
mysql> CREATE TABLE new_table SELECT CAST('2000-01-01' AS DATE) AS c1; mysql> SHOW CREATE TABLE new_table\G *************************** 1. row *************************** Table: new_table Create Table: CREATE TABLE `new_table` ( `c1` date DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1
The cast functions are useful for sorting ENUM columns in lexical order. Normally, sorting of ENUM columns occurs using the internal numeric values. Casting the values to CHAR results in a lexical sort: SELECT enum_col FROM tbl_name ORDER BY CAST(enum_col AS CHAR);
CAST() also changes the result if you use it as part of a more complex expression such as CONCAT('Date: ',CAST(NOW() AS DATE)). For temporal values, there is little need to use CAST() to extract data in different formats. Instead, use a function such as EXTRACT(), DATE_FORMAT(), or TIME_FORMAT(). See Section 12.7, “Date and Time Functions”. To cast a string to a number, you normally need do nothing other than use the string value in numeric context: mysql> SELECT 1+'1'; -> 2
That is also true for hexadecimal and bit literals, which are binary strings by default: mysql> SELECT X'41', X'41'+0; -> 'A', 65 mysql> SELECT b'1100001', b'1100001'+0; -> 'a', 97
A string used in an arithmetic operation is converted to a floating-point number during expression evaluation. A number used in string context is converted to a string: mysql> SELECT CONCAT('hello you ',2); -> 'hello you 2'
For information about implicit conversion of numbers to strings, see Section 12.2, “Type Conversion in Expression Evaluation”. When using an explicit CAST() on a TIMESTAMP value in a statement that does not select from any tables, the value is treated by MySQL as a string prior to performing any conversion. This results in the value being truncated when casting to a numeric type, as shown here: mysql> SELECT CAST(TIMESTAMP '2014-09-08 18:07:54' AS SIGNED); +-------------------------------------------------+ | CAST(TIMESTAMP '2014-09-08 18:07:54' AS SIGNED) | +-------------------------------------------------+ | 2014 | +-------------------------------------------------+ 1 row in set, 1 warning (0.00 sec)
1228
Cast Functions and Operators
mysql> SHOW WARNINGS; +---------+------+----------------------------------------------------------+ | Level | Code | Message | +---------+------+----------------------------------------------------------+ | Warning | 1292 | Truncated incorrect INTEGER value: '2014-09-08 18:07:54' | +---------+------+----------------------------------------------------------+ 1 row in set (0.00 sec)
This does not apply when selecting rows from a table, as shown here: mysql> USE test; Database changed mysql> CREATE TABLE c_test (col TIMESTAMP); Query OK, 0 rows affected (0.07 sec) mysql> INSERT INTO c_test VALUES ('2014-09-08 18:07:54'); Query OK, 1 row affected (0.05 sec) mysql> SELECT col, CAST(col AS UNSIGNED) AS c_col FROM c_test; +---------------------+----------------+ | col | c_col | +---------------------+----------------+ | 2014-09-08 18:07:54 | 20140908180754 | +---------------------+----------------+ 1 row in set (0.00 sec)
This is a known issue which is resolved in MySQL 5.6. MySQL supports arithmetic with both signed and unsigned 64-bit values. For numeric operators (such as + or -) where one of the operands is an unsigned integer, the result is unsigned by default (see Section 12.6.1, “Arithmetic Operators”). To override this, use the SIGNED or UNSIGNED cast operator to cast a value to a signed or unsigned 64-bit integer, respectively. mysql> SELECT 1 - 2; -> -1 mysql> SELECT CAST(1 - 2 AS UNSIGNED); -> 18446744073709551615 mysql> SELECT CAST(CAST(1 - 2 AS UNSIGNED) AS SIGNED); -> -1
If either operand is a floating-point value, the result is a floating-point value and is not affected by the preceding rule. (In this context, DECIMAL column values are regarded as floating-point values.) mysql> SELECT CAST(1 AS UNSIGNED) - 2.0; -> -1.0
The SQL mode affects the result of conversion operations (see Section 5.1.8, “Server SQL Modes”). Examples: • For conversion of a “zero” date string to a date, CONVERT() and CAST() return NULL and produce a warning when the NO_ZERO_DATE SQL mode is enabled. • For integer subtraction, if the NO_UNSIGNED_SUBTRACTION SQL mode is enabled, the subtraction result is signed even if any operand is unsigned. The following list describes the available cast functions and operators: • BINARY expr The BINARY operator converts the expression to a binary string. A common use for BINARY is to force a character string comparison to be done byte by byte rather than character by character, in effect becoming case sensitive. The BINARY operator also causes trailing spaces in comparisons to be significant.
1229
Cast Functions and Operators
mysql> SELECT -> 1 mysql> SELECT -> 0 mysql> SELECT -> 1 mysql> SELECT -> 0
'a' = 'A'; BINARY 'a' = 'A'; 'a' = 'a '; BINARY 'a' = 'a ';
In a comparison, BINARY affects the entire operation; it can be given before either operand with the same result. For purposes of converting a string expression to a binary string, these constructs are equivalent: BINARY expr CAST(expr AS BINARY) CONVERT(expr USING BINARY)
If a value is a string literal, it can be designated as a binary string without performing any conversion by using the _binary character set introducer: mysql> SELECT 'a' = 'A'; -> 1 mysql> SELECT _binary 'a' = 'A'; -> 0
For information about introducers, see Section 10.1.3.8, “Character Set Introducers”. The BINARY operator in expressions differs in effect from the BINARY attribute in character column definitions. A character column defined with the BINARY attribute is assigned table default character set and the binary (_bin) collation of that character set. Every nonbinary character set has a _bin collation. For example, the binary collation for the utf8 character set is utf8_bin, so if the table default character set is utf8, these two column definitions are equivalent: CHAR(10) BINARY CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin
The use of CHARACTER SET binary in the definition of a CHAR, VARCHAR, or TEXT column causes the column to be treated as the corresponding binary string data type. For example, the following pairs of definitions are equivalent: CHAR(10) CHARACTER SET binary BINARY(10) VARCHAR(10) CHARACTER SET binary VARBINARY(10) TEXT CHARACTER SET binary BLOB
• CAST(expr AS type) The CAST() function takes an expression of any type and produces a result value of the specified type, similar to CONVERT(). For more information, see the description of CONVERT(). CAST() is standard SQL syntax. • CONVERT(expr,type), CONVERT(expr USING transcoding_name) The CONVERT() function takes an expression of any type and produces a result value of the specified type.
1230
Cast Functions and Operators
Discussion of CONVERT(expr, type) syntax here also applies to CAST(expr AS type), which is equivalent. CONVERT(... USING ...) is standard SQL syntax. The non-USING form of CONVERT() is ODBC syntax. CONVERT() with USING converts data between different character sets. In MySQL, transcoding names are the same as the corresponding character set names. For example, this statement converts the string 'abc' in the default character set to the corresponding string in the utf8 character set: SELECT CONVERT('abc' USING utf8);
CONVERT() without USING and CAST() take an expression and a type value specifying the result type. These type values are permitted: • BINARY[(N)] Produces a string with the BINARY data type. See Section 11.4.2, “The BINARY and VARBINARY Types” for a description of how this affects comparisons. If the optional length N is given, BINARY(N) causes the cast to use no more than N bytes of the argument. Values shorter than N bytes are padded with 0x00 bytes to a length of N. • CHAR[(N)] [charset_info] Produces a string with the CHAR data type. If the optional length N is given, CHAR(N) causes the cast to use no more than N characters of the argument. No padding occurs for values shorter than N characters. With no charset_info clause, CHAR produces a string with the default character set. To specify the character set explicitly, these charset_info values are permitted: • CHARACTER SET charset_name: Produces a string with the given character set. • ASCII: Shorthand for CHARACTER SET latin1. • UNICODE: Shorthand for CHARACTER SET ucs2. In all cases, the string has the default collation for the character set. • DATE Produces a DATE value. • DATETIME Produces a DATETIME value. • DECIMAL[(M[,D])] Produces a DECIMAL value. If the optional M and D values are given, they specify the maximum number of digits (the precision) and the number of digits following the decimal point (the scale). • NCHAR[(N)] Like CHAR, but produces a string with the national character set. See Section 10.1.3.7, “The National Character Set”. Unlike CHAR, NCHAR does not permit trailing character set information to be specified.
1231
XML Functions
• SIGNED [INTEGER] Produces a signed integer value. • TIME Produces a TIME value. • UNSIGNED [INTEGER] Produces an unsigned integer value.
12.11 XML Functions Table 12.15 XML Functions Name
Description
ExtractValue()
Extracts a value from an XML string using XPath notation
UpdateXML()
Return replaced XML fragment
This section discusses XML and related functionality in MySQL. Note It is possible to obtain XML-formatted output from MySQL in the mysql and mysqldump clients by invoking them with the --xml option. See Section 4.5.1, “mysql — The MySQL Command-Line Tool”, and Section 4.5.4, “mysqldump — A Database Backup Program”. Two functions providing basic XPath 1.0 (XML Path Language, version 1.0) capabilities are available. Some basic information about XPath syntax and usage is provided later in this section; however, an in-depth discussion of these topics is beyond the scope of this manual, and you should refer to the XML Path Language (XPath) 1.0 standard for definitive information. A useful resource for those new to XPath or who desire a refresher in the basics is the Zvon.org XPath Tutorial, which is available in several languages. Note These functions remain under development. We continue to improve these and other aspects of XML and XPath functionality in MySQL 5.5 and onwards. You may discuss these, ask questions about them, and obtain help from other users with them in the MySQL XML User Forum. XPath expressions used with these functions support user variables and local stored program variables. User variables are weakly checked; variables local to stored programs are strongly checked (see also Bug #26518): • User variables (weak checking). Variables using the syntax $@variable_name (that is, user variables) are not checked. No warnings or errors are issued by the server if a variable has the wrong type or has previously not been assigned a value. This also means the user is fully responsible for any typographical errors, since no warnings will be given if (for example) $@myvariable is used where $@myvariable was intended. Example: mysql> SET @xml = 'XY'; Query OK, 0 rows affected (0.00 sec) mysql> SET @i =1, @j = 2; Query OK, 0 rows affected (0.00 sec)
1232
XML Functions
mysql> SELECT @i, ExtractValue(@xml, '//b[$@i]'); +------+--------------------------------+ | @i | ExtractValue(@xml, '//b[$@i]') | +------+--------------------------------+ | 1 | X | +------+--------------------------------+ 1 row in set (0.00 sec) mysql> SELECT @j, ExtractValue(@xml, '//b[$@j]'); +------+--------------------------------+ | @j | ExtractValue(@xml, '//b[$@j]') | +------+--------------------------------+ | 2 | Y | +------+--------------------------------+ 1 row in set (0.00 sec) mysql> SELECT @k, ExtractValue(@xml, '//b[$@k]'); +------+--------------------------------+ | @k | ExtractValue(@xml, '//b[$@k]') | +------+--------------------------------+ | NULL | | +------+--------------------------------+ 1 row in set (0.00 sec)
• Variables in stored programs (strong checking). Variables using the syntax $variable_name can be declared and used with these functions when they are called inside stored programs. Such variables are local to the stored program in which they are defined, and are strongly checked for type and value. Example: mysql> DELIMITER | mysql> CREATE PROCEDURE myproc () -> BEGIN -> DECLARE i INT DEFAULT 1; -> DECLARE xml VARCHAR(25) DEFAULT 'XYZ'; -> -> WHILE i < 4 DO -> SELECT xml, i, ExtractValue(xml, '//a[$i]'); -> SET i = i+1; -> END WHILE; -> END | Query OK, 0 rows affected (0.01 sec) mysql> DELIMITER ; mysql> CALL myproc(); +--------------------------+---+------------------------------+ | xml | i | ExtractValue(xml, '//a[$i]') | +--------------------------+---+------------------------------+ | XYZ | 1 | X | +--------------------------+---+------------------------------+ 1 row in set (0.00 sec) +--------------------------+---+------------------------------+ | xml | i | ExtractValue(xml, '//a[$i]') | +--------------------------+---+------------------------------+ | XYZ | 2 | Y | +--------------------------+---+------------------------------+ 1 row in set (0.01 sec) +--------------------------+---+------------------------------+ | xml | i | ExtractValue(xml, '//a[$i]') | +--------------------------+---+------------------------------+ | XYZ | 3 | Z | +--------------------------+---+------------------------------+ 1 row in set (0.01 sec)
1233
XML Functions
Parameters. Variables used in XPath expressions inside stored routines that are passed in as parameters are also subject to strong checking. Expressions containing user variables or variables local to stored programs must otherwise (except for notation) conform to the rules for XPath expressions containing variables as given in the XPath 1.0 specification. Note A user variable used to store an XPath expression is treated as an empty string. Because of this, it is not possible to store an XPath expression as a user variable. (Bug #32911) • ExtractValue(xml_frag, xpath_expr) ExtractValue() takes two string arguments, a fragment of XML markup xml_frag and an XPath expression xpath_expr (also known as a locator); it returns the text (CDATA) of the first text node which is a child of the element or elements matched by the XPath expression. In MySQL 5.5, the XPath expression can contain at most 127 characters. (This limitation is lifted in MySQL 5.6.) Using this function is the equivalent of performing a match using the xpath_expr after appending /text(). In other words, ExtractValue('Sakila', '/a/b') and ExtractValue('Sakila', '/a/b/text()') produce the same result. If multiple matches are found, the content of the first child text node of each matching element is returned (in the order matched) as a single, space-delimited string. If no matching text node is found for the expression (including the implicit /text())—for whatever reason, as long as xpath_expr is valid, and xml_frag consists of elements which are properly nested and closed—an empty string is returned. No distinction is made between a match on an empty element and no match at all. This is by design. If you need to determine whether no matching element was found in xml_frag or such an element was found but contained no child text nodes, you should test the result of an expression that uses the XPath count() function. For example, both of these statements return an empty string, as shown here: mysql> SELECT ExtractValue('', '/a/b'); +-------------------------------------+ | ExtractValue('', '/a/b') | +-------------------------------------+ | | +-------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT ExtractValue('', '/a/b'); +-------------------------------------+ | ExtractValue('', '/a/b') | +-------------------------------------+ | | +-------------------------------------+ 1 row in set (0.00 sec)
However, you can determine whether there was actually a matching element using the following: mysql> SELECT ExtractValue('', 'count(/a/b)'); +-------------------------------------+ | ExtractValue('', 'count(/a/b)') | +-------------------------------------+ | 1 | +-------------------------------------+ 1 row in set (0.00 sec)
1234
XML Functions
mysql> SELECT ExtractValue('', 'count(/a/b)'); +-------------------------------------+ | ExtractValue('', 'count(/a/b)') | +-------------------------------------+ | 0 | +-------------------------------------+ 1 row in set (0.01 sec)
Important ExtractValue() returns only CDATA, and does not return any tags that might be contained within a matching tag, nor any of their content (see the result returned as val1 in the following example).
mysql> SELECT -> ExtractValue('cccddd', '/a') AS val1, -> ExtractValue('cccddd', '/a/b') AS val2, -> ExtractValue('cccddd', '//b') AS val3, -> ExtractValue('cccddd', '/b') AS val4, -> ExtractValue('cccdddeee', '//b') AS val5; +------+------+------+------+---------+ | val1 | val2 | val3 | val4 | val5 | +------+------+------+------+---------+ | ccc | ddd | ddd | | ddd eee | +------+------+------+------+---------+
This function uses the current SQL collation for making comparisons with contains(), performing the same collation aggregation as other string functions (such as CONCAT()), in taking into account the collation coercibility of their arguments; see Section 10.1.8.4, “Collation Coercibility in Expressions”, for an explanation of the rules governing this behavior. (Previously, binary—that is, case-sensitive—comparison was always used.) NULL is returned if xml_frag contains elements which are not properly nested or closed, and a warning is generated, as shown in this example: mysql> SELECT ExtractValue('cc