Generalized Inverted Index An Inverted Index Is An Index

  • Uploaded by: Nikolay Samokhvalov
  • 0
  • 0
  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Generalized Inverted Index An Inverted Index Is An Index as PDF for free.

More details

  • Words: 739
  • Pages: 14
Generalized Inverted Index An inverted index is an index structure storing a set of (key, posting list) pairs, where 'posting list' is a set of documents in which the key occurs. ●

Generalized means that the index does not know which operation it accelerates. It works with custom strategies, defined for specific data types. GIN is similar to GiST and differs from B-Tree indices, which have predefined, comparison-based operations. ●

Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006

GIN Structure

Entry page, level N: keywords abc

bar

foo

Entry page, level 0 (leaf) aaa Pointer to posting tree: B-Tree over ItemPointer to heap

abc Posting list: sorted array of ItemPointer to heap

Entry page, level 0 baa

Posting page, level N: ItemPointer 14:17

218:1

Posting page, level 0 (leaf) 1:33

2:7

14:17

bar

Right link

1021:6

Posting page, level 0 (leaf) Right bound 14:17

Oleg Bartunov, Teodor Sigaev

123:1

158:18

Right bound 218:1

PostgreSQL Summit, Toronto, July 8-9, 2006

GIN features



Concurrency –



WAL –



Lehman and Yao's high-concurrency B-tree management algorithm Recovery

User-defined opclasses –

The scheme is similar to GiST

Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006

GIN Interface

Four interface functions (pseudocode): ●

● ●



Datum* extractValue(Datum inputValue, uint32* nentries) int compareEntry(Datum a, Datum b) Datum* extractQuery(Datum query, uint32* nentries, StrategyNumber n) bool consistent(bool check[], StrategyNumber n, Datum query)

Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006

GIN Interface: extractValue

Datum* extractValue(Datum inputValue, uint32* nentries) Returns an array of Datum of entries of the value to be indexed. nentries should contain the number of returned entries. Tsearch2 example: inputValue is tsvector, output is array of text type, containing lexemes. Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006

GIN Interface: compareEntry

int compareEntry(Datum a, Datum b) Compares two entries (not the indexing values), returns <0, 0, >0 Tsearch2 example: built-in bttextcmp(), used for built-in B-Tree index over texts.

Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006

GIN Interface: extractQuery Datum* extractQuery(Datum query, uint32* nentries, StrategyNumber n) Returns an array of Datum of entries of the query to be executed. n is the strategy number of the operation. Depending on n, query can be different type. Tsearch2 example: query is tsquery, output is array of text type, containing lexemes. Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006

GIN Interface: consistent bool consistent(bool check[], StrategyNumber n, Datum query) Each element of the check array is true if the indexed value has a corresponding entry in the query: if (check[i] = TRUE) then the i-th entry of the query is present in the indexed value. The function should return true if the indexed value matches by StrategyNumber and the query. Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006

GIN Interface: consistent

Tsquery: fat & ( cat | dog ) Posting lists (logically) of ItemPointers 1:7 1:15

1:15 15:7

33:11

bool check[]

Consistent function

T,F,F

T&(F|F) = F

T,F,T

T&(F|T) = T

F,T,F

F&(T|F) = F

33:11

33:11

T,T,T

T&(T|T) = T

34:1

34:1

F,T,T

F&(T|T) = F

Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006

GIN: create index flow Tuple to index Value (tsvector): cat:2 fat:1 ItemPointer: 17:9

extractValue()

maintenance_work_mem cache: Sorted array of entries

Arrays of ItemPointers

cat

12:3, 14:5, 15:1, 17:9

foo

1:1, 2:5, 15:3

fat

2:3, 17:9

Oleg Bartunov, Teodor Sigaev

HDD

PostgreSQL Summit, Toronto, July 8-9, 2006

Gin opclasses



Built-in support for any one-dimensional array 

&& - overlap



@ - contains



~ - contained



Tsearch2



Intarray – enhanced support for int4[]

Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006

GIN tips





GUC variable: gin_fuzzy_search_limit - soft upper limit on the returned results for very frequent words Create is much faster than inserts

Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006

GIN limitations

● ●



No support for multicolumn indices GIN doesn't uses scan->kill_prior_tuple & scan->ignore_killed_tuples GIN searches entries only by equality matching



GIN doesn't support full scans of index



GIN doesn't index NULL values

Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006

???





Two kinds of NULL 

(NULL = NULL) is NULL



('{NULL}'::int[]='{NULL}') is TRUE

Multidimensional arrays: &&, @, ~ ? 



'{{1,2},{3,4}}' @ '{2,3}' - ?

Recent fillfactor patch – nested B-Tree

Oleg Bartunov, Teodor Sigaev

PostgreSQL Summit, Toronto, July 8-9, 2006

Related Documents

Index
November 2019 41
Index
October 2019 59
Index
October 2019 45
Index
December 2019 39

More Documents from ""

October 2019 15
October 2019 27
October 2019 22