University of Hawassa Faculty of Agriculture Department of Animal & Range Sciences NaRM 326: Remote Sensing (RS) & Geographic Information System (GIS) UNIT 2: COMPUTER REPRESENTATIONS OF GEOGRAPHIC INFORMATION:
(RASTER DATA STRUCTURE / RASTER REPRESENTATIONS)
This text note / or lecture will give you an introduction about the raster data structure, which is particularly useful for handling Continuous geographic fields
/ or continuous (data) surfaces, but often used also for other types of data. In a raster database, the data is stored in cells in a matrix and this is a very important difference from the vector data structure.
In this example, a piece of land contains 3 classes (objects): lake, town and forest. To convert this landscape to a raster data structure a grid (matrix) is overlaid over the landscape and the classes are given a unique code (identifier), in this case lake=1, town=2 and forest=3. Each cell in the matrix represents a certain area in the real world, depending on the size of the cell.
A raster database is made of columns and rows. The rows are numbered starting from the upper-left corner of the database, unlike in a vector database where the origin of the coordinate system starts at the lower-left corner. Each cell is identified with an index corresponding to the column number and the row
number. For example, the index “(7, 5)” corresponds to the cell in the column 7 and row 5.
Knowing the index number for a cell is not enough to know where the cell is located geographically (on Earth), since the index for the cell only locates the cells position in the matrix. To geo code the entire grid reference points for at least two corners of the grid are necessary. To calculate the location of an individual cell we also need to know the cell resolution (area the cell covers in reality).
The true location of each cell (coordinates) can be calculated based on the minimum X and Y coordinates of the grid. Most often, the coordinates represent the location of the CENTER of the cell. Check the grid on the previous slide to make sure that you understand how the computation works.
The value of a cell can represent a specific element or object in a landscape, like for example 1 = lake, but it can also be an ID number for that cell. The ID number can thereafter be linked to more complex attribute data (for example, text files, tables, video files, pictures...). But in most cases the cell value is of the first type, the cell value represents an object in the real world, which means that thematic information is stored in separate raster databases.
The cell values can be either numbers or characters, depending on the needs of the user. Numerical data can be stored in different formats, e.g. stored as byte (8 bit), integer (16 bit) or real (32 bit) data, depending whether it is important to have decimals (fractions) and/or negative values in the numbers, and depending on the size of the numbers to be stored. The database can also be BINARY (numbers) or ASCII (numbers + characters + symbols), or LOGICAL expressions (yielding true/false results) formats optionally.
2
When creating a raster database, the first step is to decide the resolution of the grid (the size of the cells). It is very rare that the resolution is different for the X and Y dimensions, although this is theoretically possible. Normally, the cells are squares with equal X and Y resolutions. All cells must be given a value (so if there is nothing to represent in the raster, the value zero can be given for example). Raster data therefore requires a lot of storing space in your computer because the raster structure does not allow for “empty” cells (compare with the example on the slide – the zero cells contains no useful information, but still this has to be stored).
One may ask “what happens if there is more than one geometrical object found within the same cell?” One solution is to take to dominant area within a cell. In this example, the forest covers a bigger area than the other classes so the cell could be coded as “forest”. Another method is to code the cell with the class found at the center of the cell. In this example, the cell would now be coded as “lake”. As you can see in the example, the result may differ considerably depending on which coding method that is selected. If the creator of a raster database is aware of the algorithm used during rasterization process, he or she should always add this information to the documentation data associated with a raster database!
A very important problem with the raster data structure is that it does not permit the user to know anything about what happens inside a cell. The cell is the smallest unit in the database and anything that is smaller than the cell will not show in the database. The following three slides will illustrate this for different object types.
If point data is stored as raster structure data it is not possible to know exactly where the points were situated within the cells. To increase the precision, the cell size should be reduced (but more cells = more data = more storage space in your computer! There is always a “tradeoff” between resolution and memory.
3
Information about the exact location of linear objects is lost in a similar way when translating to raster data. In this example, both the red and the black line networks will be represented in exactly the same manor using the raster data structure despite the fact that they are very different to each other in reality.
Information about the exact location of linear objects is lost in a similar way when translating to raster data. In this example, both the red and the black line networks will be represented in exactly the same manor using the raster data structure despite the fact that they are very different to each other in reality.
An important advantage with raster data is that it is possible to represent
continuous surfaces (continuous geographic fields) in a very realistic manor. Topography or temperature, for example, occurs everywhere and vary gradually over a surface and are as a consequence ideally for storing in a raster database.
To summarize particularities with the raster data structure: The data structure do not allow for empty cells. This in turn will cause raster databases to be large in respect of storage usage on the computers hard disk. However, more sophisticated raster software normally use different types of data compression, similar to what is used for compressing image files, to reduce the storage space.
Factors influencing the size of a raster database are: Number of columns and rows, which in turn is affected by the cell size provided that the database should cover the same geographical area. Data type, which depends on the type of numerical data that is stored, e.g. if only integers between 0 and 255 are to be stored data can be stored as Byte data, as opposed to storing of very low and high magnitude values with several decimals, which will demand data to be stored as Real data type. File type, e.g. if the data file is stored in a very compact format as binary instead of the more storage demanding ASCII-format. Data
4
compression type will also influence the space required. In most GIS software handling raster data the user has at least limited control over data storing and as a general rule a database should not be bigger (in terms of memory) than needed. This means that the user must select data types, etc that are appropriate for the type of data being stored.
Many GIS software programs offer the possibility to save the data as ASCII or Binary data. In a computer the bit is the smallest possible unit of information and can be considered as a sort of “switch” that can be either ON (1) or OFF (0). Eight (8) bits form a byte. Here is an example of how a number can be stored using one byte. You start at the RIGHT, and ADD all the values where the bits are 1’s. If all bits have the value 1, the byte equals the number 255, while if they 7
6
5
4
3
2
1
are all 0’s, the byte equals 0. In a byte, the bits = 2 , 2 , 2 , 2 , 2 , 2 , 2 , 2
0
(note: the base 2 is why it is called BINARY!). If data is stored as ASCII data type, at least one byte is needed for each single number, e.g. the number 57 on the slide that is possible to represent with one byte when stored as binary data will require two bytes when stored as ASCII data, one for the number “5” and one for the number “7”.
The main issues to consider for the raster data structure is that the way of storing is quite simple and easy to understand, data handling can be somewhat slow if the cells are small and larger areas should be represented, the raster data is very efficient for representing continuous surfaces and the data format is particularly suitable for combining with remote sensing data, since this type of data always is stored in raster format.
As a summary, discrete data is ideally represented with a vector database. In this example, a person walks from point A to point B and crosses a number of private properties. The borders of the properties are very accurately defined, the borders are discrete and the person will pass immediately from one property to the next.
5
For the route as in the previous slide, but not considering the topography instead of properties, topography has no discrete borders, on the contrary the topography vary continuously during the crossing and consequently this type of data is better represented with raster data. Today, most GIS software programs can handle both vector and raster structure data models.
6
(VERCTOR DATA STRUCTURE / VECTOR REPRESENTATIONS)
7