DATA MODELS & MANGEMENT- I
Outlines Introduction Raster Data Vector Data Raster and Vector Structures Raster and Vector Advantages and
Disadvantages
Introduction Geographic Data and Information are
the heart of GIS.
Two fundamental components of
geographic data: space (expressed as spatial data) and qualities (attributes).
Both of these are stored in database.
Data and Information Definitions Information is the primary purpose of
GIS, not just data. Data is the input; information is the
output.
Types of data Spatial Maps
non-spatial Schematic diagrams
Images
Oblique photographs
Videography
Films
KT1 2EE Postcodes/ZIP codes RH8 9AA SW1P 3AD
Financial statements £12,000 23.45 56789 £23,456 12.45 23456 £45,987 29.57 87634 5
Introduction Spatial data in GIS has two primary data
formats: raster and vector. Raster uses a grid cell structure, whereas
vector is more like a drawn map.
Spatial Data: Vector format Vector data are defined spatially: (x1,y1)
Point - a pair of x and y coordinates
vertex Line - a sequence of points
Polygon - a closed set of lines
Node
Raster and Vector Data
Raster data are described by a cell grid, one value per cel
Vector
Raster
Point Line Zone of cells
Polygon
Raster and Vector Data Vector format has points, lines, polygons that appear
normal, much like a map. Raster format generalizes the scene into a grid of
cells, each with a code to indicate the feature being depicted. The cell is the minimum mapping unit. Raster has generalized reality: all of the features in
the cell area are reduced to a single cell identity.
Raster and Vector Data Models Raster: because the raster cell’s value or code
represents all of the features within the grid, it does not maintain true size, shape, or location for individual features. Even where “nothing” exists (no data), the cells must be coded.
Vector: vectors are data elements describing
position and direction. In GIS, vector is the map-like drawing of features, without the generalizing effect of a raster grid. Therefore, shape is better retained. Vector is much more spatially accurate than the raster format.
Raster Data Raster Coding Resolution Gridding and Linear Features Raster Precision and Accuracy
Raster Coding In the data entry process, maps can be digitized
or scanned at a selected cell size and each cell assigned a code or value.
The cell size can be adjusted according to the
grid structure or by ground units, also termed resolution.
There are three basic and one advanced scheme
for assigning cell codes.
Raster Coding Presence/Absence: is the most basic method and to record a
feature if some of it occurs in the cell space.
Cell Center: involves reading only the center of the cell and
assigning the code accordingly. Not good for points or lines.
Dominant Area: to assign the cell code to the feature with the
largest (dominant) share of the cell. This is suitable primarily for polygons.
Percent Coverage: a more advanced method. To separate
each feature for coding into individual themes and then assign values that show its percent cover in each cell.
Raster Coding Problems Raster coding produces spatial
inaccuracies.
Raster Coding Problems One possible solution is to increase the
resolution by increasing the number of cells, making each one smaller and therefore more sensitive to accurate classification.
Raster Mapping A major problem with the raster structure is that
the shape of features is forced into an artificial grid cell format.
For right-angled features, such as square
agricultural fields or rectangular political districts, this may not present a major problem. However, for many features, size and shape can become undesirably distorted.
Resolution Increasing the number of cells on a data set
increases spatial resolution, which helps to increase spatial accuracy. One advantage to using relatively few cells
is the short processing time and ease of analysis.
Gridding and Linear Features Low-resolution raster results in a rather
generalized and crude shape. High-resolution raster shape appears more
realistic, though still a long way from the vector shape and spatial accuracy.
Raster Precision and Accuracy Questions of raster data precision (the exact location) and
accuracy (maximum spatial truth) are often a problem.
Because the raster cell is the maximum resolution and the
minimum mapping unit, there is no way to know exactly where small feature occurs.
Smaller cells have less spatial error because the area of doubt is
smaller.
Uncertainty becomes greater when measuring across cells. Area measurement are also generalized.
Vector Data Vector features appear more realistic than
raster features and have better spatial accuracy. Vector features are defined primarily by their
shapes, more specifically by the outline of their shapes. In GIS, the vector system is a coordinate-based data structure.
Vector Data Shape points are the ends and bends that define the feature’s
outline.
At the beginning and end of every line or polygon feature is a
node.
At each bend (change of direction) is a vertex (plural: vertices). Node are end points and vertices are between, defining the
shape.
Point features are standalone nodes.
Vector Data Chains connect the shape points to draw the feature’s outline.
Chains are vectors or data structure paths that are not part of the actual stored data elements; they are not real lines, but define and present the connection between shape points.
Vector system data files store only the coordinate of each node
and vertex; the hardware draws the connecting chain segments. It is virtual component.
The vector data structure is also known as an arc-node model
because it uses chains (arcs) and end points (nodes).
Raster and Vector Structures Raster and vector structure have different
methods of storing and displaying spatial data. Raster cells are stored and displayed as
cells, but in the vector format only the nodes and vertices are stored. This results in considerable data storage differences.
Raster and Vector Structures A point in a raster system is a single cell, but in a vector
system it is only a node represented by a symbol with its coordinate position noted.
A simple line in a raster system consists of a sequence of
cells. In a vector system, a simple line consists of two nodes and a chain that connects them.
A more complex raster line consists of connected cells,
sometimes in stair-step fashion when they are diagonal. Complex lines in the vector format have vertices to mark changes in direction, with nodes at each end.
Raster and Vector Structures Raster polygons are filled with cells. For
single polygons, the vector format usually has a single node and several vertices to mark the boundary direction changes.
Connected polygons are simply two blocks
of cells in the raster format, but in vector they share a common border and some common nodes.
Raster to Vector Conversion There are at least four basic reasons to convert from
raster to vector: (1) better visual appearance of vector features; (2) some plotter work only on vector data; (3) comparison with vector data is best when both data files have identical formats; (4) some GIS systems have vectors as the central operating data structure. Rasterization of vector data is often called gridding.
Raster Advantages A relatively simple data structure; The simple grid structure makes analysis easier. The computer platform can be “low tech” and inexpensive. Remote sensing imagery is typically obtained in raster format. Modeling is the creation of a generalized data file or a set of
universal procedures to accomplish a certain GIS task.
Raster Disadvantages Spatial inaccuracies Because each cell tends to generalize a landscape, the result
is relatively low resolution compared to the vector format.
Because of spatial inaccuracies caused by data
generalization, a raster format cannot tell precisely what exists at a given location.
Each cell must have a code, even where nothing exists.
Vector Advantages In general, vector data is more map-like. Is very high resolution. The high resolution supports high spatial accuracy. Vector formats have storage advantages. The general public usually understands what is shown on
vector maps.
Vector data can be topological.
Vector Disadvantages May be more difficult to manage than raster formats. Require more powerful, high-tech machines. The use of better computers, increased management needs,
and other considerations often make the vector format more expensive.
Learning the technical aspects of vector system is more
difficult than understanding the simplicity of the raster format, particularly when topology is introduced.
GIS Data Characteristics Location, or position, is a major staring point of
spatial measurement. Location can be descriptive, or uses a “Lat-Lon” system.
Size characteristics: Polygon: area and perimeter;
Lines: length.
Shape: an important descriptive element used in
map and image interpretation. The shape of a feature often indicates its identity and role on the landscape.
GIS Data Characteristics Point features have no real shape or spatial
dimension, only the position of objects or occurrences. They are represented by symbols, such as dots, geometric shapes, or icons.
A line feature has length from beginning to end. Polygon features have a wide variety of shapes,
from easily interpreted circles and squares to complicated shapes that defy description.
Spatial Data Relationships Spatial relationships are how features
relate to each other in space. It includes distance, distribution,
density, and pattern.
Spatial Data Relationships Distance from one feature to another is an
elementary but important relationship. It is available through simple measurement. Distribution is the collective location of features;
the geographic dispersal or range. There are two basic ways of perceiving distribution: features among themselves and their spatial relationship with other features.
Spatial Data Relationships Density is the number of items per unit
area; how close features are to each other. Pattern is the consistent arrangement of
features, similar to (and can include) distribution and density.
The Data Model Geographical variation in the real world is
infinitely complex. Therefore, we require a set of rules (‘the data model’) to convert real geographical variation into discrete objects.
‘A set of guidelines for the representation of
the logical organisation of the data in a database … (consisting) of named logical units of data and the relationships between them.’
lati tud e
The GIS Model: example roads
lati
tud
e
longitude
hydrology longitude
topography
Layers may be represented in two ways: •in vector format as points and lines •in raster(or image) format as pixels
longitude
All geographic data has 4 properties: projection, scale, accuracy and resolution
e tud lati
Here we have three layers or themes: --roads, --hydrology (water), --topography (land elevation) They can be related because precise geographic coordinates are recorded for each theme. Layers are comprised of two data types •Spatial data which describes location (where) •Attribute data specifing what, how much,when
Types of data model The Raster Model Equivalent of a continuous grid covering the surface, whereby each cell in the grid represents a square on the ground.
The Vector Model Attempts to represent objects as exactly and precisely as possible by storing points, lines (arcs) and polygons (areas) in a continuous co-ordinate space
Raster-Vector Data Model
Raster
Vector
Real World
Representing Data with Raster and Vector Models Raster Model area is covered by grid with (usually) equalsized, square cells attributes are recorded by assigning each cell a single value based on the majority feature (attribute) in the cell, such as land use type. Image data is a special case of raster data in which the “attribute” is a reflectance value from the geomagnetic spectrum cells in image data often called pixels (picture elements)
Representing Data with Raster and Vector Models Vector Model
The fundamental concept of vector GIS is that all geographic features in the real work can be represented either as: points or dots (nodes): trees, poles, fire plugs, airports, cities lines (arcs): streams, streets, sewers, areas (polygons): land parcels, cities, counties, forest, rock type
Vector and Raster Models in GIS Representation of Lines Raster
Vector
TOPOLOGY (for vector data)
What is topology? Why is important? Three types of topological models in GIS Spatial operations of topology
Contiguity Connectivity
Trade-offs of topological structure Application model Triangular Irregular Network (TIN):Vector-based GIS
Spatial features and spatial relationships Spatial features in maps
Points, lines and polygons
Human being interprets additional
information from maps about the spatial relationships between features A route trace from an airport to a house Land contiguity adjacent to streets along which the lands are located
The definition of Topology The spatial relationships can be interpreted identification of connecting lines along a path definition of the areas enclosed within these lines identification of contiguous areas In digital maps, these relationships are
depicted using ‘Topology’ Topology = A mathematical procedure for explicitly defining spatial relationship Topology is the description of how the spatial objects are related with spatial meaning
Topological data models Three types of topological concepts Arc, Node and polygon topologies Arc
Arcs have directions and left and right polygons (=contiguity)
Node Nodes link arcs with start and end nodes (=connectivity) Polygon
Arcs that connect to surround an area define a polygon (=area definition)
Terms and concepts Connectivity - from and to nodes Contiguity - Polygon Enclosure Adjacency - from Direction To Node
Left Polygon
Ar c From Node
Right Polygon
Spatial operations of topology Connectivity and contiguity (Aronoff, 1989)
A basic, but core spatial analysis operations in GIS Contiguity A biologist might be interested in the habitats that occur next to each other A city planner might be interested in zoning conflicts such as industrial zones bordering recreation areas Connectivity Transportation network, telecommunication systems, river systems To find optimum routings or most efficient delivery routes or the fastest travel route To predict loading at critical points in a river channel To estimate water flow at a bridge crossing that will result from heavy flood
Trade-offs of topology Advantages Spatial data is stored more efficiently Analysis process faster and efficient for large data sets By topological relationships, we can perform spatial analysis functions, Modelling flow through the connection of lines in a network (i.e. buffering) Combining adjacent polygons with similar characteristics (i.e. spatial merge) Overlaying geographical features (i.e. spatial overlay)
Disadvantages Extra cost and time creating topological structure does impose a cost Topology should be always updated when a new map or existing map is updated Additional batch job working
To avoid the extra efforts, GIS systems need to run a batch job (i.e. a process that can be run without user interactions); 70% of total GIS costs Autoexec.bat in DOS Macro languages such as AML (Arc/Info), Avenue (ArcView), MapBasic (MapInfo) and etc
Conclusions of topology When topology is created, we can
identify Know its positions of spatial features Know what is around it Understand its geographical characteristics by virtue of recognising its surroundings Know how to get from A to B
Thank You