Lecture 5 & 6
Space is defined as relation on a set of objects. Metric space refers to a set of locations with co-ordinates (Xi, Yi) Co-ordinates (Xi, Yi) defines the position of geographic objects. A metric space is a set where a notion of distance (called a metric) between elements of the set is defined. The metric space which most closely corresponds to our intuitive understanding of space is the 3dimensional Euclidean space. Euclidean space refers to the study of relationships among distances and angles, first in a plane and then in a space and this relationship are known as two- and three- dimensional Euclidean geometry. An n-dimensional space with notions of distance and angle that obey the Euclidean relationships is called an n-dimensional Euclidean space.
2D and 3D Cartesian coordinate systems provide the mechanism for describing the geographic location and shape of features using x and y values (by using columns and rows in rasters). N
Two axes: •One horizontal (x), representing east-west, •One vertical (y), representing north-south. Origin: The point at which the axes intersect (0,0).
W
S
2D coordinate systems
E Locations of geographic objects are defined relative to the origin, using the notation (x,y), where x refers to the distance along the horizontal axis, and y refers to the distance along the vertical axis. (4, 3) records a point that is 4 units over in x and 3 units up in y from the origin.
Apart from (x,y) also use a Z value to measure elevation above or below mean sea level i.e. (x,y,z).
(2,3,4) indicates 2 units in x and 3 units in y from (0,0) and whose elevation is 4 units above the earth's surface (such as 4 meters above mean sea level)
3D coordinate systems
Euclidean space is more than just a real coordinate space. The distances between points and the angles between lines or vectors can be measured as :
This distance function is called the Euclidean metric. It can be viewed as a form of the Pythagorean theorem. Real coordinate space together with this Euclidean structure is called Euclidean space and often denoted En Euclidean space is a metric space, it is also a topological space with the natural topology induced by the metric.
The Euclidean distance between points for n-dimension P(P1,P2, ….Pn.) and Q (Q1,Q2,………,Qn)
One-dimensional distance For two 1D points, P (Px ) and Q (Qx), the distance is computed as:
Two-dimensional distance For two 2D points P (Px, Py) and Q (Qx, Qy),
the distance is computed as:
Three-dimensional distance For two 3D points P (Px, Py, Pz) and Q (Qx, Qy, Qz), the distance computed as:
Geographically referenced data refers to data referenced by location on Earth (e.g., latitude/longitude, northing/easting) in some standard format.
Geographic information contains: • Either an explicit geographic reference (latitude and longitude or national grid co-ordinate), •Or an implicit reference (an address, postal code, census tract name, forest stand identifier, or road name).
Geographic Data (Geographically referenced data, identified according to location)
Spatial data
Raster
Non-spatial data
Vector
MAP
DATABASE
Introduction to Spatial Data •Spatial data represents Spatial information in 2, 3 or 4 dimensions etc. •Geographic information is a subset of spatial information. •The data that indicates the Earth location (latitude and longitude, or height and depth) of these rendered objects is the spatial data. •When the map is rendered, this spatial data is used to project the locations of the objects on a two-dimensional piece of paper. •A GIS is often used to store, retrieve, and render this Earth-relative spatial data.
All Spatial features recorded as Geographic Primitives with several primary characteristics Points (0-D. no length or width) are represented as a single “Dot” on the map. • Points are used to indicate discrete locations. •They have no length or area at the given scale, only position in space. •They usually have a single X, Y coordinate. •Used to represent a feature that is too small to be displayed as a line or area. Lines/Lines (1-D, length, no width) are ordered sets of points that represents straight line or a curved arc depending upon the feature it describes. Besides having a position in space, they also have a length. • They are accompanied by a set of coordinates. • They are used to represent a geographical feature that is too narrow to have area, such as a stream or a road.
Polygons/areas (2-D, length and width / area and perimeter) are closed features whose boundary encloses a homogenous area. not only a position in space and a length but also a width
•They have an area that is given by the arcs/lines that make the boundary. •They are used to represent features that have area (e.g. lakes, large cities and islands) Surfaces (3-D Areas with Z dimension) represents continuous value. Represents spatial objects with not only a position in space, a length and a width, but also a depth or height (in other words they have a volume).
There are 2 basic spatial data types representing the real world
Raster and Vector Raster Data
•Points •Lines •Polygons •TINs
•Points: single cells, unique/ known values; •Lines: Strings of cells with common values; •Polygons/areas: groups of cells with common values; •Surfaces: cells represent real or virtual elevations;
Vector Data
Raster: matrix of cells (pixels) referenced by row/column, stored as a matrix or array;
•For geo-referenced rasters, every cell represents a given area on the ground (resolution). The smaller the area the cells represent, the larger the data set size for a given area. •Raster cell values represent nominal, ordinal, or continuous data. Numbers in cells can be integer or floating point. •Raster attributes are the data set.
In the ArcGis grid data model, data tables can store additional information about nominal/categorical data, in the Value Attribute Table (VAT). VATs store information about the categories, not about individual cells: Value
Count
Name
Suitability
Type
2
30672
Cropland & Pasture
4
Agriculture
3
3339
Urban & Industrial
5
Urban
10
212
Clearings & bush fields
5
Cleared
21
1383
Cottonwood
4
Riparian
463
142
Ash Cottonwood
3
Woodland
476
7205
Oak
3
Forest
585
1112
Mixed evergreen broadleaf
2
Forest
Raster data are good at: •Representing continuous data (e.g., slope, elevation, chemical concentrations). •Representing multiple feature types (e.g., points, lines, and polygons) as single feature types (cells). •Rapid computations ("map algebra") in which raster layers are treated as elements in mathematical expressions analysis of multilayer or multivariate data (e.g., satellite image processing and analysis) is possible. •Hogging disk space.
Vector is a data structure, used to store spatial data in a discrete Cartesian x,y coordinates •Sizes of lines or areas vary, as they trace surface phenomena. •Data stored as pairs of x,y coordinates, usually with ID numbers; data typically stored in separate data tables. •In ArcGis, except in polygon coverages, the data tables contain exactly as many records as there are unique features in the data set.
•Points: id (x, y); •Lines: id (x1,y1, ... xn, yn) •Polygons: id (x1,y1 ... xn, yn), where xn=x1, yn=y1 (closed); •Surfaces: represented by Triangulated Irregular Networks (TINS)
A vector based GIS is defined by the vectorial representation of its geographic data. According with the characteristics of this data model, geographic objects are explicitly represented (Spatial) and, within the spatial characteristics, the thematic aspects are associated (Thematic). Vectorial systems are composed of two components: •One that manages spatial data •One that manages thematic data. This is the named hybrid organization system, as it links a relational data base for the attributes with a topological one for the spatial data. A key element in these kind of systems is the identifier of every object. This identifier is unique and different for each object and allows the system to connect both data bases.
Vector data are good at: •Accurately representing true shape and size •Representing non-continuous data (e.g., rivers, political boundaries, road lines, mountain peaks) •Creating aesthetically pleasing maps •Conserving disk space
TOPOLOGY Topology refers to the spatial relationships between geographic features. It describes the relationships between connecting or adjacent coverage features. Topological relationships are built from simple elements into complex elements: points (simplest elements), arcs (sets of connected points), areas (sets of connected arcs), and routes (sets of sections, which are arcs or portions of arcs). Topology is useful in GIS because many spatial modeling operations don't require coordinates, only topological information. For example, to find an optimal path between two points requires a list of the arcs that connect to each other and the cost to traverse each arc in each direction. Coordinates are only needed for drawing the path after it is calculated.
Components of Topology: Topology has three basic components: I. Connectivity (Arc – Node Topology): o Points along an arc that define its shape are called Vertices. o Endpoints of the arc are called Nodes. o Arcs join only at the Nodes.
II. Area Definition / Containment (Polygon – Arc Topology): o An enclosed polygon has a measurable area. o Lists of arcs define boundaries and closed areas are maintained. o Polygons are represented as a series of (x , y) coordinates that connect to define an area.
III. Contiguity: o Every arc has a direction o A GIS maintains a list of Polygons on the left and right side of each arc. o The computer then uses this information to determine which features are next to one another. Connectivity
Containment / Area Definition.
Contiguity
Node
Arcs
Polygon
Arcs
Arc
Left & Right Polygons
1
a1, a2, a6
A
a1, a2, a3
a1
A/D
2
a2, a3, a5
B
a2, a5, a6
a2
A/B
3
a1, a3, a4
C
a3, a4, a5
a3
A/C
4
a4, a5, a6
D
A1, a4, a6
a4
C/D
a5
B/C
a6
B/D
Explanation of Topology
The vector data attributes are also held in database tables. Because the vector data represent both linear and polygonal features, there will be 2 attribute tables (Polygon attribute & Line attribute).
Polygon attribute
Line attribute
Non-Spatial /Attribute Data: The attributes refer to the properties of spatial entities. They are often referred to as non-spatial data since they do not in themselves represent location information.
Attribute data are mainly database information corresponding to the geographic features under consideration. •This type of data describes characteristics of the spatial features. •These characteristics can be quantitative and/or qualitative in nature. •Attribute data is often referred to as tabular data and linked to the feature by a unique identifier. For example, attributes of a river might include its name, length, and sediment load at a gauging station.
•Non-spatial data can be joined to geocoded files with matching attributes and displayed as regular maps. E.g. census information such as race or income, non-inherently spatial data, can be displayed as maps. •By drawing on cartographic metaphors and representing non-spatial data as maps, or "information maps," the information in non-spatial data can be "spatialized," analyzed, browsed, and processed using GIS and cartographic methods, then shared on the web using internet map servers.
• Non-spatial data often has no corresponding geocoded representation; yet valuable information may still be derived if the right representation can be found. •Non-spatial (Non-graphic) Database: Set of tabular data records, each record containing multiple data fields. In the context of spatial databases, one of these fields is the Unique ID Number of a corresponding map feature.
•
Attribute values in a GIS are stored as Relational Database tables.
•
Each feature (point, line, polygon, or raster) within each GIS layer will be represented as a record in a table.
•
Each cell has a coordinate representation within the table and a numeric value (i.e., LU_CODE). Each LU_CODE is associated with a full description through a relational join.
GIS Data Formats & Structures
GIS DATA MODELS: A GIS is based on data. A data set may be stored in more than one format to ensure that the data can meet a range of business needs and software access requirements of users. There are types of standard data model that store GIS data. They are: Data formats: Vector Lattice/Grid/Raster Image TIN ASCII DWG/DXF Tabular Databases GeoDatabase
1. Spatial Data Models 2. Attribute Data Models
SPATIAL
DATA
MODELS:
Spatial data has been stored and presented in the form of a map. Three basic types of spatial data models have evolved for storing geographic data digitally. These are referred to as:
•Lattice/Grid/Raster •Vector •Image
Lattice and Grid (Raster): •Describe a data format that stores positional (horizontal) location information in a row-column (Cartesian) structure (pixels), a highly efficient data storage, access, and manipulation format. •Some grids may store multiple attributes just like vector data, grids usually store only a single numerical value. •Store 2D & 3D information •Users with a strong demand for analyzing and manipulating grid data, will require Spatial Analyst or similar extensions to their GIS software.
Image •Images are really just a flavor of a grid or raster. •Image is usually means orthophotography (i.e., aerial or highresolution satellite imagery). •Images store their positional, x, y, location information in a pixel by pixel pattern just like grids. •The ‘Z’ value is a number which is interpreted by software as a shade of gray, as in a panchromatic image, or a Red-Blue-Green color pattern as in color photography. •The ‘Z’ value is just a number so it can be manipulated as in a grid, allowing image analysis to be performed or imagery color or display characteristics to be modified.
•Imagery provides a key cartographic role serving as an up-to-date background to other vector datasets. •Because of the common usage of imagery in GIS, most software supports a range of image file types such as TIF, IMG, PIX etc., with installed or no-cost extensions.
Data storage in Raster/image/Grid •Data are stored in binary format (0,1) •Simple binary data values uses meaning that the possibilities are limited to two digit numbers – either 0 or 1. This is an example of a 1-bit raster data file. Mathematically, there are only two possibilities for each pixel, 0 or 1. By contrast in an 8-bit data file, there are 256 possibilities of data values for each pixel. •The computer “sees” the cells that contain 0 as “turned off”, while the cells that contain 1 as “turned on”.
Vector: •Roadways as lines, firestations as points and lakes and ponds as polygons (areas). •Vector data is a straight forward digital version of the lines that define the shape or boundary of a map feature. •In some software packages, vector data can have more complex structure, e.g. measures along lines (i.e., roads), or areas of polygon overlap such as animal habitat zones. •Vector data is stored as Geodatabase (GDB) feature classes and as shapefiles. ArcView 3.x users can access only shapefiles, while ArcGIS software can use GDB featureclasses and shapefiles.
•Vector data store significant amounts of attribute data or details about features in the data set, providing the real power in using GIS for queries and analyses. •Vector data does not provide any 3-D representation, as this format of data usually describes only the map or 2-D view of the world.
Advantages of Raster data 1. The geographic location of each cell is implied by its position in the cell matrix. 2. Overlaying is easy and efficiently implemented. 3. Due to the nature of the data storage technique data analysis is usually easy to program and quick to perform. 4. The inherent nature of raster maps, e.g. one attribute maps, is ideally suited for mathematical modeling and quantitative analysis. 5. Discrete data, e.g. forestry stands, is accommodated equally well as continuous data, e.g. elevation data, and facilitates the integrating of the two data types.
Disadvantages of Raster Data: 1.The cell size determines the resolution at which the data is represented. 2. It is especially difficult to adequately represent linear features depending on the cell resolution. Accordingly, network linkages are difficult to establish. 3. Processing of associated attribute data may be cumbersome if large amounts of data exists. Raster maps inherently reflect only one attribute or characteristic for an area. 4. Since most input data is in vector form, data must undergo vectorto-raster conversion. Besides increased processing requirements, this may introduce data integrity concerns due to generalization and choice of inappropriate cell size.
Advantages of Vector Data: 1.Data can be represented at its original resolution without generalization. 2.Graphic output is usually more aesthetically pleasing. 3.Since most data, e.g. hard copy maps are in vector form, no conversion is required. 4.Accurate
geographic
location
of
data
is
maintained.
5.Allows for efficient encoding of topology, and as a result more efficient operations that require topological information, e.g. proximity, network analysis.
Disadvantages of Vector Data: 1.The location of each vertex needs to be stored explicitly. 2.Algorithms for manipulative and analysis functions are complex and may be processing intensive. 3.Continuous data, such as elevation data, is not effectively represented in vector form. 4.Spatial analysis and filtering within polygons is impossible.
TIN – •Store 3 D data with an x, y and z value. •Store and display elevation data. •They are somewhat specialized in that they require 3-D analysis and display software such as ArcView or ArcGIS 3-D analyst. •Because there is a continuity relationship between all data formats, TINs can be converted into grids and also in vector equivalents. However this changes the way the data is modeled and usually involves some interpolation of the data thus reducing the functionality of the TIN format.
•Even though TINs generally store only a single Z value as an attribute, the TIN format creates very large files as they store the relationship between all the features within the data. •TINs representing thousands or millions of points are not uncommon and their resulting file size limits TINs to a relatively small tile extent covering a limited geographic area.
ATTRIBUTE DATA MODELS (DBMS Models used in GIS): A separate data model is used to store and maintain attribute data for GIS software. These data models may exist internally within the GIS software, or may be reflected in external commercial Database Management Software (DBMS). A variety of different data models exist for the storage and management of attribute data. The most common are:
ASCII DWG/DXF Tabular Databases GeoDatabase
ASCII – (American Standard Code for Information Interchange) •Data in this format is simply a line-by-line listing of information in text format that takes on a geographical meaning when the listing contains positional coordinate information. •Text information can be easily imported into most GIS and CADbased software programs and it is this flexibility that drives storing some point data sets in this format. •When possible, most point data sets are stored as vector datasets to make them more consumable to ArcView and ArcGIS software packages. •In the case of the elevation data that originate as very large ASCII files, storage as vector point files is not efficient for display and
DWG/DXF Drawing files (DWG) and the ASCII export version (DXF) •
Another flavor of vector data developed for and used extensively in engineering CAD (Computer Aided Drawing) software.
•
As the line between GIS and traditional CAD software and data types continues to blur, the industry has improved the compatibility, and thus sharing of these data types.
•
Used to store planimetric linework such as roads, water/sewer infrastructure, and legal description information by public work agencies, survey departments and utility companies.
•
For GIS users this data is often converted to GIS-type formats such as vector shapefiles, but DXF and DWG can also be read directly by most GIS software.
•These CAD data types provide a key bridge between GIS and engineering applications. For example the LiDAR-derived elevation contours in the SDW are provided in both vector shapefile and vector DWG format. •Though CAD formats provide accurate and detailed location information they do not store attribute information in the same way as GIS vector data does but rather provide more limited descriptive information in the LAYER and other DWG entity values.
Tabular databases – •Microsoft Access, SQL Server, Oracle and other relational database systems serve as storage and access software for a wide range of tabular data tables. •ASCII data is often moved to a tabular database arranged in a logical integrated manner that emphasizes relationships between the data sets. •Vector data also incorporates this functionality in storing the data as attributes, but large complex business tables such as financial records, census data, etc., are stored and managed as tables in these more efficient databases. •This allows the data to be served up from a central point to a variety of web-based applications and query and reporting applications.
•GIS data, particularly vector data, can also access these databases through connections within the GIS software establishing a relationship between the spatial location of features and the descriptive information about them. •Extracts of information from these relational databases is sometimes stored in standalone dbase-format (dbf) tables that are highly compatible with shapefile format data and can be joined to the shapefile dbf attribute table.
Attribute data structure Tabular Model: ASCII or other standard format Hierarchical Model: The hierarchical database organizes data in a tree structure. Data is structured downward in a hierarchy of tables Network Model: The network database organizes data in a network structure. Any column in this structure can be linked to any other. Relational Model: The relational database organizes data in tables. Each table, is identified by a unique table name, and is organized by rows and columns. Each column within a table also has a unique name. Columns store the values for a specific attribute, e.g. cover group, tree height. Rows represent one record in the table. In a GIS each row is usually linked to a separate spatial feature, e.g. a forestry stand. Accordingly, each row would be comprised of several columns, each column containing a specific value for that geographic feature.
Data is often stored in several tables. Tables can be joined or referenced to each other by common columns (relational fields). Usually the common column is an identification number for a selected geographic feature, e.g. a forestry stand polygon number. This identification number acts as the primary key for the table. The ability to join tables through use of a common column is the essence of the relational model. Such relational joins are usually ad hoc in nature and form the basis of for querying in a relational GIS product. Unlike the other previously discussed database types, relationships are implicit in the character of the data as opposed to explicit characteristics of the database set up.
The relational database model is the most accepted for managing the attributes of geographic data.
widely
The relational DBMS is attractive because of it’s: ●Simplicity in organization and data modeling. ●Flexibility - data can be manipulated in an ad hoc manner by joining tables. ●Efficiency of storage-proper design of data tables can reduce redundancy. • Queries do not need to take into account the internal organization of data. The relational DBMS has emerged as the dominant commercial data management tool in GIS implementation and application.
GeoDatabase – •This close association between spatial vector data and relational database tables is taken toward a single common format of data storage. •Beyond enhanced storage efficiencies and improvements in access speeds, geodatabases will help integrate the spatial data of organizations with their extensive business table data. Users will move from accessing their common data types in a file-based model as now done to a design where all GIS data – location and attribute – is accessed from a relational database.
Other types of spatial data that can be stored using the Spatial option besides GIS data include: •Data from computer-aided design (CAD) •Computer-aided manufacturing (CAM) systems. Instead of operating on objects on a geographic scale, CAD/CAM systems work on a smaller scale such as for an automobile engine or printed circuit boards.
Object Oriented Model: The object-oriented database model manages data through objects. An object is a collection of data elements and operations that together are considered a single entity. The object-oriented database is a relatively new model. This approach has the attraction that querying is very natural, as features can be bundled together with attributes at the database administrator's discretion. To date, only a few GIS packages are promoting the use of this attribute data model. However, initial impressions indicate that this approach may hold many operational benefits with respect to geographic data processing. Fulfillment of this promise with a commercial GIS product remains to be seen.