iiWAS 2008
Proceedings of iiWAS2008
GAINS-BI: Business Intelligent Approach for Greenhouse Gas and Air Pollution Interactions and Synergies Information System Thanh Binh NGUYEN
Wolfgang SCHOEPP
Fabian WAGNER
International Institute for Applied Systems Analysis (IIASA) Schlossplatz 1 A-2361 Laxenburg, Austria Tel: (+43 2236) 71 327
International Institute for Applied Systems Analysis (IIASA) Schlossplatz 1 A-2361 Laxenburg, Austria Tel: (+43 2236) 71 309
International Institute for Applied Systems Analysis (IIASA) Schlossplatz 1 A-2361 Laxenburg, Austria Tel: (+43 2236) 71 565
[email protected]
[email protected]
[email protected]
ABSTRACT The Greenhouse Gas and Air Pollution Interactions and Synergies (GAINS)-Model is studied and developed to provide a consistent framework for the analysis of co-benefits reduction strategies from air pollution and greenhouse gas sources. In this paper we introduced a BI approach, namely GAINS-BI, applied as a further development of the GAINS model. In this context, the GAINS-BI conceptual model, including GAINS-BI architecture and concepts, is specified based on a sound mathematical models used for calculate emission and costs. Hereafter, a multidimensional data model, e.g. activity, emission and cost data cubes, has been studied and introduced to represent specific multidimensional analysis requirements of greenhouse gas and air pollution application domains. To proof of concepts, some implementation results have been presented.
Categories and Subject Descriptors H.2.8 [Database Applications]
atmospheric dispersion characteristics and environmental sensitivities towards air pollution [1,2,3,5,8,9,14]. In 2005 the model was extended to meet the new needs of “pollution science” as well as modeling pollution through greenhouse gases. The extension of the scientific approach was also reflected in the new name of the model, namely Greenhouse Gas and Air Pollution Interactions and Synergies (GAINS)[8]. These air pollution related problems are considered in a multi-pollutant context (Figure 1) quantifying the contributions of sulphur dioxide (SO2), nitrogen oxides (NOx), ammonia (NH3), nonmethane volatile organic compounds (VOC), and primary emissions of fine (PM2.5) and coarse (PM10-PM2.5) particles. The main goal of the model is to estimate, for a given energy- and agricultural scenario, the costs and environmental effects of user-specified emission control policies (the “scenario analysis” mode), see Figure 2. Furthermore, a linear optimization mode can be used to identify the cost-minimal combination of emission controls meeting user-supplied air quality targets, taking into account regional differences in emission control costs and atmospheric dispersion characteristics.
General Terms Design
Keywords GAINS, Business Intelligent, Data warehouse, ETL
1. INTRODUCTION The Regional Air Pollution Information and Simulation (RAINS) model developed by the International Institute for Applied Systems Analysis (IIASA) combines information on economic and energy development, emission control potentials and costs, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. iiWAS2008, November 24–26, 2008, Linz, Austria. (c) 2008 ACM 978-1-60558-349-5/08/0011 $5.00.
Figure 1. Flow of information in the RAINS/GAINS model As a next step of the model, the GAINS-BI approach is introduced in this paper. The GAINS-BI is studied and developed based on the Business intelligence (BI) concepts [5,7,11,12,13],
332
iiWAS 2008
which are basically comprised of data warehousing infrastructure, analysis, and reporting environment. Furthermore, Business intelligence (BI) is the process of gathering enough of the right information in the right manner at the right time, and delivering the right results to the right people for decision-making purposes so that it can continue to yield real business benefits, or have a positive impact on business strategy, tactics, and operations .With its set of methodologies and technologies, BI has been described as a promising technology that tends to help enterprises in transforming their legacy systems towards integrated, user-centric information systems required for the support of improvement business operation effectiveness and of management/decision making process [12]. In this paper, first we introduce mathematical models used to calculate emission [8], and cost [14] for a given pollutant, GAINS region, and year within a given GAINS scenario. These mathematic sound concepts enable to specify the GAINS-BI conceptual data model as well as used for calculating emission and costs in the ETL process and data cube generation. In this context, the GAINS-BI architecture and its concepts have been introduced as an application framework. Furthermore, a multidimensional data model, including three main fact tables namely activity_f, emission_f, cost_f and six dimensions, namely scenario_d, pollutant_d, region_d, activity_d, sector_d, and time_d to be specified three data cubes, namely activity, emission and cost data cubes. To fulfill regional specific requirements, we have developed a class of regional data marts, , i.e. GAINS Europe, GAINS Asia, GAINS World etc, used to collect relevant data from multiple GAINS data sources and used for regional data analysis and cost optimization. Afterwards, a global GAINS data warehouse has been developed to integrate multidimensional data from regional data marts for global data analysis and cost optimization purposes. Some typical examples have been presented. The utilization of BI in GAINS provides a feasible and effective method to improve the speed of reporting, analysis, and information delivery for faster operational decision-making and action-taking, thus enabling to react rapidly to business problems and satisfy new requirements.
Proceedings of iiWAS2008
decisions, create more effective plans and respond more quickly to problems and opportunities [10]. Thus, this approach effectively and efficiently leverages the data resources to satisfy their requirements for analysis, reporting and decision making process. In the context of the GAINS model, there are several specific questions like “How much would a migration from one technology to another, more effective one, cost and how much emissions would it save?”, or “What is the most effective way in terms of use of technologies to save emissions within a given budget?”. Questions like this are answered with the help of the GAINS optimization module [14]. This paper focuses on applying mathematical models and BI concepts to improve the data integration (ETL) process with some specific calculation for emission and costs. Hereafter, GAINS-BI system is implemented with various data analysis and decision support components, and is to provide efficient ways of obtaining valuable information and knowledge.
3. GAINS-BI CONCEPTUAL MODEL In this section, first we introduce the GAINS concepts and the scenario-based emission and cost calculations. Furthermore, the GAINS-BI architecture is introduced as a framework for specifying based components of the system. More details, the ETL process is presented to show the data follow how to collect and calculate data in GAINS-BI.
3.1 GAINS concepts According to [8], the RAINS model has been extended to capture (economic) interactions between the control of conventional air pollutants and greenhouse gases. This GAINS model includes, in addition to the air pollutants covered in RAINS, carbon dioxide (CO2), methane (CH4), nitrous oxide (N2O) and the F-gases [8]. Thereby, the traditional RAINS model constitutes the air pollution-related part of the GAINS model, while the GAINS extensions address the interactions between air pollutants and greenhouse gases.
The rest of this paper is organized as follows: section 2 introduces some approaches related to our work; after a brief introduction of GAINS concepts, in section 3, a GAINS architecture and ETL framework are presented, section 4 will present our implementation results. At last, section 5 gives a summary of what have been achieved and future works.
2. RELATED WORKS The characters of the proposed approach can be rooted in several research areas of BI, including the trends and concepts of BI solutions, the combined use of mathematical models and data warehousing technologies in supporting BI, as well as the utilization of BI in GAINS. With the amount of data generated in an enterprise increasing continuously, delivering the right and sufficient amount of information at the right time to the right business users has become more complicated and critical [7]. More and more enterprise solutions and platforms for Business Intelligence have been developed such as IBM DB2 with Business Intelligence Tools, Microsoft SQL Server, Teradata Warehouse, SAS, iData Analyzer, Oracle, Cognos, Business Objects, etc. [11], have been developed aim to empower businesses by providing direct access to information used to make
Figure 2. Environmental effects of air pollutants and greenhouse gasesScenario-based Emission and Cost Calculation Emission Calculation According to [8].The emissions for a given pollutant, GAINS region, and year within a given GAINS scenario are calculated according to the following equation
333
iiWAS 2008
Proceedings of iiWAS2008
E p , r ,t , y
¦E
p , a , s ,t , r , y
a , s ,t
¦A
a,s,r , y
. X a , s , t , r , y .ef a , s , p , r .(1 K a , s ,t , p , r )
a , s ,t
where
p, r , y : Pollutant, GAINS region, year, a, s, t : GAINS activity, sector, abatement technology (option), E p , r ,t , y :missions of the specific pollutant p, GAINS region r, and year y, Aa , s ,r , y :Activity
for
a
given
GAINS
3.2 GAINS-BI architecture GAINS-BI is conceived as a data warehousing system whereby member organizations remain responsible for the generation and providing of their data. The GAINS BI components could be described as follows: x
activity/sector
combination (a, s), X a , s ,t ,r , y :Actual implementation rate of the considered
x
abatement option , ef a , s ,t , p , r “Uncontrolled” (“unabated”) emission factor, and
K a ,s ,t , p ,r Reduction efficiency.
x
Cost Calculation Similar to the emissions, also the costs of reducing emissions for a given pollutant, GAINS region, and year can be calculated by the GAINS model according to [14]:
C p , r ,t , y
¦C a , s ,t
p , a , s ,t , r , y
¦G
( a , s , t ), p
Aa , s , r , y . X a , s ,t , r , y .cf a , s , t , r
x
x
a , s ,t
where
C p , r , t , y ;Reduction costs of the specific pollutant p, GAINS region r, and year y, cf a , s ,t ,r Unit cost factor of the considered abatement option, and
G ( a , s ,t ), p Kronecker delta function that returns 1 if p is the primary cost pollutant for abatement option a, s, t and 0 otherwise.
ETL Tools: GAINS-BI enables users to provide that content (upload) and, to access it in terms of GAINS data download. Afterwards, the ETL tool will extract, transform (emission and cost calculations) (Figure 4), and load the data to a GAINS data mart. Emission and Cost Aggregation Pre-Calculation. In this step, three main data cubes are generated in term of precalculate activity, emission and cost data with multiple levels of data granularities based on dimension hierarchies and aggregated data values. Intranet Report Systems. Jasper Report is used in this case to generate reports, including data table and chart generations. Web Publisher. In the context of Jasper Report, the output of GAINS-BI could be generated in different formats, i.e. HTML Ajax, PDF, Excel, etc. Metadata. GAINS-BI metadata is used to describe all builtsteps. First, the metadata contains the Calculation Rules based on the mathematical formulations described in the previous section. The second metadata item is information about Pre-calculation Aggregation (data granularity). The Multi-Dimensional Report Management is used to manage a set of generated reports as well as the configuration of each report. For multiple purposes, e.g. many kinds of end users, different levels of decision makings, Data Selection is to define different sub-cubes. These sub-cubes could be accessed via several methods or protocols, i.e. web services, API, Servlet, etc. The report and result presentation could be
Figure 3. GAINS BI architecture
334
iiWAS 2008
Proceedings of iiWAS2008
Figure 4. GAINS-BI ETL process configured metadata.
based
on
the
Presentation
AllTechnologies->TechnologyType->IDTechnology
Configuration
Time Dimension (technology_d) Time Dimension is denoted as AllTime->Year
4. MODELLING MULTIDIMENSIONAL DATA MODELS AND IMPLEMENTATION RESULTS 4.1 Dimension Definitions
Scenario Dimension (scenario_d) Emission Scenarios define, for each country, the combination of activity projections and control strategies [14]. This combination determines the level of actual emissions. The scenario dimension is organized as follows:
Region Dimension (region_d) The GAINS Europe model covers 42 land-based regions in Europe, most of them individual countries and four subnational regions in the European part of Russia. Moreover, there are currently five sea regions represented in the model. These regions are denoted as IDRegion. Furthermore, there regions could be grouped into several region groups, denoted by RegionGroup. AllRegions->RegionGroup->IDRegion Sector Dimension (sector_d) and Activity Dimension (activity_d) GAINS covers a number of sectors, and each sector may be associated with a number of different activities. Hence, in GAINS activity data are structured by sector-activity combinations. For example, in the sector ‘industrial boilers’ the associated activities are the various fuels that are used in industrial boilers, i.e., coal, oil, etc. Activities may be further subdivided, e.g., hard coal (grade 1), hard coal (grade 2), etc. The sector and activity dimensions covered by GAINS-BI are organized as follows
AllScenarios->ScenarioGroup->IDScenario
4.2 Fact table Definitions Activity Fact Table (activity_f) Economic activities such as energy consumption, industrial production and agricultural farming cause emissions of air pollutants, which have several negative effects on ecosystems and human health. These variables describe the level of the activity in a sector and a country. The Activity Fact Table is defined based on five dimensions, i.e. Scenario Dimension (scenario_d), Region Dimension (region_d), Sector Dimension (sector_d), Activity Dimension (activity_d), Time Dimension(time_d) and denoted as follows: activity_f(idscenario,idregion,idactivity,idsector,year,activityvalue)
AllSectors->SecType->IDSector AllActivities->ActType->IDActivity Pollutant Dimension (pollutant_d) The set of pollutants in GAINS covers both the traditional air pollutants (SO2, NOx, PM2.5, NH3 and VOC) as well as the greenhouse gases CO2, CH4, N2O and FGAS (a GWP-weighted average of HFCs, PFCs, SF6). The pollutant dimension is organized as follows: AllPollutants->PollutantGroup \ ->Pollutant>IDPollutant_Fraction
Technology Dimension (technology_d) Emissions of pollutants can be controlled with control technologies, but not every technology controls every pollutant.
Figure 5. Activity Data Cube Schema
335
iiWAS 2008
Proceedings of iiWAS2008
Figure 7 shows an example of Activity report generated from Activity Data Cube Emission Fact Table (emission_f) Emissions of each pollutant are calculated as the product of the activity levels, the “uncontrolled” emission factor in absence of any emission control measures, the efficiency of emission control measures and the application rate of such measures. The Emission Fact Table is defined based on five dimensions, i.e. Scenario Dimension (scenario_d), Region Dimension (region_d), Sector Dimension (sector_d), Activity Dimension (activity_d), Time Dimension(time_d), Pollutant Dimension (pollutant_d) and denoted as follows:
Cost Fact Table (emission_f) GAINS does not produce nor use single pollutant cost curves in the optimization. However, single pollutant cost curves can be constructed by GAINS, if so desired [14].The Cost Fact Table is defined based on six dimensions, i.e. Scenario Dimension (scenario_d), Region Dimension (region_d), Sector Dimension (sector_d), Activity Dimension (activity_d), Time Dimension(time_d), Pollutant Dimension (pollutant_d), Technology Dimension (technology_d) and denoted as follows: cost_f(idscenario,idregion,idactivitytivity,idsectortor,idtechnology ,year,idpollutant_fraction,factor,activityvalue,perc,cost,idcostsets ,unit)
emission_f(idscenario,idregion,idactivitytivity,idsectortor,year, idpollutant_fraction,activityvalue,impl_ef,factor_noc_abtd,rem_ef,emiss_c alc,emiss,emiss_co2eq)
Figure 6. Cost Data Cube Schema Figure 5. Emission Data Cube Schema
Figure 7. An example of using Activity Data Cube to generate Energy Data aggregated by Activity
336
iiWAS 2008
Proceedings of iiWAS2008
Figure 8. An example of using three data cubes to generate multi reports
Member states to meet the environmental targets of the Thematic Strategy on Air Pollution. NEC Sceanrio Analysis Report No. 3. International Institute for Applied Systems Analysis (IIASA), Laxenburg, Austria.
5. CONCLUSIONS AND FUTURE WORKS As the process of turning data into information and then into knowledge, the concept of Business Intelligence has been emerging as a potential solution to open new perspectives and areas for improvement of the decision making processes and operations. In this paper, a BI solution, with the integrated repositories and data warehouse as the central components, has been introduced to support the GAINS experts for better business decisions. Moreover, we have presented how to apply the mathematical formulation to specify the GAINS-BI conceptual data model as well as to calculate emission and costs in the ETL process and data cube generation. The GAINS-BI architecture and its concepts have also been introduced. The multidimensional data model is defined. In the near future, the pursuit of semantic technologies will be used to enhance the efficiency and agility of GAINS-BI solution, i.e. representation of data combination and constrains. Moreover, mathematical optimization and data mining algorithms will be adapted for multidimensional analysis of integrated data from heterogeneous sources in university environment. Thus, the proposed BI tools can provide improved analytic capabilities for the multi level of information delivery across an enterprise, and that would help the visibility about strategic decisions.
6. REFERENCES [1] Amann, M., Asman W., Bertok I., Cofala, J., Heyes, C., Klimont, Z., R., Posch, M. and Schöpp, W. (2007)., Costoptimized reductions of air pollutant emissions in the EU
[2] Amann, M., Bertok I., Cofala, J., Heyes, C., Klimont, Posch, M. Schöpp, W. and Wagner F. (2006), Baseline Scenarios for the Revision of the NEC Emission Ceilings Directive. Part 1: emission projections. NEC Scenario Analysis Report Nr.1. International Institute for Applied Systems Analysis (IIASA), Laxenburg, Austria, http://www.iiasa.ac.at/rains/CAFE_files/NEC-BL-p1v21.pdf. [3] Amann, M., Bertok, I., Cabala, R., Cofala, J., Heyes, C., Gyarfas, F., Klimont, Z., Schöpp, W. and Wagner, F. (2005), Analysis for the final CAFE scenario. CAFÉ Report No. 6. International Institute for Applied Systems Analysis (IIASA), http://www.iiasa.ac.at/rains/CAFE_files/CAFED3.pdf. [4] Amann, M., Cofala, J., Heyes, C., Klimont, Z., Mechler, R., Posch, M. and Schöpp, W. (2004), The RAINS model. Documentation of the model approach prepared for the RAINS review. International Institute for Applied Systems Analysis (IIASA), Laxenburg, Austria, www.iiasa.ac.at/rains/review/index.html. [5] G. R. Gangadharan, S. N. Swami, "Business Intelligence Systems: Design and Implementation Strategies," in Proc of the 26th International Conference Information Technology Interfaces ITI 2004, Croatia, 2004, pp. 139-144.
337
iiWAS 2008
Proceedings of iiWAS2008
[6] Grant A. J., Luqi, "Intranet Portal Model and Metrics: A Strategic Management Perspective," IT Professional, vol. 7, pp. 37-44, 2005. [7] Hugh 7. J. W., Barbara H. W., "The Current State of Business Intelligence," Computer, vol. 40, pp. 96-99, 2007. [8] Klaassen, G., Amann, M., Berglund, C., Cofala, J., HöglundIsaksson, L., Heyes, C., Mechler, R., Tohka, A., Schöpp, W., Winiwarter, W. (2004) The Extension of the RAINS Model to Greenhouse Gases. An interim report describing the state of work as of April 2004. IIASA IR-04-015. [9] Makowski M.P. Data Cleaning and Performance Tuning in the GAINS Model. Thesis at the Database and Artificial Intelligence Group (DBAI) of the Technical University of Vienna, 2008. [10] Ta’a A., Bakar M. S. A., Saleh A. R., “Academic business intelligence system development using SAS® tools”, in Online Proc of the SAS Global Forum, 2008.
Conference on Computer information systems and Industrial management applications, 2007, pp. 364-368. [12] Wei X., Xiaofe X., Lei S., Quanlong L., Hao L, “Business intelligence based group decision support system”, in Proc of the International Conferences on Info-tech and Info-net ICII 2001, Beijing, China, 2001, pp. 295 – 300. [13] Zeng L., Z. Shi, M. Wang, W. Wu, "Techniques, Process, and Enterprise Solutions of Business Intelligence," in Proc of the IEEE Conference on Systems, Man, and Cybernetics, Taipei, Taiwan, 2006, pp. 4722-4726. [14] Wagner, F., W. Schoepp and C. Heyes. The RAINS optimization module for the Clean Air For Europe (CAFE) Programme, Interim Report IR-06-029, International Institute for Applied Systems Analysis (IIASA), September 2006. [15] www: http://www.iiasa.ac.at/web-apps/apd/gains/
[11] Tvrdikova M., "Support of Decision Making by Business Intelligence Tools," in Proc of the 6th International
338