BIG DATA - 3Vs of Big data o Volume Companies gather huge amount of data form business transactions, social media, biometrics, and other sources. These must be stored properly Tech focus: Apache Hadoop o Velocity Data that streams in at an unprecedented peed must be dealt with in a timely manner o Variety Data comes in different forms: numeric, integer alphanumeric, photo, video, etc. - Data Extraction - Data Storage - Data Cleaning - Data Mining AI vs ML vs DL: Artificial Intelligence - AI = Any code, technique or algorithm that enables machine to mimic develop or demonstrate human behavior or cognition. - Currently we are at Weak AI stage vs Strong AI where machines can do anything humans can do. - To get to Strong AI status, we use Machine Learning Machine Learning - Supervised Learning o With help from data scientists o Classifications – Hotdog / Not a hotdog; Wolf or Husky o Completely Automated Public Turing test to tell Computers and Humans Apart. (CAPTCHA) - Unsupervised Learning o Learn to predict outcomes on the go o Clustering Deep Learning - Draw meaningful inferences from large volumes of data sets - Requires Artificial Neural Networks (ANN) Data to Information - Data o Internet speed is 4mbps - Information o The average internet speed is 5mbps over the period of 2Q 2016 - The way information is structured provides context and meaning to the data collected. - Database Report Extraction Report
Uses of Information - Planning - Recording - Controlling - Measuring - Decision-making o Strategic Information Used to help plan the objectives of the business as a whole and to measure how well those objectives are being achieved Ex: Profitability, Size, Growth, etc. o Tactical Information Used to decide how the resources of the business should be employed Ex: information about business productivity, Pricing information, etc. o Operational Information Used to make sure that specific operational tasks are carried out as planned/intended. Ex: Things are done properly Special Topics on Information - Tables o Compared or look up individual values o You required precise values o Values involve multiple units of measure o Data has to communicate quantitative information, but not trends. - Charts o Used to convey a message that is contained in the shape of the data. - Columns Charts o Small categories - Stacked Columns o Composition o 3 to 4 parts - Bar Graphs o Use when displaying negative quantities - Line Charts o Continuous data set without breaks in between. o Trend-based visualizations - Area Charts - Pie and Donut Charts o Represent numbers in percentages - Scatter Chart o For correlation and distribution analysis o Good for showing the relationship between two different variables where one correlates to another (or doesn’t) o Show data distribution or clustering trends and help you spot anomalies. - Bubble Chart
Data Processing - Collection - Preparation - Input - Processing - Output and Interpretation - Storage DFD Rules - Data must be moved by a process - Data changes when it moves through a process - All processes should have unique names - Input different from output. - No process can have only input - No process can have only output Gathering Techniques - 1v1 interviews - Group interviews - Facilitated - Joint Application Development - Questionnaires - Prototyping - Use Cases - Following People Around (Shadowing) - Request for Proposals - Brainstorming