Audit logs for security and compliance Anton Chuvakin, Ph.D., GCIA, GCIH, GCFA WRITTEN: 2004 DISCLAIMER: Security is a rapidly changing field of human endeavor. Threats we face literally change every day; moreover, many security professionals consider the rate of change to be accelerating. On top of that, to be able to stay in touch with such ever-changing reality, one has to evolve with the space as well. Thus, even though I hope that this document will be useful for to my readers, please keep in mind that is was possibly written years ago. Also, keep in mind that some of the URL might have gone 404, please Google around. A beaten maxim proclaims that “knowledge is power”, but where do we get our knowledge about IT resources? The richest source of such information is logs and audit trails. Through logs and alerts (which we treat similarly to logs and audit trails), information systems often give signs that something is amiss or even will be amiss soon. What are some examples of log files and audit trails? We can classify the log files by the source that produced them, since it usually determines the type of information contained in the files. For example, host log files, produced by UNIX, Linux and Windows, are different from network device logs, produced by Cisco, Nortel, and Lucent routers, switches, and other network gear. Similarly, security appliance logs, produced by firewalls, intrusion detection system, intrusion “prevention” systems, are very different from both host and network logs. In fact, the security devices display a wide diversity in what they log and the format in which they do it. Ranging in function from simply recording suspicious IP addresses all the way to full network traffic capture, security devices produce an amazing wealth of information, both relevant and totally irrelevant to the situation at hand. Thus, logs present unique challenges. Some of the questions that we ask are: • • • • •
How do we find what is relevant for the situation at hand? How can we learn about intrusions—past, present and maybe even future —from the logs? Is it easy to expect to surf through gigabytes of log files in search of evidence that might not even be there, since the hacker was careful to not leave any traces? How do we use logs to come up with high-level metrics, indicating the health of our enterprise? Can compliance auditors use the logs to prove or disprove regulator compliance in the organization?
Let us briefly demonstrate some common log example. UNIX and Linux installations produce a flood of messages via a syslog or “system logger” daemon, in plain text. Such message can indicate: • There is a problem with a secondary DNS server. • A user has logged in to the machine. • A forbidden DNS access has occurred. • A user has provided a password to the Secure Shell daemon for remote login Similarly, newer Window versions also provide extensive system logging. It uses a proprietary binary format to record three types of log files: system, application, and security. For example, the system log contains various records related to the normal - and not so normal - operation of the computer In many cases, the log files don’t just give the clear answers that need to be extracted – sometimes forcefully - from them. This is accomplished by performing “log analysis”. Log analysis is the science and art of extracting answers from computer-generated audit records. Often, even seemingly straightforward logs need analysis and correlation with other information sources. Correlation means the manual or automated process of establishing relationships between seemingly unrelated events happening on the network. Events that happen on different machines at different times could have some sort of relationship, relevant to the situation. Such relationships need to be discovered and disclosed. Why analyze the logs? The answer is different for different environment. For example, for a home or small office (SoHo) computer system logs are only useful in the case of major system trouble (such as hardware or operating system failures) or security breaches which are easier to prevent since you only have to watch a single system or a small number of systems. Often, your time is better being spent reinstalling your Windows operating system and keeping up with patches and updates. Poring over logs for signs of potential intrusions is not advisable for most users, with the possible exception of hard core log analysis addicts. Only the minimum amount of logging should thus be enabled and the analysis boils down to firing up Windows event logger after something wrong occurs. Next, let us consider a small to medium business with no full-time security staff. In this sense, it is similar to a home system, with a few important differences. This environment often has people who astonish security professionals with comments such as "Why would somebody want to hack us, we have nothing that they need?" Now more and more people understand that disk storage, processor cycles, and high-speed network connections have a lot of value for attackers. Log analysis for such an organization focuses on discovering, detecting and dealing with high-severity threats. While it is well known that many low-severity threats reflected in logs might be precursors for a more serious attack, a small company rarely has the resource to investigate them.
A large corporation is regulated by more administrative requirements than the life of an individual. Among these are the responsibility to shareholders, fear of litigation and other liability. Due to the above, the level of security and accountability is higher. Most organizations connected to the Internet now have at least a firewall and some sort of a dedicated network for public servers exposed to the Internet. Many also have deployed spam filters, intrusion detection systems (IDS), intrusion prevention systems (IPS) and Virtual Private Networks (VPNs) and are looking at more novel solutions such as anti-spyware. All these technologies raise concerns about what to do with logs coming from them, as companies rarely hire new security staff just to handle the logs. In such an environment, log analysis is of crucial importance. The logs present one of the best ways of detecting the threats flowing from the hostile Internet as well as from the inside of their networks. Overall, do you have to do log analysis? The answer to this question ranges from a “not likely” for a small business to an unquestionable “Yes!!!” for a larger organization. By now, we convinced you that the information in logs can be tremendously important; we also stated that such information will often be extremely voluminous. However, such log analysis and review program needs to be consistent. Imagine you work for one of those companies where information security is taken seriously, senior management support is for granted, the appropriate IT defenses are deployed and users are educated on the security policy. Firewalls are humming along, intrusion detection systems are installed and incident response team is ready for action. This will probably go a long way towards creating a more secure enterprise computing environment. Lets look at it from the prevention- detection- response model. The above solutions provide the technical side of the prevention, detection and response. The complex interplay between prevention detection and response is further complicated by the continuous decision making process: 'what to respond to?', 'how to prevent an event?', etc. Such decisions are based on the information provided by the security infrastructure components. Paradoxically, the more security devices one deploys, the more firewalls are blocking messages and generating logs, the more detection systems are sending alerts, the more messages the servers spew, the harder it is to make the right decisions about how to react. Logs from all of the above devices need to be consistently and diligently analyzed to arrive at the right security decisions. What are the common options for optimizing the security decisions made by the company executives? The security information flow need to be converted from logs and alerts into a decision. The attempts to create a fully automated solution for making such a decision, some even based on artificial intelligence, have not
yet reached a commercially-viable stage. The problem is thus to create a system to reduce the information flow sufficiently and then to provide some guidance to the system's human operators in order to make the right security decision. In addition to facilitating decision making in case of a security-related log or other event indication (defined as a single communication instance from a security device) or an incident (defined as a confirmed attempted intrusion or other attack), reducing the information flow is required for implementing security benchmarks. Assessing the effectiveness of deployed security controls is an extremely valuable part of an organization security program. Such an assessment can be used to calculate a security Return On Investment (ROI) and to enable other methods for marrying security and business needs. The commonly utilized scenarios can be loosely categorized into install-andforget (unfortunately, all too common) with no log analysis in sight, manual log data reduction (or, reliance on a particular person to extract and analyze the meaningful audit records) and in-house automation tools (such as scripts and utilities aimed at processing the information flow). Let us briefly look at advantages and disadvantages of the above methods. Is there a chance that that the first approach - deploying and leaving the security infrastructure unsupervised with no log review- have a business justification anywhere outside of a very small environment such as described above? Indeed, some people do drive their cars without a mandatory car insurance, but companies are unlikely to be moved by the same reasons that motivate the reckless drivers. Most of the readers have probably heard 'Having a firewall does not provide 100% security' many times. In fact, it is often stated that 0-day (i.e. previously unknown) exploits and new vulnerabilities are less of a threat to security, than the company employees. Technology solutions are rarely effective against social and human problems. Advanced firewalls can probably be made to mitigate the threat from new exploits, but not from the firewall administrators' mistakes and deliberate tampering from the inside of the protected perimeter. In addition, total lack of feedback and awareness on security technology performance, coming from log collection and review program, will prevent a company from taking a proactive stance against new threats and adjusting its defenses against the flood of attacks hitting its bastions. The next possibility is where no consistent log review program is present but some employees are dedicated to the task. Does relying on human experts to understand your log information and to provide effective response guidelines based on the gathered evidence constitutes a viable alternative to doing nothing? Two approaches to the problem are common. First, a security professional can study the logs only after the security incident. Careful examination of log evidence collected by various security devices will certainly shed the light on the incident and will likely help to prevent the recurrence and further loss. However,
in case where extensive damage is done, it is already too late and prevention of future incidents of the same kind will not return the stolen intellectual property or allay the disappointed business partners. Expert response after-the-fact has a good chance to be delayed in the age of fast automated attack tools. The second option is to review the accumulated audit trail data periodically, such as on a daily or weekly basis. A simple calculation is in order. A single border router will produce several hundred messages per second on a busy network, and so will the firewall. Adding host messages from hundreds of servers will increase the flow to possibly thousands per second. Now if one is to scale this to a global company network infrastructure, the information flow will increase hundredfold. No human expert or a team will be able to review, let along analyze, the incoming flood of signals. But what if a security professional chooses to automate the task by writing a script or a program to alert him or her on the significant alerts and log records? Such technical approach to a log review program may help with data collection (centralized syslog server or a database) and alerting (email, pager, voice mail). However, a series of important questions arises. Collected log and audit data will greatly help with an incident investigation, but what about the timeliness of the response? Separating meaningful events from mere chaff is not a trivial task, especially in a global distributed and multi-vendor environment. Moreover, even devices sold by a single vendor might have various event logging and prioritization schemes. Thus designing the right data reduction and analysis scheme that optimizes security decision process might require significant time and capital investment and still not reach the set goals due to a lack of the specific analysis expertise. In addition, escalating alerts on raw event data (such as 'if you see a specific bad IDS signature, send me an email') will quickly turn into the "boy that cried wolf" story with pagers screaming for attention and not getting it. In light of the above problems with prioritization, simply alerting on "high-priority" events is not a solution. Indeed, IDS systems can be tuned to provide less alerts, but to effectively tune the system one needs access to the whole feedback provided by the security infrastructure and not just to raw IDS logs. For example, outside and inside firewall logs are very useful for tuning the IDS deployed in the DMZ. Overall, it appears that simply investing in more and more security devices without a consistent program to analyze and review their logs will not create more security. One needs to keep in close touch with the deployed devices, and the only way to do it is by using special-purpose automated tools to analyze all the information they produce and to draw meaningful conclusions aimed to optimize the effectiveness of the IT defenses. While having internal staff writes code to help accumulate data and map it might be acceptable in immediate term situations in small environments, the maintenance, scalability and continued justification for such systems likely has a very low ROI. In fact, it has caused the birth of Security Information Management (SIM) products that have, as their only
focus, the collection and correlation of this data as well as the creation of executive-level metrics from logs. Logs are also immensely valuable for compliance programs. Many recent US regulations such as HIPAA, GLBA, Sarbanes-Oxley and many others have items related to audit logging and handling of those logs. For example, a detailed analysis of the security requirements and specifications outlined in the HIPAA Security Rule sections §164.306, §164-308, and §164-312, shows some items relevant to auditing and logging. Specifically, section §164.312 (b) “Audit Retention” covers audit, logging, monitoring controls for systems that contain patient information. Similarly, Gramm-Leach Bliley Act (GLBA) section 501 and others have items that indirectly address the collection and review of audit logs. Centralized logging of security events across a variety of devices, analysis, reporting, risk analysis all provide information to demonstrate the presence and effectiveness of the security controls implemented by the organizations and help identify, reduce the impact, and remedy a variety of security breaches in the organization. The important of logs for regulatory compliance will only grow as standards (such as ISO17799) become the foundations of new regulations. Common mistakes of log analysis We covered the need to collect logs and review them via a carefully planned program. However, when planning and implementing log collection and analysis infrastructure, the organizations often discover that they aren't realizing the full promise of such a system. This happens due to some common log-analysis mistakes. We cover such typical mistakes organizations make when analyzing audit logs and other security-related records produced by security infrastructure components. No. 1: Not looking at the logs Let's start with an obvious but critical one. While collecting and storing logs is important, it's only a means to an end -- knowing what 's going on in your environment and responding to it. Thus, once technology is in place and logs are collected, there needs to be a process of ongoing monitoring and review that hooks into actions and possible escalation. It's worthwhile to note that some organizations take a half-step in the right direction: They review logs only after a major incident. This gives them the reactive benefit of log analysis but fails to realize the proactive one -- knowing when bad stuff is about to happen. Looking at logs proactively helps organizations better realize the value of their security infrastructures. For example, many complain that their network intrusiondetection systems (NIDS) don't give them their money's worth. A big reason for that is that such systems often produce false alarms, which leads to decreased
reliability of their output and an inability to act on it. Comprehensive correlation of NIDS logs with other records such as firewalls logs and server audit trails as well as vulnerability and network service information about the target allow companies to "make NIDS perform" and gain new detection capabilities. Some organizations also have to look at log files and audit tracks due to regulatory pressure. No. 2: Storing logs for too short a time This makes the security team think they have all the logs needed for monitoring and investigation (while saving money on storage hardware) and then leading to the horrible realization after the incident that all logs are gone due to its retention policy. The incident is often discovered a long time after the crime or abuse has been committed. If cost is critical, the solution is to split the retention into two parts: short-term online storage and long-term off-line storage. For example, archiving old logs on tape allows for cost-effective off-line storage, while still enabling future analysis. No. 3: Not normalizing logs What do we mean by "normalization"? It means we can convert the logs into a universal format, containing all the details of the original message but also allowing us to compare and correlate different log data sources such as Unix and Windows logs. Across different application and security solutions, log format confusion reigns: some prefer Simple Network Management Protocol, others favor classic Unix syslog. Proprietary methods are also common. Lack of a standard logging format leads to companies needing different expertise to analyze the logs. Not all skilled Unix administrators who understand syslog format will be able to make sense out of an obscure Windows event log record, and vice versa. The situation is even worse with security systems, because people commonly have experience with a limited number of systems and thus will be lost in the log pile spewed out by a different device. As a result, a common format that can encompass all the possible messages from security-related devices is essential for analysis, correlation and, ultimately, for decision-making. No. 4: Failing to prioritize log records Assuming that logs are collected, stored for a sufficiently long time and normalized, what else lurks in the muddy sea of log analysis? The logs are there, but where do we start? Should we go for a high-level summary, look at most
recent events or something else? The fourth error is not prioritizing log records. Some system analysts may get overwhelmed and give up after trying to chew a king-size chunk of log data without getting any real sense of priority. Thus, effective prioritization starts from defining a strategy. Answering questions such as "What do we care about most?" "Has this attack succeeded?" and "Has this ever happened before?" helps to formulate it. Consider these questions to help you get started on a prioritization strategy that will ease the burden of gigabytes of log data, collected every day. No. 5: Looking for only the bad stuff Even the most advanced and security-conscious organizations can sometimes get tripped up by this pitfall. It's sneaky and insidious and can severely reduce the value of a log-analysis project. It occurs when an organization is only looking at what it knows is bad. Indeed, a vast majority of open-source tools and some commercial ones are set up to filter and look for bad log lines, attack signatures and critical events, among other things. For example, Swatch is a classic free log-analysis tool that's powerful, but only at one thing -- looking for defined bad things in log files. However, to fully realize the value of log data, it needs to be taken to the next level -- to log mining. In this step, you can discover things of interest in log files without having any preconceived notion of what you need to find. Some examples include compromised or infected systems, novel attacks, insider abuse and intellectual property theft. It sounds obvious: How can we be sure we know of all the possible malicious behavior in advance? One option is to list all the known good things and then look for the rest. It sounds like a solution, but such a task is not only onerous, but also thankless. It's usually even harder to list all the good things than it is to list all the bad things that might happen on a system or network. So many different events occur that weeding out attack traces just by listing all the possibilities is ineffective. A more intelligent approach is needed. Some of the data mining (also called "knowledge discovery in databases") and visualization methods actually work on log data with great success. They allow organizations to look for real anomalies in log data, beyond "known bad" and "not known good." Avoiding these mistakes will take your log-analysis program to the next level and enhance the value of your company's security and logging infrastructures. To conclude, logs might be the untapped treasures of security, allowing the organizations to gain security benefits using an existing security infrastructure.
TO realize them however, the log collection and review program needs to be carefully planned and common mistakes needs to be avoided. ABOUT THE AUTHOR: This is an updated author bio, added to the paper at the time of reposting in 2009. Dr. Anton Chuvakin (http://www.chuvakin.org) is a recognized security expert in the field of log management and PCI DSS compliance. He is an author of books "Security Warrior" and "PCI Compliance" and a contributor to "Know Your Enemy II", "Information Security Management Handbook" and others. Anton has published dozens of papers on log management, correlation, data analysis, PCI DSS, security management (see list www.info-secure.org) . His blog http://www.securitywarrior.org is one of the most popular in the industry. In addition, Anton teaches classes and presents at many security conferences across the world; he recently addressed audiences in United States, UK, Singapore, Spain, Russia and other countries. He works on emerging security standards and serves on the advisory boards of several security start-ups. Currently, Anton is developing his security consulting practice, focusing on logging and PCI DSS compliance for security vendors and Fortune 500 organizations. Dr. Anton Chuvakin was formerly a Director of PCI Compliance Solutions at Qualys. Previously, Anton worked at LogLogic as a Chief Logging Evangelist, tasked with educating the world about the importance of logging for security, compliance and operations. Before LogLogic, Anton was employed by a security vendor in a strategic product management role. Anton earned his Ph.D. degree from Stony Brook University.