Xuw/LogMiningOverview

From RAD Lab

(Difference between revisions)
Jump to: navigation, search
Revision as of 20:34, 29 October 2009
Xuw (Talk | contribs)
(Publications)
← Previous diff
Revision as of 00:40, 6 November 2009
Xuw (Talk | contribs)

Next diff →
Line 51: Line 51:
= Talks = = Talks =
 +* SOSP Talk (Oct, 2009) [http://www.sigops.org/sosp/sosp09/slides/xu-slides-sosp09.pdf slides]
* Online console log mining (May 2009 RAD Lab Retreat) [[Media:wei_retreat_0905.pdf|slides]] * Online console log mining (May 2009 RAD Lab Retreat) [[Media:wei_retreat_0905.pdf|slides]]
* Offline console log mining (May 2009 at Microsoft) [[Media:WeiXu09MayMSR.pdf|slides]] * Offline console log mining (May 2009 at Microsoft) [[Media:WeiXu09MayMSR.pdf|slides]]

Revision as of 00:40, 6 November 2009

Contents

Introduction

When a datacenter-scale service consisting of hundreds of software components running on thousands of computers misbehaves, developer-operators need every tool at their disposal to troubleshoot and diagnose operational problems. Ironically, there is one source of information that is built into almost every piece of software that provides detailed information that reflects the original developers’ ideas about noteworthy or unusual events, but is typically ignored: the humble console log.

Since the dawn of programming, developers have used everything from printf to complex logging and monitoring libraries to record program variable values, trace execution, report runtime statistics, and even print out full-sentence messages designed to be read by a human—usually by the developer herself.

We show that we can automatically discover abnormal and potentially interesting messages from the vast amount of free text logs. Different from exiting solutions, we analyze these messages, especially program trace messages, in a fully automatic way.

We have implemented a set of tools using a combination of program analysis, information retrieval and machine learning techniques to perform the analysis:

  • 1) Parse textual logs and extract semi-structured information;
  • 2) Automatically group related log messages to reconstruct execution traces;
  • 3) Use machine learning methods to detect abnormal traces.

The following figure illustrates our general approach (click to view full scaled version).

One key observation is that the typical console message is much more structured than it appears: the definition of its ``schema is implicit in the log printing statements, which can be recovered from program source code. This observation is key to our log parsing approach, which yields detailed and accurate message structure recovery, feature construction and problem detection. The parsing makes it very easy and flexible to create a variety of (generic or application-specific) features, so that powerful machine learning methods can be applied to perform high quality pattern mining and accurate problem detection.

Our approach can run either online to provide near-real-time log analysis or in batch to analyze large archives of logs in short time.

We studied logs from several real-world systems – Sun’s Project Darkstar online game server, Nutch web crawling, Hadoop. Our preliminary results show that we can not only detect a large portion of runtime anomalies, but also provide easy-to-understand explanations to system operators.

Publications

  • Online system problem detection by mining patterns of console logs, Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan. To appear in the IEEE International Conference on Data Mining (ICDM’ 09), Miami, FL, December 2009 pdf
  • Large-scale system problem detection by mining console logs, Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP’ 09), Big Sky, MT, October 2009 pdf

And here is a shorter workshop version of the paper

  • Mining console logs for large-scale system problem detection, Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan, In Proc. of the 3rd workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SysML’08), San Diego, CA, December 2008 pdf

Talks

  • SOSP Talk (Oct, 2009) slides
  • Online console log mining (May 2009 RAD Lab Retreat) slides
  • Offline console log mining (May 2009 at Microsoft) slides

People

Call for undergrads

If you are current Berkeley undergrads and interested in helping on this project while learning cool technologies, please refer to this page for potential Project Ideas.

Codes

coming soon