Notice: Constant MEMCACHE_COMPRESSED already defined in /project/cs/radlab/www/radlab/w/includes/memcached-client.php on line 78 Xuw/LogMiningIdeas - RAD Lab

Xuw/LogMiningIdeas

From RAD Lab

Jump to: navigation, search

Contents

Console log mining project ideas for undergrads

Here are suggested projects for Berkeley undergrads. Feel free to drop by and we can figure out the details of the project that best fits your background/interests and schedule. For more information of the projects, please see the overview of the project

Collecting and analyzing logs from data center applications

Goal

  • deploy two real applications (tomcat application servers and cassendra distributed storage systems) on EC2
  • deploy a simple application
  • write a workload generator to provide load
  • collect and parsing the console logs using existing analysis tools

Preferred Skills

  • skills in using Linux/Unix, writing scripts
  • Java

You will learn

  • deploying and maintaining real-world applications in cloud computing environment
  • maintaining and debugging complex distributed systems

hours/week

8-10


Collecting and analyzing Web 2.0 application logs

Goal

  • instrument the client side and server side code of a existing web 2.0 application (we will suggest one, or if you have your own), adding logging statements,
  • collecting logs on a server running on the cloud
  • analyzing collected logs for either problem detection or user-behavior studies

Preferred Skills

  • knows or willing to learn javascript or actionscript(Flash), simple server-side scripting (e.g. JSP, PHP, Ruby or Python)
  • like to hack other people's code

You will learn

  • operating and maintaining a browser-based web 2.0 application
  • essential tools for cloud computing (e.g. EC2 or Google AppEngine)

hours/week

10-15

Creating open source log mining toolkit

Goal

  • refactor/clean-up current research prototype code for log parsing as an open source project
  • integrate multiple parsers, log analyzer and machine learning part
  • adding unit tests and documentations

Skill preferred

  • experience on hacking other people's code
  • strong Java or other OO programing skill
  • knows development tools like SVN

You will learn

  • managing a medium-sized open source project
  • essential software engineering concepts and tools
  • exposure to popular open-source tools for information retrieval (searches)
  • scaling data analysis with cloud computing

hours/week

10-20 (need multiple people on this)

Refine automatic detection of log printing statements

Goal

  • previously, we guessed which strings are log printing statements
  • if we have a sample of printed logs, can it be more clear which functions are for log printing?
  • experiment the scheme on several Linux based systems

Skills preferred

  • linux systems, scripting
  • java
  • hacking existing code

You will learn

  • text processing/search and statistical algorithms

hours/week

8-10

Console log analysis integration with Eclipse IDE

Goal

  • provide a Eclipse-based GUI for current online log mining approach
  • automatically display variable values in log messages while users are reading the code
  • when an error is detected, relates the error to source code

Skills preferred

  • used Eclipse before
  • knows or willing to learn about developing Eclipse plugins

You will learn

  • modern plugin-based software architecture
  • GUI design

hours/week

10-15

Source code analyzer for Perl/Ruby

Goal

  • porting Eclipse built-in parser to parse Perl and Ruby programs
  • finding all log printing statements in the program

Skills preferred

  • compiler
  • scripting language (perl or ruby) and Java

You will learn

  • a deep understanding of how scripting language interpreters/compilers work

hours/week

8-10


Comparing the source-analysis-based log analysis with other log parsing schemes

Goal

  • there are many open source tools for source code analysis, learn how to use and tune them,
  • porting the input/output data structures of these tools so they plug into our analysis framework
  • comparing problem detection results

Skills preferred

  • linux and scripting

You will learn

  • many important methods and concepts in data mining and information retrieval

hours/week

8-10

Comparing console log approaches to X-trace based approach

Goal

  • take X-trace instrumented hadoop
  • porting X-trace output to our analysis scheme
  • comparing the results with our results

Skills preferred

  • linux and scripting
  • Java

You will learn

  • operating large scale applications in cloud computing environment
  • large-scale system monitoring and debugging

hours/week

8-10

Console log visualization

if you are taking courses related to visualization or graphics design, and are interested, please let me know.

Personal tools