Xuw/LogMiningIdeas
From RAD Lab
[edit]
Console log mining project ideas for undergrads
Here are suggested projects for Berkeley undergrads. Feel free to drop by and we can figure out the details of the project that best fits your background/interests and schedule. For more information of the projects, please see the overview of the project
[edit]
Collecting and analyzing logs from data center applications
[edit]
Goal
- deploy two real applications (tomcat application servers and cassendra distributed storage systems) on EC2
- deploy a simple application
- write a workload generator to provide load
- collect and parsing the console logs using existing analysis tools
[edit]
Preferred Skills
- skills in using Linux/Unix, writing scripts
- Java
[edit]
You will learn
- deploying and maintaining real-world applications in cloud computing environment
- maintaining and debugging complex distributed systems
[edit]
hours/week
8-10
[edit]
Collecting and analyzing Web 2.0 application logs
[edit]
Goal
- instrument the client side and server side code of a existing web 2.0 application (we will suggest one, or if you have your own), adding logging statements,
- collecting logs on a server running on the cloud
- analyzing collected logs for either problem detection or user-behavior studies
[edit]
Preferred Skills
- knows or willing to learn javascript or actionscript(Flash), simple server-side scripting (e.g. JSP, PHP, Ruby or Python)
- like to hack other people's code
[edit]
You will learn
- operating and maintaining a browser-based web 2.0 application
- essential tools for cloud computing (e.g. EC2 or Google AppEngine)
[edit]
hours/week
10-15
[edit]
Creating open source log mining toolkit
[edit]
Goal
- refactor/clean-up current research prototype code for log parsing as an open source project
- integrate multiple parsers, log analyzer and machine learning part
- adding unit tests and documentations
[edit]
Skill preferred
- experience on hacking other people's code
- strong Java or other OO programing skill
- knows development tools like SVN
[edit]
You will learn
- managing a medium-sized open source project
- essential software engineering concepts and tools
- exposure to popular open-source tools for information retrieval (searches)
- scaling data analysis with cloud computing
[edit]
hours/week
10-20 (need multiple people on this)
[edit]
Refine automatic detection of log printing statements
[edit]
Goal
- previously, we guessed which strings are log printing statements
- if we have a sample of printed logs, can it be more clear which functions are for log printing?
- experiment the scheme on several Linux based systems
[edit]
Skills preferred
- linux systems, scripting
- java
- hacking existing code
[edit]
You will learn
- text processing/search and statistical algorithms
[edit]
hours/week
8-10
[edit]
Console log analysis integration with Eclipse IDE
[edit]
Goal
- provide a Eclipse-based GUI for current online log mining approach
- automatically display variable values in log messages while users are reading the code
- when an error is detected, relates the error to source code
[edit]
Skills preferred
- used Eclipse before
- knows or willing to learn about developing Eclipse plugins
[edit]
You will learn
- modern plugin-based software architecture
- GUI design
[edit]
hours/week
10-15
[edit]
Source code analyzer for Perl/Ruby
[edit]
Goal
- porting Eclipse built-in parser to parse Perl and Ruby programs
- finding all log printing statements in the program
[edit]
Skills preferred
- compiler
- scripting language (perl or ruby) and Java
[edit]
You will learn
- a deep understanding of how scripting language interpreters/compilers work
[edit]
hours/week
8-10
[edit]
Comparing the source-analysis-based log analysis with other log parsing schemes
[edit]
Goal
- there are many open source tools for source code analysis, learn how to use and tune them,
- porting the input/output data structures of these tools so they plug into our analysis framework
- comparing problem detection results
[edit]
Skills preferred
- linux and scripting
[edit]
You will learn
- many important methods and concepts in data mining and information retrieval
[edit]
hours/week
8-10
[edit]
Comparing console log approaches to X-trace based approach
[edit]
Goal
- take X-trace instrumented hadoop
- porting X-trace output to our analysis scheme
- comparing the results with our results
[edit]
Skills preferred
- linux and scripting
- Java
[edit]
You will learn
- operating large scale applications in cloud computing environment
- large-scale system monitoring and debugging
[edit]
hours/week
8-10
[edit]
Console log visualization
if you are taking courses related to visualization or graphics design, and are interested, please let me know.
