The Apache Software Foundation Blog
The ASF asks: Have you met Apache Mahout?
Quick peek: Given the amount of data available in digital form to a huge amount of businesses today, Machine Learning is what helps you make sense of your data and provide better service to your customers:
- Given interaction logs of your web shop, Mahout helps come up with good recommendations for products customers might be interested in buying.
- When faced with an ever increasing stream of news articles Mahout is what helps you to reduce that information load to a manageable amount of groups of topically related articles.
Apache Mahout provides stable, industry ready implementations of machine learning algorithms that help make more out of your product. The project combines support for efficient standalone deployments with the possibility of scaling to a distributed Apache Hadoop cluster thus making it easy to scale with your business needs.
- Clustering, that is grouping items only based on their similarity;
- Classification, that is assigning items to pre-defined categories;
- Recommendation, that is identifying items a user might like based on his behaviour;
- Frequent Itemset Mining, that is identifying items that usually appear together e.g. in a customer purchase
- a permissive open source license supporting almost any business use-case you can think of;
- a very active community responding to user requests and helping analyse your specific data problems;
- a production ready implementation of algorithms covering most of the sophisticated data analysis jobs you would want to run on your data while still being open and easy to adjust to your specific needs.
- Model refactoring and CLI changes to improve integration and consistency
- New ClusterEvaluator and CDbwClusterEvaluator offer new ways to evaluate clustering effectiveness
- New VectorModelClassifier allows any set of clusters to be used for classification
- RecommenderJob has been evolved to a fully distributed item-based recommender
- More algorithms supported like Spectral Clustering and MinHash Clustering (still experimental), HMM based sequence classification from GSoC (currently as sequential version only and still experimental), new type of NB classifier, and feature reduction options for existing one, new Sequential logistic regression training framework, new SGD classifier
- New vector encoding framework for high speed vectorization without a pre-built dictionary
- Promoted several pieces of old Colt framework to tested status (QR decomposition, in particular)
- Distributed Lanczos SVD implementation
- Many, many small fixes, improvements, refactorings and cleanup
Posted at 01:59PM Nov 03, 2010 by Sally in General | |