The MetaSieve Blog

April 15, 2010

Using Groovy for Measuring Statistical Dependence – How to make predictions about the relatedness of statistical events

Filed under: Uncategorized — Tags: , — Björn Wilmsmann @ 10:04 pm

The April issue of GroovyMag, an online-only magazine for everything Groovy and Grails has been published today.

I’ve contributed an article about measuring statistical dependence with Groovy. Here’s a teaser:

Statistical dependence is all about finding out which events in a statistical sample are likely to co-occur, that is if one event occurs it can be predicted with a certain probability that another event will occur as well. Using simple measures of statistical dependence I’d like to show how Groovy can be used to make such predictions.

Processing and reporting statistical data is commonplace in software development businesses today. It’s used for all sorts of things including business performance indicators, website or user tracking statistics, searching and indexing textual content on a website or any other content repository.

Each of these applications,and many more for that matter, to some extent require that statistical data be collected and possible relations between single events be identified.
Common examples of statistical events are purchases made by a customer, actions taken by a website user or word occurrences in textual content.

From this kind of events potentially useful information can be derived:

  • products which are likely to be purchased together and thus can be provisioned and stocked accordingly
  • the click stream users will probably take on a website
  • related words can be suggested to the user in an auto-complete feature

I’ll cover the basic steps for making such predictions with Groovy. First things first, we’ll start by gathering the necessary data.

[ … ]

Read the rest of this article at GroovyMag.

Related links:

Blog at WordPress.com.