The MetaSieve Blog

May 15, 2010

Natural Language Processing in Groovy: A Primer – Using Groovy for processing and analysing textual data

Filed under: Uncategorized — Tags: , , — Björn Wilmsmann @ 1:00 am

The May issue of GroovyMag, an online-only magazine for everything Groovy and Grails has been published today.

I’ve contributed an article about doing natural language processing with Groovy. Here’s a teaser:

Given that most Internet content in way or another contains natural language, then it’s no surprise natural language processing (NLP) and text mining have become a vital aspect of many Internet applications – from large scale search engines to social media apps. Making sense of the content that is stored in your application can make all the difference. Groovy’s regular expression, text processing abilities and readily available NLP libraries for Java make it a natural match for processing large amounts of text.

Since its inception Perl, the language that arguably can be considered the first scripting language in modern terms, had the features necessary to conveniently process large amounts of text built right into the language. Fortunately, most scripting languages that followed Perl borrowed and inherited many of these useful features. This is why things like methods for reading texts and regular expressions are commonplace in Python, Ruby and, of course, Groovy too.

[ … ]

Read the rest of this article at GroovyMag.

Related links:

Advertisements

April 15, 2010

Using Groovy for Measuring Statistical Dependence – How to make predictions about the relatedness of statistical events

Filed under: Uncategorized — Tags: , — Björn Wilmsmann @ 10:04 pm

The April issue of GroovyMag, an online-only magazine for everything Groovy and Grails has been published today.

I’ve contributed an article about measuring statistical dependence with Groovy. Here’s a teaser:

Statistical dependence is all about finding out which events in a statistical sample are likely to co-occur, that is if one event occurs it can be predicted with a certain probability that another event will occur as well. Using simple measures of statistical dependence I’d like to show how Groovy can be used to make such predictions.

Processing and reporting statistical data is commonplace in software development businesses today. It’s used for all sorts of things including business performance indicators, website or user tracking statistics, searching and indexing textual content on a website or any other content repository.

Each of these applications,and many more for that matter, to some extent require that statistical data be collected and possible relations between single events be identified.
Common examples of statistical events are purchases made by a customer, actions taken by a website user or word occurrences in textual content.

From this kind of events potentially useful information can be derived:

  • products which are likely to be purchased together and thus can be provisioned and stocked accordingly
  • the click stream users will probably take on a website
  • related words can be suggested to the user in an auto-complete feature

I’ll cover the basic steps for making such predictions with Groovy. First things first, we’ll start by gathering the necessary data.

[ … ]

Read the rest of this article at GroovyMag.

Related links:

Create a free website or blog at WordPress.com.