Groovy  Apache Groovy

Apache Groovy 2022 Year In Review

by paulk


Posted on Thursday December 29, 2022 at 02:28PM in Technology


The year 2022 has been a reasonably good one for the Groovy Programming Language. Here are just a few of the highlights.

Releases & Contributions

In 2022, Groovy had 18 releases starting with Groovy 4 in January:

Tweet for Groovy 4 release

The latest release of Groovy 4 is 4.0.7 which includes over 300 improvements and bug fixes since 4.0.0. For more details about Groovy 4, you can read the release notes. There have also been bug fix releases for earlier Groovy versions.

For our main branch of our main source code repo, there were 820 commits from 28 contributors. This is the branch which corresponds to Groovy 5 but many fixes were also back-ported to earlier Groovy versions. Just on Groovy 5, we expect to have alpha versions available for review in the first quarter of 2023. Also, while discussing commit counts, we should note that Eric Milles became the 5th person to overtake James Strachan in numbers of commits to the project. Congrats Eric!

There were also many contributions outside those code commits. We thank all those involved in contributing to or promoting Groovy in 2022!

Downloads

In early 2021, Groovy surpassed the 1 billion artifacts downloaded mark. This is downloads of artifacts from repositories like Maven Central and (at least until recently) Bintray. We now only collect stats from Maven Central. We don't collect stats on downloads of the zip releases nor attempt to account for the many downloads where Groovy is bundled within other products, so the stats are no doubt much higher. Well, the good news is that the rate of downloads is still increasing, so interest in Groovy remains high. While the figures for December are not yet finalised, it looks like 2022 will be the first year Groovy surpasses 1 Billion downloads in a single calendar year!

Groovy download stats

Blogs

We also started increasing the number of posts in the official ASF blog platform: blogs.apache.org/groovy. There were nearly 30 posts for you to peruse from this year. We try to show off Groovy features and also have some fun.

blog_collage.jpg

You might also like to check out the Groovy and Data Science blog post from the JVM Advent folks. It summaries a handful of the above mentioned blog posts.

ApacheCon

Several folks from the project and many friends of Groovy participated in the sold out ApacheCon conference in New Orleans in October. We thank the conference organisers, speakers and attendees for the wonderful conference.

apachecon_collage.jpg

We have plenty more in store for 2023. We invite you to come on the journey with us!

Social media: @ApacheGroovy @ApacheGroovy@fosstodon.org


JVM Hello World with Groovy

by paulk


Posted on Thursday December 22, 2022 at 02:24PM in Technology


For those that haven't seen it yet, the JVM Advent folks posted a great Groovy and Data Science blog post several days ago as part of the 2022 JVM Advent series. If you have an interest in Data Science, we recommend you check that out before continuing with this post.

Today's post in the JVM Advent series is looking at the world of bytecode libraries on the JVM. Let's look at creating the same hello world example from that post using Groovy with the ProGuardCORE, ASM, and Byte Buddy libraries.

First, we highly recommend you read the previously mentioned JVM Advent post first for more background. After all, it's easy to create a simple hello world class file example directly as a Java source file (as that post shows) or as a Groovy source file like this:

println 'Hello world'

The examples shown in this blog post illustrate how you could create the equivalent class file using libraries which let you manipulate the generated bytecode directly. It's a bit of a deep dive if you want to know more about JVM internals and can also be handy for numerous use cases like building tools or modifying Java classes on the fly. I suggest you read the web sites for those libraries if you want further details or additional motivation.

ProGuardCORE

The ProGuardCORE library lets you read, analyze, modify, and write Java class files. Here's how we could use it to write a hello world class file:

var name = 'HelloProGuardCORE'
var superclass = 'java/lang/Object'
var classBuilder = new ClassBuilder(CLASS_VERSION_1_8, PUBLIC, name, superclass).tap {
addMethod(PUBLIC | STATIC, 'main', '([Ljava/lang/String;)V', 100, builder ->
builder
.getstatic('java/lang/System', 'out', 'Ljava/io/PrintStream;')
.ldc("Hello from $name")
.invokevirtual('java/io/PrintStream', 'println', '(Ljava/lang/String;)V')
.return_()
)
}

new File("${name}.class").withDataOutputStream { dos ->
classBuilder.programClass.accept(new ProgramClassWriter(dos))
}

This is essentially the "Groovified" version of the example in the JVM Advent blog post. We are using the libraries ClassBuilder class, adding a method, then adding four bytecode statements as the body of the method. If you haven't seen method and type descriptor syntax before, a few parts might seem a little strange but you possibly won't be surprised that it seems to be referencing a System.out.println call and passing it a constant String.

When we run this script, a HelloProGuardCORE class file is produced. We can invoke that class file in the normal way:

$ java HelloProGuardCORE
Hello from HelloProGuardCORE

We encourage you to read the JVM Advent post or the library documentation if you want more details.

ASM

ASM is an all purpose Java bytecode manipulation and analysis framework. In fact, it's the one that Groovy uses in its parser and some of its tools. Here is how to use it to generate more or less the same class as previously:

var name = 'HelloASM'
var superclass = 'java/lang/Object'
var cw = new ClassWriter(0)
cw.visit(V1_8, ACC_PUBLIC + ACC_SUPER, name, null, superclass, null)
cw.visitMethod(ACC_PUBLIC + ACC_STATIC, 'main', '([Ljava/lang/String;)V', null, null).with {
visitCode()
visitFieldInsn(GETSTATIC, 'java/lang/System', 'out', 'Ljava/io/PrintStream;')
visitLdcInsn('Hello from ' + name)
visitMethodInsn(INVOKEVIRTUAL, 'java/io/PrintStream', 'println', '(Ljava/lang/String;)V', false)
visitInsn(RETURN)
visitMaxs(3, 3)
visitEnd()
}
cw.visitEnd()

new File("${name}.class").withDataOutputStream { dos ->
dos.write(cw.toByteArray())
}

After running this script, a HelloASM class file is produced, and here is the output when running that class file:

$ java HelloASM
Hello from HelloASM

Parts of the code should look familiar to the previous example.

Byte Buddy

Byte Buddy is a code generation and manipulation library for creating and modifying Java classes. It's strengths lie in its ability to create and modify classes dynamically. So, its power is perhaps not needed for our simple example. A nice aspect of this library however, is that it hides some of the low-level details like type and method descriptors behind its fluent API. Here is our example:

var name = 'HelloByteBuddy'
new ByteBuddy()
.subclass(Object)
.name(name)
.defineMethod('main', Void.TYPE, PUBLIC | STATIC)
.withParameter(String[])
.intercept(MethodCall.invoke(
PrintStream.getMethod('println', String))
.onField(System.getField('out'))
.with('Hello from ' + name))
.make()
.saveIn('.' as File)

Like the other scripts, this also produces a class file which we can invoke as shown here:

$ java HelloByteBuddy
Hello from HelloByteBuddy

That wraps up our examples using the three libraries, but we have one more fun alternative to cover!

Using Groovy ASTs

Groovy is a very extensible language. It provides among other things, a compile-time metaprogramming mechanism called AST Transforms (Abstract Syntax Tree Transformations). This mechanism uses annotations to indicate to the compiler that special processing is required during compilation. A now somewhat outdated AST transform, @Bytecode, experimented with allowing you to write bytecode instructions directly in your Groovy code. Let's look at using that AST transform here:

@CompileStatic @POJO
class HelloAST {
@Bytecode
static void main(args) {
getstatic 'java/lang/System.out', 'Ljava/io/PrintStream;'
ldc 'Hello from HelloAST'
invokevirtual 'java/io/PrintStream.println', '(Ljava/lang/String;)V'
return
}
}

We are writing directly the instructions that the Java or Groovy compiler (with static compilation enabled) would produce. For this example, we don't run the script to produce the class file, we just compile it using the Groovy compiler.

We definitely don't recommend relying on the @Bytecode AST transform for any production code but it can be fun to play with. We've also used the @CompileStatic and @POJO AST transforms to tell the compiler that we aren't using any Groovy dynamic features, so that it should write Java-like code whenever possible and avoid calling the Groovy runtime.

We can examine the bytecode using javap and indeed it has bytecode similar to that produced by the other libraries:

  public static void main(java.lang.String...);
    descriptor: ([Ljava/lang/String;)V
    flags: (0x0089) ACC_PUBLIC, ACC_STATIC, ACC_VARARGS
    Code:
      stack=2, locals=1, args_size=1
         0: getstatic     #21                 // Field java/lang/System.out:Ljava/io/PrintStream;
         3: ldc           #23                 // String Hello from HelloAST
         5: invokevirtual #29                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
         8: return
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0       9     0  args   Ljava/lang/Object;

Because the code is not calling the Groovy runtime, we can invoke it directly without the Groovy jar:

$ java HelloAST
Hello from HelloAST

That wraps up our little tour of bytecode libraries. I hope you have learnt some additional JVM details!

Further information


Adventures with GroovyFX

by paulk


Posted on Monday December 12, 2022 at 02:22PM in Technology


This blog looks at a GroovyFX version of a ToDo application originally written in JavaFX. First we start with a ToDoCategory enum of our ToDo categories:

TodoEnum.png

We will have a ToDoItem class containing the todo task, the previously mentioned category and the due date.

@Canonical
@JsonIncludeProperties(['task', 'category', 'date'])
@FXBindable
class ToDoItem {
String task
ToDoCategory category
LocalDate date
}

It's annotated with @JsonIncludeProperties to allow easy serialization to/from JSON format, to provide easy persistence, and @FXBindable which eliminates the boilerplate required to define JavaFX properties.

Next, we'll define some helper variables:

var file = 'todolist.json' as File
var mapper = new ObjectMapper().registerModule(new JavaTimeModule())
var open = { mapper.readValue(it, new TypeReference<List<ToDoItem>>() {}) }
var init = file.exists() ? open(file) : []
var items = FXCollections.observableList(init)
var close = { mapper.writeValue(file, items) }
var table, task, category, date, images = [:]
var urls = ToDoCategory.values().collectEntries {
[it, "emoji/${Integer.toHexString(it.emoji.codePointAt(0))}.png"]
}

Here, mapper serializes and deserializes our top-level domain object (the ToDo list) into JSON using the Jackson library. The open and close Closures do the reading and writing respectively.

For a bit of fun and only slightly more complexity, we have included some slightly nicer images in our application. JavaFX's default emoji font rendering is a little sketchy on some platforms and it's not much work to have nice multi-colored images. This is achieved using the icons from https://github.com/pavlobu/emoji-text-flow-javafx. The application is perfectly functional without them (and the approximately 20 lines for the cellFactory and cellValueFactory definitions could be elided) but is prettier with the nicer images. We shrunk them to 1/3 their original size but we could certainly make them larger if we felt inclined.

Our application will have a combo box for selecting a ToDo item's category. We'll create a factory for the combo box so that each selection will be a label with both graphic and text components.

def graphicLabelFactory = {
new ListCell<ToDoCategory>() {
void updateItem(ToDoCategory cat, boolean empty) {
super.updateItem(cat, empty)
if (!empty) {
graphic = new Label(cat.name()).tap {
graphic = new ImageView(images[cat])
}
}
}
}
}

When displaying our ToDo list, we'll use a table view. So, let's create a factory for table cells that will use the pretty images as a centered graphic.

def graphicCellFactory = {
new TableCell<ToDoItem, ToDoItem>() {
void updateItem(ToDoItem item, boolean empty) {
graphic = empty ? null : new ImageView(images[item.category])
alignment = Pos.CENTER
}
}
}

Finally, with these definitions out of the way, we can define our GroovyFX application for manipulating our ToDo list:

start {
stage(title: 'GroovyFX ToDo Demo', show: true, onCloseRequest: close) {
urls.each { k, v -> images[k] = image(url: v, width: 24, height: 24) }
scene {
gridPane(hgap: 10, vgap: 10, padding: 20) {
columnConstraints(minWidth: 80, halignment: 'right')
columnConstraints(prefWidth: 250)

label('Task:', row: 1, column: 0)
task = textField(row: 1, column: 1, hgrow: 'always')

label('Category:', row: 2, column: 0)
category = comboBox(items: ToDoCategory.values().toList(),
cellFactory: graphicLabelFactory, row: 2, column: 1)

label('Date:', row: 3, column: 0)
date = datePicker(row: 3, column: 1)

table = tableView(items: items, row: 4, columnSpan: REMAINING,
onMouseClicked: {
var item = items[table.selectionModel.selectedIndex.value]
task.text = item.task
category.value = item.category
date.value = item.date
}) {
tableColumn(property: 'task', text: 'Task', prefWidth: 200)
tableColumn(property: 'category', text: 'Category', prefWidth: 80,
cellValueFactory: { new ReadOnlyObjectWrapper(it.value) },
cellFactory: graphicCellFactory)
tableColumn(property: 'date', text: 'Date', prefWidth: 90, type: Date)
}

hbox(row: 5, columnSpan: REMAINING, alignment: CENTER, spacing: 10) {
button('Add', onAction: {
if (task.text && category.value && date.value) {
items << new ToDoItem(task.text, category.value, date.value)
}
})
button('Update', onAction: { if (task.text && category.value && date.value && !table.selectionModel.empty) { var item = items[table.selectionModel.selectedIndex.value]
item.task = task.text
item.category = category.value
item.date = date.value
}
})
button('Remove', onAction: {
if (!table.selectionModel.empty)
items.removeAt(table.selectionModel.selectedIndex.value)
})
}
}
}
}
}

We could have somewhat separated the concerns of application logic and display logic by placing the GUI part of this app in an fxml file. For our purposes however, we'll keep the whole application in one source file and use Groovy's declarative builder style.

Here is the application in use:

TodoScreenshot.png

Further information

The code for this application can be found here: https://github.com/paulk-asert/groovyfx-todo.

It's a Groovy 3 and JDK 8 application but see this blog if you want to see Jackson deserialization of classes and records (and Groovy's emulated records) from CSV files using the most recent Groovy and JDK versions.


Fun with obfuscated Groovy

by paulk


Posted on Thursday December 08, 2022 at 12:40AM in Technology


An interesting tweet appeared in my feed this morning:

ObfuscatedJava@joshbloch.png


And of course it prints the same thing in Groovy:

char p(int i) {
(char) (72.5
+ i * (17488.589319014318
+ i * (-54923.96120078333
+ i * (72666.96791801952
+ i * (-54398.97479321991
+ i * (25980.955221285272
+ i * (-8426.37914599868
+ i * (1921.5090334614745
+ i * (-313.59919165032926
+ i * (36.799215753141524
+ i * (-3.0787816564704586
+ i * (0.17913536718875267
+ i * (-0.0068850803421115925
+ i * (1.5709912194287188E-4
+ i * (-1.6109400409995646E-6
)))))))))))))))
}

var i = 0
while (p(i)) print p(i++)


STOP reading now and try it out for yourself ... or browse the possible origin.


SPOILER ALERT okay, if you didn't stop, I guess it's still okay to scroll down to see what it prints out and how to create an obfuscated script yourself...





                                 ↓





                                 ↓





                                 ↓





Let's Groovify the reply from Alexey Nikitin which uses Apache Commons Math to replicate the problem:

@Grab('org.apache.commons:commons-math3:3.6.1')
import org.apache.commons.math3.analysis.interpolation.NevilleInterpolator

var text = 'Hello, world!\n'
var size = text.size()
var x = new double[size + 1]
var y = new double[size + 1]
for(i in 0..<size) {
x[i] = i
y[i] = (int) text[i]
}
x[size] = size
y[size] = 0

var lines = []
var interpolator = new NevilleInterpolator()
var function = interpolator.interpolate(x, y)
var coeff = function.coefficients
lines << 'char p(int i) {'
lines << " (char) (${coeff[0]} + 0.5"
for(i in 1..<coeff.length) {
lines << ' + i * (' + coeff[i]
}
lines << ' ' + ')' * coeff.length
lines << '''}
var i = 0
var out = ''
while(p(i)) out += p(i++)
out
'''
var script = lines.join('\n')
println script
assert text == Eval.me(script)

This generates the script, prints it out, and then runs it to make sure it produces what we intended. It only differs from above in that instead of printing out each character, it builds up and returns a String so that we can assert our expectations. It was simpler than capturing stdout by other means.

Enjoy!


Zipping Collections with Groovy

by paulk


Posted on Thursday November 17, 2022 at 12:50PM in Technology


In computer science, zipping translates sequences into sequences where, if visualized in two dimensions, the rows and columns are swapped. So the zip of:

[['a', 'b', 'c'],
 [ 1 ,  2 ,  3 ]]

would be:

[['a', 1],
 ['b', 2],
 ['c', 3]]

It's a very handy operation and depending on the language, may be supported for tuples, lists, streams and other sequences or aggregates.

Java collections and streams don't currently support such functionality out-of-the-box with various workarounds discussed here. The summary: language and library design is hard; any zip implementation that Java provides would have some limitations baked in, so they instead provide the necessary primitives to allow folks to build their own implementations based on their specific requirements.

We'll look at what Groovy provides and some of the available Java libraries that you can also use. The same caveats apply to these libraries, each will have its own implementation strengths and weaknesses.

We'll use an example inspired by this Donald Raab blog post. It looks at zipping (and formatting) lists of strings containing "fall"-inspired emoji. Yes, it's late spring for the southern hemisphere who also mostly call fall "autumn", but hopefully everyone will appreciate the inspiration.


fall.png


Groovy

Groovy uses the transpose method for zipping:

ZippingCollectionsGroovy.png

Eclipse Collections

Eclipse Collections has a zip method on its list classes:

ZippingCollectionsEC.png

Guava

Guava has a streams utility class with a zip method:

ZippingCollectionsGuava.png

StreamEx

StreamEx provides an enhanced stream library which supports zipWith:

ZippingCollectionsStreamEx.png

Vavr

Vavr has a zipWith method on its list class:

ZippingCollectionsVavr.png

jOOλ

jOOλ has a zip method for its sequences:

ZippingCollectionsJool.png

Groovy GQuery

If you are a fan of query-like DSLs, Groovy's language integrated query, gquery, can also be used:

ZippingCollectionsGQ.png

This uses a special _rn "row number" pre-defined variable in GQ expressions. It follows the same strategy as the IntStream "workaround" for Java mentioned in this blog.

More information

  • The code examples can be found in the repo


Whiskey Clustering with Groovy and Apache Ignite

by paulk


Posted on Thursday October 27, 2022 at 11:13AM in Technology


In a previous blog post, we looked at using Apache Wayang (incubating) and Apache Spark to scale up the k-means clustering algorithm. Let's look at another useful technology for scaling up this problem, Apache Ignite. They recently released a new version, but earlier versions are also fine for our example. Before we start, a quick reminder of the problem.

Whiskey Clustering

groovy.pngThis problem looks at the quest of finding the perfect single-malt Scotch whiskey. The whiskies produced from 86 distilleries have been ranked by expert tasters according to 12 criteria (Body, Sweetness, Malty, Smoky, Fruity, etc.). We'll use a K-means algorithm to calculate the centroids.

whiskey_bottles.jpg

K-means is a standard data-science clustering technique. In our case, it groups whiskies with similar characteristics (according to the 12 criteria) into clusters. If we have a favourite whiskey, chances are we can find something similar by looking at other instances in the same cluster. If we are feeling like a change, we can look for a whiskey in some other cluster. The centroid is the notional "point" in the middle of the cluster. For us it reflects the typical measure of each criteria for a whiskey in that cluster.

Apache Ignite

Apache Ignite is a distributed database for high-performance computing with in-memory speed. It makes a cluster (or grid) of nodes appear like an in-memory cache.

This explanation drastically simplifies Ignite's feature set. Ignite can be used as:

  • an in-memory cache with special features like SQL querying and transactional properties
  • an in-memory data-grid with advanced read-through & write-through capabilities on top of one or more distributed databases
  • an ultra-fast and horizontally scalable in-memory database
  • a high-performance computing engine for custom or built-in tasks including machine learning

It is mostly this last capability that we will use. Ignite's Machine Learning API has purpose built, cluster-aware machine learning and deep learning algorithms for Classification, Regression, Clustering, and Recommendation among others. We'll use the distributed K-means Clustering algorithm from their library.

2022-10-27 21_17_19-Machine Learning _ Ignite Documentation.png

Implementation Details

Apache Ignite has special capabilities for reading data into the cache. We could use IgniteDataStreamer or IgniteCache.loadCache() and load data from files, stream sources, various database sources and so forth. This is particularly relevant when using a cluster.

For our little example, our data is in a relatively small CSV file and we will be using a single node, so we'll just read our data using Apache Commons CSV:

var file = getClass().classLoader.getResource('whiskey.csv').file as File
var rows = file.withReader {r -> RFC4180.parse(r).records*.toList() }
var data = rows[1..-1].collect{ it[2..-1]*.toDouble() } as double[][]

We'll configure our single node Ignite data cache using code (but we could place the details in a configuration file in more complex scenarios):

var cfg = new IgniteConfiguration(
peerClassLoadingEnabled: true,
discoverySpi: new TcpDiscoverySpi(
ipFinder: new TcpDiscoveryMulticastIpFinder(
addresses: ['127.0.0.1:47500..47509']
)
)
)

We'll create a few helper variables:

var features = ['Body', 'Sweetness', 'Smoky', 'Medicinal', 'Tobacco',
'Honey', 'Spicy', 'Winey', 'Nutty', 'Malty', 'Fruity', 'Floral']
var pretty = this.&sprintf.curry('%.4f')
var dist = new EuclideanDistance()
var vectorizer = new DoubleArrayVectorizer().labeled(FIRST)

Now we start the node, populate the cache, run our k-means algorithm, and print the result.

Ignition.start(cfg).withCloseable { ignite ->
println ">>> Ignite grid started for data: ${data.size()} rows X ${data[0].size()} cols"
var dataCache = ignite.createCache(new CacheConfiguration<Integer, double[]>(
name: "TEST_${UUID.randomUUID()}",
affinity: new RendezvousAffinityFunction(false, 10)))
data.indices.each { int i -> dataCache.put(i, data[i]) }
var trainer = new KMeansTrainer().withDistance(dist).withAmountOfClusters(5)
var mdl = trainer.fit(ignite, dataCache, vectorizer)
println ">>> KMeans centroids:\n${features.join(', ')}"
var centroids = mdl.centers*.all()
centroids.each { c -> println c*.get().collect(pretty).join(', ') }
dataCache.destroy()
}

Results

Here is the output:

[18:13:11]    __________  ________________
[18:13:11]   /  _/ ___/ |/ /  _/_  __/ __/
[18:13:11]  _/ // (7 7    // /  / / / _/
[18:13:11] /___/\___/_/|_/___/ /_/ /x___/
[18:13:11]
[18:13:11] ver. 2.14.0#20220929-sha1:951e8deb
[18:13:11] 2022 Copyright(C) Apache Software Foundation
...
[18:13:11] Configured plugins:
[18:13:11]   ^-- ml-inference-plugin 1.0.0
[18:13:14] Ignite node started OK (id=f731e4ab)
...
>>> Ignite grid started for data: 86 rows X 13 cols
>>> KMeans centroids
Body, Sweetness, Smoky, Medicinal, Tobacco, Honey, Spicy, Winey, Nutty, Malty, Fruity, Floral
2.7037, 2.4444, 1.4074, 0.0370, 0.0000, 1.8519, 1.6667, 1.8519, 1.8889, 2.0370, 2.1481, 1.6667
1.8500, 1.9000, 2.0000, 0.9500, 0.1500, 1.1000, 1.5000, 0.6000, 1.5500, 1.7000, 1.3000, 1.5000
1.2667, 2.1333, 0.9333, 0.1333, 0.0000, 1.0667, 0.8000, 0.5333, 1.8000, 1.7333, 2.2667, 2.2667
3.6667, 1.5000, 3.6667, 3.3333, 0.6667, 0.1667, 1.6667, 0.5000, 1.1667, 1.3333, 1.1667, 0.1667
1.5000, 2.8889, 1.0000, 0.2778, 0.1667, 1.0000, 1.2222, 0.6111, 0.5556, 1.7778, 1.6667, 2.0000
[18:13:15] Ignite node stopped OK [uptime=00:00:00.663]

We can plot the centroid characteristics in a spider plot.

2022-10-27 20_42_01-Whiskey clusters with Apache Ignite.png


More Information

  • Repo containing the source code: WhiskeyIgnite
  • Repo containing similar examples using a variety of libraries including Apache Commons CSV, Weka, Smile, Tribuo and others: Whiskey
  • A similar example using Apache Spark directly but with a built-in parallelized k-means from the spark-mllib library rather than a hand-crafted algorithm: WhiskeySpark


Groovy Dates And Times Cheat Sheet

by paulk


Posted on Monday October 24, 2022 at 07:27AM in Technology


Java has had a Date class from the very beginning and Groovy supports using it and several related classes like Calendar. Throughout this blog post we refer to those classes as the legacy date classes. Groovy enhances the experience of using the legacy date classes with simpler mechanisms for formatting, parsing and extracting fields from the related classes.

Since Java 8, the JDK has included the JSR-310 Date Time API. We refer to these classes as the new date classes. The new date classes remove many limitations of the legacy date classes and bring some greatly appreciated additional consistency. Groovy provides similar enhancements for the new date classes too.

Groovy's enhancements for the legacy date classes are in the groovy-dateutil module (prior to Groovy 2.5, this functionality was built in to the core module). The groovy-datetime module has the enhancements for the new date classes. You can include a dependency to this module in your build file or reference the groovy-all pom dependency. Both modules are part of a standard Groovy install.

The next few sections illustrate common date and time tasks and the code to perform them using the new and legacy classes with Groovy enhancements in numerous places.

Please note: that some of the formatting commands are Locale dependent and the output may vary slightly if you run these examples yourself.

Representing the current date/time

The legacy date classes have an abstraction which includes date and time. If you are only interested in one of those two aspects, you simply ignore the other aspect. The new date classes allow you to have date-only, time-only and date-time representations.

The examples create instances representing the current date and/or time. Various information is extracted from the instances and they are printed in various ways. Some of the examples use the SV macro which prints the name and string value of one or more variables.

taskjava.timelegacy

current date and time

println LocalDateTime.now()      
println Instant.now()
2022-10-24T12:40:02.218130200
2022-10-24T02:40:02.223131Z
println new Date()               
println Calendar.instance.time
Mon Oct 24 12:40:02 AEST 2022
Mon Oct 24 12:40:02 AEST 2022
day of current year &
day of current month
println LocalDateTime.now().dayOfYear
println LocalDateTime.now().dayOfMonth
297
24
println Calendar.instance[DAY_OF_YEAR]
println Calendar.instance[DAY_OF_MONTH]
297
24
extract today's
year, month & day
var now = LocalDate.now()     // or LocalDateTime

println SV(now.year, now.monthValue, now.dayOfMonth)

(Y, M, D) = now[YEAR, MONTH_OF_YEAR, DAY_OF_MONTH]
println "Today is $Y $M $D"
now.year=2022, now.monthValue=10, now.dayOfMonth=24
Today is 2022 10 24
var now = Calendar.instance
(_E, Y, M, _WY, _WM, D) = now
println "Today is $Y ${M+1} $D"

(Y, M, D) = now[YEAR, MONTH, DAY_OF_MONTH]
println "Today is $Y ${M+1} $D"
Today is 2022 10 24
Today is 2022 10 24
alternatives to print today
println now.format("'Today is 'YYYY-MM-dd")
printf 'Today is %1$tY-%1$tm-%1$te%n', now
Today is 2022-10-24
Today is 2022-10-24
println now.format("'Today is 'YYYY-MM-dd")
printf 'Today is %1$tY-%1$tm-%1$te%n', now
Today is 2022-10-24
Today is 2022-10-24

extract parts of current time

now = LocalTime.now() // or LocalDateTime
println SV(now.hour, now.minute, now.second)
(H, M, S) = now[HOUR_OF_DAY, MINUTE_OF_HOUR,
SECOND_OF_MINUTE]
printf 'The time is %02d:%02d:%02d\n', H, M, S
now.hour=12, now.minute=40, now.second=2
The time is 12:40:02
(H, M, S) = now[HOUR_OF_DAY, MINUTE, SECOND]

println SV(H, M, S)
printf 'The time is %02d:%02d:%02d%n', H, M, S
H=12, M=40, S=2
The time is 12:40:02
alternatives to print time
println now.format("'The time is 'HH:mm:ss")
printf 'The time is %1$tH:%1$tM:%1$tS%n', now
The time is 12:40:02
The time is 12:40:02
println now.format("'The time is 'HH:mm:ss")
printf 'The time is %1$tH:%1$tM:%1$tS%n', now
The time is 12:40:02
The time is 12:40:02

Processing times

The new date classes have a LocalTime class specifically for representing time-only quantities. The legacy date classes don't have such a purpose-built abstraction; you essentially just ignore the day, month, and year parts of a date. The java.sql.Time class could be used as an alterative but rarely is. The Java documentation comparing the new date classes to their legacy equivalents, talks about using GregorianCalendar with the date set to the epoch value of 1970-01-01 as an approximation of the LocalTime class. We'll follow that approach here to provide a comparison but we strongly recommend upgrading to the new classes if you need to represent time-only values or use the Joda-Time library on JDK versions prior to 8.

The examples look at representing a minute before and after midnight, and some times at which you might eat your meals. For the meals, as well as printing various values, we might be interested in calculating new times in terms of existing times, e.g. lunch and dinner are 7 hours apart.
taskjava.timelegacy
one min after midnight
LocalTime.of(0, 1).with {
println format('HH:mm')
println format('hh:mm a')
println format('K:mm a')
}
00:01
12:01 am
0:01 am
Calendar.instance.with {
clear()
set(MINUTE, 1)
println format('HH:mm')
println format('hh:mm a')
println format('K:mm a')
}
00:01
12:01 am
0:01 am
one min before midnight
LocalTime.of(23, 59).with {
println format('HH:mm')
println format('hh:mm a')
println format('K:mm a')
}
23:59
11:59 pm
11:59 pm
Calendar.instance.with {
clear()
set(hourOfDay: 23, minute: 59)
println format('HH:mm')
println format('hh:mm a')
println format('K:mm a')
}
23:59
11:59 pm
11:59 pm
meal times
var breakfast = LocalTime.of(7, 30)
var lunch = LocalTime.parse('12:30')
assert lunch == LocalTime.parse('12:30.00 pm', 'hh:mm.ss a')
lunch.with { assert hour == 12 && minute == 30 }
var dinner = lunch.plusHours(7)
assert dinner == lunch.plus(7, ChronoUnit.HOURS)
assert Duration.between(lunch, dinner).toHours() == 7
assert breakfast.isBefore(lunch) // Java API
assert lunch < dinner // Groovy shorthand
assert lunch in breakfast..dinner
assert dinner.format('hh:mm a') == '07:30 pm'
assert dinner.format('k:mm') == '19:30'
assert dinner.format(FormatStyle.MEDIUM) == '7:30:00 pm'
assert dinner.timeString == '19:30:00'
var breakfast = Date.parse('hh:mm', '07:30')
var lunch = Calendar.instance.tap {
clear()
set(hourOfDay: 12, minute: 30)
}
assert lunch[HOUR_OF_DAY, MINUTE] == [12, 30]
var dinner = lunch.clone().tap { it[HOUR_OF_DAY] += 7 }
assert dinner == lunch.copyWith(hourOfDay: 19)
assert dinner.format('hh:mm a') == '07:30 pm'
assert dinner.format('k:mm') == '19:30'
assert dinner.time.timeString == '7:30:00 pm'
assert breakfast.before(lunch.time) // Java API
assert lunch < dinner // Groovy shorthand

Processing dates

To represent date-only information with the legacy date classes, you set the time aspects to zero, or simply ignore them. Alternatively, you could consider the less commonly used java.sql.Date class. The new date classes have the special LocalDate class for this purpose which we highly recommend.

The examples create dates for Halloween and Melbourne Cup day (a public holiday in the Australia state of Victoria). We look at various properties of those two dates.

taskjava.timelegacy
holidays
var halloween22 = LocalDate.of(2022, 10, 31)
var halloween23 = LocalDate.parse('2023-Oct-31', 'yyyy-LLL-dd')
assert halloween22 == halloween23 - 365
assert halloween23 == halloween22.plusYears(1)
var melbourneCup22 = LocalDate.of(2022, 11, 1)
assert halloween22 < melbourneCup22
assert melbourneCup22 - halloween22 == 1
assert Period.between(halloween22, melbourneCup22).days == 1
assert ChronoUnit.DAYS.between(halloween22, melbourneCup22) == 1L
var days = []
halloween22.upto(melbourneCup22) {days << "$it.dayOfWeek" }
assert days == ['MONDAY', 'TUESDAY']
var hols = halloween22..melbourneCup22
assert hols.size() == 2
var halloween21 = Date.parse('dd/MM/yyyy', '31/10/2021')
var halloween22 = Date.parse('yyyy-MMM-dd', '2022-Oct-31')
assert halloween21 + 365 == halloween22
var melbourneCup22 = new GregorianCalendar(2022, 10, 1).time
assert melbourneCup22.dateString == '1/11/22' // AU Locale
assert halloween22 < melbourneCup22
assert melbourneCup22 - halloween22 == 1
assert melbourneCup22 == halloween22.copyWith(month: 10, date: 1)
var days = []
halloween22.upto(melbourneCup22) { days << it.format('EEEEE') }
assert days == ['Monday', 'Tuesday']
var hols = halloween22..melbourneCup22
assert hols.size() == 2

Processing date and time combinations

The new date classes use LocalDateTime to represent a quantity with both date and time aspects. Many of the methods seen earlier will also be applicable here.

The examples show creating and printing a representation of lunch on Melbourne Cup day.

taskjava.timelegacy
holidays
var melbourneCupLunch = LocalDateTime.of(2022, 11, 1, 12, 30)
assert melbourneCupLunch.timeString == '12:30:00'
assert melbourneCupLunch.dateString == '2022-11-01'
assert melbourneCupLunch.dateTimeString == '2022-11-01T12:30:00'
assert melbourneCupLunch.toLocalDate() == melbourneCup22
assert melbourneCupLunch.toLocalTime() == lunch
assert melbourneCupLunch == melbourneCup22 << lunch
var melbourneCupLunch = new GregorianCalendar(2022, 10, 1, 12, 30).time
assert melbourneCupLunch.timeString == '12:30:00 pm' // Locale specific
assert melbourneCupLunch.dateString == '1/11/22' // Locale specific
assert melbourneCupLunch.dateTimeString == '1/11/22, 12:30:00 pm' // Locale specific
assert melbourneCupLunch.clearTime() == melbourneCup22

Processing zoned date and times

The legacy date classes have the concept of a TimeZone, predominantly used by the Calendar class. The new date classes has a similar concept but uses the ZoneId, ZoneOffset, and ZonedDateTime classes (among others).

The examples show various properties of zones and show that during the Melbourne cup breakfast, it would still be the night before (Halloween) in Los Angeles. They also show that those two zones are 18 hours apart at that time of the year.

taskjava.timelegacy
holidays
var aet = ZoneId.of('Australia/Sydney')
assert aet.fullName == 'Australian Eastern Time' && aet.shortName == 'AET'
assert aet.offset == ZoneOffset.of('+11:00')
var melbCupBreakfastInAU = ZonedDateTime.of(melbourneCup22, breakfast, aet)
var melbCupBreakfast = LocalDateTime.of(melbourneCup22, breakfast)
assert melbCupBreakfastInAU == melbCupBreakfast << aet
var pst = ZoneId.of('America/Los_Angeles')
assert pst.fullName == 'Pacific Time' && pst.shortName == 'GMT-08:00'
var meanwhileInLA = melbCupBreakfastInAU.withZoneSameInstant(pst)
assert halloween22 == meanwhileInLA.toLocalDate()
assert aet.offset.hours - pst.offset.hours == 18
var aet = TimeZone.getTimeZone('Australia/Sydney')
assert aet.displayName == 'Australian Eastern Standard Time'
assert aet.observesDaylightTime()
var melbourneCupBreakfast = new GregorianCalendar(aet).tap {
set(year: 2022, month: 10, date: 1, hourOfDay: 7, minute: 30)
}
var pst = TimeZone.getTimeZone('America/Los_Angeles')
assert pst.displayName == 'Pacific Standard Time'
var meanwhileInLA = new GregorianCalendar(pst).tap {
setTimeInMillis(melbourneCupBreakfast.timeInMillis)
}
assert meanwhileInLA.time.format('MMM dd', pst) == halloween22.format('MMM dd')
assert aet.rawOffset / 3600000 - pst.rawOffset / 3600000 == 18

Other useful classes

The new date classes offer a few more useful classes. Here are some of the common ones:

  • OffsetDateTime - like ZonedDateTime but with just an offset from UTC rather than a full time-zone
  • Instant - like OffsetDateTime but tied to UTC
  • YearMonth - like a LocalDate but with no day component
  • MonthDay - like a LocalDate but with no year component
  • Period - used to represent periods of time, e.g. Period.ofDays(14), Period.ofYears(2); see also the LocalDate example above.
  • Duration - a time-based amount of time, e.g. Duration.ofSeconds(30), Duration.ofHours(7); see also the LocalTime example above.

Conversions

It is useful to convert between the new and legacy classes. Some useful conversion methods are shown below with Groovy enhancements shown in blue.

FromConversion method/property
GregorianCalendar 
toInstant()
toZonedDateTime()
from(ZonedDateTime)
Calendar
toInstant()
toZonedDateTime()
toOffsetDateTime()
toLocalDateTime()
toLocalDate()
toLocalTime()
toOffsetTime()
toDayOfWeek()
toYear()
toYearMonth()
toMonth()
toMonthDay()
zoneOffset
zoneId
Date
toInstant()
from(Instant)
toZonedDateTime()
toOffsetDateTime()
toLocalDateTime()
toLocalDate()
toLocalTime()
toOffsetTime()
toDayOfWeek()
toYear()
toYearMonth()
toMonth()
toMonthDay()
zoneOffset
zoneId
ZonedDateTime
OffsetDateTime
LocalDateTime
LocalDate
LocalTime
toDate()
toCalendar()

SimpleDateFormat patterns

We saw several examples above using the format and parse methods. For the legacy date classes, numerous Groovy enhancements delegate to SimpleDateFormat. This class represents date/time formats using pattern strings. These are special letters to represent some time or date component mixed with escaped literal strings. The special letters are often repeated to represent the minimum size field for number components and whether the full or an abbreviated form is used for other components.

As an example, for the U.S. locale and U.S. Pacific Time time zone, the following pattern:

yyyy.MM.dd G 'at' HH:mm:ss z
would apply to the following text:
2001.07.04 AD at 12:08:56 PDT

Letter Description
GEra designator AD
yYear 1996; 96
YWeek year (similar to year but allotted by weeks; the first/last few days of a year might be allotted to finish/start the last/previous week)
MMonth in year (context sensitive) July; Jul; 07
LMonth in year (standalone form) July; Jul; 07
wWeek in year 27
WWeek in month 2
DDay in year 189
dDay in month 10
FDay of week in month 2
EDay name in week Tuesday; Tue
uDay number of week (1 = Monday, ..., 7 = Sunday)
aAm/pm marker PM
HHour in day (0-23) 0
kHour in day (1-24) 24
KHour in am/pm (0-11) 0
hHour in am/pm (1-12) 12
mMinute in hour 30
sSecond in minute 55
SMillisecond 978
zTime zone Pacific Standard Time; PST; GMT-08:00
ZTime zone (RFC 822) -0800
XTime zone (ISO 8601) -08; -0800; -08:00
'to escape text put a single quote on either side
''two single quotes for a literal single quote '


DateTimeFormatter patterns

Groovy's format and parse enhancements for the new date classes delegate to the DateTimeFormatter class. It's behavior is similar to what we saw for SimpleDateFormat but with slightly different conversion letters:

Conversion suffix Description
Gera AD
uyear 2004; 04
yyear-of-era 2004; 04
Dday-of-year 189
M/Lmonth-of-year 7; 07; Jul; July; J
dday-of-month 10
Q/qquarter-of-year 3; 03; Q3; 3rd quarter
Yweek-based-year 1996; 96
wweek-of-week-based-year 27
Wweek-of-month 4
Eday-of-week Tue; Tuesday; T
e/clocalized day-of-week 2; 02; Tue; Tuesday; T
Fweek-of-month 3
aam-pm-of-day PM
hclock-hour-of-am-pm (1-12) 12
Khour-of-am-pm (0-11) 0
kclock-hour-of-am-pm (1-24) 0
Hhour-of-day (0-23) 0
mminute-of-hour 30
ssecond-of-minute 55
Sfraction-of-second 978
Amilli-of-day 1234
nnano-of-second 987654321
Nnano-of-day 1234000000
Vtime-zone ID America/Los_Angeles; Z; -08:30
ztime-zone name Pacific Standard Time; PST
Olocalized zone-offset GMT+8; GMT+08:00; UTC-08:00;
Xzone-offset 'Z' for zero Z; -08; -0830; -08:30; -083015; -08:30:15;
xzone-offset +0000; -08; -0830; -08:30; -083015; -08:30:15;
Zzone-offset +0000; -0800; -08:00;
ppad next
'to escape text put a single quote on either side
''two single quotes for a literal single quote '

Localized Patterns

JDK19 adds the ofLocalizedPattern(String requestedTemplate) method. The requested template is one or more regular expression pattern symbols ordered from the largest to the smallest unit, and consisting of the following patterns:

     "G{0,5}" +        // Era
     "y*" +            // Year
     "Q{0,5}" +        // Quarter
     "M{0,5}" +        // Month
     "w*" +            // Week of Week Based Year
     "E{0,5}" +        // Day of Week
     "d{0,2}" +        // Day of Month
     "B{0,5}" +        // Period/AmPm of Day
     "[hHjC]{0,2}" +   // Hour of Day/AmPm (refer to LDML for 'j' and 'C')
     "m{0,2}" +        // Minute of Hour
     "s{0,2}" +        // Second of Minute
     "[vz]{0,4}"       // Zone

The requested template is mapped to the closest of available localized format as defined by the Unicode LDML specification. Here is an example of usage:

var now = ZonedDateTime.now()
var columns = '%7s | %10s | %10s | %10s | %14s%n'
printf columns, 'locale', 'GDK', 'custom', 'local', 'both'
[locale('en', 'US'),
locale('ro', 'RO'),
locale('vi', 'VN')].each { locale ->
Locale.default = locale
var gdk = now.format('y-MM-dd')
var custom = now.format(ofPattern('y-MM-dd'))
var local = now.format(ofLocalizedDate(SHORT))
var both = now.format(ofLocalizedPattern('yMM'))
printf columns, locale, gdk, custom, local, both
}

Which has this output:

locale |        GDK |     custom |      local |           both
en_US | 2022-12-18 | 2022-12-18 | 12/18/22 | 12/2022
ro_RO | 2022-12-18 | 2022-12-18 | 18.12.2022 | 12.2022
vi_VN | 2022-12-18 | 2022-12-18 | 18/12/2022 | tháng 12, 2022

Example credit: this example from Nicolai Parlog.

Formatter formats

The java.util.Formatter class is a base class in Java for various kinds of formatting. It can be used directly, via String.format, parse, printf, or Groovy's sprintf. We saw several examples of using printf and parse formatting in the above examples.

The Formatter class has methods which take a format string as its first argument and zero or more additional arguments. The format string typically has one or more format specifiers (starting with a percent character) which indicate that a formatted version of one of the additional arguments should be placed into the string at that point. The general form of a format specifier is:

%[argument_index$][flag][width][.precision]conversion

Most of the parts are optional. The argument_index part is only used when referencing one of the additional arguments more than once (or out of order). The precision part is only used for floating point numbers. The flag part is used to indicate always include sign(+), zero-padding(0), locale-specific comma delimiters(,), and left justification(-). The width indicates the minimum number of characters for the output. The conversion indicates how the argument should be processed, e.g. as a numeric field, a date, a special character, or some other special processing. Upper and lowercase variants exist for most conversions which, for the uppercase variant, will call toUpperCase after the conversion is complete.

Conversion Description
'b', 'B'Treat as a boolean or false if null
'h', 'H'Output the arguments hashcode as a hex string
's', 'S'Treat as a String
'c', 'C'Treat as a Unicode character
'd'Treat as a decimal integer
'o'Treat as an octal integer
'x', 'X'Treat as a hexadecimal integer
'e', 'E'Treat as a decimal number in scientific notation
'f'Treat as a floating point number
'g', 'G'Treat as a floating point in either decimal or scientific notation
'a', 'A'Treat as a hexadecimal floating-point number
't', 'T'Treat as the prefix for a date/time conversion
'%'A literal percent
'n'A line separator

When the date/time prefix is used, additional suffixes are applicable.

For formatting times:

Conversion suffix Description
'H'Hour of the day for the 24-hour clock as two digits 00 - 23
'I'Hour for the 12-hour clock as two digits 01 - 12
'k'Hour of the day for the 24-hour clock 0 - 23
'l'Hour for the 12-hour clock 1 - 12
'M'Minute within the hour as two digits 00 - 59
'S'Seconds within the minute as two digits 00 - 60
("60" is used for leap seconds)
'L'Millisecond within the second as three digits 000 - 999
'N'Nanosecond within the second as nine digits 000000000 - 999999999
'p'Locale-specific morning or afternoon marker in lower case, am or pm
(The conversion prefix 'T' forces this output to upper case)
'z'RFC 822 style numeric time zone offset from GMT -0800
(Adjusted as needed for Daylight Saving Time)
'Z'Abbreviated time zone
's'Seconds since the beginning of the epoch starting at 1 January 1970 00:00:00 UTC
'Q'Milliseconds since the beginning of the epoch starting at 1 January 1970 00:00:00 UTC


For formatting dates:

Conversion suffix Description
'B'Locale-specific full month name January
'b', 'h'Locale-specific abbreviated month name Jan
'A'Locale-specific full name of the day of the week Sunday
'a'Locale-specific short name of the day of the week Sun
'C'First two digits of four-digit year 00 - 99
'Y'Year as four digits 0092
'y'Last two digits of the year 00 - 99
'j'Day of year as three digits 001 - 366
'm'Month as two digits 01 - 13
'd'Day of month as two digits 01 - 31
'e'Day of month 1 - 31


For formatting date/time compositions:

Conversion suffix Description
'R'Time formatted for the 24-hour clock as "%tH:%tM"
'T'Time formatted for the 24-hour clock as "%tH:%tM:%tS"
'r'Time formatted for the 12-hour clock as "%tI:%tM:%tS %Tp"
The location of the morning or afternoon marker ('%Tp') may be locale-dependent.
'D'Date formatted as "%tm/%td/%ty"
'F'ISO 8601 date formatted as "%tY-%tm-%td"
'c'Date and time formatted as "%ta %tb %td %tT %tZ %tY" Sun Jul 21 15:17:00 EDT 1973


Further information


Fruity Eclipse Collections

by paulk


Posted on Thursday October 13, 2022 at 11:05AM in Technology


This blog post continues on to some degree from the previous post, but instead of deep learning, we'll look at clustering using k-means after first exploring some top methods of Eclipse Collections with fruit emoji examples.

Eclipse Collections Fruit Salad

First, we'll define a Fruit enum (it adds one additional fruit compared to the related Eclipse Collections kata):

code for fruit enum

We can use this enum in the following examples:

usage.png

The last example calculates red fruit in parallel threads. As coded, it uses virtual threads when run on JDK19 with preview features enabled. You can follow the suggestion in the comment to run on other JDK versions or with normal threads. In addition to Eclipse Collections, we have the GPars library on our classpath. Here we are only using one method which is managing pool lifecycle for us.

Exploring emoji colors

For some fun, let's look at whether the nominated color of each fruit matches the color of the related emoji. As in the previous blog, we'll use the slightly nicer Noto Color Emoji fonts for our fruit as shown here:

2022-10-12 14_16_42-Noto Color Emoji - Google Fonts.png

We'll use an Eclipse Collection BiMap to switch back and forth between the color names and java.awt colors:

@Field public static COLOR_OF = BiMaps.immutable.ofAll([
WHITE: WHITE, RED: RED, GREEN: GREEN, BLUE: BLUE,
ORANGE: ORANGE, YELLOW: YELLOW, MAGENTA: MAGENTA
])
@Field public static NAME_OF = COLOR_OF.inverse()

We are also going to use some helper functions to switch between RGB and HSB color values:

static hsb(int r, int g, int b) {
float[] hsb = new float[3]
RGBtoHSB(r, g, b, hsb)
hsb
}

static rgb(BufferedImage image, int x, int y) {
int rgb = image.getRGB(x, y)
int r = (rgb >> 16) & 0xFF
int g = (rgb >> 8) & 0xFF
int b = rgb & 0xFF
[r, g, b]
}

The HSB color space represents colors in a spectrum from 0 to 360 degrees:

Color Circle, Credit: https://nycdoe-cs4all.github.io/units/1/lessons/lesson_3.2

Image credit: https://nycdoe-cs4all.github.io/units/1/lessons/lesson_3.2

We have two helper methods to assist with colors. The first picks out "mostly black" and "mostly white" colors while the second uses a switch expression to carve out some regions of the color space for our colors of interest:

static range(float[] hsb) {
if (hsb[1] < 0.1 && hsb[2] > 0.9) return [0, WHITE]
if (hsb[2] < 0.1) return [0, BLACK]
int deg = (hsb[0] * 360).round()
return [deg, range(deg)]
}

static range(int deg) {
switch (deg) {
case 0..<16 -> RED
case 16..<35 -> ORANGE
case 35..<75 -> YELLOW
case 75..<160 -> GREEN
case 160..<250 -> BLUE
case 250..<330 -> MAGENTA
default -> RED
}
}

Note that the JDK doesn't have a standard color of PURPLE, so we combine purple with magenta by choosing an appropriate broad spectrum for MAGENTA.

We used a Plotly 3D interactive scatterplot (as supported by the Tablesaw Java dataframe and visualization library) to visualize our emoji colors (as degrees on the color spectrum) vs the XY coordinates:

2022-10-13 20_04_10-Color vs xy.png

We are going to try out 3 approaches for determining the predominant color of each emoji:

  1. Most common color: We find the color spectrum value for each point and count up the number of points of each color. The color with the most points will be selected. This is simple and works in many scenarios but if an apple or cherry has 100 shades of red but only one shade of green for the stalk or a leaf, green may be selected.
  2. Most common range: We group each point into a color range. The range with the most points will be selected.
  3. Centroid of biggest cluster: We divide our emoji image into a grid of sub-images. We will perform k-means clustering of the RGB values for each point in the sub-image. This will cluster similar colored points together in a cluster. The cluster with the most points will be selected and its centroid will be chosen as the selected pre-dominant color. This approach has the affect of pixelating our sub-image by color. This approach is inspired by this python article.

Most Common Color

Ignoring the background white color, the most common color for our PEACH emoji is a shade of orange. The graph below shows the count of each color:

2022-10-17 15_57_40-Color histogram for PEACH.png

Most Common Range

If instead of counting each color, we group colors into their range and count the numbers in each range, we get the following graph for PEACH:

2022-10-17 15_56_58-Range histogram for PEACH.png


K-Means

K-Means is an algorithm for finding cluster centroids. For k=3, we would start by picking 3 random points as our starting centroids.

kmeans_step1.png

We allocate all points to their closest centroid:

kmeans_step2.png

Given this allocation, we re-calculate each centroid from all of its points:

kmeans_step3.png

We repeat this process until either a stable centroid selection is found, or we have reached a certain number of iterations.

We used the K-Means algorithm from Apache Commons Math.

Here is the kind of result we would expect if run on the complete set of points for the PEACH emoji. The black dots are the centroids. It has found one green, one orange and one red centroid. The centroid with the most points allocated to it should be the most predominant color. (This is another interactive 3D scatterplot.)


RGB_3D_PEACH.png


We can plot the number of points allocated to each cluster as a bar chart. (We used a Scala plotting library to show Groovy integration with Scala.)

2022-10-17 16_56_28-Centroid sizes for PEACH.png

The code for drawing the above chart looks like this:

var trace = new Bar(intSeq([1, 2, 3]), intSeq(sizes))
.withMarker(new Marker().withColor(oneOrSeq(colors)))

var traces = asScala([trace]).toSeq()

var layout = new Layout()
.withTitle("Centroid sizes for $fruit")
.withShowlegend(false)
.withHeight(600)
.withWidth(800)

Plotly.plot(path, traces, layout, defaultConfig, false, false, true)

K-Means with subimages

The approach we will take for our third option enhances K-Means. Instead of finding centroids for the whole image as the graphs just shown do, we divide the image into subimages and perform the K-Means on each subimage. Our overall pre-dominant color is determined to be the most common color predicated across all of our subimages.

Putting it all together

Here is the final code covering all three approaches (including printing some pretty images highlighting the third approach and the Plotly 3D scatter plots):

var results = Fruit.ALL.collect { fruit ->
var file = getClass().classLoader.getResource("${fruit.name()}.png").file as File
var image = ImageIO.read(file)

var colors = [:].withDefault { 0 }
var ranges = [:].withDefault { 0 }
for (x in 0..<image.width) {
for (y in 0..<image.height) {
def (int r, int g, int b) = rgb(image, x, y)
float[] hsb = hsb(r, g, b)
def (deg, range) = range(hsb)
if (range != WHITE) { // ignore white background
ranges[range]++
colors[deg]++
}
}
}
var maxRange = ranges.max { e -> e.value }.key
var maxColor = range(colors.max { e -> e.value }.key)

int cols = 8, rows = 8
int grid = 5 // thickness of black "grid" between subimages
int stepX = image.width / cols
int stepY = image.height / rows
var splitImage = new BufferedImage(image.width + (cols - 1) * grid, image.height + (rows - 1) * grid, image.type)
var g2a = splitImage.createGraphics()
var pixelated = new BufferedImage(image.width + (cols - 1) * grid, image.height + (rows - 1) * grid, image.type)
var g2b = pixelated.createGraphics()

ranges = [:].withDefault { 0 }
for (i in 0..<rows) {
for (j in 0..<cols) {
def clusterer = new KMeansPlusPlusClusterer(5, 100)
List<DoublePoint> data = []
for (x in 0..<stepX) {
for (y in 0..<stepY) {
def (int r, int g, int b) = rgb(image, stepX * j + x, stepY * i + y)
var dp = new DoublePoint([r, g, b] as int[])
var hsb = hsb(r, g, b)
def (deg, col) = range(hsb)
data << dp
}
}
var centroids = clusterer.cluster(data)
var biggestCluster = centroids.max { ctrd -> ctrd.points.size() }
var ctr = biggestCluster.center.point*.intValue()
var hsb = hsb(*ctr)
def (_, range) = range(hsb)
if (range != WHITE) ranges[range]++
g2a.drawImage(image, (stepX + grid) * j, (stepY + grid) * i, stepX * (j + 1) + grid * j, stepY * (i + 1) + grid * i,
stepX * j, stepY * i, stepX * (j + 1), stepY * (i + 1), null)
g2b.color = new Color(*ctr)
g2b.fillRect((stepX + grid) * j, (stepY + grid) * i, stepX, stepY)
}
}
g2a.dispose()
g2b.dispose()

var swing = new SwingBuilder()
var maxCentroid = ranges.max { e -> e.value }.key
swing.edt {
frame(title: 'Original vs Subimages vs K-Means',
defaultCloseOperation: DISPOSE_ON_CLOSE, pack: true, show: true) {
flowLayout()
label(icon: imageIcon(image))
label(icon: imageIcon(splitImage))
label(icon: imageIcon(pixelated))
}
}

[fruit, maxRange, maxColor, maxCentroid]
}

println "Fruit Expected By max color By max range By k-means"
results.each { fruit, maxRange, maxColor, maxCentroid ->
def colors = [fruit.color, maxColor, maxRange, maxCentroid].collect {
NAME_OF[it].padRight(14)
}.join().trim()
println "${fruit.emoji.padRight(6)} $colors"
}

Here are the resulting images:

2022-10-13 20_37_25-Original.png

2022-10-13 20_37_08-Original.png

2022-10-13 20_36_49-Original.png

2022-10-13 20_36_27-Original.png

2022-10-13 20_36_07-Original.png

2022-10-13 20_35_21-Original.png

And, here are the final results:

final results

In our case, all three approaches yielded the same results. Results for other emojis may vary.

Further information



Deep Learning and Eclipse Collections

by paulk


Posted on Tuesday October 11, 2022 at 10:41AM in Technology


DeepLearning4J and Eclipse Collections revisited

In previous blogs, we have covered Eclipse Collections and Deep Learning. Recently, a couple of the highly recommended katas for Eclipse Collections have been revamped to include "pet" and "fruit" emojis for a little bit of extra fun. What could be better than Learning Eclipse Collections? Deep Learning and Eclipse Collections of course!

First, we create a PetType enum with the emoji toString, and then Pet and Person records. We'll populate a people list as is done in the kata. The full details are in the repo.

Let's use a GQuery expression to explore the pre-populated list:

println GQ {
from p in people
select p.fullName, p.pets
}

The result is:

deep-learning-eclipse-collections pre-populated field

Now let's duplicate the assertion from the getCountsByPetType test in exercise3 which checks pet counts:

2022-10-11 20_06_12-Groovy web console.png

As we expect, it passes.

Now, for a bit of fun, we will use a neural network trained to detect cat and dog images and apply it to our emojis. We'll follow the process described here. It uses DeepLearning4J to train and then use a model. The images used to train the model were real cat and dog images, not emojis, so we aren't expecting our model to be super accurate.

The first attempt was to write the emojis into swing JLabel components and then save using a buffered image. This lead to poor looking images:

PetAsFonts.jpg

And consequently, poor image inference. Recent JDK versions on some platforms might do better but we gave up on this approach.

Instead, emoji image files from the Noto Color Emoji font were used and saved under the pet type in the resources folder. These look much nicer:

2022-10-11 18_24_38-Noto Color Emoji - Google Fonts.png

Here is the code which makes use of those saved images to detect the animal types (note the use of type aliasing since we have two PetType classes; we rename one to PT):

import ramo.klevis.ml.vg16.PetType as PT
import ramo.klevis.ml.vg16.VG16ForCat

var vg16ForCat = new VG16ForCat().tap{ loadModel() }
var results = []
people.each{ p ->
results << p.pets.collect { pet ->
var file = new File("resources/${pet.type.name()}.png")
PT petType = vg16ForCat.detectCat(file, 0.675d)
var desc = switch(petType) {
case PT.CAT -> 'is a cat'
case PT.DOG -> 'is a dog'
default -> 'is unknown'
}
"$pet.name $desc"
}
}
println results.flatten().join('\n')

Note that the model exceeds the maximum allowable size for normal github repos, so you should create it following the original repo instructions and then store the resulting model.zip in the resources folder.

When we run the script, we get the following output:


[main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [CpuBackend] backend
...
[main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Blas vendor: [OPENBLAS]
...
============================================================================================================================================
VertexName (VertexType)                 nIn,nOut       TotalParams    ParamsShape                    Vertex Inputs
============================================================================================================================================
input_1 (InputVertex)                   -,-            -              -                              -
block1_conv1 (Frozen ConvolutionLayer)  3,64           1792           b:{1,64}, W:{64,3,3,3}         [input_1]
block1_conv2 (Frozen ConvolutionLayer)  64,64          36928          b:{1,64}, W:{64,64,3,3}        [block1_conv1]
block1_pool (Frozen SubsamplingLayer)   -,-            0              -                              [block1_conv2]
block2_conv1 (Frozen ConvolutionLayer)  64,128         73856          b:{1,128}, W:{128,64,3,3}      [block1_pool]
block2_conv2 (Frozen ConvolutionLayer)  128,128        147584         b:{1,128}, W:{128,128,3,3}     [block2_conv1]
block2_pool (Frozen SubsamplingLayer)   -,-            0              -                              [block2_conv2]
block3_conv1 (Frozen ConvolutionLayer)  128,256        295168         b:{1,256}, W:{256,128,3,3}     [block2_pool]
block3_conv2 (Frozen ConvolutionLayer)  256,256        590080         b:{1,256}, W:{256,256,3,3}     [block3_conv1]
block3_conv3 (Frozen ConvolutionLayer)  256,256        590080         b:{1,256}, W:{256,256,3,3}     [block3_conv2]
block3_pool (Frozen SubsamplingLayer)   -,-            0              -                              [block3_conv3]
block4_conv1 (Frozen ConvolutionLayer)  256,512        1180160        b:{1,512}, W:{512,256,3,3}     [block3_pool]
block4_conv2 (Frozen ConvolutionLayer)  512,512        2359808        b:{1,512}, W:{512,512,3,3}     [block4_conv1]
block4_conv3 (Frozen ConvolutionLayer)  512,512        2359808        b:{1,512}, W:{512,512,3,3}     [block4_conv2]
block4_pool (Frozen SubsamplingLayer)   -,-            0              -                              [block4_conv3]
block5_conv1 (Frozen ConvolutionLayer)  512,512        2359808        b:{1,512}, W:{512,512,3,3}     [block4_pool]
block5_conv2 (Frozen ConvolutionLayer)  512,512        2359808        b:{1,512}, W:{512,512,3,3}     [block5_conv1]
block5_conv3 (Frozen ConvolutionLayer)  512,512        2359808        b:{1,512}, W:{512,512,3,3}     [block5_conv2]
block5_pool (Frozen SubsamplingLayer)   -,-            0              -                              [block5_conv3]
flatten (PreprocessorVertex)            -,-            -              -                              [block5_pool]
fc1 (Frozen DenseLayer)                 25088,4096     102764544      b:{1,4096}, W:{25088,4096}     [flatten]
fc2 (Frozen DenseLayer)                 4096,4096      16781312       b:{1,4096}, W:{4096,4096}      [fc1]
predictions (OutputLayer)               4096,2         8194           b:{1,2}, W:{4096,2}            [fc2]
--------------------------------------------------------------------------------------------------------------------------------------------
            Total Parameters:  134268738
        Trainable Parameters:  8194
           Frozen Parameters:  134260544
============================================================================================================================================
...
Tabby is a cat
Dolly is a cat
Spot is a dog
Spike is a dog
Serpy is a cat
Tweety is unknown
Speedy is a dog
Fuzzy is unknown
Wuzzy is unknown

As we can see, it correctly predicted the cats (Tabby and Dolly) and dogs (Spot and Spike) but incorrectly thought a snake (Serpy) was a cat and a turtle (Speedy) was a dog. Given the lack of detail in the emoji images compared to the training images, this lack of accuracy isn't unexpected. We could certainly use better images or train our model differently if we wanted better results but it is fun to see our model not doing too badly even with emojis!


Deck of cards with Groovy, JDK collections and Eclipse Collections

by paulk


Posted on Friday September 23, 2022 at 10:18AM in Technology


Once again, Donald Raab has produced an interesting blog post on Eclipse Collections; this one shows some code for modelling and manipulating cards with Java 17 and Eclipse Collections. His related katas are highly recommended.

Here is the same example in Groovy (4.0.5 was used here) with just a few tweaks:

enum Rank { ACE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN, JACK, QUEEN, KING }

enum Suit { SPADES, DIAMONDS, HEARTS, CLUBS }

@Sortable(includes='suit,rank')
record Card(Rank rank, Suit suit) { }

var cards = Sets.cartesianProduct(EnumSet.allOf(Rank), EnumSet.allOf(Suit), Card::new)
var cardsBySuit = cards.groupBy(Card::suit)
var cardsByRank = cards.groupBy(Card::rank)

assert [cards, cardsByRank, cardsBySuit]*.size() == [52, 13 ,4]

var random = new Random(42L)
var deck = cards.toList().shuffleThis(random).shuffleThis(random).shuffleThis(random).toStack()
(1..5).collect(i -> deck.pop(5).toSortedList()).each(this::println)

And here is the output:

[Card[rank=FOUR, suit=SPADES], Card[rank=FOUR, suit=DIAMONDS], Card[rank=SIX, suit=HEARTS], Card[rank=NINE, suit=CLUBS], Card[rank=JACK, suit=CLUBS]]
[Card[rank=FIVE, suit=DIAMONDS], Card[rank=TWO, suit=HEARTS], Card[rank=FIVE, suit=HEARTS], Card[rank=TEN, suit=CLUBS], Card[rank=QUEEN, suit=CLUBS]]
[Card[rank=FIVE, suit=SPADES], Card[rank=NINE, suit=SPADES], Card[rank=QUEEN, suit=SPADES], Card[rank=THREE, suit=DIAMONDS], Card[rank=TWO, suit=CLUBS]]
[Card[rank=EIGHT, suit=SPADES], Card[rank=TWO, suit=DIAMONDS], Card[rank=EIGHT, suit=DIAMONDS], Card[rank=KING, suit=HEARTS], Card[rank=FIVE, suit=CLUBS]]
[Card[rank=SIX, suit=SPADES], Card[rank=KING, suit=DIAMONDS], Card[rank=THREE, suit=HEARTS], Card[rank=TEN, suit=HEARTS], Card[rank=QUEEN, suit=HEARTS]]

We can do a similar example with the built-in JDK collections and make some additional tweaks for nicer output:

enum Rank {
ACE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN, JACK, QUEEN, KING
String toString() { ['A', *'2'..'9', '10', 'J', 'Q', 'K'][ordinal()] }
}

enum Suit {
SPADES, DIAMONDS, HEARTS, CLUBS
String toString() { '♠♦♥♣'[ordinal()] }
}

@Sortable(includes='suit,rank')
record Card(Rank rank, Suit suit) {
Card(List srPair) { this(srPair[1], srPair[0]) }
String toString() { "${rank}${suit}" }
}

var cards = [Suit.values(), Rank.values()].combinations().collect(Card::new)
var cardsBySuit = cards.groupBy(Card::suit)
var cardsByRank = cards.groupBy(Card::rank)

assert [cards, cardsByRank, cardsBySuit]*.size() == [52, 13, 4]

var random = new Random(42L)
3.times { cards.shuffle(random) }
var deck = cards as Stack
5.times { println "Hand ${it+1}: ${(0..5).collect{ deck.pop() }.sort()}" }

println "Remaining cards sorted:\n${deck.sort()}"

Which has this output:

Cards3.png

Both these Groovy examples run on JDK versions from 8 to 19
(emulated records are used in JDK versions 8 to 15).

Updated 25/Sep/2022: Use same ordering for the JDK collections version.


Encryption and decryption with Groovy

by paulk


Posted on Monday September 19, 2022 at 02:34PM in Technology


Inspired by this recent blog entry, here is an example showing how to encrypt and decrypt with Groovy.

Using the JDK crypto classes

First, we need some text to encrypt. We'll use an excerpt of the one from the aforementioned blog post:

var text = 'Contrary to popular belief, Lorem Ipsum is not simply random text.\
It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old.'

Next, we'll create a factory for our cipher instance, generate a key, and set up an initialization vector.

First, the cipher factory:

var factory = { Cipher.getInstance('AES/CBC/PKCS5Padding') }

For our cipher algorithm, we are using the Advanced Encryption Standard (AES) algorithm, in Cipher Block Chaining (CBC) mode, with PKCS5 padding. We'll look at other options later.

Next we generate our secret key. Our secret key is our password. Only someone who has the password will be able to decrypt the encrypted message. We could use any random bits for our key, but like passwords, we want to choose a strong key rather than a weak one. Cryptographic libraries provide classes to generate such keys. We just need to provide the key size. AES supports 128, 192 and 256 bit keys. We'll choose 192 here:

var key = generateKey('AES', 192)

Our code uses this helper method:

def generateKey(String algorithm, Integer size) {
var generator = KeyGenerator.getInstance(algorithm)
generator.init(size)
generator.generateKey()
}

Next, we generate an initialization vector:

var ivParameterSpec = randomParameterSpec(factory)

It uses this helper method (we're using the algorithm block size for our initialization vector size):

def randomParameterSpec(Closure<Cipher> factory) {
var block = new byte[factory().blockSize]
SecureRandom.instanceStrong.nextBytes(block)
new IvParameterSpec(block)
}

An initialization vector is used to introduce some additional randomness to avoid repeating patterns in the input leading to repeating patterns in the encrypted bytes.

With all these things in place, we are almost ready to encrypt or decrypt, but first, let's define two more helper methods:
def encrypt(byte[] bytes, Key key, IvParameterSpec spec, Closure<Cipher> factory) {
var cipher = factory()
cipher.init(ENCRYPT_MODE, key, spec)
cipher.doFinal(bytes)
}

def decrypt(byte[] bytes, Key key, IvParameterSpec spec, Closure<Cipher> factory) {
var cipher = factory()
cipher.init(DECRYPT_MODE, key, spec)
cipher.doFinal(bytes)
}
And here is how we encrypt and decrypt:
var encrypted = encrypt(text.bytes, key, ivParameterSpec, factory)
println "Encrypted bytes : $encrypted"
println "Encrypted text : ${new String(encrypted)}"

var decrypted = decrypt(encrypted, key, ivParameterSpec, factory)
println "Decrypted bytes : $decrypted"
println "Decrypted text : ${new String(decrypted)}"

Which has this output:

Encrypted bytes : [-117, 36, 18, 69, -101, -8, 35, 93, -102, -49, -12, ..., -19, -100]
Encrypted text : ‹$E›ø#]šÏôæ”Á˜çp^µ³=L(Ö^_ŒC>CIË„ö,1É8ÆŸ.Š?vßG,Èw‰å¼zÜf>?µ›D¹éÆk€	°˜2êÔ}í©àhl$>?¹¡Kå3ÔO?±&…êî¶Ê–¾°®q®à—0ú‘ÔhO<H¦ç®Ç”ÈhAëjó QPyƒy6Ĥ*´un¼ï¯m¨´ÙjeJtëº\ó6ƪKªœíœ
Decrypted bytes : [67, 111, 110, 116, 114, 97, 114, 121, 32, 116, 111, 32, ..., 100, 46]
Decrypted text : Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old.

We can see that everything worked as expected, since the final output matches our original input text.

Using the Bouncy Castle library

We can alternatively, swap algorithms. There are numerous algorithms and modes supported by the JDK and others supported by third-party libraries. A nice summary can be found here.

We'll swap to use the CAST5 (CAST-128) algorithm which supports up to a 128-bit key. We'll use HMAC-SHA1 to generate our key.

import org.bouncycastle.jce.provider.BouncyCastleProvider
var bc = new BouncyCastleProvider()
factory = { Cipher.getInstance('CAST5', bc) }
key = generateKey('HmacSHA1', 128)
ivParameterSpec = randomParameterSpec(factory)

CAST5 is the default algorithm used in some versions of GPG and PGP. It isn't included by default in the JDK, so for this we'll use the Bouncy Castle library.

[Just as a note. If you are wanting to encrypt or decrypt GPG/PGP files, don't use the above code. Libraries like Bouncy Castle have dedicated classes for such scenarios.]

We now encrypt and decrypt as before:

encrypted = encrypt(text.bytes, key, ivParameterSpec, factory)
println "Encrypted text : ${new String(encrypted)}"
decrypted = decrypt(encrypted, key, ivParameterSpec, factory)
println "Decrypted text : ${new String(decrypted)}"

Which has this output:

Encrypted text : Mªá?r?v9£÷~4µT'›ÙÝÁl¿Þg¾0ñŽ¡?Ü=³9Q¬»3«ÖÁ¡µ ¾@4÷`FñÙŠfø7¥#›v¤Í–‰¼Ü¢ƒE6ôŽTÙlæÏz>o?àL›¡¢z1nÖo9]šOÔ¼SÔOÍ#Ý7LœÀî}ó5m%q•»l%/AWT´¢zH#t솱l¶£—Œ«©wˆÃ®>®Ü6ër-E
Decrypted text : Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old.

Other useful functionality

Passing around binary data like our secret key or the encrypted data, has its own problems. Groovy provides extension methods to encode such data (and corresponding decode methods). For example, we can encode our secret key in various ways:

var keyBytes = key.encoded
println keyBytes.encodeHex()
println keyBytes.encodeBase64()
println keyBytes.encodeBase64Url()

Which has this output (the key is random, so the output will differ for each run):

85a0d3f0ce0cbe6402dc9579fbffcf1d
haDT8M4MvmQC3JV5+//PHQ==
haDT8M4MvmQC3JV5-__PHQ

Groovy also provides extension methods for various checksums (but you might want to look at stronger checksum algorithms in security sensitive scenarios):

println "SHA256 : ${text.sha256()}"
println "MD5 : ${text.md5()}"

Which has this output:

SHA256 : ccb184e35e4c32bafc730d84ec924ea2980035ea5fadb012e3b2b31abf4323c9
MD5 : 46c61a174c2dc99204521ca89f09f63c

If you are encrypting and decrypting entire files, the JDK has special classes for that too which are also easy to use from Groovy. That's all for now.

References

Conclusion

We have taken a brief look at encrypting and decrypting with Apache Groovy.


Calculating Fibonacci with Groovy revisited

by paulk


Posted on Thursday September 08, 2022 at 10:59AM in Technology


In a recent post, we looked at using Matrices with Groovy including using matrices to calculate Fibonacci terms. But do you need that complexity to calculate Fibonacci? Not, not at all. You can do various one-liners for that scenario (to repeat the calculation from that post):

Stream.iterate([1, 1], f -> [f[1], f.sum()]).limit(8).toList()*.head()

The previous post was more about using matrices than Fibonacci per se, though hopefully learning that the Fibonacci matrix was a specialisation of the Leslie matrix was an added bonus.

Let's have a look at a few other options to write Fibonacci methods in Groovy.

Iterative style

Unless you learned a functional programming language as your first language, you may have written an iterative Factorial or Fibonacci as one of your first programming learning exercises. Such an algorithm for Fibonacci could look something like this:

def fib(n) {
if (n <= 1) return n

def previous = n.valueOf(1), next = previous, sum
(n-2).times {
sum = previous
previous = next
next = sum + previous
}
next
}

assert fib(10) == 55
assert fib(50L) == 12586269025L
assert fib(100G) == 354224848179261915075G

The only interesting part to this solution is the use of dynamic idioms. We didn't provide an explicit type for n, so duck-typing means the method works fine for Integer, Long and BigInteger values. This implementation does all calculations using the type of the supplied n, so the user of the method controls that aspect.

Groovy gives the option to also specify an explicit type like Number or use TypeChecked or CompileStatic for type inference if you wanted. We'll see an example of those options later.

Recursive

Once you mastered iterative programming, your next programming learning exercise may have been the recursive version of Factorial or Fibonacci. For Fibonacci, you may have coded something like this:

def fib(n) {
if (n <= 1) return n
fib(n - 1) + fib(n - 2)
}

assert fib(10) == 55
assert fib(50L) == 12586269025L
assert fib(100G) == 354224848179261915075G

This naïve version is incredibly inefficient. Calling fib(6) ends up calculating fib(2) five times for instance:

There are several ways to avoid that repetition. One option is to use the @Memoized annotation. Adding this annotation on a method causes the compiler to inject appropriate code for caching results into the method. This is ideal for pure functions like Fibonacci since they always return the same output for a given input. There are annotation attributes to adjust how big such a cache might be, but that sophistication isn't needed here.

@Memoized
def fib(n) {
if (n <= 1) return n
fib(n - 1) + fib(n - 2)
}

assert fib(10) == 55
assert fib(50L) == 12586269025L
assert fib(100G) == 354224848179261915075G

This now runs just as quickly as the iterative method. If you happened to use a Closure instead of a method, you can call one of the memoize methods on Closure.

A problem with this approach (in fact recursion in general) is that you hit a stack overflow exception for larger values of n, e.g. fib(500G). Groovy supports tail call elimination with the inclusion of the TailRecursive annotation. In this case the compiler injects an "unrolled" non-recursive version of the algorithm. In order for the "unrolling" to succeed, the algorithm needs to be re-worked so that at most one call to fib occurs in any return statement. Here is a version of the algorithm re-worked in this way:

@TailRecursive
static fib(n, a, b) {
if (n == 0) return a
if (n == 1) return b
fib(n - 1, b, a + b)
}

assert fib(10, 0, 1) == 55
assert fib(50L, 0L, 1L) == 12586269025L
assert fib(100G, 0G, 1G) == 354224848179261915075G
assert fib(500G, 0G, 1G) == 139423224561697880139724382870407283950070256587697307264108962948325571622863290691557658876222521294125G

This is slightly more complicated to understand than the original if you haven't seen it before but now is both efficient and handles large values of n. We can compile statically for even faster speed like this:

@TailRecursive
@CompileStatic
static fib(Number n, Number a, Number b) {
if (n == 0) return a
if (n == 1) return b
fib(n - 1, b, a + b)
}

assert fib(10, 0, 1) == 55
assert fib(50L, 0L, 1L) == 12586269025L
assert fib(100G, 0G, 1G) == 354224848179261915075G
assert fib(500G, 0G, 1G) == 139423224561697880139724382870407283950070256587697307264108962948325571622863290691557658876222521294125G

If you are using a Closure, you would look at using the trampoline method on Closure to achieve a similar result.

Streams

We saw the Stream based "one-liner" solution at the start of this blog post. Let's adopt the duck-typing idioms we have used so far and define a fib method. It could look like this:

def fib(n) {
def zero = n.valueOf(0)
def one = n.valueOf(1)
Stream.iterate([zero, one], t -> [t[1], t.sum()])
.skip(n.longValue())
.findFirst().get()[0]
}

assert fib(10) == 55
assert fib(50L) == 12586269025L
assert fib(100G) == 354224848179261915075G

Bytecode and AST transforms

Finally, just so you know all your options, here is a version using the @Bytecode AST transform which lets you write JVM bytecode directly in your Groovy! Note this falls into the category of "don't ever ever do this" but just so you know you can, it is included here:

@Bytecode
int fib(int i) {
l0
iload 1
iconst_2
if_icmpgt l1
iconst_1
_goto l2
l1
frame SAME
aload 0
iload 1
iconst_2
isub
invokevirtual '.fib','(I)I'
aload 0
iload 1
iconst_1
isub
invokevirtual '.fib', '(I)I'
iadd
l2
frame same1,'I'
ireturn
}

assert fib(10) == 55

Please read the caveats for that transform before considering using it for anything but extreme situations. It's meant more as a fun thing to try than something anyone would want to do in production.

Have fun writing your own algorithms!



Solving cryptarithmetic puzzles with Groovy and constraint programming using Choco, JaCoP, and OR-Tools

by paulk


Posted on Monday September 05, 2022 at 01:43PM in Technology



Introduction

When writing solutions to problems, we frequently strive to hide away implementation details. In Object-oriented (OO) programming, we might build a rich hierarchy of classes with well-thought out methods so that our final solution can be expressed in terms of simple nouns and verbs (methods and class instances) in our domain model. When applying functional programming idioms, we will strive to emphasise the relationship between inputs and outputs and hide away side effects and iterative steps. Constraint programming (within the same family as logic programming) also strives to hide away details. Instead of expressing an iterative implementation, it focuses on expressing declarative properties of a solution. A solver is responsible for working out the exact implementation steps.

When using constraint programming, we develop a model consisting of variables, the domain of values each variable may hold, and additional constraints between the variables. A solver does all the work. It may employ heuristic searching, inference, propagation, symmetry and backtracking to find possible solutions. We may be able to (or want to, or need to) guide the solver as to which techniques and strategies it should use. Constraint programming has been used to solve various kinds of problems including scheduling problems, and excels at problems with combinatorial possibilities that are too irregular for other mathematical optimisations. Frequently used illustrative problems are Sudoku and Wordle solvers, N-Queen problems, and other kinds of puzzles. We'll just look at cryptarithmetic puzzles.

Cryptarithmetic Problems

Cryptarithmetic problems (also known as alphametics, verbal arithmetic, cryptarithm, and word addition) are a type of mathematical game where a mathematical equation is presented where digits in the equation are replaced by letters. Traditionally, each letter usually represents a unique number, and numbers don't start with the digit zero. If we look at one sample problem:

TO
+GO
=OUT

We can reason about what the solution can be by hand:

  • T, O, U, and G must be all different (game rule)
  • T, G, and O will be between 1 and 9, and U is between 0 and 9 (game rule)
  • If we added the two biggest 2-digit numbers, (99 + 99) we'd get 198, so O must be 1
  • Looking at the right-most "units" column, 1 + 1 equals 2, so T must be 2
  • Looking at the "tens" column, we know there is a carry of 1 (since O is 1) and we know T is 2, so G must be 8 or 9. If G was 9, U would be 1 but it can't be the same as O, so G must be 8 and U must be 0.

When solving by hand, we typically reason about individual columns and account for the "carry" to the next column. We'll come back to this point later but first, let's look at a slightly bigger problem:

SEND
+MORE
=MONEY

Solving with Brute Force

This problem isn't huge, so we can solve with brute force. We simply try all possible values for the letters in the puzzle:

for (s in 1..9)
for (e in 0..9)
for (n in 0..9)
for (d in 0..9)
for (m in 1..9)
for (o in 0..9)
for (r in 0..9)
for (y in 0..9)
if ([s, e, n, d, m, o, r, y].toSet().size() == 8) {
def send = 1000 * s + 100 * e + 10 * n + d
def more = 1000 * m + 100 * o + 10 * r + e
def money = 10000 * m + 1000 * o + 100 * n + 10 * e + y
if (send + more == money) {
println "s = $s, e = $e, n = $n, d = $d"
println "m = $m, o = $o, r = $r, y = $y"
}
}

This isn't very efficient though. It calculates 81 million combinations for the variables before skipping all but 1.5 million of them (since most won't be unique). All up it might execute in the low tens of seconds.

Alternatively, Groovy supports calculating permutations, so we can simplify our solution to a single for loop (with some tests to eliminate unhelpful iterations):

def digits = 0..9
for (p in digits.permutations()) {
if (p[-1] < p[-2]) continue
def (s, e, n, d, m, o, r, y) = p
if (s == 0 || m == 0) continue
def send = 1000 * s + 100 * e + 10 * n + d
def more = 1000 * m + 100 * o + 10 * r + e
def money = 10000 * m + 1000 * o + 100 * n + 10 * e + y
if (send + more == money) {
println "s = $s, e = $e, n = $n, d = $d"
println "m = $m, o = $o, r = $r, y = $y"
}
}

This has the advantage of only generating unique combinations. It will execute in seconds.

Running either of these solutions yields:

s = 9, e = 5, n = 6, d = 7
m = 1, o = 0, r = 8, y = 2

Using Constraint Programming

For the brute force approaches, we had a condition which checked any potential candidate answer to see if it was a correct solution. We had to be very explicit in how we wanted the potential candidates to be created. For constraint programming, we instead define variables to represent the problem, any known bounds on those variables, and we specify any other known properties of the solution, which in our case will be something similar to the condition we had to check if the answer was correct previously. Let's examine how to do that with three libraries, one with a variation.

Choco

Here is the code using the Choco library:

new Model("SEND+MORE=MONEY").with {
def (S, M) = ['S', 'M'].collect { intVar(it, 1, 9) }
def (E, N, D, O, R, Y) = ['E', 'N', 'D', 'O', 'R', 'Y'].collect { intVar(it, 0, 9) }

allDifferent(S, E, N, D, M, O, R, Y).post()

IntVar[] ALL = [
S, E, N, D,
M, O, R, E,
M, O, N, E, Y ]
int[] COEFFS = [
1000, 100, 10, 1,
1000, 100, 10, 1,
-10000, -1000, -100, -10, -1 ]

scalar(ALL, COEFFS, "=", 0).post()

println solver.findSolution()
}

We define our variables and their bounds (domain). We use an allDifferent global constraint to specify the uniqueness requirement and a scalar constraint that ensures that our variables multiplied by their respective scalar coefficients equal 0. This lets us factor in whether the particular variable is representing the "units" column, the "10s" column, the "100s" column etc. This captures the "puzzle addition" constraint. We then ask the solver to find the solution. We could just as easily have asked for all solutions (if more than one existed).

This is typical of how we solve such problems. We either define constraints directly between one or more variables or use whatever global constraints our library might support. If our library doesn't support the constraint we need, we find a way to express it using multiple simpler constraints.

The end result is that our code is more declarative than our brute force approaches, and the solution is found in tens of milliseconds. The solver has very efficient strategies for solving such puzzles.

JaCoP

We can solve the same problem using JaCoP:

def store = new Store()
def (S, M) = ['S', 'M'].collect { new IntVar(store, it, 1, 9) }
def (E, N, D, O, R, Y) = ['E', 'N', 'D', 'O', 'R', 'Y'].collect { new IntVar(store, it, 0, 9) }
var ctr = new Alldifferent(S, E, N, D, M, O, R, Y)
store.impose(ctr)

IntVar[] ALL = [
S, E, N, D,
M, O, R, E,
M, O, N, E, Y ]
int[] COEFFS = [
1000, 100, 10, 1,
1000, 100, 10, 1,
-10000, -1000, -100, -10, -1 ]
var lin = new LinearInt(ALL, COEFFS, "==", 0)
store.impose(lin)

var label = new DepthFirstSearch()
var select = new InputOrderSelect(store, ALL, new IndomainMin())
label.labeling(store, select)

There are some slight differences in this API, but nearly everything has a one-to-one correspondence to what we saw earlier. We are explicitly selecting search strategies and selection strategies here whereas with Choco, defaults were chosen for us. In both cases, explicit creation of such classes allows the strategies to be altered for particular scenarios if needed.

When run, the output looks like this:

Labeling has finished with return value of true
DFS1: DFS([S = 9, E = 5, N = 6, D = 7, M = 1, O = 0, R = 8, Y = 2], InputOrder, (org.jacop.search.IndomainMin@45394b31))

We can see here the code is very similar as is the execution time.

OR-Tools

We can repeat the solution using OR-Tools. Here is the code:

Loader.loadNativeLibraries()

new Solver('Send+More=Money').with {
def s = makeIntVar(1, 9, 's')
def e = makeIntVar(0, 9, 'e')
def n = makeIntVar(0, 9, 'n')
def d = makeIntVar(0, 9, 'd')
def m = makeIntVar(1, 9, 'm')
def o = makeIntVar(0, 9, 'o')
def r = makeIntVar(0, 9, 'r')
def y = makeIntVar(0, 9, 'y')

IntVar[] all = [s, e, n, d, m, o, r, y]
IntVar[] scalar = [s, e, n, d, m, o, r, e, m, o, n, e, y]
int[] coeffs = [
1000, 100, 10, 1, // S E N D +
1000, 100, 10, 1, // M O R E =
-10000, -1000, -100, -10, -1 // M O N E Y
]

addConstraint(makeScalProdEquality(scalar, coeffs, 0))
addConstraint(makeAllDifferent(all))

def db = makePhase(all, INT_VAR_DEFAULT, INT_VALUE_DEFAULT)
newSearch(db)
while (nextSolution()) {
println all.join(' ')
}
endSearch()

// Statistics
println "Solutions: ${solutions()}"
println "Failures: ${failures()}"
println "Branches: ${branches()}"
println "Wall time: ${wallTime()}ms"
}

which has this output:

s(9) e(5) n(6) d(7) m(1) o(0) r(8) y(2)
Solutions: 1
Failures: 5
Branches: 10
Wall time: 60ms

OR-Tools is written in C++ but has interfaces for numerous languages including Java - which is perfect for Groovy use.

Choco with JSR331

It is great to have multiple libraries to pick from but having a standard API can help switching between such libraries. This is where JSR331 comes in. It defines a standard API for interacting with constraint solvers and linear solves. Here we use a JSR331 implementation backed by an earlier version of the Choco library. The code looks like this:

import javax.constraints.*

ProblemFactory.newProblem('SEND+MORE=MONEY').with {
def (S, M) = ['S', 'M'].collect { variable(it, 1, 9) }
def (E, N, D, O, R, Y) = ['E', 'N', 'D', 'O', 'R', 'Y'].collect { variable(it, 0, 9) }

postAllDifferent(S, E, N, D, M, O, R, Y)

Var[] ALL = [
S, E, N, D,
M, O, R, E,
M, O, N, E, Y]
int[] COEFFS = [
1000, 100, 10, 1,
1000, 100, 10, 1,
-10000, -1000, -100, -10, -1]

post(COEFFS, ALL, '=', 0)

def solver = getSolver()
def solution = solver.findSolution()
println solution ?: 'No solution'
solver.logStats()
}
It is quite similar to earlier examples but now exclusively uses the JSR331 classes in the javax.constraint package. There are implementations of those classes backed by several implementations. So, indeed it would be possible to swap between them. When run, the output is:
Solution #1:
	 S[9] M[1] E[5] N[6] D[7] O[0] R[8] Y[2]

Having said that, at the time of writing, JSR331 popularity doesn't appear to be on the rise. Most folks using constraint programming libraries seem to be using the direct library classes. Indeed, the version of the Choco implementation used by the JSR331 implementation is over 10 years old.

Incorporating Carry

The scalar product global constraint we have used in the previous examples is very powerful and probably would be our first choice for this problem. We can, however, model constraint programming problems in multiple ways, so let's look at a solution that avoids that global constraint.

Instead, we will develop a model that mirrors how we reasoned about the original TO + GO = OUT problem that we solved by hand. For that, we just considered a column at a time and accounted for the carry. We'll explicitly introduce variables to hold the carry (0 if no carry, or 1 if there is a carry) into our model. Then we'll express the mathematical constraints that are applicable for each column.

Here is the code:

new Model("SEND+MORE=MONEY").with {
def (S, M) = ['S', 'M'].collect { intVar(it, 1, 9) }
def (E, N, D, O, R, Y) = ['E', 'N', 'D', 'O', 'R', 'Y'].collect { intVar(it, 0, 9) }
def C = (0..3).collect{ intVar("C$it", 0, 9) }

allDifferent(S, E, N, D, M, O, R, Y).post()
C[3] .eq(M).post() // C3 C2 C1 C0
C[2].add(S).add(M).eq(O.add(C[3].mul(10))).post() // S E N D
C[1].add(E).add(O).eq(N.add(C[2].mul(10))).post() // M O R E
C[0].add(N).add(R).eq(E.add(C[1].mul(10))).post() // -------------
D .add(E).eq(Y.add(C[0].mul(10))).post() // M O N E Y

println solver.findSolution()
}

We can see that there is now no scalar product global constraint any more but instead the constraints for each column.

When run, the output looks like this:

Solution: S=9, M=1, E=5, N=6, D=7, O=0, R=8, Y=2, C0=1, C1=1, C2=0, C3=1, sum_exp_1=9,
sum_exp_2=10, (C3*10)=10, sum_exp_3=10, sum_exp_4=6, sum_exp_5=6, (C2*10)=0, sum_exp_6=6,
sum_exp_7=7, sum_exp_8=15, (C1*10)=10, sum_exp_9=15, sum_exp_10=12, (C0*10)=10, sum_exp_11=12,

We can see that as we were defining our constraints for each column, subexpressions were being created in the model which are reflected in the solution. They are if you like, temporary calculations along the way to getting the answer - or more accurately a snapshot of ever changing temporary calculations. They don't form part of the answer that interests us, so we would be free to just print out the part of the solution which interests us if we wanted.

Creating a DSL

The previous example has lots of calls to add and mul methods. We can create a little bit of a DSL to provide some syntactic sugar to our previous examples to allow use of Groovy's operator overloading, support ranges when specifying the domain of a variable, and a few other niceties. Our code becomes:

model("SEND+MORE=MONEY") {
def (S, M) = ['S', 'M'].collect { intVar(it, 1..9) }
def (E, N, D, O, R, Y) = ['E', 'N', 'D', 'O', 'R', 'Y'].collect { intVar(it, 0..9) }
def C = intVarArray(4, 0..1)

[allDifferent(S, E, N, D, M, O, R, Y), // C3 C2 C1 C0
C[3] .eq(M), // S E N D
(C[2] + S + M).eq(O + C[3] * 10), // M O R E
(C[1] + E + O).eq(N + C[2] * 10), // -------------
(C[0] + N + R).eq(E + C[1] * 10), // M O N E Y
(D + E).eq(Y + C[0] * 10)]*.post()

println solver.findSolution()
}

It has the same output as previously.

You might wonder how the solver finds the solution. You can watch the variables in the debugger and use tools like choco-cpviz but it is a quite convoluted process until you are used to it. We'll try to give you a flavor of what is going on here. Basically, there will be various steps of pruning wherever possible and branching with possible backtracking. Below are some snapshots for our example above.

To start with, we have nearly 90 light green squares which represents our problem search space. We walk our way through the rules looking for ways to prune the search space:

choco_step1.png

 choco_step2.png

 choco_step3.png

choco_step4.png

choco_step5.png

choco_step6.png

choco_step7.png

choco_step8.png

choco_step9.png

choco_step10.png

choco_step11.png

choco_step12.png

choco_step13.png

choco_step14.png

As we are locking in the value of variables, we can substitute them into and simplify our constraints. When we reapply them, they will be quicker to evaluate and may reveal more information.

At this point we only have 2 of our variables locked down but our search space is nearly half what we started with and we have simplified some of our constraints. We would continue branching and solving at this point until we find our solution or determine that no solution is possible.

Looking at other languages

The example repo also contains solutions for this problem in other languages so you can compare and contrast including Clojure, Haskell (Frege), Java, JavaScript (Nashorn), Ruby (JRuby), Python (Jython), Kotlin, Lua (Luaj), Prolog (tuprolog), and Scala.

other language logos

Other examples

To wrap up, let's look at solving a few more examples (using Choco). We'll solve some of the examples from an interesting blog on the history of Cryptarithmetic problems:

  • ABCD * 4 = DCBA
  • AA + BB + CC = ABC
  • HALF + HALF = WHOLE
  • HALF + FIFTH + TENTH + TENTH + TENTH = WHOLE

Here is the code:

new Model("ABCD*4=DCBA").with {
def (A, D) = ['A', 'D'].collect { intVar(it, 1, 9) }
def (B, C) = ['B', 'C'].collect { intVar(it, 0, 9) }
def R = (0..2).collect { intVar(0, 9) }

allDifferent(A, B, C, D).post()
R[2].add(A.mul(4)).eq(D).post()
R[1].add(B.mul(4)).eq(C.add(R[2].mul(10))).post()
R[0].add(C.mul(4)).eq(B.add(R[1].mul(10))).post()
D.mul(4).eq(A.add(R[0].mul(10))).post()
solver.findAllSolutions().each {
println "$name: ${pretty(it, [A, B, C, D, ' * 4 = ', D, C, B, A])}\n$it\n"
}
}

new Model("AA+BB+CC=ABC").with {
def (A, B, C) = ['A', 'B', 'C'].collect { intVar(it, 1, 9) }
allDifferent(A, B, C).post()
A.mul(11).add(B.mul(11).add(C.mul(11))).eq(A.mul(100).add(B.mul(10)).add(C)).post()
solver.findAllSolutions().each {
println "$name: ${pretty(it, [A, A, ' + ', B, B, ' + ', C, C, ' = ', A, B, C])}\n$it\n"
}
}

new Model("HALF+HALF=WHOLE").with {
def (H, W) = ['H', 'W'].collect { intVar(it, 1, 9) }
def (A, E, F, L, O) = ['A', 'E', 'F', 'L', 'O'].collect { intVar(it, 0, 9) }
allDifferent(H, W, A, E, F, L, O).post()
IntVar[] ALL = [
H, A, L, F,
W, H, O, L, E]
int[] COEFFS = [
2000, 200, 20, 2,
-10000, -1000, -100, -10, -1]
scalar(ALL, COEFFS, "=", 0).post()
solver.findAllSolutions().each {
println "$name: ${pretty(it, [H, A, L, F, ' + ', H, A, L, F, ' = ', W, H, O, L, E])}\n$it\n"
}
}

new Model("HALF+FIFTH+TENTH+TENTH+TENTH=WHOLE").with {
def (H, F, T, W) = ['H', 'F', 'T', 'W'].collect { intVar(it, 1, 9) }
def (A, L, I, E, N, O) = ['A', 'L', 'I', 'E', 'N', 'O'].collect { intVar(it, 0, 9) }
allDifferent(H, F, T, W, A, L, I, E, N, O).post()
IntVar[] ALL = [
H, A, L, F,
F, I, F, T, H,
T, E, N, T, H,
T, E, N, T, H,
T, E, N, T, H,
W, H, O, L, E]
int[] COEFFS = [
1000, 100, 10, 1,
10000, 1000, 100, 10, 1,
10000, 1000, 100, 10, 1,
10000, 1000, 100, 10, 1,
10000, 1000, 100, 10, 1,
-10000, -1000, -100, -10, -1]
scalar(ALL, COEFFS, "=", 0).post()
solver.findAllSolutions().each {
def parts = [H, A, L, F, '+', F, I, F, T, H, '+', T, E, N, T, H, '+',
T, E, N, T, H, '+', T, E, N, T, H, '=', W, H, O, L, E]
println "$name: ${pretty(it, parts)}\n$it\n"
}
}

// helper method to print solutions
def pretty(model, parts) {
parts.collect { p -> p instanceof IntVar ? model.getIntVal(p) : p }.join()
}

which has this output:

ABCD*4=DCBA: 2178 * 4 = 8712
Solution: A=2, D=8, B=1, C=7, IV_1=3, IV_2=3, IV_3=0, (A*4)=8, sum_exp_4=8, (B*4)=4, ..., 

AA+BB+CC=ABC: 11 + 99 + 88 = 198
Solution: A=1, B=9, C=8, (A*11)=11, (B*11)=99, (C*11)=88, ..., 

HALF+HALF=WHOLE: 9604 + 9604 = 19208
Solution: H=9, W=1, A=6, E=8, F=4, L=0, O=2, 

HALF+HALF=WHOLE: 9703 + 9703 = 19406
Solution: H=9, W=1, A=7, E=6, F=3, L=0, O=4, 

HALF+HALF=WHOLE: 9802 + 9802 = 19604
Solution: H=9, W=1, A=8, E=4, F=2, L=0, O=6, 

HALF+FIFTH+TENTH+TENTH+TENTH=WHOLE: 6701+14126+25326+25326+25326=96805
Solution: H=6, F=1, T=2, W=9, A=7, L=0, I=4, E=5, N=3, O=8, 

You should see the common patterns used for solving these puzzles.

Further Information

Conclusion

We have looked at using Groovy and a few constraint programming libraries to solve a cryptarithmetic puzzles. Why not try solving some of your own puzzles.


Groovy List Processing Cheat Sheet

by paulk


Posted on Sunday August 28, 2022 at 08:46AM in Technology


Declaring lists

Several styles are supported for declaring lists:

var pets    = ['cat', 'canary', 'dog', 'fish', 'gerbil']  // idiomatic Groovy
var nums = [1, 2, 3, 4] as LinkedList // use 'as' for other kinds of list
var primes = List.of(2, 3, 5, 7) // Java 9+ style (immutable)
var range = 1..4 // a range is also a list (immutable)
var bigNums = [1000, 2000].asUnmodifiable() // unmodifiable (backed by original)
var negNums = [-100, -200].asImmutable() // immutable (backed by copy)

List elements and properties

Java methods for accessing list elements and list properties:

assert !pets.isEmpty()
assert pets.size() == 5
assert pets.get(0) == 'cat'
assert pets.contains('dog')
assert pets.containsAll('cat', 'dog')
pets.forEach { assert it.size() > 2 }
assert ['a', 'b', 'a'].indexOf('a') == 0
assert ['a', 'b', 'a'].lastIndexOf('a') == 2

Groovy extensions for accessing list elements and list properties:

assert pets[0] == 'cat'
assert pets?[0] == 'cat' // safe indexing returns null if pets was null instead of NPE
assert pets.first() == 'cat'
assert pets.head() == 'cat'
assert pets[-1] == 'gerbil'
assert pets[1..3] == ['canary', 'dog', 'fish']
assert pets[3..1] == ['fish', 'dog', 'canary'] // reverse range
assert pets[1, 3, 3] == ['canary', 'fish', 'fish'] // arbitrary collection
assert pets[0..1, [3, 3]] == ['cat', 'canary', 'fish', 'fish'] // nested collections
assert [1, 2, 3, 1].count(1) == 2
assert [1, 2, 3, 4].min() == 1
assert [1, 2, 3, 4].max() == 4
assert [1, 2, 3, 4].sum() == 10
assert [1, 2, 3, 4].average() == 2.5
[1, 2, 3, 4].eachWithIndex{ val, idx -> assert val == idx + 1 }
def cpets = pets[0..1]
assert cpets == ['cat', 'canary']
assert pets.findAll { it =~ /c.*/ } == cpets
assert pets.find { it =~ /c.*/ } == cpets[0]
assert pets.grep(~/c.*/) == cpets
assert cpets.min { it.size() } == 'cat' // cpet with smallest size name
assert cpets.max { it.size() } == 'canary' // cpet with largest name
assert pets.groupBy { it.size() } == [3: ['cat', 'dog'], 4: ['fish'], 6: ['canary', 'gerbil']]
assert pets.countBy { it.size() } == [3: 2, 4: 1, 6: 2]
assert pets.sum{ it.size() } == 22 // total size of all pet names
assert pets.average{ it.size() } == 4.4 // average size of pet names
assert pets.indices == 0..4
assert pets.findIndexValues { it.size() == 6 } == [1, 4]
assert cpets.indexed() == [0: 'cat', 1: 'canary']
assert cpets.withIndex()*.toList() == [['cat', 0], ['canary', 1]]

Modifying mutable lists

Methods from Java for modifying lists:

pets.remove(2);                        assert pets == ['cat', 'canary', 'fish', 'gerbil']  // remove by index
pets.remove('fish'); assert pets == ['cat', 'canary', 'gerbil'] // remove by element
pets.removeIf(p -> p.startsWith("c")); assert pets == ['gerbil'] // remove by condition
pets.clear(); assert pets.isEmpty() // make empty
pets.add("kangaroo"); assert pets == ['kangaroo'] // add element
pets.add(0, "koala"); assert pets == ['koala', 'kangaroo'] // add element at index position
pets.addAll(['ant', 'bee']); assert pets == ['koala', 'kangaroo', 'ant', 'bee'] // add collection
pets.addAll(1, ['ant']); assert pets == ['koala', 'ant', 'kangaroo', 'ant', 'bee'] // add collection at index
pets.removeAll(['ant', 'flea']); assert pets == ['koala', 'kangaroo', 'bee'] // remove from collection
pets.retainAll(['bee', 'koala']); assert pets == ['koala', 'bee'] // retain from collection
pets.set(0, "zebra"); assert pets == ['zebra', 'bee'] // set element at index
pets.replaceAll(String::toUpperCase); assert pets == ['ZEBRA', 'BEE'] // replace elements by unary operator
pets.sort(Comparator.naturalOrder()); assert pets == ['BEE', 'ZEBRA'] // sort

Groovy extensions for modifying lists:

pets << 'rock';                        assert pets == ['BEE', 'ZEBRA', 'rock']  // leftShift append
pets += ['rabbit', 'rock', 'hare']; assert pets == ['BEE', 'ZEBRA', 'rock', 'rabbit', 'rock', 'hare'] // append collection
pets.unique(); assert pets == ['BEE', 'ZEBRA', 'rock', 'rabbit', 'hare'] // remove duplicates
pets.sort{ it.size() }; assert pets == ['BEE', 'rock', 'hare', 'ZEBRA', 'rabbit'] // sort by size
pets[0] = 'ant'; assert pets == ['ant', 'rock', 'hare', 'ZEBRA', 'rabbit'] // replace element by index
pets[1..3] = ['snake', 'SNAKE']; assert pets == ['ant', 'snake', 'SNAKE', 'rabbit'] // replace range of elements
pets.unique{it.toLowerCase() }; assert pets == ['ant', 'snake', 'rabbit'] // remove duplicates ignoring case
pets.reverse(true); assert pets == ['rabbit', 'snake', 'ant'] // flip
pets.shuffle(); assert pets.size() == 3; // shuffle elements; resulting order will vary
def dice = [1, 2, 3, 4, 5, 6]
dice.removeAt(2); assert dice == [1, 2, 4, 5, 6]
dice.removeElement(2); assert dice == [1, 4, 5, 6]
dice.removeLast(); assert dice == [1, 4, 5]
dice.swap(0, 2); assert dice == [5, 4, 1]

Additional list functionality

Methods and operators which return new lists or values:

assert [1, 2, 3] + [1] == [1, 2, 3, 1]
assert [1, 2, 3, 1] - [1] == [2, 3]
assert [1, 2, 3] * 2 == [1, 2, 3, 1, 2, 3]
assert [1, [2, 3]].flatten() == [1, 2, 3]
assert [1, 2, 3].disjoint([4, 5, 6])
assert [1, 2, 3].intersect([4, 3, 1]) == [1, 3]
assert [1, 2, 3].collect { it + 3 } == [4, 5, 6]
assert [1, 2, 3, 4].collect { 1 } == [1, 1, 1, 1]
assert [4, 2, 1, 3].findAll { it % 2 == 0 } == [4, 2]
assert [1, 2, 3, 4].take(3) == [1, 2, 3]
assert [1, 2, 3, 4].takeRight(3) == [2, 3, 4]
assert [1, 2, 3, 4].takeWhile{ it < 3 } == [1, 2]
assert [1, 2, 3, 4].drop(2) == [3, 4]
assert [1, 2, 3, 4].dropRight(2) == [1, 2]
assert [1, 2, 3, 4].dropWhile{it < 3 } == [3, 4]
assert [1, 2, 3, 4].join('-') == '1-2-3-4'
assert [1, 2, 3, 4].tail() == [2, 3, 4]
assert [1, 2, 3, 4].init() == [1, 2, 3]
assert [1, 2, 3, 4].tails() == [[1, 2, 3, 4], [2, 3, 4], [3, 4], [4], []]
assert [1, 2, 3, 4].inits() == [[1, 2, 3, 4], [1, 2, 3], [1, 2], [1], []]
assert [1, 2, 3, 4].reverse() == [4, 3, 2, 1]
assert [1, 2, 3, 1].toUnique() == [1, 2, 3]
assert [1, 2, 3, 1].toSorted() == [1, 1, 2, 3]
assert [1, 2, 3, 4].collect { it * 2 } == [2, 4, 6, 8]
assert [[1, 2], [3, 4]].collectNested { it * 2 } == [[2, 4], [6, 8]]
def squaresAndCubesOfEvens = { it % 2 ? [] : [it**2, it**3] }
assert [1, 2, 3, 4].collectMany(squaresAndCubesOfEvens) == [4, 8, 16, 64]
assert [1, 2, 3, 4].any { it > 3 }
assert [1, 2, 3, 4].every { it < 5 }
assert ![1, 2, 3, 4].every { it > 3 }
assert [1, 2, 3, 4].find { it > 2 } == 3
assert [1, 2, 3, 4].findAll { it > 2 } == [3, 4]
assert [1, 2, 3, 4].findIndexOf { it > 2 } == 2
assert [1, 2, 3, 1].findLastIndexOf { it > 2 } == 2
assert [1, 2, 3, 4].inject { acc, i -> acc + i } == 10
assert (1..10).collate(3) == [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
assert (1..10).chop(1, 3, 2, -1) == [[1], [2, 3, 4], [5, 6], [7, 8, 9, 10]]
assert [1,2,3].permutations().toList() == [
[1, 2, 3], [3, 2, 1], [2, 1, 3], [3, 1, 2], [1, 3, 2], [2, 3, 1]
]
def matrix = [['a', 'b'], [ 1 , 2 ]]
assert matrix.transpose() == [ ['a', 1], ['b', 2] ]
assert matrix.combinations() == [ ['a', 1], ['b', 1], ['a', 2], ['b', 2] ]
assert [1, 2, 3].subsequences()*.toList() == [[1], [1, 2, 3], [2], [2, 3], [1, 2], [3], [1, 3]]
def answers = [1, 2, 3].withDefault{ 42 }
assert answers[2] == 3 && answers[99] == 42

GINQ processing

Groovy also supports language integrated query support to process lists:

// squares of odd numbers between 1 and 5
assert [1, 9, 25] == GQL {
from n in 1..5 where n % 2 != 0 select n ** 2
}

// group pets by name size
assert ["3:[cat, dog]", "4:[fish]", "6:[canary, gerbil]"] == GQL {
from p in pets
groupby p.size() as size
select size, agg(p) as names
}*.with{ "$it.size:$it.names" }

Stream methods

Useful stream methods (it is worthwhile comparing some of the examples to earlier non-stream variants):

pets = ['cat', 'canary', 'dog', 'fish', 'gerbil']
assert pets.stream().filter(p -> p.size() == 3).map(String::toUpperCase).toList() == ['CAT', 'DOG']
assert pets.stream().map(p -> p.size()).distinct().sorted().toList() == [3, 4, 6] // ordered pet name sizes
assert nums.stream().reduce{ a, b -> a + b }.get() == 10
assert (1..10).stream().skip(3).limit(5).filter(i -> i % 2 == 0).map(i -> i ** 2).toList() == [16, 36, 64]
assert [1, 2, 3, 4].stream().flatMap(i -> i % 2 ? Stream.empty() : Stream.of(i**2, i**3)).toList() == [4, 8, 16, 64]
assert pets.stream().collect(Collectors.groupingBy(p -> p.size())) == [3:['cat', 'dog'], 4:['fish'], 6:['canary', 'gerbil']]
assert [1, 2, 3, 4].stream().map(Integer::toString).collect(Collectors.joining('-')) == '1-2-3-4'
Arrays.stream(0..9 as int[]).summaryStatistics().with {
assert sum == 45 && min == 0 && max == 9 && average == 4.5 && count == 10
}
assert pets.stream().allMatch(w -> w ==~ /.*[aeiou].*/) // all pet names contain a vowel

GPars

Before looking at GPars, it is worth looking at parallel stream processing:

// calculate squares of odd numbers from input list
assert (1..5).parallelStream().filter{ it % 2 != 0 }.map(n -> n ** 2).toList() == [1, 9, 25]

GPars was designed to provide similar functionality long before streams processing was available. It still has some useful features.

Groovy has several tricks for removing the outer "withPool" clauses but we'll do the longhand here. Two GPars variations of above streams example:

GParsPool.withPool {
assert (1..5).findAllParallel{ it % 2 }.collectParallel{ it ** 2 } == [1, 9, 25]
assert (1..5).parallel.filter{ it % 2 }.map{ it ** 2 }.collection == [1, 9, 25]
}

Or using (JEP 425) virtual threads:

GParsExecutorsPool.withExistingPool(Executors.newVirtualThreadPerTaskExecutor()) {
assert (1..5).findAllParallel{ it % 2 }.collectParallel{ it ** 2 } == [1, 9, 25]
}

Other libraries

There are numerous list-related libraries on the JVM. We'll look at just a few.

Eclipse collections

Eclipse collections comes with many container types including immutable collections, primitive collections, bimaps, multimaps and bags as well as numerous utility methods. It focuses on reduced memory footprint and efficient containers. It might be particularly of interest if you need primitive collections, immutable collections or some more exotic collection types like bag or bidirectional maps. Here are just a few examples:

var certainties = Lists.immutable.of('death', 'taxes')
assert certainties.reduce{ a, b -> "$a & $b" }.get() == 'death & taxes'
var numBag = Bags.immutable.with('One', 'One', 'Two', 'Three')
assert numBag.toMapOfItemToCount() == [One:2, Two:1, Three:1]
var biMap = BiMaps.immutable.with(6, "six", 2, "two")
assert biMap.inverse().six == 6

Guava

Guava provides a number of extensions to the JDK collections ecosystem. In particular, it has immutable collections, new collection types like multisets and bidirectional maps and various powerful extensions and utilities. Here are a few examples:

var set = TreeMultiset.create([1, 2, 3])
assert set == TreeMultiset.create([3, 2, 1])
set.addAll([1, 3, 5])
assert set.size() == 6 && set.elementSet().size() == 4
assert set.toList() == [1, 1, 2, 3, 3, 5]
var bimap = HashBiMap.create()
bimap.five = 5
assert bimap.inverse()[5] == 'five'

Apache Commons Collections

The Apache Commons Collections library extends upon the JDK collections framework adding some new types like bidirectional maps and bags as well as providing many comparator and iterator implementations. The library was designed to fill gaps in the JDK offerings and while some of those gaps in the JDK have now been filled by the JDK itself, Commons Collections still contains much useful functionality. Here are a few examples:

var six = [six: 6] as TreeBidiMap
assert six.inverseBidiMap() == [6: 'six']
var bag = new HashBag(['one'] * 6)
bag.remove('one', 2)
assert bag.getCount('one') == 4

Further Information

Conclusion

We have looked at the more common methods for list processing with Groovy and a few other useful libraries.



Matrix calculations with Groovy, Apache Commons Math, ojAlgo, Nd4j and EJML

by paulk


Posted on Thursday August 18, 2022 at 01:41PM in Technology


This blogs looks at performing matrix calculations with Groovy using various libraries: Apache Commons Math, ojAlgo, EJML, and Nd4j (part of Eclipse Deeplearning4j). We'll also take a quick look at using the incubating Vector API for matrix calculations (JEPs 338, 414417426).

Fibonacci

The Fibonacci sequence has origins in India centuries earlier but is named after the Italian author of the publication, The Book of Calculation, published in 1202. In that publication, Fibonacci proposed the sequence as a means to calculate the growth of idealized (biologically unrealistic) rabbit populations. He proposed that a newly born breeding pair of rabbits are put in a field; each breeding pair mates at the age of one month, and at the end of their second month they always produce another pair of rabbits; and rabbits never die, but continue breeding forever. Fibonacci posed the puzzle: how many pairs will there be in one year? The sequence goes like this:

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233

We can solve this problem using matrices. If we multiply the matrix 2022-08-18 23_42_31-ASF Blogs_ Edit Entry.png by itself n times we get 2022-08-18 23_42_48-ASF Blogs_ Edit Entry.png. This is an operation known as matrix exponentiation. Let's explore this problem using four of the most popular and maintained matrix libraries.

Apache Commons Math

Let's explore solving this problem using Apache Commons Math. Apache Commons Math is a library of lightweight self-contained mathematics and statics components. Matrices are part of the linear algebra part of this library and for that context, matrices of double values are relevant. So, we'll represent our Fibonacci numbers as double values.

double[][] data = [[1d, 1d], [1d, 0d]]
def m = MatrixUtils.createRealMatrix(data)
println m * m * m
println m**6

Commons math has a factory method for creating matrixes from double arrays. The names of the methods for multiplication and exponentiation happen to align with Groovy's methods available for operator overloading, namely multiply and power, so we can use Groovy's convenient shorthands.

When we run the script, the output looks like this:

Array2DRowRealMatrix{{3.0,2.0},{2.0,1.0}}
Array2DRowRealMatrix{{13.0,8.0},{8.0,5.0}}

We could go a little further and print the values from the matrix, but the result is clear enough. We see the values in the Fibonacci sequence appearing in the output.

EJML

EJML (Efficient Java Matrix Library) is a linear algebra library for manipulating real, complex, dense, and sparse matrices. It is also a 100% Java solution. It has some novel features including import from Matlab and support for semirings (GraphBLAS) which can be used for graph algorithms where a sparse matrix may be used to represent a graph as an adjacency matrix or incidence matrix.

We can do the same calculation using EJML:

def m = new SimpleMatrix([[1d, 1d], [1d, 0d]] as double[][])
def ans = m.mult(m).mult(m)
println ans
6.times { ans = ans.mult(m) }
println ans

The name of the multiply method differs from the one where we can use automatic operator overloading shorthands, so we just call the method that EJML provides. See a little later under language extensibility on how we could in fact add support to use the same shorthands as we saw for Commons Math.

EJML doesn't have an exponentiation method but we just call multiply the requisite number of times to achieve the same effect. Note that we bumped the number of iterations called for the second matrix, to reveal the next bunch of elements from the Fibonacci sequence.

Running this script has this output:

Type = DDRM , rows = 2 , cols = 2
 3.0000E+00  2.0000E+00 
 2.0000E+00  1.0000E+00 

Type = DDRM , rows = 2 , cols = 2
 5.5000E+01  3.4000E+01 
 3.4000E+01  2.1000E+01

The first matrix has the same number as previously, and the second reveals the next numbers in the Fibonacci sequence (21, 34, and 55).

Nd4j

Nd4j provides functionality available on Python to Java users. It contains a mix of numpy operations and tensorflow/pytorch operations. Nd4l makes use of native backends to allow it to work on different platforms and provide efficient operation when scaling up.

The code for our Fibonacci solution is very similar to EJML:

def m = Nd4j.create([[1, 1], [1, 0]] as int[][])
def ans = m.mmul(m).mmul(m)
println ans
9.times { ans = ans.mmul(m) }
println ans

One feature that is different to the previous two libraries is that Nd4j supports integer matrices (as well as doubles and numerous other numerical and related types).

Running the script gives the following output:

...
[main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [CpuBackend] backend
...
[main] INFO org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory - Binary level Generic x86 optimization level AVX/AVX2
[main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Blas vendor: [OPENBLAS]
...
[[         3,         2], 
 [         2,         1]]
[[       233,       144], 
 [       144,        89]]

Again, the first matrix is the same as we have seen previously, the second has been bumped along the Fibonacci sequence by three more elements.

ojAlgo

The next library we'll look at is ojAlgo (oj! Algorithms). It is an open source all-Java offering for mathematics, linear algebra and optimisation, supporting data science, machine learning and scientific computing. It claims to be the fastest pure-Java linear algebra library available and the project website provide links to numerous benchmarks backing that claim.

Here is the code for our Fibonacci example:

def factory = Primitive64Matrix.FACTORY
def m = factory.rows([[1d, 1d], [1d, 0d]] as double[][])
println m * m * m
println m**15

We can see it supports the same operator overloading features we saw for Commons Math.

When we run the script, it has the following output:

org.ojalgo.matrix.Primitive64Matrix < 2 x 2 >
{ { 3.0,	2.0 },
{ 2.0,	1.0 } }
org.ojalgo.matrix.Primitive64Matrix < 2 x 2 >
{ { 987.0,	610.0 },
{ 610.0,	377.0 } }

As expected, the first matrix is as we've seen before, while the second reveals the next three numbers in the sequence.

Exploring the Vector API and EJML

From JDK16, various versions (JEPs 338414417426) of the Vector API have been available as an incubating preview feature. The HotSpot compiler has previously already had some minimal optimisations that can leverage vector hardware instructions but the Vector API expands the scope of possible optimisations considerably. We could look at writing our own code that might make use of the Vector API and perhaps perform our matrix multiplications ourselves. But, one of the libraries has already done just that, so we'll explore that path.

The main contributor to the EJML library has published a repo for the purposes of very early prototyping and benchmarking. We'll use the methods from one of its classes to explore use of the vector API for our Fibonacci example. The MatrixMultiplication class has three methods: mult_ikj_simple is coded in the way any of us might write a multiplication method as a first pass from its definition without any attempts at optimisation, mult_ikj is coded in a highly-optimised fashion and corresponds to the code EJML would normally use, and mult_ikj_vector uses the Vector API. Note, you can think of these methods as "one layer down" from the mult method we called in the previous example, i.e. the mult method we used previously would be calling one of these under the covers. That's why we pass the internal "matrix" representation instead of our SimpleMatrix instance.

For our little calculations, the optimisations offered by the Vector API would not be expected to be huge. However, we'll do our calculation for generating the matrix we did as a first step for all of the libraries and we'll do it in a loop with 1000 iterations for each of the three methods (mult_ikj_simple, mult_ikj, and mult_ikj_vector). The code looks like this:

def m = new SimpleMatrix([[1, 1], [1, 0]] as double[][])
double[] expected = [3, 2, 2, 1]
def step1, result
long t0 = System.nanoTime()
1000.times {
step1 = new SimpleMatrix(2, 2)
result = new SimpleMatrix(2, 2)
MatrixMultiplication.mult_ikj_simple(m.matrix, m.matrix, step1.matrix)
MatrixMultiplication.mult_ikj_simple(step1.matrix, m.matrix, result.matrix)
assert result.matrix.data == expected
}
long t1 = System.nanoTime()
1000.times {
step1 = new SimpleMatrix(2, 2)
result = new SimpleMatrix(2, 2)
MatrixMultiplication.mult_ikj(m.matrix, m.matrix, step1.matrix)
MatrixMultiplication.mult_ikj(step1.matrix, m.matrix, result.matrix)
assert result.matrix.data == expected
}
long t2 = System.nanoTime()
1000.times {
step1 = new SimpleMatrix(2, 2)
result = new SimpleMatrix(2, 2)
MatrixMultiplication.mult_ikj_vector(m.matrix, m.matrix, step1.matrix)
MatrixMultiplication.mult_ikj_vector(step1.matrix, m.matrix, result.matrix)
assert result.matrix.data == expected
}
long t3 = System.nanoTime()
printf "Simple: %6.2f ms\n", (t1 - t0)/1000_000
printf "Optimized: %6.2f ms\n", (t2 - t1)/1000_000
printf "Vector: %6.2f ms\n", (t3 - t2)/1000_000

This example was run on JDK16 with the following VM options: "--enable-preview --add-modules jdk.incubator.vector".

The output looks like this:

WARNING: Using incubator modules: jdk.incubator.vector
Simple:    116.34 ms
Optimized:  34.91 ms
Vector:     21.94 ms

We can see here that we have some improvement even for our trivial little calculation. Certainly, for biggest problems, the benefit of using the Vector API could be quite substantial.

We should give a big disclaimer here. This little microbenchmark using a loop of 1000 will give highly variable results and was just done to give a very simple performance comparison. For a more predictable comparison, consider running the jmh benchmark in the aforementioned repo. And you may wish to wait until the Vector API is out of preview before relying on it for any production code - but by all means, consider trying it out now.

Leslie Matrices

Earlier, we described the Fibonacci sequence as being for unrealistic rabbit populations, where rabbits never died and continued breeding forever. It turns out that Fibonacci matrices are a special case of a more generalised model which can model realistic rabbit populations (among other things). These are Leslie matrices. For Leslie matrices, populations are divided into classes, and we keep track of birth rates and survival rates over a particular period for each class. We store this information in a matrix in a special form. The populations for each class for the next period can be calculated from those for the current period through multiplication by the Leslie matrix.

This technique can be used for animal populations or human population calculations. A Leslie matrix can help you find out if there will be enough GenX, Millenials, and GenZ tax payers to support an aging and soon retiring baby boomer population. Sophisticated animal models might track populations for an animal and for its predators or its prey. The survival and birth rates might be adjusted based on such information. Given that only females give birth, Leslie models will often be done only in terms of the female population, with the total population extrapolated from that.

We'll show an example for kangaroo population based on this video tutorial for Leslie matrices. It can help us find out if the kangaroo population is under threat (perhaps drought, fires or floods have impacted their feeding habitat) or if favorable conditions are leading to overpopulation.

Following that example, we divide kangaroos into 3 population classes: ages 0 to 3, 3 to 6, and 6 to 9. We are going to look at the population every three years. The 0-3 year olds birthrate (B1) is 0 since they are pre-fertile. The most fertile 3-6 year olds birthrate (B2) is 2.3. The old roos (6-9) have a birthrate (B3) of 0.4. We assume no kangaroos survive past 9 years old. 60% (S1) of the young kangaroos survive to move into the next age group. 30% (S2) of the middle-aged kangaroos survive into old age. Initially, we have 400 kangaroos in each age group.

Here is what the code looks like for this model:

double[] init = [400,   // 0..<3
400, // 3..<6
400] // 6..9
def p0 = MatrixUtils.createRealVector(init)
println "Initial populations: $p0"

double[][] data = [
[0 , 2.3, 0.4], // B1 B2 B3
[0.6, 0, 0 ], // S1 0 0
[0 , 0.3, 0 ] // 0 S2 0
]
def L = MatrixUtils.createRealMatrix(data)
def p1 = L.operate(p0)
println "Population after 1 round: $p1"

def p2 = L.operate(p1)
println "Population after 2 rounds: $p2"

def L10 = L ** 10
println "Population after 10 rounds: ${L10.operate(p0).toArray()*.round()}"

This code produces the following output:

Initial populations: {400; 400; 400}
Population after 1 round: {1,080; 240; 120}
Population after 2 rounds: {600; 648; 72}
Population after 10 rounds: [3019, 2558, 365]

After the first round, we see many young roos but a worrying drop off in the older age groups. After the second round, only the oldest age group looks worryingly small. However, with the healthy numbers in the young generation, we can see that after 10 generations that indeed, the overall population is not at risk. In fact, overpopulation might become a problem.

Encryption with matrices

An early technique to encrypt a message was the Caesar cipher. It substitutes letters in the alphabet by the letter shifted a certain amount along in the alphabet, e.g. "IBM" becomes "HAL" if shifting to the previous letter and "VMS" becomes "WNT" if shifting one letter forward in the alphabet. This kind of cipher can be broken by looking at frequency analysis of letters or pattern words.

The Hill cipher improves upon the Caesar cipher by factoring multiple letters into each letter of the encrypted text. Using matrices made it practical to look at three or more symbols at once. In general, an N-by-N matrix (the key) is multiplied by an encoding of the message in matrix form. The result is a matrix representation (encoded form) of the encrypted text. We use the inverse matrix of our key to decrypt or message.

We need to have a way to convert our text message to and from a matrix. A common scheme is to encode A as 1, B as 2, and so on. We'll just use the ascii value for each character. We define encode and decode methods to do this:

double[][] encode(String s) { s.bytes*.intValue().collate(3) as double[][] }
String decode(double[] data) { data*.round() as char[] }

We'll define a 2-by-2 matrix as our key and use it to encrypt. We'll find the inverse of our key and use that to decrypt. If we wanted to, we could use a 3-by-3 key for improved security at the cost of more processing time.

Our code looks like this:

def message = 'GROOVY'
def m = new SimpleMatrix(encode(message))
println "Original: $message"

def enKey = new SimpleMatrix([[1, 3], [-1, 2]] as double[][])
def encrypted = enKey.mult(m)
println "Encrypted: ${decode(encrypted.matrix.data)}"

def deKey = enKey.invert()
def decrypted = deKey.mult(encrypted)
println "Decrypted: ${decode(decrypted.matrix.data)}"

When run, it has the following output:

Original: GROOVY
Encrypted: ĴŔŚWZc
Decrypted: GROOVY

This offers far more security than the Caesar cipher, however, given today's computing availability, Hill ciphers can still eventually be broken with sufficient brute force. For this reason, Hill ciphers are seldom used on their own for encryption but they are often used in combination with other techniques to add diffusion - strengthening the security offered by the other techniques.

Shape manipulation

Our final example looks at geometrically transforming shapes. To do this, we represent the points of the shape as vectors and multiply them using transforms represented as matrices. We need only worry about the corners, since we'll use Swing to draw our shape and it has a method for drawing a polygon by giving its corners.

First we'll use Groovy's SwingBuilder to set up our frame:

new SwingBuilder().edt {
def frame = frame(title: 'Shapes', size: [420, 440], show: true, defaultCloseOperation: DISPOSE_ON_CLOSE) {
//contentPane.background = Color.WHITE
widget(new CustomPaintComponent())
}
frame.contentPane.background = Color.WHITE
}

We aren't really use much of SwingBuilder's functionality here but if we wanted to add more functionality, SwingBuilder would make that task easier.

We will actually draw our shapes within a custom component. We'll define a few color constants, a drawPolygon method which given a matrix of points will draw those points as a polygon. We'll also define a vectors method to convert a list of points (the corners) into vectors, and a transform method which is a factory method for creating a transform matrix.

Here is the code:

class CustomPaintComponent extends Component {
static final Color violet = new Color(0x67, 0x27, 0x7A, 127)
static final Color seaGreen = new Color(0x69, 0xCC, 0x67, 127)
static final Color crystalBlue = new Color(0x06, 0x4B, 0x93, 127)
static drawPolygon(Graphics g, List pts, boolean fill) {
def poly = new Polygon().tap {
pts.each {
addPoint(*it.toRawCopy1D()*.round()*.intValue().collect { it + 200 })
}
}
fill ? g.fillPolygon(poly) : g.drawPolygon(poly)
}

static List<Primitive64Matrix> vectors(List<Integer>... pts) {
pts.collect{ factory.column(*it) }
}

static transform(List<Number>... lists) {
factory.rows(lists as double[][])
}

void paint(Graphics g) {
g.drawLine(0, 200, 400, 200)
g.drawLine(200, 0, 200, 400)
g.stroke = new BasicStroke(2)

def triangle = vectors([-85, -150], [-145, -30], [-25, -30])
g.color = seaGreen
drawPolygon(g, triangle, true)
// transform triangle ...

def rectangle = vectors([0, -110], [0, -45], [95, -45], [95, -110])
g.color = crystalBlue
drawPolygon(g, rectangle, true)
// transform rectangle ...

def trapezoid = vectors([50, 50], [70, 100], [100, 100], [120, 50])
g.color = violet
drawPolygon(g, trapezoid, true)
// transform trapezoid ...
}
}

When we run this code we see our three shapes:

2022-08-18 00_02_11-Shapes.png

We can now add our transforms. We'll have one which rotate by 90 degrees anti-clockwise. Another which enlarges a shape by 25% and one that shrinks a shape by 25%. We can combine transforms simply by multiplying them together. We'll make two transformations of our triangle. We'll rotate in both cases but we'll shrink one and enlarge the other. We apply the transform simply by multiplying each point by the transform matrix. Then we'll draw both transformed shapes as an outline rather than filled (to make it easier to distinguish the original and transformed versions). Here is the code:

def rot90 = transform([0, 1], [-1, 0])
def bigger = transform([1.25, 0], [0, 1.25])
def smaller = transform([0.75, 0], [0, 0.75])
def triangle_ = triangle.collect{ rot90 * bigger * it }
drawPolygon(g, triangle_, false)
triangle_ = triangle.collect{ rot90 * smaller * it }
drawPolygon(g, triangle_, false)

For our rectangle, we'll have one simple transform which flips the shape in the vertical axis. A second transform combines multiple changes in one transform. We could have split this into smaller transforms and the multiplied them together - but here they are all in one. We flip in the horizontal access and then apply a shear. We then draw the transformed shapes as outlines:

def flipV = transform([1, 0], [0, -1])
def rectangle_ = rectangle.collect{ flipV * it }
drawPolygon(g, rectangle_, false)
def flipH_shear = transform([-1, 0.5], [0, 1])
rectangle_ = rectangle.collect{ flipH_shear * it }
drawPolygon(g, rectangle_, false)

For our trapezoid, we create a transform which rotates 45 degrees anti-clockwise (recall sin 45° = cos 45° = 0.707). Then we create 6 transforms rotating at 45, 90, 135 and so forth. We draw each transformed shape as an outline:

def rot45 = transform([0.707, 0.707], [-0.707, 0.707])
def trapezoid_
(1..6).each { z ->
trapezoid_ = trapezoid.collect{ rot45 ** z * it }
drawPolygon(g, trapezoid_, false)
}

When we run the entire example, here is what the output looks like:

2022-08-18 00_00_48-Shapes.png

We can see here that matrix transforms give us powerful ways to manipulate images. We have used 2D images here, but the same techniques would apply to 3D shapes.

Language and tool extensibility

We saw earlier that some of the examples could make use of Groovy operator shorthand syntax, while others couldn't. Here is a summary of some common methods in the libraries:

Groovy operator+-***
Groovy methodplusminusmultiplypower
Commons mathaddsubtractmultiplypower
EJMLplusminusmult-
Nd4jaddsubmmul-
ojAlgoaddsubtractmultiplypower

Where the library used the same name as Groovy's method, the shorthand could be used.

Groovy offers numerous extensibility features. We won't go into all of the details but instead refer readers to the History of Groovy paper which gives more details.

In summary, that paper defined the following extensions for Commons Math matrices:

RealMatrix.metaClass {
plus << { RealMatrix ma -> delegate.add(ma) }
plus << { double d -> delegate.scalarAdd(d) }
multiply { double d -> delegate.scalarMultiply(d) }
bitwiseNegate { -> new LUDecomposition(delegate).solver.inverse }
constructor = { List l -> MatrixUtils.createRealMatrix(l as double[][]) }
}

This fixes some of the method name discrepancies in the above table. We can now use the operator shorthand for matrix and scalar addition as well as scalar multiply. We can also use the ~ (bitwiseNegate) operator when finding the inverse. The mention of double[][] during matrix construction is now also gone.

The paper goes on to discuss how to automatically add the necessary imports for a matrix library and provide aliases if needed. The imports aren't shown for the code listings in this blog but are in the listings in the sample code repo. In any case, the imports can be "configured away" as the paper discusses. The end result is our code in its entirety can look like this:

var m = [[1, 1], [1, 0]] as Matrix
m**6

The paper also discusses tooling extensibility, in particular the visualisation aspects of the GroovyConsole. It shows how to define an output transform which renders any matrix result using a jlatexmath library. So instead of seeing "Array2DRowRealMatrix{{13.0,8.0},{8.0,5.0}}", they will see a graphical rendition of the matrix. So, the final end-user experience when using the GroovyConsole looks like this:

2022-08-18 00_28_04-bin.png

When using in Jupyter style environments, other pretty output styles may be supported.

Further information

Conclusion

We have examined a number of simple applications of matrices using the Groovy programming language and the Apache Commons Math, ojAlgo, Nd4j, and JEML libraries. You should be convinced that using matrices on the JVM isn't hard and you have numerous options. We also saw a brief glimpse at the Vector API which looks like an exciting addition to the JVM (hopefully arriving soon in non-preview mode).