Scala to map from query string, json or xml

monad-transformations

Hello there. In this post, I would like to share examples of parsing different formats to data structure. Here I will add only examples for the string, , and XML with which I have been working so far. Some code copied and adapted from other sources including SO. My goal is mostly to repeat it for myself and assemble complete example.

One of the possible use-case for the following examples is parsing some Request or Response object body. Probably it could be needed for data extraction from Response or for testing that some specific fields exist in the Request payload.
I have composed simple Scala project in the IntelliJ Community Edition with Scala plugin installed. For the project building, I use sbt, which is the standard tool for Scala projects. But it is still possible to build project with mvn (like it done at my current job) or with any other build tool. Personally, I like sbt more for Scala and MVN for . With sbt, it is possible to use some good plugins for dependencies management, like Coursier.

This is my build.sbt dependencies list.

I use scalatest with MustMatchers to test my parsing methods. For XML parsing we don’t need any deps, because Scala supports XML internally, it is built into the language.

This is my Request object example, with String type filed. The request contains a method to get Map[String, String] from the Request.body. Let’s imagine we don’t know when the body contains which format or we want universal method.

Two other classes, which are UrlUtils and TextUtils contains the concrete implementations of parsing. It is better because methods could be used somewhere else. And they are static, without any state, just helpers.

I would like to add a note, that these examples are not 100% . Even now, when I look at them, I see that I could make all parsing methods return Option[Map[String, String]] and reorganize code somehow in a more way. But I want to keep it like it is here for the simplicity.

Let’s move to the UrlUtils class first. This is the place where we are using previously included apache dependencies.

There are some interesting parts. First, we add the question mark to the query string, it is missing, to make query parsable. Second, included dependencies are actually Java libraries. And they return not Scala collection types, but Java. That’s why we need to convert resulting collection to the scala Seq with help of build in JavaConveters. However, we will get Seq of another Java classes NameValuePair, but we need to get Map[String, String] in the end. For that, we just map through the Seq and transform it to the Seq of string pairs (tuples) which could be converted easily to Map.

And here is the test for this helper.

Now, let’s move to the TextUtils object. Objects in Scala are not like classes. They are used for static methods or as class companions to provide simplified creation and usage of the owner class. Also, they could be used as factories. An object always acts as a singleton. So, here is another helper object.

Don’t be confused that we use dependency called fasterxml.jackson. It has nothing to do with XML parsing, in this case, we use it for parsing JSON. :) It provides really good methods to parse JSON in different data structures. And even has some integration with Scala by providing a module for that. In the method jsonToMap, DefaultScalaModule is used for parsing json into Scala Map type.
That’s it, it was easy. :)

Now we are gonna parse XML string into the same Map collection. And that’s not so easy as with JSON, because of XML format nature. In XML we have data stored in tags, which could have any name and any nesting level. Data could be stored in the XML tags attributes as well. It leads to the conclusion, that there is no universal way to transform XML into desired Map. For every given XML we need to describe very specific parser for this XML.

To make this task more standardized, I have created trait XmlExtractor which in Scala can act as Interface. And this trait used in this way in the xmlToMap method above.
I have a test for the object Request.

In the test, all supported for now request body formats tested. And I provide some dumb XML for the XML parsing test. To transform this XML into Map we need to write our own parser and here Scala comes to the rescue. Scala has internal, build in functionality to work with XML.
I will not give here a description of how to work with XML in Scala. There are already some great articles over the internet about that, in addition to the official documentation.
So I will just provide an example and some links after that.

Here is an interface

And the implementation of the test case

As it can be seen from the example, there are special operators to work with XML and that we import special types for that from a Scala namespace.
The best article about XML I have found is Basic XML processing with Scala.
Also, Alvin Alexander blog posts could be useful to read. It is mostly online adaptations of the Scala Cookbook receipts.
https://alvinalexander.com/scala/scala-xml-examples-xml-literals-source-code-searching-xpath
https://alvinalexander.com/scala/serializing-deserializing-xml-scala-classes
https://alvinalexander.com/scala/how-to-extract-data-from-xml-nodes-in-scala

One more thing about which I want to talk is why I have provided two different implementation methods in the XML extractor. I wanted to show, how the same could be achieved differently in Scala and how one features compliment others. The  extractOld method is straightforward implementation with search in the XML and looping through the result. Later I have changed it to more functional way. To understand this transformation you need to understand how the yield works in Scala.
This documentation page explains it very well. Simply, yield will produce a list of tuples from for loop (map is basically a collection of tuples in Scala). But before that, we have a collection of found XML pairs. We do want to iterate through pairs, not through every pair member. So we use flattened map for that, and after that inside map() call we can iterate through the second search and produce the desired list of tuples. I think the code is more descriptive than my explanation. :D

That’s it, an example comes to the end. It could be easily extended with other formats, such as CSV or YAML.