Hello there. In this post, I would like to share examples of parsing different formats to Scala Map data structure. Here I will add only examples for the query string, json, and XML with which I have been working so far. Some code copied and adapted from other sources including SO. My goal is mostly to repeat it for myself and assemble complete example.
One of the possible use-case for the following examples is parsing some Request or Response object body. Probably it could be needed for data extraction from Response or for testing that some specific fields exist in the Request payload.
I have composed simple Scala project in the IntelliJ Community Edition with Scala plugin installed. For the project building, I use sbt, which is the standard tool for Scala projects. But it is still possible to build project with mvn (like it done at my current job) or with any other Java build tool. Personally, I like sbt more for Scala and MVN for Java. With sbt, it is possible to use some good plugins for dependencies management, like Coursier.
This is my build.sbt dependencies list.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
name := "scala_to_map" version := "0.1" scalaVersion := "2.12.4" libraryDependencies ++= Seq( // core "org.scala-lang" % "scala-reflect" % "2.12.4", "org.scala-lang" % "scala-library" % "2.12.4", "org.scala-lang" % "scala-compiler" % "2.12.4", // for tests "org.scalatest" % "scalatest_2.12" % "3.0.4" % "test", "com.novocode" % "junit-interface" % "0.11" % "test", // for query string "org.apache.httpcomponents" % "httpcore" % "4.4.9", "org.apache.httpcomponents" % "httpclient" % "4.5.5", // for json "com.typesafe.play" % "play-json_2.12" % "2.6.9", "com.fasterxml.jackson.core" % "jackson-databind" % "2.9.4", "com.fasterxml.jackson.core" % "jackson-annotations" % "2.9.4", "com.fasterxml.jackson.module" % "jackson-module-scala_2.12" % "2.9.4", ) |
I use scalatest with MustMatchers to test my parsing methods. For XML parsing we don’t need any deps, because Scala supports XML internally, it is built into the language.
This is my Request object example, with String type filed. The request contains a method to get Map[String, String] from the Request.body. Let’s imagine we don’t know when the body contains which format or we want universal method.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
import java.net.URL import scala.collection.mutable case class Request( var url: URL, method: String = "GET", body: String = null, headers: mutable.Map[String, String] = mutable.Map(), ) { def bodyMap: Map[String, String] = { var result = Map[String, String]() if (body != null && method == "POST") { result = UrlUtils.parseEncodedQueryString(body) if (result.nonEmpty) { return result } result = TextUtils.jsonToMap(body) if (result.nonEmpty) { return result } result = TextUtils.xmlToMap(body) if (result.nonEmpty) { return result } } result } } |
Two other classes, which are UrlUtils and TextUtils contains the concrete implementations of parsing. It is better because methods could be used somewhere else. And they are static, without any state, just helpers.
I would like to add a note, that these examples are not 100% functional. Even now, when I look at them, I see that I could make all parsing methods return Option[Map[String, String]] and reorganize code somehow in a more functional way. But I want to keep it like it is here for the simplicity.
Let’s move to the UrlUtils class first. This is the place where we are using previously included apache dependencies.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
import java.net.{URI, URISyntaxException, URL} import java.nio.charset.StandardCharsets import org.apache.http.NameValuePair import org.apache.http.client.utils.URLEncodedUtils import scala.collection.JavaConverters.iterableAsScalaIterableConverter import scala.collection.mutable object UrlUtils { def parseEncodedQueryString(queryString: String): Map[String, String] = { var tmpQuery = queryString if (tmpQuery.charAt(0) != '?') { tmpQuery = "?" + tmpQuery } try { val params = URLEncodedUtils.parse(new URI(tmpQuery), StandardCharsets.UTF_8) val convertedParams: mutable.Seq[NameValuePair] = mutable.Seq(params.asScala.toSeq: _*) val scalaParams: Seq[(String, String)] = convertedParams.map(pair => pair.getName -> pair.getValue) scalaParams.toMap } catch { case e: URISyntaxException => Map() } } } |
There are some interesting parts. First, we add the question mark to the query string, it is missing, to make query parsable. Second, included dependencies are actually Java libraries. And they return not Scala collection types, but Java. That’s why we need to convert resulting collection to the scala Seq with help of build in JavaConveters. However, we will get Seq of another Java classes NameValuePair, but we need to get Map[String, String] in the end. For that, we just map through the Seq and transform it to the Seq of string pairs (tuples) which could be converted easily to Map.
And here is the test for this helper.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import org.scalatest.{FunSpec, MustMatchers} class UrlUtilsTestSpec extends FunSpec with MustMatchers { describe("parseEncodedQueryString") { it("Should parse query string with or without '?'") { val query1 = "?key1=val1&key2=val2&key3=val3" UrlUtils.parseEncodedQueryString(query1)("key1") mustBe "val1" val query2 = "key1=val1&key2=val2&key3=val3" UrlUtils.parseEncodedQueryString(query2)("key2") mustBe "val2" val query3 = "http://test.com?key1=val1&key2=val2&key3=val3" UrlUtils.parseEncodedQueryString(query2)("key3") mustBe "val3" } } } |
Now, let’s move to the TextUtils object. Objects in Scala are not like classes. They are used for static methods or as class companions to provide simplified creation and usage of the owner class. Also, they could be used as factories. An object always acts as a singleton. So, here is another helper object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
import com.fasterxml.jackson.core.JsonParseException import com.fasterxml.jackson.databind.ObjectMapper import com.fasterxml.jackson.module.scala.DefaultScalaModule object TextUtils { def jsonToMap(json: String): Map[String, String] = { try { val mapper = new ObjectMapper() mapper.registerModule(DefaultScalaModule) mapper.readValue(json, classOf[Map[String, String]]) } catch { case e: JsonParseException => Map() } } def xmlToMap(xmlString: String, xmlExtractor: XmlExtractor): Map[String, String] = { xmlExtractor.extract(xmlString) } } |
Don’t be confused that we use dependency called fasterxml.jackson
. It has nothing to do with XML parsing, in this case, we use it for parsing JSON. :) It provides really good methods to parse JSON in different data structures. And even has some integration with Scala by providing a module for that. In the method
jsonToMap, DefaultScalaModule is used for parsing json into Scala Map type.
That’s it, it was easy. :)
Now we are gonna parse XML string into the same Map collection. And that’s not so easy as with JSON, because of XML format nature. In XML we have data stored in tags, which could have any name and any nesting level. Data could be stored in the XML tags attributes as well. It leads to the conclusion, that there is no universal way to transform XML into desired Map. For every given XML we need to describe very specific parser for this XML.
To make this task more standardized, I have created trait
XmlExtractor which in Scala can act as Interface. And this trait used in this way in the xmlToMap
method above.
I have a test for the object
Request.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
import java.net.URL import org.scalatest.{FunSpec, MustMatchers} class RequestTestSpec extends FunSpec with MustMatchers { describe("bodyMap") { it("Should parse encoded query sting") { val request = Request(new URL("http://t.com"), "POST", "key1=val1&key2=val2") request.bodyMap()("key1") mustBe "val1" } it("Should parse json") { val request = Request(new URL("http://t.com"), "POST", "{\"key1\":\"val1\", \"key2\":\"val2\"}") request.bodyMap()("key1") mustBe "val1" } it("Should parse xml") { val request = Request(new URL("http://t.com"), "POST", "<xml><pairs>" + "<pair><someTag>key1</someTag><anotherTag>value1</anotherTag></pair>" + "<pair><someTag>key2</someTag><anotherTag>value2</anotherTag></pair>" + "</pairs></xml>") request.bodyMap(new ConcreteXmlExtractor)("key1") mustBe "value1" request.bodyMap(new ConcreteXmlExtractor)("key2") mustBe "value2" } it("should work like that") { val propertyId = 12345 val page = 1 val request = Request(new URL("http://t.com"), "POST", s"view_args=${propertyId}&page=${page}") //test body on id and page val requestBodyMap = request.bodyMap() requestBodyMap("view_args") must be(propertyId.toString) requestBodyMap("page") must be(page.toString) } } } |
In the test, all supported for now request body formats tested. And I provide some dumb XML for the XML parsing test. To transform this XML into Map we need to write our own parser and here Scala comes to the rescue. Scala has internal, build in functionality to work with XML.
I will not give here a description of how to work with XML in Scala. There are already some great articles over the internet about that, in addition to the official documentation.
So I will just provide an example and some links after that.
Here is an interface
1 2 3 4 5 |
trait XmlExtractor { def extract(xmlString: String): Map[String, String] } |
And the implementation of the test case
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
import scala.xml.{Elem, XML} class ConcreteXmlExtractor extends XmlExtractor { override def extract(xmlString: String): Map[String, String] = { (XML.loadString(xmlString) \ "pairs") .flatMap(pairs => pairs \ "pair") .map(x => ((x \ "someTag").text, (x \ "anotherTag").text)).toMap } private def extractOld(xmlString: String): Map[String, String] = { val xml: Elem = XML.loadString(xmlString) val pairs = xml \ "pairs" (for {x <- pairs \ "pair"} yield ((x \ "someTag").text, (x \ "anotherTag").text)).toMap } } |
As it can be seen from the example, there are special operators to work with XML and that we import special types for that from a Scala namespace.
The best article about XML I have found is Basic XML processing with Scala.
Also, Alvin Alexander blog posts could be useful to read. It is mostly online adaptations of the Scala Cookbook receipts.
https://alvinalexander.com/scala/scala-xml-examples-xml-literals-source-code-searching-xpath
https://alvinalexander.com/scala/serializing-deserializing-xml-scala-classes
https://alvinalexander.com/scala/how-to-extract-data-from-xml-nodes-in-scala
One more thing about which I want to talk is why I have provided two different implementation methods in the XML extractor. I wanted to show, how the same could be achieved differently in Scala and how one features compliment others. The
extractOld method is straightforward implementation with search in the XML and looping through the result. Later I have changed it to more functional way. To understand this transformation you need to understand how the yield
works in Scala.
This documentation page explains it very well. Simply, yield will produce a list of tuples from for loop (map is basically a collection of tuples in Scala). But before that, we have a collection of found XML pairs. We do want to iterate through pairs, not through every pair member. So we use flattened map for that, and after that inside map() call we can iterate through the second search and produce the desired list of tuples. I think the code is more descriptive than my explanation. :D
That’s it, an example comes to the end. It could be easily extended with other formats, such as CSV or YAML.