JSON to XML: the probably a tiny bit over engineered way
The complete code of the article. You need Cats and Play-Json in order to run it.
It happens regularly in software development that we have to connect systems speaking different languages. JSON is nowadays ubiquitous in service communication, especially in web development but XML still has its fair amount of bastions. Imagine you need to pass information provided by a JSON API through an XML layer, you need a converter.
The easy way
This translation is actually pretty trivial, it takes essentially 6 lines of simple pattern-matching code in Scala:
import play.api.libs.json._
import scala.xml._
def json2xml(json: JsValue, rootLabel: String): Elem = {
// XML node creation helper
def mkElem(jsType: String, children: Node*): Elem =
Elem(null, rootLabel,
new UnprefixedAttribute("type", jsType, scala.xml.Null),
TopScope, true, children: _*
)
// The real translation
json match {
case JsNull =>
mkElem("null")
case JsString(s) =>
mkElem("string", PCData(s))
case JsNumber(n) =>
mkElem("number", Text(n.toString))
case JsBoolean(b) =>
mkElem("boolean", Text(b.toString))
case JsArray(l) =>
mkElem("array", l.map(json2xml(_, s"${rootLabel}Item")):_*)
case JsObject(m) =>
mkElem("object", m.toList.map { case (k,v) => json2xml(v, k) }: _*)
}
}
The trickiest part of this example is figuring out how to build XML nodes in Scala. It translates the following JSON:
[
{ "title": "2001 : A Space Odyssey",
"release":
{ "day": 27,
"month": 9,
"year": 1968
},
"genres" : [ "Science fiction" ],
"actors": [
{ "lastName": "Dullea",
"firstName": "Keir",
"role": "Dr. David Bowman"
}
],
"directors": [
{ "lastName": "Kubrick",
"firstName": "Stanley"
}
]
}
]
into
<films type="array">
<filmsItem type="object">
<title type="string"><![CDATA[2001 : A Space Odyssey]]></title>
<release type="object">
<day type="number">27</day>
<month type="number">9</month>
<year type="number">1968</year>
</release>
<genres type="array">
<genresItem type="string"><![CDATA[Science fiction]]></genresItem>
</genres>
<actors type="array">
<actorsItem type="object">
<lastName type="string"><![CDATA[Dullea]]></lastName>
<firstName type="string"><![CDATA[Keir]]></firstName>
<role type="string"><![CDATA[Dr. David Bowman]]></role>
</actorsItem>
</actors>
<directors type="array">
<directorsItem type="object">
<lastName type="string"><![CDATA[Kubrick]]></lastName>
<firstName type="string"><![CDATA[Stanley]]></firstName>
</directorsItem>
</directors>
</filmsItem>
</films>
Note that, unlike JSON, XML have no notion of booleans, number or null, so we add type information as attribute on each node. This has the benefit of enabling us to convert such XML back to their former JSON form. Also note that, we need CDATA sections to preserve spaces.
Problem solved? Yes! But we can go much much further on this subject…
The Rocket Science way
There much more thing to say about this example, first let’s expose some properties of JSON values.
Inviting (Co)Algebras to the Party
JSON values can be modelled with an Algebraic Data Type or ADT for short. Play-Json represents them by the type JsValue
:
sealed abstract class JsValue
final case object JsNull extends JsValue
final case class JsNumber(value: BigDecimal) extends JsValue
final case class JsBoolean(value: Boolean) extends JsValue
final case class JsString(value: String) extends JsValue
final case class JsArray(value: List[JsValue]) extends JsValue
final case class JsObject(value: Map[String, JsValue]) extends JsValue
But in order to simplify the presentation, we will use slightly different, but equivalent, definition of JSON values:
sealed abstract class Atomic
final case object Null extends Atomic
final case class Bool(value: Boolean) extends Atomic
final case class Number(value: BigDecimal) extends Atomic
final case class Str(value: String) extends Atomic
sealed abstract class JsValue
final case class JsAtom(value: Atomic) extends JsValue
final case class JsArray(value: List[JsValue]) extends JsValue
final case class JsObject(value: Map[String, JsValue]) extends JsValue
Like in any Algebraic Data Type, the constructors of JsValues
can be seen as operations on it. JsAtom
informs us that every number, boolean, string and null
give rise to a distinct JSON value. JsArray
and JsObject
tells us that each (qualified) list of JSON values forms a distinct JSON value itself. Considering that JSON values are defined in terms of these operations, and that we want to translate JSON into XML, it would make sense to define them on XML as well. First, let’s explicit these operations:
sealed abstract class JsLike[+R]
final case class Atom(value: Atomic) extends JsLike[Nothing]
final case class Arr[+R](value: List[R]) extends JsLike[R]
final case class Obj[+R](value: Map[String, R]) extends JsLike[R]
The interesting point here is we can translate back and forth between JsValue
and JsLike[JsValue]
. These translations are even the inverse of each other, meaning both types are totally equivalent!
val jsLike2JsValue: JsLike[JsValue] => JsValue = {
case Atom(a) => JsAtom(a)
case Arr(a) => JsArray(a)
case Obj(m) => JsObject(m)
}
val jsValue2JsLike: JsValue => JsLike[JsValue] = {
case JsAtom(a) => Atom(a)
case JsArray(a) => Arr(a.toList)
case JsObject(m) => Obj(m)
}
jsLike2JsValue
is called a JsLike
-Algebra because it has the form JsLike[X] => X
. It means jsLike2JsValue
is a way “compute” JsLike
operation, i.e. it composes values to form new ones. On the opposite, jsValue2JsLike
is called a JsLike
-CoAlgebra because it has the form X => JsLike[X]
. It is a way to expose how a value is built, i.e. it deconstructs values to expose their structure.
Can we find such functions for XML values? We are looking for two functions:
val jsLike2Elem: JsLike[Elem] => Elem = ???
val elem2JsLike: Elem => JsLike[Elem] = ???
It would certainly be nice, but unfortunately this is not that simple! 5
, true
and null
are valid JSON values, So jsLike2Elem(Atom(Number(5)))
, jsLike2Elem(Atom(Bool(true)))
and jsLike2Elem(Atom(Null)))
should be valid XML value! But what should be the root tag of the resulting elements? How to translate 5
into a valid XML? We know that it would have the form:
<someRootTag type="number">5</someRootTag>
But what someRootTag
should be? We could pick an arbitrary one, but it would break composability (try it, you’ll see!). There’s no escape, all XML values need tags but not every JSON value have some! The situation suggest JSON values are closer to “XML values with unknown root tags” <X type="number">5</X>
where X
as the unknown, i.e. the functional space String => Elem
:
val _5: String => Elem =
(someRootTag: String) => <someRootTag type="number">5</someRootTag>
Do you think we can define meaningful functions?
val jsLike2xml: JsLike[String => Elem] => (String => Elem) = ???
val xml2JsLike: (String => Elem) => JsLike[String => Elem] = ???
Yes we can … partially. We can define jsLike2xml
:
val jsLike2xml: JsLike[String => Elem] => (String => Elem) = {
def mkRoot(jsType: String, children: Node*): String => Elem =
(someRootTag: String) =>
Elem(null,
someRootTag,
new UnprefixedAttribute("type", jsType, scala.xml.Null),
TopScope,
true,
children: _*
)
(j: JsLike[String => Elem]) =>
j match {
case Atom(Null) =>
mkRoot("null")
case Atom(Str(s)) =>
mkRoot("string", PCData(s))
case Atom(Bool(b)) =>
mkRoot("boolean", Text(b.toString))
case Atom(Number(n)) =>
mkRoot("number", Text(n.toString))
case Arr(a) =>
(root: String) => {
mkRoot("array", a.map(_(s"${root}Item")): _*)(root)
}
case Obj(m) =>
mkRoot("object", m.toList.map { case (k, v) => v(k) }: _*)
}
}
but for xml2JsLike
, we’re facing two not-that-small issues:
-
First, unlike
jsValue2JsLike
, we can not pattern-match on functions. We have no sane way to know that(someRootTag: String) => <someRootTag type="number">5</someRootTag>
is built from
Atom(Number(5))
. -
Even if we could pattern-match on functions,
jsLike2xml
is not surjective, i.e. not every XML element is the result ofjsLike2xml(f)
for somef
. To deal with invalid input, the return type ofxml2JsLike
can not beJsLike[String => Elem]
butF[JsLike[String => Elem]]
for some functorF
able to deal with errors likeOption
,Either
, etc. For simplicity’s sake, let’s considerF
to beOption
.
Let’s once again take a step back. We want to decompose a function (f: String => Elem)
into an Option[JsLike[String => Elem]]
without pattern-matching it. The only reasonable thing we can do with functions is pass them some arguments:
def xml2JsLike(f: (String => Elem)): String => Option[JsLike[Elem => String]] =
(someRootTag: String) => ... f(someRootTag) ...
The type String => Option[A]
is actually a monad, known as a ReaderT[Option, String, A]
. Which makes xml2JsLike
a monadic coalgebra. Let’s give it a name:
import cats.data.ReaderT
type TagOpt[A] = ReaderT[Option, String, A]
As an exercise try to implement xml2JsLike
. *To that end, it may be useful to notice that JsLike
is a Traverse
, i.e. that an instance of Traverse[JsLike]
can be defined. Such an instance defines a function:
def `traverse[G[_]: Applicative, A, B](ja: JsLike[A])(f: A => G[B]): G[JsLike[B]]`
To summarize this part, we have these four functions:
val jsLike2JsValue: JsLike[JsValue] => JsValue
val jsValue2JsLike: JsValue => JsLike[JsValue]
val jsLike2xml: JsLike[String => Elem] => (String => Elem)
val xml2JsLike: (String => Elem) => TagOpt[JsLike[String => Elem]]
Now we want to convert JsValue
from/into String => Elem
.
Converting back and forth
Now that we know how to compose and decompose both JSON and XML values. How do we write converters? For simplify’s sake, let’s be a bit more abstract. Let A
and B
be to types (like JsValue
and String => Elem
) and F[_]
a type constructor (like JsLike
) that have the nice property of being a functor (i.e. it has function map: F[A] => (A => B) => F[B]
). In addition, let decomposeA: A => F[A]
and recomposeB: F[B] => B
(like jsValue2JsLike
and jsLike2xml
). We want a function convert: A => B
:
trait Direct {
import cats.Functor
import cats.syntax.functor._
type A
type B
type F[_]
implicit val fHasMap: Functor[F]
val decomposeA: A => F[A]
val recomposeB: F[B] => B
final def convert(a: A): B = {
val fa: F[A] = decomposeA(a)
val fb: F[B] = fa.map(convert)
recomposeB(fb): B
}
}
Or in a more compact way:
def hylo[A,B, F[_]: Functor](decompose: A => F[A], recompose: F[B] => B): A => B = {
def convert(a: A): B =
recompose(decompose(a).map(convert))
convert _
}
And voila, a converter in just 1 lines of code:
def json2xml(json: JsValue): String => Elem =
hylo(jsValue2JsLike, jsLike2xml).apply(json)
The way back is only a bit more involving. This time we require F
to be Traverse
and the function decomposeA
to be of type A => M[F[A]]
for some monad M
:
trait WayBack {
type A
type B
type F[_]
implicit val fHasTraverse: Traverse[F]
type M[_]
implicit val mIsAMonad: Monad[M]
val decomposeA: A => M[F[A]]
val recomposeB: F[B] => B
final def convert(a: A): M[B] =
for {
fa <- decomposeA(a)
fb <- fa.traverse(convert)
} yield recomposeB(fb)
}
Again, in a more compact way:
def hyloish[A,B, F[_]: Traverse, M[_]: Monad](decompose: A => M[F[A]], recompose: F[B] => B): A => M[B] = {
def convert(a: A): M[B] =
for {
fa <- decompose(a)
fb <- fa.traverse(convert)
} yield recompose(fb)
convert _
}
which gives the way back as the oneliner:
def xml2json(f: String => Elem): TagOpt[JsValue] =
hyloish(xml2JsLike, jsLike2JsValue).apply(f)
Reorganizing a bit, it leads to the two conversion functions between (String, JsValue)
and Elem
:
val json2xmlBetter: ((String, JsValue)) => Elem =
(jsonPlusTag: (String, JsValue)) => json2xml(jsonPlusTag._2)(jsonPlusTag._1)
val xml2jsonBetter: Elem => TagOpt[(String, JsValue)] =
(e: Elem) => xml2json((s: String) => e.copy(label = s)).map(e.label -> _)
What’s the point?
Apart from being so much more complicated that the trivial approach, is there some benefits? Actually yes.
- Firstly, given
n
formats, there aren²
converters. Writing and testingn²
functions is a lot of tedious and error-prone work. But if you find some common operationsF[_]
, you only need2n
functions (oneX => F[X]
and oneF[X] => X
for each formatX
) to achieve the same goal. Furthermore, each of those functions will be easier to test, which is not to neglect. - Secondly, algebras (functions
X => F[X]
) and coalgebras (functionsF[X] => X
) operate one level at a time. They enable to treat formatX
as if it was an algebraic data type over operationsF
. Pattern-matching is such a nice feature! - Thirdly, you can write generic functions taking any type
X
for which you can provide functionsX => F[X]
andF[X] => X
. These functions also have higher chances of being correct because there is less space for unexpected behaviour.
If want to dive deeper in this subject, you can look at Matryoshka, read Functional programming with bananas, lenses, envelopes and barbed wire or any resource on F-Algebras and recursion schemes.
Solution to exercises
JsLike
instance for Traverse
implicit val jsLikeInstances: Traverse[JsLike] =
new Traverse[JsLike] {
import cats.Eval
def traverse[G[_], A, B](fa: JsLike[A])(f: A => G[B])(
implicit G: Applicative[G]): G[JsLike[B]] =
fa match {
case Atom(a) => G.point(Atom(a))
case Arr(a) => a.traverse[G, B](f).map(Arr(_))
case Obj(m) =>
m.toList
.traverse[G, (String, B)] { case (s, a) => f(a).map(s -> _) }
.map(i => Obj(i.toMap))
}
def foldLeft[A, B](fa: JsLike[A], b: B)(f: (B, A) => B): B =
fa match {
case Atom(_) => b
case Arr(a) => a.foldLeft(b)(f)
case Obj(m) => m.values.foldLeft(b)(f)
}
def foldRight[A, B](fa: JsLike[A], lb: Eval[B])(
f: (A, Eval[B]) => Eval[B]): Eval[B] =
fa match {
case Atom(_) => lb
case Arr(a) => a.foldRight(lb)(f)
case Obj(m) => m.values.foldRight(lb)(f)
}
}
xml2JsLike
def xml2JsLike(f: String => Elem): TagOpt[JsLike[String => Elem]] =
ReaderT[Option, String, JsLike[String => Elem]] { (s: String) =>
val elem: Elem = f(s)
elem
.attributes
.asAttrMap
.get("type")
.flatMap[JsLike[String => Elem]] {
case "null" =>
Some(Atom(Null))
case "boolean" =>
elem.text match {
case "true" => Some(Atom(Bool(true)))
case "false" => Some(Atom(Bool(false)))
case _ => None
}
case "number" =>
import scala.util.Try
Try(BigDecimal(elem.text))
.toOption
.map(n => Atom(Number(n)))
case "string" =>
Some(Atom(Str(elem.text)))
case "array" =>
Some(Arr(
elem
.child
.toList
.flatMap {
case e: Elem => List((s: String) => e.copy(label = s))
case _ => Nil
}
))
case "object" =>
Some(Obj(
elem
.child
.toList
.flatMap {
case e: Elem => List(e.label -> ((s: String) => e.copy(label = s)))
case _ => Nil
}.toMap
))
case _ =>
None
}
}