Spice up your tests using Faker

Oleksandr Shynkariuk - Jun 20, 2022

In modern JVM backend development we have a lot of great frameworks and libraries at our disposal to write high quality tests. For example, there is a well-known JUnit for unit tests, Spring libraries for integration & contract testing as well as Locust for performance testing. There are also Mockito and Mockk that help us to mock the program behavior and reproduce the desired flow we are aiming to test. All these tools are doing a great job to let developers focus more on thinking about business cases to cover and less on setting up a boilerplate to run those tests on. However, all these frameworks do not focus on an important aspect of every test - data quality.

How not to create test data

I am convinced that high quality tests, apart from covering all necessary happy, error and corner cases, should also use high quality test data in them. What is the "high quality test data" then? In my opinion this is the data which is as close to production as possible by its nature. Having it available in our tests allows identifying and handling most edge cases that are specific to our use case. Additionally, it is more pleasant to the eye to work with real-looking data. Obviously, we cannot use production data to feed our tests with. So, many people find a way out by synthesizing their own test data. I admit doing the same, as a result ending up with hardcoded values in tests similar to this:

1 2 3 4 5 6
val name = "Test User" val phoneNumber = "+31234567890" val email = "test@email.com" val city = "Test city" val streetName = "Test street" val houseNumber = 42

The issue with this test data is that it looks very "testy": it does not repeat the nature and diversity of production data. Our tests using this data will always be processing the same person with phone number "+31234567890" and "test@email.com" email over and over again, which is not what our production code was designed to do, right?

Generate test data with Kotlin Faker

Luckily for us, we can do better with the help of Kotlin Faker! There are also similar Java Faker and Scala Faker libraries available, but in this article we will focus solely on Kotlin Faker. We will skip the setting up part, as it is well described in the official documentation.To start off with something simple, let's rewrite the user details example from above with a help of Kotlin faker and see what we can get out of it:

1 2 3 4 5 6 7 8 9 10 11 12 13
val faker = Faker() val name = faker.name.firstName() val phone = faker.phoneNumber.phoneNumber() val email = faker.internet.safeEmail() val city = faker.address.city() val streetName = faker.address.streetName() val houseNumber = faker.address.buildingNumber() fun main() { println("Hello, I am $name living in $city, $streetName $houseNumber." + "\nMy email is $email and my phone number is $phone.") }

Let's run this code and see what test data was created for us:

1 2
Hello, I am Ronald living in Reynoldshaven, Aja Hollow 8224. My email is dorsey.hahn@gmail.test and my phone number is 081-441-2161.

Impressive, right? This data looks much closer to what we can expect in our production environment, at the same time we did not expose any of our production data for the sake of high quality test data. All you need is to create an instance of the Faker class and use it to obtain the test data. If we run this code once again, we will very likely get new data:

1 2
Hello, I am Rebecca living in Laurindahaven, Meri Brooks 1849. My email is chassidy.kirlin@yahoo.test and my phone number is 038-247-1644.

The explanation to this is that the library provides randomized (not random!) data every time we ask for it. In practice, every test run will get a new combination of input data, just like in the production environment! This comes with a cost though: since every time we run the test, input data changes and our tests might fail in a randomized manner. To handle this, we have to provide meaningful assertion error messages that would make it clear what input data crashes the test.

Real world example

Let's dive a bit deeper with an example close to a real world problem. Imagine we want to test a flow to send shipments from the Netherlands to Spain. We begin with setting up our domain model: In order to register a shipment, we have to create a TransportRequest which contains sender and receiver address details as well as other transportation information like track and trace code and additional comments:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
data class TransportRequest( val sender: Addressee, val receiver: Addressee, val trackTrace: String, val comments: String? = null ) data class Addressee( val firstName: String, val lastName: String, val email: String, val phone: String, val address: Address ) data class Address( val street: String, val houseNumber: String, val city: String, val country: String )

Having both parties in different countries poses requirements to the nature of test data, for example the address of a person living in Spain should look like a Spanish one. We can tackle this requirement by initializing fakers for each country by setting up their locale:

1 2
val dutchFaker = Faker(fakerConfig { locale = "nl-NL" }) val spanishFaker = Faker(fakerConfig { locale = "es" })

Now let's initialize our TransportRequest using both fakers and see what we get:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
val sender = Addressee( firstName = dutchFaker.name.firstName(), lastName = dutchFaker.name.lastName(), email = dutchFaker.internet.safeEmail(), phone = dutchFaker.phoneNumber.phoneNumber(), address = Address( street = dutchFaker.address.streetName(), houseNumber = dutchFaker.address.buildingNumber(), city = dutchFaker.address.city(), country = "NL" ) ) val receiver = Addressee( firstName = spanishFaker.name.firstName(), lastName = spanishFaker.name.lastName(), email = spanishFaker.internet.email(), phone = spanishFaker.phoneNumber.phoneNumber(), address = Address( street = spanishFaker.address.streetName(), houseNumber = spanishFaker.address.buildingNumber(), city = spanishFaker.address.city(), country = "ES" ) ) val request = TransportRequest( sender = sender, receiver = receiver, trackTrace = dutchFaker.code.asin(), comments = dutchFaker.yoda.quotes() ) println(GsonBuilder().setPrettyPrinting().create().toJson(request))

And here is the test data that gets generated for us:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
{ "sender": { "firstName": "Chenise", "lastName": "Reurink", "email": "msc.amerentia.driel@yahoo.test", "phone": "06 6581 2294", "address": { "street": "Gioweg", "houseNumber": "678a", "city": "Oppelaarstroom", "country": "NL" } }, "receiver": { "firstName": "Emilio", "lastName": "Vázquez", "email": "francisco.ceja.báez@hotmail.com", "phone": "956-591-111", "address": { "street": "Plaza Adriana Delapaz", "houseNumber": "3", "city": "Marbella", "country": "ES" } }, "trackTrace": "B000Q6Y34W", "comments": "Strong is Vader. Mind what you have learned. Save you it can." }

Looks pretty neat! If I did not know that this is generated by Kotlin Faker, I would definitely think that this is production data, except the comment field ofcourse ;-)


Hopefully after reading this article you will give Kotlin Faker (or any other faker library available in your programming language) a try. Introducing such a library in your tests requires minimum effort and will help you bring them to the next level.

Useful links

Code used in this article - https://gitlab.com/code-foundry/kotlin-faker-blog-post

Kotlin Faker - https://serpro69.github.io/kotlin-faker/

Java Faker - https://github.com/DiUS/java-faker

Scala faker - https://index.scala-lang.org/bitblitconsulting/scala-faker