How to config?

Dvir Reingewirtz
4 min readMar 10, 2019

Configuration. Overall it sounds pretty simple — you dump your parameters in a file and import it to the class. So where does it get complicated?
The traditional config file doesn’t match every case and could give one hard time when needed to be updated. So how to do it properly?

First, let's clear up some stuff — there is no magic solution that answers all the problems. every method has pros and cons. The trick is to get the most suitable solution for every case. Therefore We will go through each method and give it 3 ranks:
- easy to update
- good for teamwork
- minimum points of failure

here we go!

Config file with parameters

The most common way of saving configurations. many const values flying around in a file.
Pros:
Easy import: everything is under the same project
Keep it local: since the file is in the same project it doesn’t depend on any other platform to be up.
Cons:
Hard to update: for every change you make you have to pull and redeploy the whole project.
Recipe for conflicts: when there is more than one person on the same config file there is always the danger of override parameter with the same name.
Privacy problem: in some cases, the config contains passwords to servers or even, god forbid, your own password. Normally you do not want that information to be exposed in your repository.

  • Easy to update: 🐧
  • Good for teamwork: 🐧🐧
  • Minimum points of failure: 🐧🐧🐧🐧🐧

*its spouse to be Tux, that the closest I could find*

Good for:
A configuration that relevant only for your class/module that shouldn't change in the near future. such as the port for elastic servers, a path on HDFS, etc.

Config file with data structures

This method is the improved version of the method above.
Pros:
Easy import: pretty and readable hierarchy imports.
Keep it local: once again we do not lay on external platforms.
Safe from conflicts: each section has its own space and thanks to the hierarchy it’s safe from an override.

Cons:
Hard to update: it's still not a stand-alone file, therefore every change will force another deployment.
Privacy problem: still exists

  • Easy to update: 🐧
  • Good for teamwork: 🐧🐧🐧🐧🐧
  • Minimum points of failure: 🐧🐧🐧🐧🐧
{
"project_config": [
{
"project_name": "clients_popular_purchases"
},
{
"elastic": {
"write": {
"port": 100,
"server_name": "reception"
}
}
},
{
"hive": [
{
"read": {
"table_name": "clients_raw"
}
},
{
"write": {
"table_name": "clients_popular_purchases"
}
}
]
}
]
}

External data structure config file

In this method, the config file is no longer part of the project. its sits in the HDFS/server.
What do we earn from it?

Pros:
Easy import: since it still a struct config the imports are still nice to read.
Easy to update: no need redeploy, just update the file and you are good to go.
Privacy: since the file is in the file system we can set the permissions and hide our secrets.
multiple project management: you can keep all your configuration in the same path and have arranged and effective management.

Cons:
local no more: one can say that there is a danger that the server falls with our configurations-but when the file kept in the HDFS the are three replicas so the chances of the file to be not available are low.
No version control: since the file is no longer part of the project we are facing a danger when rollback is needed.

  • Easy to update: 🐧🐧🐧🐧🐧
  • Good for teamwork: 🐧🐧🐧🐧🐧
  • Minimum points of failure: 🐧🐧🐧🐧

External database config

this method is rarely used but worth mentioning.
let's say we have dirty data as input and we want to fix that with key-value based mapping. Of course, we can have a dictionary in our config file but another solution is a mapping table.

Pros:
Easy to view: when reaching a large amount of values the table solution gives an easier way to inspect them
Classic for join: when replacement is needed joining the data with the mapping table is easy.

Cons:
— No version control
— Relying on external database uptime
— Update and inserting new config map is throw query

  • Easy to update: 🐧🐧🐧
  • Good for teamwork: 🐧🐧🐧🐧🐧
  • Minimum points of failure: 🐧🐧

in conclusion, configuration management is important and requires some thought. choose wisely yours, and keep it maintained.

Good Luck!

--

--