zen of coding

Key/value tables and how to use them in CakePHP 1.3

The key/value tables (EAV model) are a nice a approach to database modeling, when we need to store some arbitrary data about another model.

For example, let’s take a User model.
We could create a single users table to hold all of the potential data or even create an additional table such as user_details with all additional fields, which “might” be needed.

However, what happens when we don’t exactly know how much or what data will be eventually required by the application?

  • How many phones does the user have? Should we create a table with fields phone1, phone2, phone3, etc.?
  • What about emails? (Some users have 1 some have 5)
  • What happens if the business later on decides to ask for a fax number?.. and person’s height or favorite color?

If we attempt to predefine all the place-holders (columns or fields in a table) for the potential data, the table can grow horizontally into an unmanageable monster.

This is the perfect time to enter key/value tables.

Let’s consider our User model and, therefore the users table:
[sourcecode language=”sql”]
CREATE TABLE `users` (
`id` char(36) COLLATE utf8_unicode_ci NOT NULL DEFAULT ”,
`username` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`password` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`created` datetime DEFAULT NULL,
`modified` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
[/cc]

The table couldn’t be simpler. All we’ll store here is the user’s login information. The rest will be handled by the user_details table…
It’s worth to mention that our models will have a pretty standard association:
User hasMany UserDetail and on the flip side UserDetail belongsTo User

Alright, next, we have the user_details table:
[sourcecode language=”sql”]
CREATE TABLE `user_details` (
`id` char(36) COLLATE utf8_unicode_ci NOT NULL DEFAULT ”,
`user_id` char(36) COLLATE utf8_unicode_ci DEFAULT NULL,
`field` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`value` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
[/cc]

The columns field and value will actually store the user details. Because this table is going to grow vertically, potentially we can store an unlimited and completely arbitrary information about a user. As a matter of fact two different parts of your app can handle storing the user details completely independent of each other. For example, when a user plays the role of a “Guest” versus a user who represents a “Member”, or better yet… a user who is not event a human being, but rather a corporate entity, in which case typical fields like first_name and last_name, may become completely irrelevant.

Hopefully you can start seeing how the key/value approach can help our modeling be a lot more flexible.

… but let’s look closer at some details… how do we deal with this in cake-like-ways?

Actually, it is not that different from what you are already accustomed to.

Let’s see our UsersController:

class UsersController extends AppController {
 
  public function add () {
    if(!empty($this->data)) {
      $this->User->saveAll($this->data, array('validate' => 'first'));
    }
  }

}

Nothing unusual so far…

Next comes our add.ctp view:

echo $this->Form->create('User');
echo $this->Form->input('User.username');
echo $this->Form->input('UserDetail.0.dob');
echo $this->Form->input('UserDetail.1.gender', array(
                        'type' => 'radio',
                        'legend' => FALSE,
                        'options' => array(
                                  'M' => 'M',
                                  'F' => 'F'
                       )));
echo $this->Form->input('UserDetail.2.phone');
echo $this->Form->input('UserDetail.3.email');
echo $this->Form->end('Save');

Nothing unusual so far, yet again.
We are going to save a User with some username as well as some User details: date of birth, gender, phone and an email.
Thinking back, once the save happens, our user_details table will have four records for a single user ID.

Let’s see the models then:

User:

class User extends AppModel {
  public $hasMany = array('UserDetail');

  public $validate = array(
    'username' => array(
        'rule' => 'isUnique',
        'message' => 'Username already exists'
    )
  );
}

UserDetail:

class UserDetail extends AppModel {
  public $belongsTo = array('User');

  public $validate = array(
     'dob' => array(
        'rule' => 'notEmpty',
        'message' => 'Please select a date of birth'
     ),
     'gender' => array(
        'rule' => 'notEmpty',
        'message' => 'Please select a gender'
     ),
     'phone' => array(
        'rule' => 'phone',
        'message' => 'Please enter a phone'
     ),
     'email' => array(
        'rule' => 'email',
        'message' => 'Please enter a valid email'
     )
  );

  public function beforeSave() {
     foreach($this->data[$this->alias] as $field => $value) {
        if($field !== 'user_id') {
          $this->data[$this->alias]['field'] = $field;
          $this->data[$this->alias]['value'] = $value;
        }
     }
     return true;
  }
}

So the validation is handled “as always”.

The only little bit of trickery is in our beforeSave() method…
Here we take our existing fields from the data array (dob, gender, phone…), and convert them to the expected key/value pair (or field/value based on the column names). We are skipping the user_id which is automatically injected by saveAll() just as needed.

Well, this should be a good starting point…

The more extravagant example would be showing the ability to store validation rules and field types in the DB as well. This is helpful if you would like an admin back-end to manage the requirements for user details. As of this example, we’d have to manually adjust the views and validation rules to allow for any new fields. If we store the rules (as serialized array for example) as well as field type(s), the form building and validation would be done on the fly and would allow business users to manage the specifics of the required data.

… but that’s a topic for another day.

  • Nice article.

  • teknoid

    @dogmatic69

    Glad you liked ;)

  • red

    Yep, nice one – thanks!

  • Pingback: Tweets that mention Key/value tables and how to use them in CakePHP 1.3 | nuts and bolts of cakephp -- Topsy.com()

  • This is an awesome approach. I’m going to be experimenting with it. I think it’s something that can be used for many projects. Great article!

  • Thats fantastic, I have never thought about using something like this. I have had all the problems you listed.

    How would you display these in the view though?

  • teknoid

    @Josh

    Displaying in the view is easy, assuming you know the User.id.
    Get all the records and loop through them as usual.

    (I might do another write-up to show index and edit approaches).

  • Nice one. Thanks for the insight.

    However, reporting would be a pain. Need to choose when to use this technique carefully.

  • teknoid

    @Rachman Chavik

    I’d say… depends on the report :)

  • keymaster

    Besides the aribtirary data applicablity you mentioned, EAV modelling (what you are referring to as key-value tables) is primarily used for sparse data sets.

    Really though, IMHO, it’s applicability is limited.

    It kills performance for any kind of complex querying, and to do properly requires a medley of associated metadata tables.

    Obviously it is quicker to set up, hence the temptation, but… what are you really saving – adding a few fields to a table? Doesn’t that take all of 60 seconds?

    In the multiple emails example, your standard alternative is to make a User hasMany Emails association. Then you must create the Emails table, as well as it’s model, controller, etc. Thats’ true. But how long does it really take to do all that, especially with bake – 15 minutes? What percent of your total project’s time is 15 minutes?

    And what are you doing to your ability to query your data later on, in unforseen ways?

    I don’t know, I hear lots of people talking about EAV tables and how easy they are, but really, they aren’t that much easier, and you sacrifice alot in terms of future querying ability, modelling relationships, etc.

  • @teknoid: yup

    Imagine doing a report to get latest registered female users, whose was born on January 1993, and whose email is hosted in yahoo.com

    select *
    from users u, user_details ud
    where u.id = ud.user_id
    and (ud.field = ‘dob’ and ud.value like ‘1993-01%’)
    and (ud.field = ‘gender’ and ud.value = ‘F’)
    and (ud.field = ’email’ and ud.value like ‘%yahoo.com’);

    Also, if you have many custom fields, the number of rows involved with this query will skyrocket.

    But this is very handy if used properly.

    Hmm, wonder how to do this in cakephp speak.

  • I agree with Keymaster. EAV model is very alluring and seems very flexible, but you have to think twice before using it, all the more since it’s so easy to add a field in a model with CakePHP capabilities (ORM and bake).
    You’ll have more difficulties then to build queries (by the way, Rachman’s query is wrong ! you need to alias the user_details once for each of the field you want to get).
    EAV modelling is used for instance in Magento (eCommerce solution) and ez Publish (CMS). You can have products in the same site with many specific attributes that way (sparse data set, as Kemayster points out). In Magento, it creates big performance problems (imagine the query to get a product and, say, 40 attributes to display a simple product list page!) and to improve the speed of the solution, all “details” are eventually denormalized. But the great thing in the Magento API is that the complexity of the query is hidden from the developper (no more complex than doing a $this->model->find to get the extra attributes).

    As for as CakePHP is concerned, I remember a post on the bakery about this… but i’m not sure. By googling, I found :
    http://github.com/finsterdexter/cakephp-eav-behavior
    http://github.com/morrislaptop/eav

  • teknoid

    @All

    Thanks for your comments (and thanks for links as well).
    The performance sacrifice will usually come from improper usage. For example, I’d never query this table directly for reporting (for any serious reporting querying the live DB for read-only purposes is just not a great idea).

    @keymaster

    Like mentioned adding a single field to the table is not an issue, as long as you know exactly what field you need to add. In a case of a CMS this may not be applicable.
    I don’t know about using User hasMany Email as an alternative, if I start making these associations to any type of data that might require multiple records (who knows there could be dozens) the key/value table will be no slower both in terms of development and querying.

  • jarrod

    I also like the flexibility of the k/v approach but besides the reporting performance issues there is also the lack of support for database field types and lengths. I hate the idea of storing dates as varchar!

  • teknoid

    @jarrod

    Proper modeling would include meta-data, but that would be too much for a simple example.

  • keymaster

    This is easily the number #1 blog in the cake blogosphere.

    So many interesting topics are brought up, and teknoid always has an original approach to often subtle problems.

    This blog is more fun than reading the morning newspaper :-))

    Keep up the great work!

  • teknoid

    @keymaster

    Thank you. Nice to hear.

  • (i agree with keymaster about this blog, by the way, and i wish there would be more blogs like this in the cakephp blogosphere)

    @jarrod to quote again magento’s example, split value tables are used, according to types : one for varchar, one for datetime, one for int…

  • Hello all,

    I used Felix solution for this earlier this year:
    http://debuggable.com/posts/unlimited-model-fields-expandable-behavior:48428c2e-9a88-47ec-ae8e-77a64834cda3

    Its pretty handy, but i agree with above comments about reporting / querying etc. It makes that stuff so much more complicated, you only want to do it for specific usages.

    Eg, when something isn’t going to have to be searched by one of the expanded fields.

    Still, handy to have in the toolkit.

    Will

  • teknoid

    @will

    Thanks for sharing the link.

  • gabrielr

    Hi, nice article. But I have a question and I Hope, you help me.

    In the case create new record, there is no problem with de method saveAll() and the example, but what happend when I try to edit that records using saveAll(), cakephp Inserts new records in the Deteail’s Table, not Update that, What Im doing wrong?

  • teknoid

    @gabrielr

    You need to ensure that you are passing an ID with each record you are editing. Only then an update will be triggered.

    I would love to explain in more detail, but that would really take a while.

    (I promise to do a follow-up very soon, which will show how to edit existing records for this scenario).

    • gabrielr

      Thanks for the answer, I try it …

  • vela

    whenever I open this blog to look for some decision, I see you just have posted an article about my problem :)))
    Thanks! :)

  • teknoid

    @vela

    Thanks. Good to hear.

  • I’ve been struggling with this very problem, for a realty website. It would be great if the admin could add new property fields, but of course they’d have to be searched…. I like this technique and plan on using it for site settings, but what other ways are there (that hopefully wouldn’t hurt search capabilities)?

    As always tho, excellent article and topic, Please sir, can’t I some more?

  • teknoid

    @chris

    There is hardly a simple answer for this, but my recommendation would be to create a view (or a cache of sorts)… which would allow you to query the needed data.

    As mentioned above, EAV or key/value system is not the best approach for intensive read operations, but very flexible for storage.

  • Ross

    The above works perfectly for adding, but doesn’t quite work with editing. Here’s my attempt:

    Form->input(‘ContentDetail.0.id’);
    echo $this->Form->input(‘ContentDetail.0.website’, array(‘value’=>$this->data[‘ContentDetail’][0][‘value’]));
    echo $this->Form->input(‘ContentDetail.1.id’);
    echo $this->Form->input(‘ContentDetail.1.contact_email’, array(‘value’=>$this->data[‘ContentDetail’][1][‘value’]));
    echo $this->Form->input(‘ContentDetail.2.id’);
    echo $this->Form->input(‘ContentDetail.2.contact_phone’, array(‘value’=>$this->data[‘ContentDetail’][2][‘value’]));
    echo $this->Form->input(‘ContentDetail.3.id’);
    echo $this->Form->input(‘ContentDetail.3.address’, array(‘type’=>’textarea’, ‘value’=>$this->data[‘ContentDetail’][3][‘value’]));
    ?>

    But it doesn’t feel very cakey?

    ContentDetail is my `UserDetail` table.

    This does appear to be work, but again, I don’t think it’s the best approach.

    Have I missed something glaringly obvious? Thanks.

%d bloggers like this: