zen of coding

HABTM and JOIN trickery with CakePHP

Update (10/7/2009)… this functionality has been rolled into Habtamable behavior

An interesting question came up on IRC today…

If I have Post HABTM Tag, how do I select only Posts that have Tags “new” AND “cakephp”?

A seemingly simple problem, actually required a pretty tricky solution. I do have to say that it’s not because of some cake limitation, but really (at least for me) it’s not at all straight forward to do that type of query in good ol’ SQL (yeah, I don’t really know SQL all that well… good thing we’ve got google).

Before going further, I’ll assume that you have a solid understanding of the way HABTM works, concepts of join tables, auto “with” models and cake conventions. If not, you should probably refer to my previous posts about HABTM and/or read up in the manual.

Alright, let’s analyze the problem for a second… basically we need to grab all Posts where in the join table a single post_id would match two tag_id’s (i.e. the tag_id’s of “new” and “cakephp”). Note, that we cannot match just one or the other, we must have both tag_id’s matching the same post_id. Not only that, we don’t know tag_id’s as we only know the search terms (“new” and “cakephp”).

Looks like we’ll need some creative SQL and JOINs to get this working right…

Surely, we could rely on Model::query(), but let’s see if we can get this working with find() instead.

First of all we have to force cake to build a join query.

So let’s do something like this:

$searchTerms = array('cakephp', 'new');

$this->Post->unbindModel(array('hasAndBelongsToMany'=>array('Tag')));

$this->Post->bindModel(array('hasOne'=>array(
                                                     'PostsTag'=>array(
                                                         'foreignKey'=>false,
                                                         'type'=>'INNER',
                                                         'conditions'=>array('PostsTag.post_id = Post.id')
                                                     ),
                                                    'Tag'=>array(
                                                          'foreignKey'=>false,
                                                          'type'=>'INNER',
                                                          'conditions'=>array(
                                                              'Tag.id = PostsTag.tag_id',
                                                              'Tag.tag'=>$searchTerms
)))));

I will briefly explain, what’s going on here (if you need more details, see the post linked above on how to force CakePHP to do a JOIN)…

We are telling cake to JOIN our Post model with the PostsTag model (join table: posts_tags) and then JOIN our PostTag model with our Tag model (tag table: tags).
The JOIN conditions are pretty simple, we ensure that Post.id matches the PostsTag.post_id and Tag.id matches PostsTag.tag_id. Of course we need to also ensure that we only grab the tags where Tag.tag is IN our search terms (see the $searchTerms array).

Once all of that is accomplished, we build our find() method:

$this->Post->find('all', array(
                              'group' => array('Post.id','Post.title HAVING COUNT(*) = '.$numCount)))

What?

Let’s break it down…

If we were to do a simple find(‘all’) we’d get all Posts that happen to have Tag.id’s matching either one of our search terms. This is not what we need.

By adding the GROUP BY and HAVING COUNT(*) = $numCount, we ensure that we match both of our Tag ids and not just one or the other. In other words, COUNT (*) must equal to the number of search terms.

OK, so what is $numCount?
In our example we know that we searched for two terms ($searchTerms array), therefore we could have done:

HAVING COUNT(*) = 2.

However, if our $searchTerms array had an unknown number of items, we’d do something like this prior to our find() call, to determine how many Tag.id’s must be matched in order for our query to be correct:

$numCount = count($searchTerms);

I can imagine that this is probably a bit over the top, but it certainly taught me a few things and reinforced some others, hopefully you’ll learn a thing or two from this as well ;)

  • wargoth

    Post->PostsTag->bindModel(
    ‘belongsTo’ => array(‘Post’)
    );

    $this->Post->PostsTag->find(
    ‘all’, array(
    ‘conditions’ => array(
    ‘PostsTag.tag_id’ => $tag_ids_array
    )
    )
    );

    ?>

    But first you should find the $tag_ids_array.

  • teknoid

    @wargoth

    That’s not going to work for this case, because array(1,2,3,4) produces an IN clause, which is essentially an OR. We need to ensure that for a given post_id there is matching tag_id=1 AND tag_id=2

  • Ugh… there’s gotta be a simpler way of doing this. I mean, this works and great job figuring it out, but it seems to lack… elegance.

    @wargoth, that also won’t work where you’re trying to do a MATCH() AGAINST() fulltext search.

  • teknoid

    @Finster

    It doesn’t even look all that elegant in plain SQL ;)

    That being said, I don’t know of any other way to force a JOIN in cake … and generally speaking this post was more of an experiment and a learning experience.

  • Oh, I wasn’t trying to criticize your method at all, more I was criticizing the fact that CakePHP still requires this level of “hackery” for HABTM. This is a very helpful post, all around!

  • teknoid

    @Finster

    I agree, hopefully in 2.0 we’ll see a more elegant way to manage JOINs.

  • Oh finally, i’ve been looking for this solution for a long time..

    But, then how can I add pagination for this result..?

  • teknoid

    @labanux

    Just use $this->paginate(‘Post’);

  • I’m sorry, I’m still a little bit confused with pagination..

    Consider I have Posts Controller, so is it will be :

    some_function()
    {
    $this->Post->find(‘all’);

    $this->set(‘posts’, $this->paginate(‘Post’);
    }

    So is that mean everytime I called $this->paginate->(‘Post’), the result will automatically taken from find() function before ?

  • Oh sorry, I mean

    $this->Post->find(’all’); <-- replaced with your code I've tried that, and it didn't work.

  • teknoid

    @labanux

    You don’t need find(‘all’) if you use paginate() because paginate() does really the same thing (i.e. finds some data) except it applies the limit and order to your query. Well, that’s not fair… it does a lot of other great things, but on the basic level it’s just a way to find data for a given model.

    The technique should work (don’t remember if I’ve tested it), since paginate() actually calls find() internally and applies the same JOINs as find() would.

  • $searchTerms = array(‘cakephp’, ‘new’);

    $this->Post->unbindModel(array(‘hasAndBelongsToMany’=>array(‘Tag’)));

    $this->Post->bindModel(array(‘hasOne’=>array(
    ‘PostsTag’=>array(
    ‘foreignKey’=>false,
    ‘type’=>’INNER’,
    ‘conditions’=>array(‘PostsTag.post_id = Post.id’)
    ),
    ‘Tag’=>array(
    ‘foreignKey’=>false,
    ‘type’=>’INNER’,
    ‘conditions’=>array(
    ‘Tag.id = PostsTag.tag_id’,
    ‘Tag.tag’=>$searchTerms
    )))));

    $this->set(‘posts’, $this->paginate(‘Post’)); Post->find(‘all’, array(‘group’ => array(‘Post.id’,’Post.title HAVING COUNT(*) = ‘.$numCount))) <-- return the right value.. help me please.. :( I'm dying solving this..

  • teknoid

    @labanux

    You never explained what the actual problem is… you should really post it at the google group as you’ll get better help there.

    But take a look at the queries generated, are you getting expected query for your paginate()?

  • Pingback: Has not and belongs to many | Cherry on the...()

  • Gaurav

    Hi

    I gone through your article and i am currently user the following code inside my controller action

    $this->User->unbindModel(array(‘hasAndBelongsToMany’=>array(‘Plan’)));
    $this->User->bindModel(array(‘hasOne’=>array(
    ‘userPlan’=>array(
    ‘foreignKey’=>false,
    ‘type’=>’INNER’,
    ‘conditions’=>array(‘userPlan.UserId = User.UserId’)
    ),
    ‘Plan’=>array(
    ‘foreignKey’=>false,
    ‘type’=>’INNER’,
    ‘conditions’=>array(
    ‘Plan.PlanId = userPlan.PlanId’
    )))));

    Here my User table and Plan tables are not directly related to each other. UserPlan table having the userid and the planid.

    But when i execute this gives me the following error

    Query: SELECT `User`.`UserId`, `User`.`UserCode`, `User`.`Title`, `User`.`FirstName`, `User`.`LastName`, `User`.`Email`, `User`.`Password`, `User`.`Phone`, `User`.`Address1`, `User`.`Address2`, `User`.`City`, `User`.`State`, `User`.`PostalCode`, `User`.`OrganizationName`, `User`.`OrganizationUrl`, `User`.`UserStatus`, `User`.`UserIdentificationFlag`, `User`.`SupervisorAccess`, `User`.`CreateDate`, `User`.`ModifyDate`, `userPlan`.`Id`, `userPlan`.`UserId`, `userPlan`.`PlanId`, `userPlan`.`ActivationDate`, `userPlan`.`ExpirationDate`, `userPlan`.`ModifyDate`, `Plan`.`PlanId`, `Plan`.`PlanName`, `Plan`.`PlanDescription`, `Plan`.`PlanType`, `Plan`.`PlanFee`, `Plan`.`PlanPeriod`, `Plan`.`PlanStatus`, `Plan`.`PlanCreateDate`, `Plan`.`PlanModifyDate` FROM `MTS_User` AS `User` LEFT JOIN `MTS_UserPlan` AS `userPlan` ON (`userPlan`.`MTS_User_id` = `User`.`UserId` AND `userPlan`.`UserId` = `User`.`UserId`) LEFT JOIN `MTS_Plan` AS `Plan` ON (`Plan`.`MTS_User_id` = `User`.`UserId` AND `Plan`.`PlanId` = `userPlan`.`PlanId`) WHERE 1 = 1
    Warning: SQL Error: 1054: Unknown column ‘userPlan.MTS_User_id’ in ‘on clause’ in C:\wamp\www\WMT\cake\libs\model\datasources\dbo_source.php on line 439

    Please suggest me how to put the innter join in my case and get the desired result.

    Any help would be greatly appriciated. I am really banging my head for this issue from last 3 days.

    Thanks

  • @Gaurav

    What version of cake are you using?
    ‘type’=>’INNER’ and ‘foriegnKey’=>false were introduced somewhere around RC1 release.

  • Gaurav

    Hi

    I am using CakePHP 1.1.

    Thanks

  • @Gaurav

    Yep, none of this is going to work on 1.1… so your best hope is custom SQL or upgrade, if you can (should).

  • I am getting an error if I put in ‘recursive’ =>2 in the find statement. I think it’s because we unbinded the relationship and the ‘recursive’ is trying to go back and use that relationship.

    Do you have this problem?

  • @Jesse

    The bindModel() plays the role of the recursive, so you don’t need to include recursion level, you just manually bind all the required Models in your JOIN.

    Then again, you didn’t specify what error you are getting, so it’s a bit hard to guess ;)

  • I’ll include the SQL statement as well as the error:

    SQL:
    SELECT `OnsaleCoupon`.`id`, `OnsaleCoupon`.`onsale_merchant_id`, `OnsaleCoupon`.`title`, `OnsaleCoupon`.`content`, `OnsaleCoupon`.`datestart`, `OnsaleCoupon`.`dateend`, `OnsaleCoupon`.`retail_price`, `OnsaleCoupon`.`sale_price`, `OnsaleCouponsCategory`.`title`, `OnsaleCouponsCategory`.`onsale_coupons_categories_division_id` FROM `onsale_coupons` AS `OnsaleCoupon` INNER JOIN `onsale_coupons_categories` AS `OnsaleCouponsCategory` ON (`OnsaleCouponsCategory`.`id` = `OnsaleCouponsOnsaleCouponsCategory`.`onsale_coupons_category_id` AND `OnsaleCouponsCategory`.`id` IN (6, 11)) WHERE `OnsaleCoupon`.`datestart` NOW() GROUP BY `OnsaleCoupon`.`id`, `OnsaleCoupon`.`title` HAVING COUNT(*) = 2

    Error:
    1054: Unknown column ‘OnsaleCouponsOnsaleCouponsCategory.onsale_coupons_category_id’ in ‘on clause’

    I’ve got my problems outlined here: http://groups.google.com/group/cake-php/browse_thread/thread/9a985e04ddf076bf/315c5a809af3c61b?lnk=gst&q=HABTM+searcg#315c5a809af3c61b

    Any suggestions?

  • @Jesse

    I haven’t looked at the issue very closely, as I am a bit short on time… but here are my suggestions:

    1. Bind the Merchant model as part of your JOIN.
    OR
    2. Grab the required id’s in an array, perhaps using Set::extract() and run a separate query to grab the Merchant data.

    … well I hope it sets you on the right track, if anything. Good luck ;)

  • I had to solve a similar problem, so here’s my solution.

    I’m new to CakePHP, so i might have done something really horrible. On the other hand, my SQL statement should be good. I just tried to translate that to CakePHP.

    /* SQL ***********************************************************************/

    SELECT posts.*
    FROM posts
    WHERE posts.id IN ( SELECT t1.post_id
    FROM posts_tags AS t1
    JOIN tags ON tags.id = t1.tag_id
    WHERE tags.name = “cakephp” )
    AND posts.id IN ( SELECT t2.post_id
    FROM posts_tags AS t2
    JOIN tags ON tags.id = t2.tag_id
    WHERE tags.name = “new” );

    /* CakePHP *******************************************************************/

    $args = array();
    $args[‘tags’] = ‘cakephp, new’;
    $filters = array();

    if( isset( $args[‘tags’] ) && trim( $args[‘tags’] ) != ” ) {
    $i = 0;
    foreach( explode( ‘,’, $args[‘tags’] ) as $tag ) {
    $i++;
    $tag = mb_trim( $tag );
    $filters[] = ‘Post.id IN ( SELECT t’.$i.’.post_id FROM posts_tags AS t’.$i.’ JOIN tags ON tags.id = t’.$i.’.tag_id WHERE tags.name = “‘.Sanitize::clean( $tag, array( ‘encode’ => false ) ).'” )’;
    }
    }

    $posts = $this->paginate( ‘Post’, $filters );

  • @jmjjg

    Thank you for sharing your approach. The sub-select, I’m personally not a big fan of… but if it does the trick for you, then that’s fine. That being said, the goal of the experiment was to avoid any straight SQL and I don’t see anyway to do so with sub-select.

  • @teknoid

    Well, the sub-select is what naturally comes to my mind, and I have trouble (at the moment) “thinking in CakePHP”. So I’ll try your approach soon.
    Also, I’m quite happy that you say “… avoid any straight SQL and I don’t see anyway to do so with sub-select [with CakePHP]”, which means that I at least searched correctly for a solution to that.

  • @jmjjg

    Yep, well… I hope it works out for ya.

  • Munks

    Hey All,

    I’ve only started using cakephp recently, but have already learned a great deal in the short time I’ve been using it. To respond to earlier concerns of pagination, the reason this won’t work with the paginate function is because it calls find() twice. The bind and unbind functions will only apply your changes until the search has been performed. Unfortunately, I have no good advice on how to remedy this.

    Cheers!

  • @Munks

    The remedy is quite simple, you need to supply the false param to the bindModel() method, which makes the bind persistent. Please refer to the API for more detailed info.

  • poornima

    i am new to cake php i get select query problem when i create my login page i wanted to clear it so please suggest some answers

  • @teknoid: I have been using your code fine up until I needed to get events created by Causes OR EventOrganisers and my inner join is on the Causes table so instantly discounts any events from EventOrganisers.

    I’ve added a topic on Google Groups if you have the time to help me out:

    http://groups.google.com/group/cake-php/browse_thread/thread/1222420fd151fefe

    Thanks,

    Paul.

  • The answer was a simple as changing the JOIN type to LEFT which I had tried earlier but at a time when I had tried a few other things at the same time which were the real issues.

    I take it your use of INNER was a personal preference in a situation where you only wanted those records that had matching associations?

  • @Paul Gardner

    INNER type, was indeed the situation during the write up.
    While the manual approach is good, and I’m glad you’ve got it sorted out, I would totally love to hear any feedback (experience) with the habtamable behavior linked above ;)

  • I may just have to have a play with your habtamable behaviour. Would it resolve the following issue I am having with what I posted above?

    Page: http://www.localcause.org.uk/causes

    In the right column you can set Content Filters to limit the causes shown by one or more regions and categories. My categories are HABTM associated and selecting two categories that both match a cause (i.e. the cause says they work in both categories) results in the cause appearing twice in the results as the INNER JOIN to category created a new non-distinct data row.

    Am I missing something obvious or is this really an issue that maybe your behaviour automatically filter this out? At present I am resolving it by specifying my fields as follows to force DISTINCT, but seems a little dirty to not be setting my fields as a true array:

    ‘fields’=>array(
    ‘DISTINCT Model.field1, Model.field2, Model.field3, Model.field3’
    )

  • OK, so to get the values as I need them I only needed to add DISTINCT to Model.id rather than all fields, but that’s when I ran into the fun issue of paginate not using DISTINCT for it’s COUNT.

    I have spent a few hours on this and came up with a little core hack (I hate doing this, but generally leads to a suggestion as to how to un-hack and achieve same result).

    /cake/libs/controller/controller.php: #1066 (Edit)
    Change $parameters = compact(‘conditions’); to
    $parameters = compact(‘conditions’, ‘fields’);

    /cake/libs/model/model.php #2042 (New line)
    if (is_array($query[‘fields’]) && stristr($query[‘fields’][0], ‘distinct ‘)) $query[‘fields’] = $query[‘fields’][0];

    The first hack passes your fields array to the find(‘count’) call and the second hack then checks to see if the first field array value (normally your main models primary id) is DISTINCT.

    If it is, it overwrites $query[‘fields’] to with $query[‘fields’][0] to satisfy the elseif(is_string()) check that already exists. This makes $db->calculate generate a query with COUNT(Model.primary_id) rather than COUNT(*) giving me the right paginate count INT.

    Anyone know if we can implement this change without hacking the core?

  • @Paul Gardner

    In order to make the “fake” bind work with pagination, you need to pass a false param. That makes the bind permanent and will be used with both the first (count) query and the second query, which actually fetches the results.

    I don’t have that feature built-in in he behavior, but it should be extremely easy to add.
    Hacking the core, is just not a good idea :)

  • @teknoid: that is true for most cases, but as paginate’s core Model::_findCount() method always does its counts as COUNT(*) when you are searching for Posts using say 3 tags and one post matches all 3 tags, then it counts the Post 3 times even if you have DISTINCT Post.id in your field list.

    My hack, which I hate doing and don’t recommend to anyone, checks to see if my field list has DISTINCT Post.id as the first value in my fields array and if so use COUNT(Post.id).

    • Carlos

      I believe the problem to your solution would be to add a custom paginateCount field to your model (or app_model) and fix your count query to get only the correct number of distinct post without recounting posts. What i have done is add this function to my app_model:

      function paginateCount($conditions, $recursive, $extra){
      $parameters = compact(‘conditions’);
      if ($recursive != $this->recursive) {
      $parameters[‘recursive’] = $recursive;
      }
      //only diff from original is adding the count_fields
      if(isset($extra[‘count_fields’])){
      $extra[‘fields’]=$extra[‘count_fields’];
      unset($extra[‘count_fields’]);
      }
      $count = $this->find(‘count’, array_merge($parameters, $extra));

      return $count;
      }

      Basically in the $this->paginate array in your controler you just add $this->paginate[‘count_fields’]=’DISTINCT Yourmodel.id’; So when you do that the count query will use count(DISTINCT YUourmodel.id) instead of count(*) thus giving you the correct count so pagination works correctly.

  • Karl

    Thanks for this. I wonder if there is a way to do this with multiple HABTM relationships though?

    For example, I’m building a search for a Publication model. A Publication HABTM Publisher and HABTM Author.

    Your technique works great for searches where I want to find all Publicatrions with all of the specified Publishers, but if I want to find Publications with all of the specified Publishers AND all of the specified Authors I get stuck.

  • @Karl

    Should be the same thing, just add another model to the chain.

  • Karl

    What would the GROUP BY look like though? With a single association it is:

    ‘Post.title HAVING COUNT(*) = ‘ . $numCount

    But if I’m searching >1 association I don’t see how this could work. What if I search for 3 publishers and 5 authors – is $numCount 3 or 5, or does the group option need to be different all together?

  • FierceHawk

    Thanks for this post, it was helpful for me.

    I’d like to point out that if you want to use this technique with pagination, you must supply false as the second parameter to bindModel(). Viz:

    $this->Post->bindModel(array(‘hasOne’=>array(
    ‘PostsTag’=>array(
    ‘foreignKey’=>false,
    ‘type’=>’INNER’,
    ‘conditions’=>array(‘PostsTag.post_id = Post.id’)
    ),
    ‘Tag’=>array(
    ‘foreignKey’=>false,
    ‘type’=>’INNER’,
    ‘conditions’=>array(
    ‘Tag.id = PostsTag.tag_id’,
    ‘Tag.tag’=>$searchTerms
    )))), false);

    This is because bindModel() is by default not persistent but it needs to be persistent in order to maintain the binding across the multiple queries performed by paginate(). Specifying false as the second argument tells Cake to keep the binding persistent across queries.

    Hope this helps someone out.
    -FierceHawk

  • Thanks for this. It helped jump start me on my own solution for multiple tags to which I blogged about (citing reference) here:

    http://www.webtechnick.com/blogs/view/237/HABTM_Search_By_Multiple_Tags_in_CakePHP

  • @Nick

    No problem, glad it helped.

  • I think I love you for this post. Seriously, I’ve been pulling out my hair (and other people’s hair) trying to apply filters to a Model with tons of HABTM relationships. I’m still relatively new to CakePHP, so this solution wasn’t very obvious to me, but glad Google led me here. Thanks.

  • teknoid

    @Matt

    LOL, thanks.
    By the way I know an excellent barber who makes great wigs. (You know keeping a good hair style is important, after all) :)

  • I too faced the issue that Paul faced related to pagination.

    Was there any solution to make findCount do a distinct ?

  • Mosin

    @teknoid, Thank you for the wonderful post.
    I was searching for something similar for more than a week.
    I used hasOne join trick as you explained.

    Search parameters are working great both on Post and Tag model.
    But my output seems to be little different.

    Here is my find:
    $output=$this->Post->find(‘all’, array(
    ‘group’ => array(‘Post.id’),
    ));

    My output looks like below:
    Array
    (
    [0] => Array
    (
    [Post] => Array
    (
    [id] => 1
    [title] => test post1
    )

    [PostsTag] => Array
    (
    [id] => 1
    [post_id] => 1
    [tag_id] => 1
    )

    [Tag] => Array
    (
    [id] => 1
    [name] => php
    )

    )

    [1] => Array
    (
    [Post] => Array
    (
    [id] => 2
    [title] => test post2
    )

    [PostsTag] => Array
    (
    [id] => 6
    [post_id] => 2
    [tag_id] => 3
    )

    [Tag] => Array
    (
    [id] => 3
    [name] => xml
    )

    )

    )

    I was expecting something like below:
    Array
    (
    0 => Array
    {
    [Post] => Array
    (
    [id] => 1
    [title] => test post 1
    )
    [Tag] => Array
    (
    [0] => Array
    (
    [id] => 1
    [name] => php
    )
    [1] => Array
    (
    [id] => 2
    [name] => javascript
    )
    [2] => Array
    (
    [id] => 3
    [name] => xml
    )
    )
    1 => Array
    {
    [Post] => Array
    (
    [id] => 2
    [title] => test post 2
    )
    [Tag] => Array
    (
    [0] => Array
    (
    [id] => 3
    [name] => xml
    )
    [1] => Array
    (
    [id] => 4
    [name] => html
    )
    [2] => Array
    (
    [id] => 5
    [name] => vb
    )
    )
    }

    Any hints on how to get HABTM kind of output while using hasOne join trick?

  • Tnxx

%d bloggers like this: