Friday, February 24, 2012

Celko nested sets vs. linked tables

Hi,

I am trying to revamp our product database with a view to making it search-optmised and would like some guidance (or confirmation of method, if you will!!). We currently use a three table structure (Product, Brand, Cat(egory)) along the lines of :

create table product
(prod_id int not null,
brand_id int not null,
cat_id int not null,
other stuff e.g. tech. specs, displayed text on web page, etc...
)

with corresponding brand_id and cat_id in the other tables. While this seems relationally sound I see it as being inefficient for searching, particularly after reading the theory behind nested sets.

A new function I am building will enable users to drill down through the product list or runs searches against all or part of the db :

e.g. all products from one category,
all products fitting certain search criteria,
products from several selected brands fitting certain criteria,
and any combination of the above you can think of!

The problem is, not all products have the same criteria list (in fact I would be surprised if any did) and may also be of more than one category (a digital camera with movie mode might easily fit into the digital camcorder search). I think I am correct in that a nested set would make the structure fit the requirement - things like criteria, displayable text, etc. could be nodes in their own right and each logical level might have its own criteria. For example, if a category is selected then certain text must be displayed and could list further categories or products. The next level down would then require its own displayable text - I am mainly thinking about SEO tags here. Also, I am not precious about retaining the current table structure and would like an open ended solution where I can add further data/functionality in a dynamic fashion, which nested sets seem to embody.

Does this make sense to anybody coz I think I've confused myself even more!!

Thanks in advance,

Gok...a collective ...huh?|||Originally posted by hoyleg
Does this make sense to anybody coz I think I've confused myself even more!!

<voice type="curly">
I'm tryna think but nothing happens!
</voice>|||LOL

Why I outta...

http://www.threestoogesmovies.com/threestoogesquotes.html|||hehe ok my sides are hurting from the 3-stooge-fest but I still don't feel I'm much closer to an answer ;-)

Collectives? That sounds like a reasonable term but not one I've ever read about or heard mentioned (but then my db training is around 10 yrs old). Care to elaborate (I wonder if it's an american term for something I've previously heard under a dif. name)?

Cheers,

G|||okay, let's start with your requirement to "drill down through" the products

this implies a bill of materials structure, product components within products within product assemblies, right?

do a search for adjacency model, you will get plenty of hits in this site here|||It's not BoM I'm trying to achieve here, more a dynamic category, subcategory relationship (where subcategory may or may not exist) and I've a good idea how to achieve this with sets. The slightly more complicated problem is how to enable search by brand - if product is the last node in the tree (say) and brand is specific to product then a query to determine which categories to display given a specific brand becomes quite hairy. I guess I'm trying to combine two ways of looking at my data - bottom-up and top-down in one method.|||definitely if category belongs to brand then it's bottom up

and what do you do if a category at a higher level belongs to a different brand than categories at lower levels in its subtree?|||Originally posted by hoyleg
It's not BoM I'm trying to achieve here, more a dynamic category, subcategory relationship (where subcategory may or may not exist) and I've a good idea how to achieve this with sets. The slightly more complicated problem is how to enable search by brand - if product is the last node in the tree (say) and brand is specific to product then a query to determine which categories to display given a specific brand becomes quite hairy. I guess I'm trying to combine two ways of looking at my data - bottom-up and top-down in one method.

Exactly...Bottom Up...And it sounds like the data is fairly static no?

It's like code tables...

Have a nightl process to rebuild to data. Have the element column 1, column 2 the level, column 3 the lineage of the data...then you'll have it all in one reference...

you can even have the grandady of all in one seoarate column...

granted this will take iterations and a cursor, but it'll be done overinight...

oh, and collectivley...meant speaking on behalf of (which I have no right to do) the entire forum...|||Oh and hey...check this out...

http://www.sqlteam.com/item.asp?ItemID=8866|||that robvolk method (depth and lineage) is but one way to skin the cat

here's another: Multi-Threaded Message Board Solution -- Hierarchical Data without the Adjacency Model (http://morgankelsey.com/code/multi-threaded_board/)|||Very cool...but rank is already associated with the data...without it, won't it be a problem?|||May be a tad out of my depth here, so feel free to give me change for my .02 USD.

I don't see rank as having anything to do with the data. The data could be ordered in any number of ways (price, number of buttons, shininess of casing material, etc.). Because of that, I am not sure a Hierarchical model is going to be so good for the problem at hand. Would it be better to just have a mapping table between the category table and the product table, so you can have different products be members of multiple different categories (as in the digital camcorder example above)?

Not too sure how you mean drill down. Do you mean from category to product? Or are the products grouped in other ways?|||Originally posted by MCrowley
May be a tad out of my depth here, so feel free to give me change for my .02 USD.

I don't see rank as having anything to do with the data. The data could be ordered in any number of ways (price, number of buttons, shininess of casing material, etc.). Because of that, I am not sure a Hierarchical model is going to be so good for the problem at hand. Would it be better to just have a mapping table between the category table and the product table, so you can have different products be members of multiple different categories (as in the digital camcorder example above)?

Not too sure how you mean drill down. Do you mean from category to product? Or are the products grouped in other ways?

No I think you're correct...it's just that I still don't understand...

What's the goal here?|||ah! a collective ... d'oh!

some good input here - I'm just applying my model to those suggested to work out which is best. my brain really hurts!

I'll try and fill in a few gaps too -

Yes, the data could simply be described in a couple of look-ups but I'm trying to optimise for selects. It might not be a problem initially but when I'm fielding 100k+ records against ~100 brands and ~100 categories (some with subcats) I am worried response times will be poor, especially when I'm specifiying lots of 'where' clauses on top of two joins (+ non-dedicated server). What initially attracted me to nested sets was the fact that because you already know the first and last id for every child/grandchild of a node a select is a one-liner and wheres can be piled on afterward without much worry.

That and the products do appear to be grouped in two ways - by category and by brand. I see category as being the most important as users are more likely to be comparing products of the same type, which would mean brand could then simply be a criteria to search under. OK, I think that's one question answered!

To recap I see a table (nested set) of categories (an example traversal would be: Digital Products -> DVD Equipment -> DVD Recorders -> Products, another: Digital Products -> Digital Cameras -> Products). Everything on a last node is then a product, all others are category and will be used for most display purposes (breadcrumb trails, etc.). Every category/subcategory/product would have displayed text as additional fields (h1, h2, meta tags, etc.) and only product would have manufacturer as an ID against a second, look-up table (to avoid any major data redundancy). Given this setup, where are criteria lists best placed? This would essentially be a list of tech. features a user could specify to narrow their search down a bit and would change with every category/subcategory/product.

Have a look at www.dealtime.co.uk as a nice indication of where I am trying to go with this, only with an additional emphasis on a really powerful search facility to complement. Drill down through the categories and have a look what pops out - products and subcats on the same page, etc. I don't suppose anybody reading this designed this site?? No? Bugger...|||Brett's link to SQLteam.com is informative, but I think their opinions are a little out of date. I've found it easy to navigate adjacency models now that table functions have been implements in SQL Server 2000. Here is a skeleton table function that returns all the progeny for a parent ID, plus their nesting level:

-------------------
create function GetChildren(@.ParentID as int)
returns @.OutputTable Table (ParentID int, ChildID int, NestLevel int)
as

begin
--First ensure the specified ID exists in the table
insert into @.OutputTable (ChildID, NestLevel) select RecordID, 0 from RecordTable where RecordID = @.ParentID

--Then iterate until done
while @.@.RowCount > 0
insert into @.OutputTable (ParentID, ChildID, NestLevel)
select RecordTable.ParentID, RecordTable.RecordID, OutputTable.NestLevel + 1
from RecordTable
inner join @.OutputTable OutputTable on RecordTable.ParentID = OutputTable.RecordID
where not exists(select * from @.OutputTable CurrentIDs where CurrentIDs.ParentID = RecordTable.ParentID and CurrentIDs.ChildID = RecordTable.RecordID)
return
end

------------
I've joined this type of function into statements in procedures many times, and it has always worked effectively.

I've never implemented the Nested Set model. It looks like it might be interesting to play around with, but I've had a hard enough time explaining the adjancency model to beginning-to-midlevel dbas. I don't know that you could ever count on finding a designer to support the nested set model.|||No...but this guy...

http://www.sqlteam.com/forums/pop_profile.asp?mode=display&id=100

Maintains this one...

http://www.sqlteam.com/default.asp

And blindamn may be correct, however it is definetly not set based like Rob is trying to suggest.

Also, YUKON will have this functionality built in...don't know what it's called, but the recursive query paramaters in Oracle are called by using CONNECT BY...

How often is the (category) data updated? Isn't this like an admin function?|||hehe my developer will support nested sets if I tell him to ;)

There's only myself and one other guy to worry about (at the mo!) and by docs/comments I hope to make the code operation obvious.

And yes, the data is largely static and admin functions will handle updates but they don't have to be the slickest code possible - it's just viewer performance I am worried about.|||That's why I guess I'm hinting at a Noramalized set of data for transactional purposes and then a "viewer" set of denormalized data that get's built nightly...|||Ahh, I think I am beginning to see. I would lean toward the nested set only oon the categories table, then the leaf nodes of the categories are mapped to products in a standard mapping table. Yes, you would have a 100k+ map table joined to a 100k+ products table, but the indexes *should* be able to take care of that problem.

But this assumes that the only reason you want the full hierarchy is to display the breadcrumbs on the top of the page? No, you might want to search on a leaf node category right off from the get-go. But that is just as easy with a cat_id, name, parent_id scheme...Maybe I should just let this thread pass by...

I must be missing something here. Just can't tell what it is. What part of the searching is it that concerns you the most?|||The main advantages i can see by using sets would be the lightweight query for page content and link generation (and the extra info would be for all SEO data aswell as breadcrumbs e.g.page title, metas, header tags), nice clean selects (inserts and updates look ugly no matter what the implementation) and easily resizeable node depths (other methods all seem to use recursion which I'm trying to avoid).

The categories do seem ideal for this method and what worries me about the search is attaching search criteria for categories. I need generic criteria for categories (RRP, dimensions, power supply, etc.) and then add-on additional criteria for subcategories (screen size for tvs, zoom for cameras, etc.). As selecting from a set includes all progeny (love that word!) easier than not it again seems to fit the model.

Now I think about it, the bottom node could contain two table refs, one a table of criteria ids, again referencing a look-up of criteria details such as display name and the other a table of productids, each with a manufacturer id ... you can guess the rest. Thus I get no recursion (in select) and a nice clean generic browse along with a quick criteria lookup. Can't avoid the joins but they don't affect a huge chunk of record selection.

Thanks for all the input guys but I think that's it sortd (let me know if there's any glaring holes tho!!). Now I've just got to build the sodding thing.

No comments:

Post a Comment