Friday, February 24, 2012
Celko nested sets vs. linked tables
I am trying to revamp our product database with a view to making it search-optmised and would like some guidance (or confirmation of method, if you will!!). We currently use a three table structure (Product, Brand, Cat(egory)) along the lines of :
create table product
(prod_id int not null,
brand_id int not null,
cat_id int not null,
other stuff e.g. tech. specs, displayed text on web page, etc...
)
with corresponding brand_id and cat_id in the other tables. While this seems relationally sound I see it as being inefficient for searching, particularly after reading the theory behind nested sets.
A new function I am building will enable users to drill down through the product list or runs searches against all or part of the db :
e.g. all products from one category,
all products fitting certain search criteria,
products from several selected brands fitting certain criteria,
and any combination of the above you can think of!
The problem is, not all products have the same criteria list (in fact I would be surprised if any did) and may also be of more than one category (a digital camera with movie mode might easily fit into the digital camcorder search). I think I am correct in that a nested set would make the structure fit the requirement - things like criteria, displayable text, etc. could be nodes in their own right and each logical level might have its own criteria. For example, if a category is selected then certain text must be displayed and could list further categories or products. The next level down would then require its own displayable text - I am mainly thinking about SEO tags here. Also, I am not precious about retaining the current table structure and would like an open ended solution where I can add further data/functionality in a dynamic fashion, which nested sets seem to embody.
Does this make sense to anybody coz I think I've confused myself even more!!
Thanks in advance,
Gok...a collective ...huh?|||Originally posted by hoyleg
Does this make sense to anybody coz I think I've confused myself even more!!
<voice type="curly">
I'm tryna think but nothing happens!
</voice>|||LOL
Why I outta...
http://www.threestoogesmovies.com/threestoogesquotes.html|||hehe ok my sides are hurting from the 3-stooge-fest but I still don't feel I'm much closer to an answer ;-)
Collectives? That sounds like a reasonable term but not one I've ever read about or heard mentioned (but then my db training is around 10 yrs old). Care to elaborate (I wonder if it's an american term for something I've previously heard under a dif. name)?
Cheers,
G|||okay, let's start with your requirement to "drill down through" the products
this implies a bill of materials structure, product components within products within product assemblies, right?
do a search for adjacency model, you will get plenty of hits in this site here|||It's not BoM I'm trying to achieve here, more a dynamic category, subcategory relationship (where subcategory may or may not exist) and I've a good idea how to achieve this with sets. The slightly more complicated problem is how to enable search by brand - if product is the last node in the tree (say) and brand is specific to product then a query to determine which categories to display given a specific brand becomes quite hairy. I guess I'm trying to combine two ways of looking at my data - bottom-up and top-down in one method.|||definitely if category belongs to brand then it's bottom up
and what do you do if a category at a higher level belongs to a different brand than categories at lower levels in its subtree?|||Originally posted by hoyleg
It's not BoM I'm trying to achieve here, more a dynamic category, subcategory relationship (where subcategory may or may not exist) and I've a good idea how to achieve this with sets. The slightly more complicated problem is how to enable search by brand - if product is the last node in the tree (say) and brand is specific to product then a query to determine which categories to display given a specific brand becomes quite hairy. I guess I'm trying to combine two ways of looking at my data - bottom-up and top-down in one method.
Exactly...Bottom Up...And it sounds like the data is fairly static no?
It's like code tables...
Have a nightl process to rebuild to data. Have the element column 1, column 2 the level, column 3 the lineage of the data...then you'll have it all in one reference...
you can even have the grandady of all in one seoarate column...
granted this will take iterations and a cursor, but it'll be done overinight...
oh, and collectivley...meant speaking on behalf of (which I have no right to do) the entire forum...|||Oh and hey...check this out...
http://www.sqlteam.com/item.asp?ItemID=8866|||that robvolk method (depth and lineage) is but one way to skin the cat
here's another: Multi-Threaded Message Board Solution -- Hierarchical Data without the Adjacency Model (http://morgankelsey.com/code/multi-threaded_board/)|||Very cool...but rank is already associated with the data...without it, won't it be a problem?|||May be a tad out of my depth here, so feel free to give me change for my .02 USD.
I don't see rank as having anything to do with the data. The data could be ordered in any number of ways (price, number of buttons, shininess of casing material, etc.). Because of that, I am not sure a Hierarchical model is going to be so good for the problem at hand. Would it be better to just have a mapping table between the category table and the product table, so you can have different products be members of multiple different categories (as in the digital camcorder example above)?
Not too sure how you mean drill down. Do you mean from category to product? Or are the products grouped in other ways?|||Originally posted by MCrowley
May be a tad out of my depth here, so feel free to give me change for my .02 USD.
I don't see rank as having anything to do with the data. The data could be ordered in any number of ways (price, number of buttons, shininess of casing material, etc.). Because of that, I am not sure a Hierarchical model is going to be so good for the problem at hand. Would it be better to just have a mapping table between the category table and the product table, so you can have different products be members of multiple different categories (as in the digital camcorder example above)?
Not too sure how you mean drill down. Do you mean from category to product? Or are the products grouped in other ways?
No I think you're correct...it's just that I still don't understand...
What's the goal here?|||ah! a collective ... d'oh!
some good input here - I'm just applying my model to those suggested to work out which is best. my brain really hurts!
I'll try and fill in a few gaps too -
Yes, the data could simply be described in a couple of look-ups but I'm trying to optimise for selects. It might not be a problem initially but when I'm fielding 100k+ records against ~100 brands and ~100 categories (some with subcats) I am worried response times will be poor, especially when I'm specifiying lots of 'where' clauses on top of two joins (+ non-dedicated server). What initially attracted me to nested sets was the fact that because you already know the first and last id for every child/grandchild of a node a select is a one-liner and wheres can be piled on afterward without much worry.
That and the products do appear to be grouped in two ways - by category and by brand. I see category as being the most important as users are more likely to be comparing products of the same type, which would mean brand could then simply be a criteria to search under. OK, I think that's one question answered!
To recap I see a table (nested set) of categories (an example traversal would be: Digital Products -> DVD Equipment -> DVD Recorders -> Products, another: Digital Products -> Digital Cameras -> Products). Everything on a last node is then a product, all others are category and will be used for most display purposes (breadcrumb trails, etc.). Every category/subcategory/product would have displayed text as additional fields (h1, h2, meta tags, etc.) and only product would have manufacturer as an ID against a second, look-up table (to avoid any major data redundancy). Given this setup, where are criteria lists best placed? This would essentially be a list of tech. features a user could specify to narrow their search down a bit and would change with every category/subcategory/product.
Have a look at www.dealtime.co.uk as a nice indication of where I am trying to go with this, only with an additional emphasis on a really powerful search facility to complement. Drill down through the categories and have a look what pops out - products and subcats on the same page, etc. I don't suppose anybody reading this designed this site?? No? Bugger...|||Brett's link to SQLteam.com is informative, but I think their opinions are a little out of date. I've found it easy to navigate adjacency models now that table functions have been implements in SQL Server 2000. Here is a skeleton table function that returns all the progeny for a parent ID, plus their nesting level:
-------------------
create function GetChildren(@.ParentID as int)
returns @.OutputTable Table (ParentID int, ChildID int, NestLevel int)
as
begin
--First ensure the specified ID exists in the table
insert into @.OutputTable (ChildID, NestLevel) select RecordID, 0 from RecordTable where RecordID = @.ParentID
--Then iterate until done
while @.@.RowCount > 0
insert into @.OutputTable (ParentID, ChildID, NestLevel)
select RecordTable.ParentID, RecordTable.RecordID, OutputTable.NestLevel + 1
from RecordTable
inner join @.OutputTable OutputTable on RecordTable.ParentID = OutputTable.RecordID
where not exists(select * from @.OutputTable CurrentIDs where CurrentIDs.ParentID = RecordTable.ParentID and CurrentIDs.ChildID = RecordTable.RecordID)
return
end
------------
I've joined this type of function into statements in procedures many times, and it has always worked effectively.
I've never implemented the Nested Set model. It looks like it might be interesting to play around with, but I've had a hard enough time explaining the adjancency model to beginning-to-midlevel dbas. I don't know that you could ever count on finding a designer to support the nested set model.|||No...but this guy...
http://www.sqlteam.com/forums/pop_profile.asp?mode=display&id=100
Maintains this one...
http://www.sqlteam.com/default.asp
And blindamn may be correct, however it is definetly not set based like Rob is trying to suggest.
Also, YUKON will have this functionality built in...don't know what it's called, but the recursive query paramaters in Oracle are called by using CONNECT BY...
How often is the (category) data updated? Isn't this like an admin function?|||hehe my developer will support nested sets if I tell him to ;)
There's only myself and one other guy to worry about (at the mo!) and by docs/comments I hope to make the code operation obvious.
And yes, the data is largely static and admin functions will handle updates but they don't have to be the slickest code possible - it's just viewer performance I am worried about.|||That's why I guess I'm hinting at a Noramalized set of data for transactional purposes and then a "viewer" set of denormalized data that get's built nightly...|||Ahh, I think I am beginning to see. I would lean toward the nested set only oon the categories table, then the leaf nodes of the categories are mapped to products in a standard mapping table. Yes, you would have a 100k+ map table joined to a 100k+ products table, but the indexes *should* be able to take care of that problem.
But this assumes that the only reason you want the full hierarchy is to display the breadcrumbs on the top of the page? No, you might want to search on a leaf node category right off from the get-go. But that is just as easy with a cat_id, name, parent_id scheme...Maybe I should just let this thread pass by...
I must be missing something here. Just can't tell what it is. What part of the searching is it that concerns you the most?|||The main advantages i can see by using sets would be the lightweight query for page content and link generation (and the extra info would be for all SEO data aswell as breadcrumbs e.g.page title, metas, header tags), nice clean selects (inserts and updates look ugly no matter what the implementation) and easily resizeable node depths (other methods all seem to use recursion which I'm trying to avoid).
The categories do seem ideal for this method and what worries me about the search is attaching search criteria for categories. I need generic criteria for categories (RRP, dimensions, power supply, etc.) and then add-on additional criteria for subcategories (screen size for tvs, zoom for cameras, etc.). As selecting from a set includes all progeny (love that word!) easier than not it again seems to fit the model.
Now I think about it, the bottom node could contain two table refs, one a table of criteria ids, again referencing a look-up of criteria details such as display name and the other a table of productids, each with a manufacturer id ... you can guess the rest. Thus I get no recursion (in select) and a nice clean generic browse along with a quick criteria lookup. Can't avoid the joins but they don't affect a huge chunk of record selection.
Thanks for all the input guys but I think that's it sortd (let me know if there's any glaring holes tho!!). Now I've just got to build the sodding thing.
Tuesday, February 14, 2012
Catastrophic Failure, MSDASQL, Linked Servers and MySQL
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "MSDASQL" for linked server "foo" reported an error. The provider reported an unexpected catastrophic failure.
Msg 7303, Level 16, State 1, Line 1
Cannot initialize the data source object of OLE DB provider "MSDASQL" for linked server "foo".
This is using SQL Server 2005 with SP2, and MyODBC version 3.51.16 (although previous versions also seem to have the same behavior).
Once the error occurs any subsequent query against the linked server (even something as simple as select top 3 * From foo...users) fails with the same error message. Queries using the OpenQuery syntax also fail with this error.
I've tried to determine if there are some error logs that might give me more insight into what's going on, but the above events don't seem to log to the ODBC trace logs. And I can't seem to figure out a way to trace the MSDASQL (i.e., oledb) events.
I initially thought there might be a problem with the MySQL myODBC driver, but the ODBC layer itself does not seem to have a problem. But the following steps indicate otherwise:
1. Create a new Linked Server to MySQL. Same error as above
2. Execute a query against the ODBC DSN directly, using a shell program. This works fine
I'm trying to clearly identify if the problem is with the MySQL driver or with the MSDASQL provider. It would appear that the latter is at fault, given the error message, and the lack of any ODBC logging.
Would anyone have any insights about what might be going on here, and if there is any possible resolution?
Here is a link to the tracing article : http://msdn2.microsoft.com/en-us/library/aa964124.aspx
MSDASQL does not produce sufficient trace information, since it was not extensively instrumented, but you might get some traces from other layers and modules - ODBC driver manager, MSDART, MSDATL3.
The ODBC driver manager contains much more ETW-enabled tracing in Vista than in the previous versions.|||
Hello Cybertoast:
Can you resolve this problem ?
I have the same error.
Thanks a lot
Daniel Hefez
Catastrophic Failure, MSDASQL, Linked Servers and MySQL
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "MSDASQL" for linked server "foo" reported an error. The provider reported an unexpected catastrophic failure.
Msg 7303, Level 16, State 1, Line 1
Cannot initialize the data source object of OLE DB provider "MSDASQL" for linked server "foo".
This is using SQL Server 2005 with SP2, and MyODBC version 3.51.16 (although previous versions also seem to have the same behavior).
Once the error occurs any subsequent query against the linked server (even something as simple as select top 3 * From foo...users) fails with the same error message. Queries using the OpenQuery syntax also fail with this error.
I've tried to determine if there are some error logs that might give me more insight into what's going on, but the above events don't seem to log to the ODBC trace logs. And I can't seem to figure out a way to trace the MSDASQL (i.e., oledb) events.
I initially thought there might be a problem with the MySQL myODBC driver, but the ODBC layer itself does not seem to have a problem. But the following steps indicate otherwise:
1. Create a new Linked Server to MySQL. Same error as above
2. Execute a query against the ODBC DSN directly, using a shell program. This works fine
I'm trying to clearly identify if the problem is with the MySQL driver or with the MSDASQL provider. It would appear that the latter is at fault, given the error message, and the lack of any ODBC logging.
Would anyone have any insights about what might be going on here, and if there is any possible resolution?
Here is a link to the tracing article : http://msdn2.microsoft.com/en-us/library/aa964124.aspx
MSDASQL does not produce sufficient trace information, since it was not extensively instrumented, but you might get some traces from other layers and modules - ODBC driver manager, MSDART, MSDATL3.
The ODBC driver manager contains much more ETW-enabled tracing in Vista than in the previous versions.|||
Hello Cybertoast:
Can you resolve this problem ?
I have the same error.
Thanks a lot
Daniel Hefez
Friday, February 10, 2012
Cast COM object error on OleDb Destination (Access 2003)
Trying to do a update/insert from SQL 2005 query to Access 2003 linked table.
In the Script Transformation I get this error.
Unable to cast COM object of type 'System.__ComObject' to class type 'System.Data.OleDb.OleDbConnection'. Instances of types that represent COM components cannot be cast to types that do not represent COM components; however they can be cast to interfaces as long as the underlying COM component supports QueryInterface calls for the IID of the interface.
Destinatoin Oledb connection is Native OLEDB Jet 4 to Access 2003 database.
Private sqlConn As OleDb.OleDbConnection
Private sqlCmd As OleDb.OleDbCommand
Private sqlParam As OleDb.OleDbParameter
Private connstring As String
Public Overrides Sub AcquireConnections(ByVal Transaction As Object)
connMgr = Me.Connections.ConnectionOLE
'sqlConn = CType(connMgr.AcquireConnection(Nothing), SqlConnection)
connstring = connMgr.ConnectionString
sqlConn = CType(connMgr.AcquireConnection(Nothing), OleDb.OleDbConnection)
End Sub
Any help would be appreciated.
The OLEDB connection manager is not using managed OleDb classes but the native OleBD interfaces. AcquireConnection returns a COM object that is created from IDBCreateSession::CreateSession call.
HTH,
Ovidiu Burlacu
You are correct. The destination connection was using the Native OLE DB for Jet 4.
I created a Data Source using .NET OLE DB provider for Jet 4 and I got connected in the Script Component OK.
New Problem.
My insert OLE DB command worked fine which uses the original destination connection (Native), But my update OLE DB command using same Native connection is erroring out on each input row of the OLE DB Command component.
[UpdateRow [2727]] Error: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: "Microsoft JET Database Engine" Hresult: 0x80004005 Description: "Numeric field overflow.".
Connection string on Destination OLE DB connection is:
Data Source=C:\Projects\Data\TSLists.mdb;Provider=Microsoft.Jet.OLEDB.4.0;Persist Security Info=False;OLE DB Services=-1;
UPDATE: The table in the Access database I am trying to update is a linked table representing a SharePoint 2003 list. What is interesting is you can push (insert) data into this table, but you can not update data in the same linked table from the same OLE DB connection in SSIS.
Cast as Decimal
database. myTable contains a field called myDecimalField as well as several
others exactly like it and are of Decimal (9,1) not null type.
All the other fields select fine in SQL, but myDecimalField gives the ERROR
below when I SELECT it.
Just for a test, I CASTed myDecimalField in CODE 1 below as a VarChar type
and SQL returns it fine. So, I tried CODE 2 below and tried to CONVERT the
VarChar CAST and I get the same ERROR below.
Can someone help me with syntax in CODE 2 and convert myDecimalField into a
DECIMAL format so I can retain the fields decimals and numberic type?
**********************
CODE 1 (works):
SELECT myIdField, CAST(myDecimalField AS
VARCHAR(55)) AS myDecimalField FROM myTable
CODE 2 ( doesn't work):
SELECT myIdField, CONVERT(DECIMAL(18, 4),
CAST(myDecimalField AS VARCHAR(55))) AS myDecimalField FROM myTable
ERROR:
OLE DB error trace [OLE/DB Provider 'VFPOLEDB'
IRowset::GetData returned 0x80040e21: Data status returned from the
provider: [COLUMN_NAME=myDecimalField
STATUS=DBSTATUS_E_UNAVAILABLE]].
Msg 7341, Level 16, State 2, Line 1
Could not get the current row value of column
'[VFPOLEDB].myDecimalField' from the OLE DB provider 'VFPOLEDB'. The
provider cannot determine the value for this column.Scott Bailey (sbailey@.mileslumber.com) writes:
> myTable in the below code examples resides in a linked Visual FoxPro
> database. myTable contains a field called myDecimalField as well as
> several others exactly like it and are of Decimal (9,1) not null type.
> All the other fields select fine in SQL, but myDecimalField gives the
> ERROR below when I SELECT it.
> Just for a test, I CASTed myDecimalField in CODE 1 below as a VarChar type
> and SQL returns it fine. So, I tried CODE 2 below and tried to CONVERT the
> VarChar CAST and I get the same ERROR below.
> Can someone help me with syntax in CODE 2 and convert myDecimalField
> into a DECIMAL format so I can retain the fields decimals and numberic
> type?
The situation certainly looks spooky, but the root problem is obviously
a problem with FoxPro, or the FoxPro provider. I would guess that there
some rows where myDecimalField has some illegal value.
Assuming that myTable has an column called id, of which the lowest value
is 1, and the highest is 100, you could do
SELECT ... FROM myTable WHERE id BETWEEN 1 AND 50
If that gives the error, narrow down the interval to 1 AND 25 and so on.
Of course it's a good idea to look at the data from FoxPro as well.
If you want an SQL Server solution, you would have to bounce the data
over a temp table, so the conversion from varchar takes place in
SQL Server.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Couple of things:
1. The error occurs on all records so i know it's not bad data, plus there's
another field with same problem.
2. Could you provide some syntax example of creating a temp table with the
varchar conversion and transferring it as you suggested? I've never used a
temp table before.
"Erland Sommarskog" <esquel@.sommarskog.se> wrote in message
news:Xns97A9BCADA85Yazorman@.127.0.0.1...
> Scott Bailey (sbailey@.mileslumber.com) writes:
> The situation certainly looks spooky, but the root problem is obviously
> a problem with FoxPro, or the FoxPro provider. I would guess that there
> some rows where myDecimalField has some illegal value.
> Assuming that myTable has an column called id, of which the lowest value
> is 1, and the highest is 100, you could do
> SELECT ... FROM myTable WHERE id BETWEEN 1 AND 50
> If that gives the error, narrow down the interval to 1 AND 25 and so on.
> Of course it's a good idea to look at the data from FoxPro as well.
> If you want an SQL Server solution, you would have to bounce the data
> over a temp table, so the conversion from varchar takes place in
> SQL Server.
>
> --
> Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/pr...oads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodin...ions/books.mspx|||scott (sbailey@.mileslumber.com) writes:
> 1. The error occurs on all records so i know it's not bad data, plus
> there's another field with same problem.
Weird. But I'm not a Foxpro person, so I have no idea of what could
be going on.
> 2. Could you provide some syntax example of creating a temp table with the
> varchar conversion and transferring it as you suggested? I've never used a
> temp table before.
CREATE TABLE #spookydecimal (id int NOT NULL,
decvalue varchar(55) NULL)
-- Add other columns as needed.
INSERT #spookydecimal(id, decvalue)
SELECT myIdField, CAST(myDecimalField AS
VARCHAR(55)) AS myDecimalField
FROM myTable
SELECT id, decvalue, case(decvalue as decimal(18,4))
FROM #spoookydecimal
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||thanks.
"Erland Sommarskog" <esquel@.sommarskog.se> wrote in message
news:Xns97A9DB71F9EE0Yazorman@.127.0.0.1...
> scott (sbailey@.mileslumber.com) writes:
> Weird. But I'm not a Foxpro person, so I have no idea of what could
> be going on.
>
> CREATE TABLE #spookydecimal (id int NOT NULL,
> decvalue varchar(55) NULL)
> -- Add other columns as needed.
> INSERT #spookydecimal(id, decvalue)
> SELECT myIdField, CAST(myDecimalField AS
> VARCHAR(55)) AS myDecimalField
> FROM myTable
> SELECT id, decvalue, case(decvalue as decimal(18,4))
> FROM #spoookydecimal
> --
> Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/pr...oads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodin...ions/books.mspx