Wednesday, March 7, 2012

Challenge: Can you optimize this?

This code is attempting to find records that have a RegJrnID that does
not occur more than one time in the table.

The reason that I want to find records with non-duplicated RegJrnID
values is to create "reversal" records for these such that the reversal
record has identical values for every column except the TaxableAmount
which will contain a negative amount. (see: example data below).

/* Set up */

CREATE TABLE t1(RegJrnID INTEGER, InvoiceDate VARCHAR(8), InvoiceNumber
VARCHAR(20), TaxableAmount DECIMAL(32,8))

/* Example data */

INSERT INTO t1 VALUES (1, '20060101', '2321323', 100.00)
INSERT INTO t1 VALUES (9, '20060213', '2130009', 40.01)
INSERT INTO t1 VALUES (3, '20060101', '9402293', 512.44)
INSERT INTO t1 VALUES (1, '20060104', '2321323', -100.00)
INSERT INTO t1 VALUES (4, '20060105', '9302221', 612.12)
INSERT INTO t1 VALUES (5, '20060105', '0003235', 18.11)
INSERT INTO t1 VALUES (6, '20060111', '5953432', 2101.21)
INSERT INTO t1 VALUES (3, '20060111', '9402293', -512.44)
INSERT INTO t1 VALUES (7, '20060115', '4234444', 44.52)
INSERT INTO t1 VALUES (8, '20060115', '0342222', 95.21)
INSERT INTO t1 VALUES (6, '20060119', '5953432', -2101.21)
INSERT INTO t1 VALUES (2, '20060101', '5440033', 231.01)

/* Show what's in the table - just because */

SELECT * FROM t1 ORDER BY RegJrnID, InvoiceDate

/* Query for records to reverse */

SELECT *
FROM t1 a

/* Ignore records that have already been reversed */

WHERE a.RegJrnID != ALL

/* This subselect finds reversed records (i.e. those that have a
duplicate RegJrnID) */

(
SELECT b.RegJrnID
FROM t1 b
GROUP BY b.RegJrnID
HAVING COUNT(*) > 1
)

/* User selection criteria are appended here */

/* AND InvoiceNumber >= '5000000' AND InvoiceNumber <= '7500000' */

/* Make the results look pretty (optional) */

ORDER BY RegJrnID

/* Housekeeping */

DROP TABLE t1There are many ways to accomplish that. I would start with something
this (untested):

select pos.RegJrnID
from(
select * from t1 where TaxableAmount >0
) pos
left outer join
from(
select * from t1 where TaxableAmount <0
) neg
on pos.RegJrnID = neg.RegJrnID
where neg.RegJrnID is null|||Here's the tested (and slightly modified) version of your code...

SELECT pos.*

FROM
(
SELECT * FROM t1 WHERE TaxableAmount > 0
) pos
LEFT OUTER JOIN
(
SELECT * FROM t1 WHERE TaxableAmount < 0
) neg
ON pos.RegJrnID = neg.RegJrnID

WHERE neg.RegJrnID IS NULL

/* Make the results look pretty (optional) */

ORDER BY pos.RegJrnID|||According to the SQL Query analyzer your query is better going head to
head with the original representing 43.43% of the batch and the
original representing 56.57% of the batch.

A 13% improvement!

Thanks!|||On 6 Jun 2006 13:24:54 -0700, octangle wrote:

>This code is attempting to find records that have a RegJrnID that does
>not occur more than one time in the table.
>The reason that I want to find records with non-duplicated RegJrnID
>values is to create "reversal" records for these such that the reversal
>record has identical values for every column except the TaxableAmount
>which will contain a negative amount. (see: example data below).

Hi octangle,

Thanks for providing CREATE TABLE and INSERT statements. This made it
very easy to set up a test DB and fun to find an answer.

What worries me is that there's no primary key in your table. I hope
that you just forgot to include it in the script and that your real
table does have a key!

Here's a much quicker way. Running both your version and my version with
execution plan displayed, yours took 72% and mine 28%. Removing the
ORDER BY changed this to 64% / 36%. Still a nice gain.

SELECT RegJrnID, MAX(InvoiceDate),
MAX(InvoiceNumber), MAX(TaxableAmount)
FROM t1
GROUP BY RegJrnID
HAVING COUNT(*) = 1
--ORDER BY RegJrnID

And here's another one, but it's correctness depends on some assumptions
I had to make because you forgot to include the primary key. With ORDER
BY, it's slightly more expensive than the previous version. With the
ORDER BY commented out, it only costs half as much!

SELECT RegJrnID, InvoiceDate,
InvoiceNumber, TaxableAmount
FROM t1 AS a
WHERE NOT EXISTS
(SELECT *
FROM t1 AS b
WHERE a.RegJrnID = b.RegJrnID
AND a.InvoiceDate <> b.InvoiceDate)
--ORDER BY RegJrnID

(Note - I have compared these queries using the sample data you provided
on a SQL Server 2005 database on my computer. Results will probably vary
on yoour database, especially if your table has indexes, your data
distribution is not like the sample data, and/or you are running another
version of SQL Server. I recommend that you test out the various
suggestions yourself before deciding.)

--
Hugo Kornelis, SQL Server MVP|||Upon further review...

As written in the previous post, records with a TaxableAmount of 0 will
not be found to be reversed... so in an attempt to remedy this I
modified the LEFT OUTER JOIN as follows:

FROM
(
SELECT * FROM t1 WHERE TaxableAmount >= 0
) pos
LEFT OUTER JOIN
(
SELECT * FROM t1 WHERE TaxableAmount < 0
) neg
ON pos.RegJrnID = neg.RegJrnID

This succeeds at finding the 0 TaxableAmount records... but once a
companion reversal record is inserted into the database both records
(the original and a the reversal) are found on subsequent queries using
this technique... since these records are retrieved by the query as
records to reverse, these get reversed again (thus making a total of 4
instances of the original record - instead of 2 which is all are
needed). And if we repeat the process a 0 TaxableAmount record will
redouble it instances every time the process is run...

Now the questions are...

Can the above LEFT OUTER JOIN be fixed?

OR

Should 0 TaxableAmounts be processed in their own pass with their own
query (yuck)?

OR

Is the original query really better because it works for all
TaxableAmounts despite the fact that its 13% slower...

OR

Is there another option??|||one more question: what if there is only one row with negative amount?

INSERT INTO t1 VALUES (11, '20060101', '2321323', -100.00)

and there is no corresponding row with positive amount? Nothing in the
posted DDL prevents you from that. In fact, originally I was
considering the query posted by Hugo, but realized it would return that
single row with negative amount and assumed it incorrect. It looks like
there might be no base for my assumption.|||Alexander Kuznetsov (AK_TIREDOFSPAM@.hotmail.COM) writes:
> one more question: what if there is only one row with negative amount?
> INSERT INTO t1 VALUES (11, '20060101', '2321323', -100.00)
> and there is no corresponding row with positive amount? Nothing in the
> posted DDL prevents you from that. In fact, originally I was
> considering the query posted by Hugo, but realized it would return that
> single row with negative amount and assumed it incorrect. It looks like
> there might be no base for my assumption.

Having watched the thread from aside, I think the real problem is that
the data model needs improvement. I would add a bit column "isbalanceentry"
or some such. And of course add a primary key. (InvoiceNumber,
isbalanceentry) looks like a candidate.

Better get the data model in order, before looking at smart queries.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||On 6 Jun 2006 17:45:58 -0700, Alexander Kuznetsov wrote:

>one more question: what if there is only one row with negative amount?
>INSERT INTO t1 VALUES (11, '20060101', '2321323', -100.00)
>and there is no corresponding row with positive amount? Nothing in the
>posted DDL prevents you from that. In fact, originally I was
>considering the query posted by Hugo, but realized it would return that
>single row with negative amount and assumed it incorrect. It looks like
>there might be no base for my assumption.

Hi Alexander,

I've seenn nothing in the original post that justifies special treatment
of negative amounts. If these should be excluded, then my version can
still be used - just add AND MAX(TaxableAmount) >= 0 to the HAVING
clause.

However, I agree with Erland that a redesign might be a better choice if
something like that is the case.

--
Hugo Kornelis, SQL Server MVP|||Erland Sommarskog wrote:
> Having watched the thread from aside, I think the real problem is that
> the data model needs improvement. I would add a bit column "isbalanceentry"
> or some such. And of course add a primary key. (InvoiceNumber,
> isbalanceentry) looks like a candidate.

I concur that the schema might need some work. I was thinking that a
single nullable column negated_date might be sufficient, so that
instead of inserting one more row one just needs to update negated_date.|||Below is an aggregate script that includes everyone's suggested queries
so far...

Based upon feedback I have beefed up the test records to more
accurately reflect all of the potential scenarios that need to be
handled by this quey.

The original query (Query attempt #1 (Octangle)) generates the desired
result set and therefore is the benchmark of correctness for my
purposes.

MWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMW

/* Set up */

CREATE TABLE t1(RegJrnID INTEGER, InvoiceDate VARCHAR(8), InvoiceNumber
VARCHAR(20), TaxableAmount DECIMAL(32,8))

INSERT INTO t1 VALUES (0, '20060120', '0000033', 0.00)
INSERT INTO t1 VALUES (1, '20060101', '2321323', 100.00)
INSERT INTO t1 VALUES (9, '20060213', '2130009', 40.01)
INSERT INTO t1 VALUES (11, '20060324', '3321110', -1200.16)
INSERT INTO t1 VALUES (3, '20060101', '9402293', 512.44)
INSERT INTO t1 VALUES (1, '20060104', '2321323', -100.00)
INSERT INTO t1 VALUES (13, '20051127', '1034501', -77.50)
INSERT INTO t1 VALUES (4, '20060105', '9302221', 612.12)
INSERT INTO t1 VALUES (5, '20060105', '0003235', 18.11)
INSERT INTO t1 VALUES (10, '20060421', '0000033', 0.00)
INSERT INTO t1 VALUES (6, '20060111', '5953432', 2101.21)
INSERT INTO t1 VALUES (3, '20060111', '9402293', -512.44)
INSERT INTO t1 VALUES (12, '20060606', '0000001', 4431.55)
INSERT INTO t1 VALUES (7, '20060115', '4234444', 44.52)
INSERT INTO t1 VALUES (8, '20060115', '0342222', 95.21)
INSERT INTO t1 VALUES (6, '20060119', '5953432', -2101.21)
INSERT INTO t1 VALUES (2, '20060101', '5440033', 231.01)
INSERT INTO t1 VALUES (10, '20060517', '0000033', 0.00)
INSERT INTO t1 VALUES (11, '20060324', '3321110', 1200.16)
INSERT INTO t1 VALUES (12, '20060606', '0000001', -4431.55)

/* Show what's in the table */

SELECT * FROM t1 ORDER BY RegJrnID, InvoiceDate

/* Query for records to reverse */

/* Query attempt #1 (Octangle) */

/* Pros: correct */
/* Cons: slow */

SELECT *
FROM t1 a

/* Ignore records that have already been reversed */

WHERE a.RegJrnID != ALL

/* This subselect finds reversed records (i.e. those that have a
duplicate RegJrnID) */

(
SELECT b.RegJrnID
FROM t1 b
GROUP BY b.RegJrnID
HAVING COUNT(*) > 1
)

/* User selection criteria are appended here */

/* AND InvoiceNumber >= '5000000' AND InvoiceNumber <= '7500000' */

/*ORDER BY RegJrnID; * Make the results look pretty (optional) */

/* Query attempt #2 (Alexander) */

/* Pros: faster */
/* Cons: misses 0 TaxableAmounts */

SELECT pos.*
FROM
(
SELECT * FROM t1 WHERE TaxableAmount > 0
) pos
LEFT OUTER JOIN
(
SELECT * FROM t1 WHERE TaxableAmount < 0
) neg
ON pos.RegJrnID = neg.RegJrnID
WHERE neg.RegJrnID IS NULL
/*ORDER BY pos.RegJrnID * Make the results look pretty (optional) */

/* Query attempt #3 (Alexander - tweaked by Octangle) */

/* Pros: faster */
/* Cons: finds too many 0 TaxableAmounts */

SELECT pos.*
FROM
(
SELECT * FROM t1 WHERE TaxableAmount >= 0
) pos
LEFT OUTER JOIN
(
SELECT * FROM t1 WHERE TaxableAmount < 0
) neg
ON pos.RegJrnID = neg.RegJrnID
WHERE neg.RegJrnID IS NULL
/*ORDER BY pos.RegJrnID * Make the results look pretty (optional) */

/* Query attempt #4 (Hugo) */

/* Pros: correct , fastest, returns results in RegJrnID order with
ORDER BY clause */

SELECT RegJrnID, MAX(InvoiceDate) as "InvoiceDate",
MAX(InvoiceNumber) as "InvoiceNumber", MAX(TaxableAmount) as
"TaxableAmount"
FROM t1
GROUP BY RegJrnID
HAVING COUNT(*) = 1

/* Query attempt #5 (Hugo) */

/* Pros: fast */
/* Cons: not correct */

SELECT RegJrnID, InvoiceDate, InvoiceNumber, TaxableAmount
FROM t1 AS a
WHERE NOT EXISTS
(
SELECT *
FROM t1 AS b
WHERE a.RegJrnID = b.RegJrnID
AND a.InvoiceDate <> b.InvoiceDate
)
/*ORDER BY RegJrnID * Make the results look pretty (optional) */

/* Housekeeping */

DROP TABLE t1

MWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMW

Queries as percent of batch (when just executing the queries in the
above script)

Query #1: 22.66% - Correct
Query #2: 19.77%
Query #3: 20.27%
Query #4: 12.63% - Correct
Query #5: 20.67%

Queries as percent when compared to only the original query (Query #1)

Query #1: 50.00% - Correct
Query #2: 42.58%
Query #3: 43.19%
Query #4: 32.14% - Correct
Query #5: 43.67%

At this point it looks like the clear winner is Query #4 by Hugo!

MWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMW

To address some of the observations/comments:

1. Negative transactions are possible - I augmented the test data to
include this case.
2. This is for a commercial product that has numerous existing
customers, I inherited the data model that this table is based upon...
my coding constraits are:
- I cannot add any columns (due to how we version a column change would
force this release to be considered a major release and not a minor
release as desired)
- I should not add any indexes/primary keys/uniqueness constriants for
performance reasons (see below)

The purpose of this table to store processed transaction results. It
needs to be as efficient as possible for insertions, so as to not slow
down the transaction processing engine. Reporting (and reversing groups
of transactions) are secondary concerns and it is acceptable for these
functions to be slower.

I sincerely want to thank everyone who chipped in a comment or
suggestion on this...|||1. If negative transactions are possible, than my query is incorrect.
2. You probably need a much larger set of test data to test different
approaches against. Also I would say there are at lest 2 possible
situations:
- most transactions are already negated.
- most transactions have not been negated yet.
In some cases in different situations different queries are the best. I
would try both and see if one and the same query is the best.

Good luck!|||Alexander Kuznetsov wrote:
> 1. If negative transactions are possible, than my query is incorrect.
> 2. You probably need a much larger set of test data to test different
> approaches against. Also I would say there are at lest 2 possible
> situations:
> - most transactions are already negated.
> - most transactions have not been negated yet.
> In some cases in different situations different queries are the best. I
> would try both and see if one and the same query is the best.
> Good luck!

FYI

The normal situation would be that most transactions are not negated in
this table. Negation would only occur if the billing system was
un-doing a billing run for some technical or business reason...
Therefore negating transactions would be rare, theoretically...

I appreciate the notion of getting better test data - this will
naturally accrue as I implement and test this soultion. I will make
sure to compare performance as this project matures...

Thanks again...!|||/* Query attempt #4 (Hugo) */

SELECT RegJrnID, MAX(InvoiceDate) as "InvoiceDate", MAX(InvoiceNumber)
as "InvoiceNumber", MAX(TaxableAmount) as "TaxableAmount"
FROM t1
GROUP BY RegJrnID
HAVING COUNT(*) = 1

I talked to a few folks around the office and none of us had ever
though to use MAX() to force values out of a query using a GROUP BY
clause...

e.g. if the query were changed to look like this:

SELECT *
FROM t1
GROUP BY RegJrnID
HAVING COUNT(*) = 1

The following error occurs for each column not mentioned in the GROUP
BY clause: "Column 't1.InvoiceDate' is invalid in the select list
because it is not contained in either an aggregate function or the
GROUP BY clause." So MAX() forces these values to participate in the
result set generated by this query...

My question with this is, "Is this technique safe for all major DBs
(Oracle, SQL Server, DB2 and MySQL) and will it work with all column
types?"|||octangle (idea.vortex@.gmail.com) writes:
> /* Query attempt #4 (Hugo) */
> SELECT RegJrnID, MAX(InvoiceDate) as "InvoiceDate", MAX(InvoiceNumber)
> as "InvoiceNumber", MAX(TaxableAmount) as "TaxableAmount"
> FROM t1
> GROUP BY RegJrnID
> HAVING COUNT(*) = 1
> I talked to a few folks around the office and none of us had ever
> though to use MAX() to force values out of a query using a GROUP BY
> clause...
>...
> My question with this is, "Is this technique safe for all major DBs
> (Oracle, SQL Server, DB2 and MySQL) and will it work with all column
> types?"

Yes, it should on any RDBMS worth the name, as it is very plain standard
SQL.

Then again, MySQL has so many funny quirks, I suspect that one should
never take anything for granted with that engine.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||octangle (idea.vortex@.gmail.com) writes:
> - I should not add any indexes/primary keys/uniqueness constriants for
> performance reasons (see below)
> The purpose of this table to store processed transaction results. It
> needs to be as efficient as possible for insertions, so as to not slow
> down the transaction processing engine. Reporting (and reversing groups
> of transactions) are secondary concerns and it is acceptable for these
> functions to be slower.

There are good changes that a well-considered clustered index can improve
the performance. Not the least, because you can handle fragmentation better.

In any case, having a table without a primary key in order to save some
cycles on insertion is about criminal in my opinion. What to you when
the same data gets inserted twice? (Don't tell me that it never happens!).
And why InvoiceDate as varchar(8)? That's 10 bytes per date, instead of
8 with datetime or 4 with smalldatetime. Here's is a second risk for
errors. Wonder how many entries for 20060230 you have...

As for the actual challenge, I prefer to stay out. I don't really want
to contribute to something which is obviously flawed.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Erland Sommarskog wrote:
> There are good changes that a well-considered clustered index can improve
> the performance. Not the least, because you can handle fragmentation better.
> In any case, having a table without a primary key in order to save some
> cycles on insertion is about criminal in my opinion. What to you when
> the same data gets inserted twice? (Don't tell me that it never happens!).
> And why InvoiceDate as varchar(8)? That's 10 bytes per date, instead of
> 8 with datetime or 4 with smalldatetime. Here's is a second risk for
> errors. Wonder how many entries for 20060230 you have...
> As for the actual challenge, I prefer to stay out. I don't really want
> to contribute to something which is obviously flawed.

Thanks for the tip on clustered index usage.

Criminal? A little severe - I'd call it a trade-off - a trade-off made
based upon the project requirements. Our process is a small part in a
larger customer billing cycle and everything we can do to be as small a
percentage of the overall work effort in the processing of every bill
is critical...

As it turns out the same RegJrnID must not be inserted twice. We have
programmatic control over this and if it were to occur there would be a
bug in the software or a serious proceedural issue on the part of the
user. In either case these represent bigger problems than the structure
of this table.

OK, OK - putting a date in a varchar is kludgy... again I simply
inherited this design... I think the big issue here was compatibility
with other RDBMS... We need to support DB2, Oracle, SQL Server and
MySQL with the same code base... So they all support character data
roughly the same... chalk it up as a rookie mistake... but oddly your
example that the table will allow Feb 30th could be seen as a feature
(i.e. we are fault tolerant of bad dates) - ultimately the generation
of the invoice date is controlled be the external application calling
our API - so allowing a bad date to float through to the DB isn't a big
deal to us...

Flawed? Again trade-offs have been made by SQL beginners... Funny thing
is that it seems to work very well for our customers and they seem
happy, so I'd say that flaws are in the eye of the beholder...

Thanks for holding us to a higher standard...

:-)|||octangle (idea.vortex@.gmail.com) writes:
> Criminal? A little severe - I'd call it a trade-off - a trade-off made
> based upon the project requirements.

I would not call it trade-off, only off.

Then again, I'm the coward kind of guy that always wear a safety belt
when I'm driving(*) and all that.

(*) OK, so it happened once I didn't put it on. That was when I was
drive up for my driving license!

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||On 7 Jun 2006 10:54:44 -0700, octangle wrote:

>/* Query attempt #4 (Hugo) */
>/* Pros: correct , fastest, returns results in RegJrnID order with
>ORDER BY clause */
>SELECT RegJrnID, MAX(InvoiceDate) as "InvoiceDate",
>MAX(InvoiceNumber) as "InvoiceNumber", MAX(TaxableAmount) as
>"TaxableAmount"
>FROM t1
>GROUP BY RegJrnID
>HAVING COUNT(*) = 1

Hi octangle,

I don't know what the remark about returning results in RegJrnID order
with ORDER BY means - all queries will return results in that order if
yoou include an ORDER BY. And without the ORDER BY, some of the queries
might return the results in that order some of the time, maybe even
every time during testing, but there's no guarantee that it will remain
so in production. In shhort - if you need a specific order, use an ORDER
BY - always!

>/* Query attempt #5 (Hugo) */
>/* Pros: fast */
>/* Cons: not correct */

Only because the data you originally provided was not enough to show
what column oor combination of columns makes a row unique. And your
reply still doesn't show it, so my next attempt might well be wrong
again :-((

SELECT RegJrnID, InvoiceDate, InvoiceNumber, TaxableAmount
FROM t1 AS a
WHERE NOT EXISTS
(
SELECT *
FROM t1 AS b
WHERE a.RegJrnID = b.RegJrnID
AND ( a.TaxableAmount <> b.TaxableAmount
OR a.InvoiceDate <> b.InvoiceDate )
)

Without the ORDER BY, this is significantly faster than attempt #4 when
tested with your test data. But with a much larger set of test data,
attempt #4 is faster (though this might be different with your data, as
it might be distributed differently).

>2. This is for a commercial product that has numerous existing
>customers, I inherited the data model that this table is based upon...
>my coding constraits are:
>- I cannot add any columns (due to how we version a column change would
>force this release to be considered a major release and not a minor
>release as desired)
>- I should not add any indexes/primary keys/uniqueness constriants for
>performance reasons (see below)

I concur with everything Erlland says about this. And I'd still like to
know which (combination of) column(s) you can use to uniquely identify a
single row.

--
Hugo Kornelis, SQL Server MVP|||Hi There,
If the following is your requirement.
"
This code is attempting to find records that have a RegJrnID that does
not occur more than one time in the table.
"
Then

Select * from Yourtable where ID in (Select ID from group by ID having
count(1)=1)
-- Give some order by etc as per your requirement
Output:--
9, '20060213', '2130009', 40.01
4, '20060105', '9302221', 612.12
5, '20060105', '0003235', 18.11
7, '20060115', '4234444', 44.52
8, '20060115', '0342222', 95.21
2, '20060101', '5440033', 231.01

With Warm regards
Jatinder Singh
http://jatindersingh.blogspot.com

octangle wrote:
> This code is attempting to find records that have a RegJrnID that does
> not occur more than one time in the table.
> The reason that I want to find records with non-duplicated RegJrnID
> values is to create "reversal" records for these such that the reversal
> record has identical values for every column except the TaxableAmount
> which will contain a negative amount. (see: example data below).
> /* Set up */
> CREATE TABLE t1(RegJrnID INTEGER, InvoiceDate VARCHAR(8), InvoiceNumber
> VARCHAR(20), TaxableAmount DECIMAL(32,8))
> /* Example data */
> INSERT INTO t1 VALUES (1, '20060101', '2321323', 100.00)
> INSERT INTO t1 VALUES (9, '20060213', '2130009', 40.01)
> INSERT INTO t1 VALUES (3, '20060101', '9402293', 512.44)
> INSERT INTO t1 VALUES (1, '20060104', '2321323', -100.00)
> INSERT INTO t1 VALUES (4, '20060105', '9302221', 612.12)
> INSERT INTO t1 VALUES (5, '20060105', '0003235', 18.11)
> INSERT INTO t1 VALUES (6, '20060111', '5953432', 2101.21)
> INSERT INTO t1 VALUES (3, '20060111', '9402293', -512.44)
> INSERT INTO t1 VALUES (7, '20060115', '4234444', 44.52)
> INSERT INTO t1 VALUES (8, '20060115', '0342222', 95.21)
> INSERT INTO t1 VALUES (6, '20060119', '5953432', -2101.21)
> INSERT INTO t1 VALUES (2, '20060101', '5440033', 231.01)
> /* Show what's in the table - just because */
> SELECT * FROM t1 ORDER BY RegJrnID, InvoiceDate
> /* Query for records to reverse */
> SELECT *
> FROM t1 a
> /* Ignore records that have already been reversed */
> WHERE a.RegJrnID != ALL
> /* This subselect finds reversed records (i.e. those that have a
> duplicate RegJrnID) */
> (
> SELECT b.RegJrnID
> FROM t1 b
> GROUP BY b.RegJrnID
> HAVING COUNT(*) > 1
> )
> /* User selection criteria are appended here */
> /* AND InvoiceNumber >= '5000000' AND InvoiceNumber <= '7500000' */
> /* Make the results look pretty (optional) */
> ORDER BY RegJrnID
> /* Housekeeping */
> DROP TABLE t1

No comments:

Post a Comment