• 5 Subscribers
  • 114 Views
1

I am trying to compare certain objects serialized to a database. For any such comparison, I know the IDs of two such objects. Let's call them o1 and o2.

A table X with columns A, B, and C contains tuples that are linked to o1 and o2, respectively. Thus, each row in X is identified by its parent (o1 or o2) and an ID column unique among all rows that belong to the same parent.

The comparison of o1 and o2 is done by finding items in X that exist only for o1 or only for o2 (equality of items assumed based upon their local ID), and by finding pairs of items for which at least one of them columns A, B, or C has different values. For each difference, a row needs to be added to a Diff table.

My goal is to use a stored procedure to conduct this comparison.

For each item in X, the Diff table can receive any number of rows. Therefore, I think an INSERT ... SELECT ... statement is not the right way to go. (At least I do not see how to proceed using one.)

Instead, it seems I have to implement this comparison somehow "procedurally", by "manually" comparing column by column, after an outer join. Thus, I have started like this:

CREATE PROCEDURE Compare (@o1 INT, @o2 INT) AS
BEGIN
    DECLARE @o1x INT; @o2x INT;
    DECLARE @o1A INT, @o2A INT;
    DECLARE @o1B INT, @o2B INT;
    DECLARE @o1C INT, @o2C INT;

    DECLARE XCursor CURSOR FOR
        SELECT x1.ID,
               x1.A,
               x1.B,
               x1.C,
               x2.ID,
               x2.A,
               x2.B,
               x2.C
        FROM X x1 FULL OUTER JOIN X x2 ON x1.ID = x2.ID
        WHERE x1.PARENT = o1 AND x2.PARENT = o2;

    OPEN XCursor

    WHILE 1=1
    BEGIN
        FETCH NEXT FROM XCursor INTO @o1x,
                                     @o1A,
                                     @o1B,
                                     @o1C,
                                     @o2x,
                                     @o2A,
                                     @o2B,
                                     @o2C
        IF @@fetch_status <> 0 BREAK

        IF @o1x IS NULL
            INSERT INTO Diff (Object, DiffDescription) VALUES (@o2x, 'O1 X is missing');

        IF @o2x IS NULL
            INSERT INTO Diff (Object, DiffDescription) VALUES (@o1x, 'O2 X is missing');

        IF @o1A <> @o2A
            INSERT INTO Diff (Object, DiffDescription) VALUES (@o1x, 'A is different.');

        IF @o1B <> @o2B
            INSERT INTO Diff (Object, DiffDescription) VALUES (@o1x, 'B is different.');

        IF @o1C <> @o2C
            INSERT INTO Diff (Object, DiffDescription) VALUES (@o1x, 'C is different.');
    END

    CLOSE XCursor

    DEALLOCATE XCursor
END

(This is not guaranteed to be 100% syntactically correct already. I am trying to illustrate the general idea.)

My question is: This seems really cumbersome and hard to maintain1. Is there a more concise way of conducting this comparison, especially a typical pattern normally employed for such use cases?

Obviously I have heavily anonymized and generalized my example DB schema and code. In the actual code, there are not just three columns A, B, and C, but more like 20; there is not just one table X, but further sub-elements linked from X, and the Diff table is quite a bit more elaborate and requires more detailed data. The actual code that I have for now is accordingly more complex.

1: For instance, it is possible more columns get added to X later on. Also, it is possible the structure of Diff gets revised in the future. Furthermore, all of this might need to be translated to a different SQL dialect once other DB vendors are targeted.

EDIT: As an alternative for the full outer join, I could also use an inner join there and detect X rows that are only present for either o1 or o2 separately. Not sure if that would be better performance-wise; at least, it would remove the necessity to suppress the column-based comparisons when only one object was found, which the current source code I posted does not do yet.

EDIT2: To repeat, and clarify: The Diff table must receive a row for each difference among a pair of equivalent X rows. Not just one row for the first detected difference, but one row for each detected difference. If the two X rows have different values in columns A and C, I two rows must be created in Diff with the appropriate messages. This is the really tricky part, and the one aspect that means I cannot just filter the set of X pairs.


2 answers in total

0
CeOnSql Posted at 2017-01-12 11:17:33Z

my test data:

CREATE TABLE #X
(
    PARENT  TINYINT,
    ID TINYINT,
    A VARCHAR(10),
    B VARCHAR(10),
    C VARCHAR(10)
)

CREATE TABLE #DIFF
(
    OBJ TINYINT,
    DIFF_DESC VARCHAR(50)
)

INSERT INTO #X (PARENT,ID,A,B,C)
VALUES
(1,1,'ValueA','ValueB','ValueC'),
(2,1,'Value A','Value B','Value C')

solution: i created a temp table for each "object" and then started to compare those:

SELECT  ID, A, B, C
INTO    #o1
FROM    #X
WHERE   PARENT = 1

SELECT  ID, A, B, C
INTO    #o2
FROM    #X
WHERE   PARENT = 2

--check if one of them is not existing
INSERT INTO #DIFF (OBJ,DIFF_DESC)
SELECT  ISNULL(#o1.ID,#o2.ID), 
        CASE WHEN #o1.ID IS NULL
            THEN 'O1 X is missing'
            ELSE 'O2 X is missing'
        END
FROM    #o1 
        FULL JOIN #o2 ON #o1.ID = #o2.ID
WHERE   #o1.ID IS NULL
        OR #o2.ID IS NULL

-- Just a check, if one the two objects does not exist 
-- (@@ROWCOUNT>0), then there will be no check for A,B,C
IF @@ROWCOUNT = 0
BEGIN
    INSERT INTO #DIFF (OBJ,DIFF_DESC)
    SELECT  ISNULL(#o1.ID,#o2.ID),
            'A is different'
    FROM    #o1 
            INNER JOIN #o2 ON #o1.ID = #o2.ID 
    WHERE   #o1.A <> #o2.A
    UNION ALL
    SELECT  ISNULL(#o1.ID,#o2.ID),
            'B is different'
    FROM    #o1 
            INNER JOIN #o2 ON #o1.ID = #o2.ID
    WHERE   #o1.B <> #o2.B
    UNION ALL
    SELECT  ISNULL(#o1.ID,#o2.ID),
            'C is different'
    FROM    #o1 
            INNER JOIN #o2 ON #o1.ID = #o2.ID 
    WHERE   #o1.C <> #o2.C
END

result:

result

please be aware of this:

  • A,B,C are NOT checked if one of the two objects does not exist
  • you can add a check, if #o1 and #o2 are empty your table does not containt any data for this PARENTS
0
Hogan Posted at 2017-01-11 18:09:37Z

It is simpler if there is a value for the column that will never occur -- in the examples below I use -1 you could use '<null>' for strings. You don't say which are ids and which are strings

SELECT
  x1.ID as ID,
  x1.A as x1_A, x1.B as x1_B, x1.C as x1_C,
  x2.A as x2_A, x2.B as x2_B, x2.C as x2_C,
  CASE
    -- I don't bother to check for the NULLS as a specific case but you could add in those 
    -- before these tests if you wanted to show them.
    WHEN ISNULL(x1.A, -1) <> ISNULL(x2.A, -1) THEN 'A column does not match'
    WHEN ISNULL(x1.B, -1) <> ISNULL(x2.B, -1) THEN 'B column does not match'
    WHEN ISNULL(x1.C, -1) <> ISNULL(x2.C, -1) THEN 'C column does not match'
  END as REASON
FROM X x1 
LEFT JOIN X x2 ON x1.ID = x2.ID
WHERE 
      x1.PARENT = o1 
  AND x2.PARENT = o2
  AND (
       ISNULL(x1.A, -1) <> ISNULL(x2.A, -1)
    OR ISNULL(x1.B, -1) <> ISNULL(x2.B, -1)
    OR ISNULL(x1.C, -1) <> ISNULL(x2.C, -1)
  )

Answer this questsion