JESS: Bad size in TokenTree causes unbounded memory growth

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

JESS: Bad size in TokenTree causes unbounded memory growth

Steven Goncalo
I have been fighting with what appeared to be a memory leak in ny Jess
application.
When I ran java with the heap profiler turned on (
http://www.javaworld.com/javaworld/jw-12-2001/jw-1207-hprof.html )
It pointed me towards some very large arrays allocated in TokenTree's rehash
method.
This had also frequently been the site of the OutOfMemoryExceptions I was
getting.
When I peppered the code up with some log4j debug printouts I saw that I was
frequently hitting the threshold for resizing the hash array.
At first I was worried that rehash somehow might have gotten called
recursively (it didn't) so I instrumented before and after the rehash.
What I saw was that, at entry, the TokenTree.size variable claimed to have a
lot more Tokens in the tree than the sum of the individual TokenVectors.
To my understanding, if the size is different before and after the hash,
something has gone wrong.
Based on some of the email traffic I had been reading, I thought at first I
must be doing something evil with the hashCodes returned by some of my
javaBean shadowfacts.
The add/remove code in TokenTree looked correct and did not give me any
clues for something to look for in my code, but I noticed it was possible to
access the vectors directly from outside of the class.
I only found the direct access of the vector used in your Junit code. When I
added to your simple add/remove test case, I was able to see the TokenTree
size drift off with none of my application ode involved.
I don't know exactly where the problem is, but something seems off in Jess's
internal use of this hash tree.

I am currently working with 70b1, and saw the problem in 70a6 also, which is
the oldest non-demo copy I had.
I really did not think I had any memory problem with earlier 7x releases,
but I was still running a 30 day demo then and can no longer verify this.

    //portion of Node2Test.java modified
    public void testLeakIfChangeBehindTheScenes() throws JessException {
        Rete engine = new Rete();
        engine.getCompiler().setHashKey(1);
        engine.executeCommand("(defrule rule" +
                "(A ?X)" +
                "(B ?Y)" +
                "(test (eq (?Y foo) ?X))" +
                "(C)" +
                "=>)");
        Object x = new Object();
        Y y = new Y();
        y.m_data = x;
        engine.store("X", x);
        engine.store("Y", y);
        engine.executeCommand("(assert (A (fetch X)))");
        engine.executeCommand("(assert (B (fetch Y)))");

        HasLHS rule = engine.findDefrule("rule");
        Node2 node = lastJoinNode(rule);
        TokenTree tree = node.m_left;

        assertEquals(1,tree.size);                      //<-new test
(passes)
        assertEquals(1, tree.getHash());
        TokenVector vector = tree.getTokenVector(0);
        assertNotNull(vector);
        assertEquals(1, vector.size());

        // Now invalidate the match
        y.m_data = new Object();
        engine.executeCommand("(retract 1)");

        // The contents of the left memory should be gone now,
        // even though the test wouldn't pass.
        assertEquals(0, vector.size());
        assertEquals(0,tree.size);                      //<-new test (fails)
        //I think something is updating the vectors w/o updating the tree
size to match
    }


Suggestions:

1) To keep my code going, I implemented a max size to the tree, above which
it no longer performs a rehash. If you let the user set a max hash table
size like they currently do the initial size, it would give the application
come control over this potential issue.

2) If the cause of the size missmatch can not be isolated quickly, it might
be worth temporarily adding some sort of recount function before the actual
resizing of the array. I know this would  just be a band-aid, but my
application is supposed to be continuously running hardware diagnostics in
the background of normal system activity, and running out of memory after a
few hours is a real show-stopper.

3) Based upon what I have seen of the internals of TokenTree, my first guess
is that the code there is correct and the problem may be in some other class
that breaks the encapsulation TokenTree nominally provides. You might want
to close down unrestricted access to the TokenTree internals and add a few
more read-only accessors for the Junit code to function. Given the name of
the test involved was "testLeakIfChangeBehindTheScenes", I suspect you
already share this concern. TokenTree.size is accessable outside of the
class, and I certainly wouldn't want to search for improper use of a
variable called "size" in 18K lines of code.

4) A minor point: rather than test for (size / m_hash)>THREASHOLD after
every addition, you could precompute limit=THREASHOLD*m_hash, and then test
size>limit to avoid the cost of a divide every time through the add method.
This looks like code that gets a lot of traffic, so even a minor tweak will
have som eimpact.


--------------------------------------------------------------------
To unsubscribe, send the words 'unsubscribe jess-users [hidden email]'
in the BODY of a message to [hidden email], NOT to the list
(use your own address!) List problems? Notify [hidden email].
--------------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: JESS: Bad size in TokenTree causes unbounded memory growth

friedman_hill ernest j
I think Steven Goncalo wrote:
> I have been fighting with what appeared to be a memory leak in ny Jess
> application.

I'm glad you did -- you found a bug in some relatively new code. The
rehash-able node memories were just introduced in 7.0a5 . Before that,
they used a fixed size "backbone" with buckets that could only
grow. That architecture made sense when it was first implemented, but
JVM memory managers have improved a lot since then!

> I only found the direct access of the vector used in your Junit code. When I
> added to your simple add/remove test case, I was able to see the TokenTree
> size drift off with none of my application ode involved.
> I don't know exactly where the problem is, but something seems off in Jess's
> internal use of this hash tree.
>
...
>         assertEquals(0, vector.size());
>         assertEquals(0,tree.size);                      //<-new test (fails)


Thanks very much for this test case -- given this hint, it took me
only a moment to see the error in the code. Do you see anything funny
about that member variable "size"? That's right -- it doesn't have the
m_ "wart" on it that I typically use for members. And guess what?
Without that wart, the predictable happened: a local variable "size"
shadowed this member and prevented the tree size from being
decremented. Look at TokenTree.remove() at line 124 -- you see the
local "size" being decremented, but it should be "m_size." So the
patch: rename that member "m_size". The only place it's used outside
of TokenTree is in the test you added. Make sure you change line 124
to m_size as well!

>
> 3) Based upon what I have seen of the internals of TokenTree, my first guess
> is that the code there is correct and the problem may be in some other class
> that breaks the encapsulation TokenTree nominally provides. You might want
> to close down unrestricted access to the TokenTree internals and add a few
> more read-only accessors for the Junit code to function. Given the name of
> the test involved was "testLeakIfChangeBehindTheScenes", I suspect you
> already share this concern. TokenTree.size is accessable outside of the
> class, and I certainly wouldn't want to search for improper use of a
> variable called "size" in 18K lines of code.

The variables aren't private because that region of the code is on the
downhill side of a large refactoring. The TokenTree class used to be
much more intertwined with Node2; now it's pretty much completely
independent. With a little bit more teasing and the introduction of
some interfaces, it's going to be possible to use pluggable
independent implementations of TokenTree, which is the goal. This will
essentially allow Jess to directly use external entities as part of
working memory -- i.e., database tables and the like. It's going to be
very nice!

>
> 4) A minor point: rather than test for (size / m_hash)>THREASHOLD after
> every addition, you could precompute limit=THREASHOLD*m_hash, and then test
> size>limit to avoid the cost of a divide every time through the add method.
> This looks like code that gets a lot of traffic, so even a minor tweak will
> have som eimpact.

That's an excellent point. Thanks -- and thanks for the bug report!




---------------------------------------------------------
Ernest Friedman-Hill  
Advanced Software Research          Phone: (925) 294-2154
Sandia National Labs                FAX:   (925) 294-2234
PO Box 969, MS 9012                 [hidden email]
Livermore, CA 94550         http://herzberg.ca.sandia.gov

--------------------------------------------------------------------
To unsubscribe, send the words 'unsubscribe jess-users [hidden email]'
in the BODY of a message to [hidden email], NOT to the list
(use your own address!) List problems? Notify [hidden email].
--------------------------------------------------------------------