We know this to be an issue if the database and the Metastorm server are not at identical times. There seems to be a problem because one is used to set the time, and the other to execute it, so one thinks it has not yet passed and sets the timer again.
More details <a data-ignore="a/b" target="_blank" href="http://metastorm.processmapping.com.au/post?id=4071467">>here<</a>
Yeah Jerome, I was aware of that issue, but this situation seemed to me a little bit different, first of all because times on oracle db server and on 2 engines are identical and also because engines write always 3 records on eevents table…. why 3 records?
I would suspect the multi-engine environment then. Try with one engine turned off and see what happens.
I have seen this exact same issue on various VM's where there was only 1 Engine and 1 Database, all on the same machine. In fact I just saw this issue last week when using v9 SR3 HF2. I was using a very short timeframe (.1 minutes after entry time)…adjusting it to .3 minutes seemed to make the issue dis-appear. Not sure why and didn't have time to investigate it further at the time.
Jerome, any ideas?
I've not seen it myself on single machine environments. I'll set up a test and keep it running in a few places and see what we get.
It's Funny!! At the beginning I agreed with Jerome:
1) dev environment 1 Engine (VM) + 1 Oracle Db (Unix) v9 SR3 HF3 : 1 row in eEvent table
2) test environment 1 Engine (VM) + 1 Oracle Db (Unix) v9 SR3 HF3 : 1 row in eEvent table
3) pre live environment 2 Engines (VM) + 1 Oracle Db (Unix) v9 SR3 HF2 : 3 rows in eEvent table
4) Live environment 2 Engines (phisical) + 1 Oracle Db (Unix) v9 SR3 HF1 : 3 rows in eEvent table
As suggested I've tried turning off one engine on pre live environment and result is 1 row in eEvent
1) test environment 1 Engine (VM) + 1 Oracle Db (Unix) v9 SR3 HF3 : 3 rows in eEvent table
This issue happens not only with short but also with long (days after entry time) timeframe….
It's a rounding error.
My test (attached) gives you varying results. Oddly enough, if you look in SQL Server, where the times are shown to the 1000th of a second, the LAST time the loopback timer occurs is x.998 of a second, eg:
2011-04-29 12:54:38.990, or 2011-04-29 12:53:44.980
This is when the problem happens. When it does not, the fraction of a second could be anything. For example I have one where the two identical actions (the second is spurious) are at: 2011-04-29 12:56:38.733 and 2011-04-29 12:56:38.873
In all cases where more than one action occurs, however, the seconds are identical, just not the fraction.
Timed loop-backs should not repeat if the time is in the past, in order to prevent infinite loop-backs. I suspect the engine is looking at a time of 01:01:15.33 and checking if it is later than 01:01:15 (ignoring the fractions of a second), and coming back with the result "yes, it is in the future' and setting the timed action again.
If the engine is under any kind of load, the event manager will not look at it for a few milliseconds, and the time is passed, so the fault is both random (depending on the current fraction of a second and dependant on load (how many system actions are queued up).
Wow, I would have never figured that out! Sooooo…given these findings, would you consider this to be a bug then? If so has it been reported?
'Tis one of my areas of exertise, figuring out these bugs in the product. Has been for 14 years now. Metastorm still do not give us a way to report issues however (we have repeatedly asked), so no, it has not been reported.
I don't understand. You can submit support requests simply by emailing firstname.lastname@example.org, or using the online form, right?
<a data-ignore="a/b" target="_blank" href="http://ework.metastorm.com/web/support/online_support.asp">http://ework.metastorm.com/web/support/online_support.asp</a>
If we had an account to do so, yes. So far, none has been providfed, despite several requests through the formal channels.
Ok! I've just submitted a support request.
In your opinion, what about switching the event manager by setting a pool interval in the registry key? It could be useful?
I know this is old, and I raised it on the other forums as well, but with the 184.108.40.206 upgrade, I have timed loop back actions running hundreds of times. That's bad enough, but each is sending an email message. Thankfully, we haven't implemented 9.2.1 in production yet it is still 220.127.116.11.
A quick check yesterday showed my engine server was less than 30 seconds ahead of the database server, so that may be the factor .. but it shouldn't be. Going to be raising a ticket. Looking further, the fact that 2 of the datetime types have a time zone adjustment while the system doesn't has presented "date difference" issues in the past, but that's another topic.
This is a known issue if there is a time difference between the engine and database. I'd recommend always syncing your servers to the same time server such as time.nist.gov.
They do get synced on a regular basis, but time creep can still happen between syncs (in particualr in development enviroments). It still doesn't make sence to me that the engine uses two different sources to decide whether to run something.
To get around this issue since time creep can happen, I used Jerome's suggestion of creating a variable that gets set in the first loop so it won't run repeat [potentially hundreds] due to this "feature". I haven't had to worry about clearing said variable, quite yet, so things do run normally as they should when the timers do run properly.
I know we've beaten this one down … but since we've been burned by this again in production recently (resulting it way too many emails) I thought I'd bring up again.
Obviously, in this solution I didn't put in the "I ran" flag. Guess what I'm doing today? (In addition to contacting the server admins about the clocks not being resynced.)
You need to sign in before voting.
You need to sign in before voting.