-.- The 1st intelligent bug-search engine for model checking by APPROVER = the 1st Algorithmic & Proven Protocol Verifier Copyright (C) 2002 - 2004, Jan Hajek, Netherlands Version 1.08 of April 28, 2004, 654 lines of < 78+CrLf chars in ASCII, written with "CMfiler 6.06f" from WWW; submitted to the webmaster of http://www.matheory.info aka www.matheory.com This epaper has facilities for fast finding and browsing. Please save the last copy and use a file differencer to see only how versions differ. Your comments (preferably interlaced into this .txt file) are welcome. NO part of this document may be published, implemented, programmed, copied or communicated by any means without an explicit & full reference to this author together with the full title and the website WWW.MATHEORY.INFO plus the COPYRIGHT note in the texts and in ALL references to this. An implicit, incomplete, indirect, disconnected or unlinked reference (in the text and/or on www) does NOT suffice. New terms: PQ-search , IQ-search , intelligent queueing search , progress/stagnation graph , P/S-graph , tempoblocking , syslock , diffector , bugheat , datalgorithm , datalgo , NAKrace , and ALL inflections of ALL these new terms are also subject to my conditions above. All based on experience. ALL rights reserved. Browsers may like to search here for the following markers : !!! !! ! ?? ? { refs } -.- separates sections |- ------- framed stuff +Abstract: Constructive descriptions of synergistically designed datalgos (= data structures + algorithmics) for model checking of protocols are provided & richly documented with flowcharts, diagrams and P/S-graphs ( by January 19, 1995, no typo, I have made 21 of my viewgraphs available to Gerard Holzmann at Bell Labs, and now everybody else can also make use of them ). -.- +Contents : +Mottos +Basic & new terms, abbreviations & symbols +Executive summary = the beef at a model checking party +APPROVER's opeRational aspects +AntiXplosion techniques in APPROVER +APPROVER's perfect data structure +IQ-search = intelligently queued search in model checking +APPROVER's advanced pattern recognition of liveness bugs +Efficiency of APPROVER's liveness-checking phase +References -.- +Mottos : I leave to the various futures (not to all) my garden of forking paths ... this network of times which approached one another, forked, broke off, or were unaware of another for centuries, embraces all possibilities of time. { Jorge Luis Borges, The Garden of the Forking Paths, The Labyrinths } earlier quoted in {HJ79} One ounce of insight is worth one megaton of hardware. Make'em, break'em, store'em, dead or alive. If in doubt, take'em out. Only the collaborators, meek-type opportunists and dead fish swim with the stream. -.- +Basic & new terms, abbreviations & symbols : := an assignment as in LHS := RHS i.e. left-hand side := right-hand side & is a concatenation, or "and" (depends on the context) asap = as soon as possible (our goal is to find bugs asap) B7700 = Burroughs mainframe running Burroughs Extended Algol (BEA) in 1977; BEA may be characterized as a Pascalized C; word size 48 bits. var = a variable modeling (in USese) = modelling (in UKese) = specifying the algorithmic essentials of (a set of) computations or communications processes; a model is a simplified ie less hairy hence managable reality. LIST = Setree on the vugraph of Combined Data Structures, stores all states, DEAD or LIVE; the basic operation on LIST is the set membership testing i.e. a state present/absent. The 2ndary operation is the backtracing of the path from the root of the genealogical tree to an error-state, if found. LIVE = a state to be expanded later; or LIVE on the vugraph of Combined Data Structures DEAD = a state which has been expanded already (must be remembered anyway), which holds also on the vugraph of Combined Data Structures, !!!! but on the vugraph of Liveness Verification phase, when all states in LIST were expanded i.e. DEAD, there DEAD is a Boolean variable for detection of deadlocks of all kinds; please read DEAD as LOCKUP on the vugraph for Liveness Verification. envelope = all control fields in a message (eg a msg nr, an ACK nr, CRC, etc) ultimate effect = an irreversible, irrevocable effect on an effector variable which in general is NOT a control variable of a program or of a communicating process and is not a part of a state vector. EIR = effective invariant relation (concerns the ultimate effects only) EPF = effective progress function (concerns the ultimate effects only) PTC = proper termination condition (if any specified) media = communication channels systematic search = not skipping any possibility (of interleaving of atomic actions [ a tomos = indivisible in Greek]). event = an external event (eg incoming message) or an internal event (eg a timeout in a process, or just an advancement of what used to be called an [implicit] "instruction counter"). expansion of a state = by application of several events to an event-driven algorithm, several states (some new, some old) will be generated. 1st-out = to be expanded next (while other states wait to be expanded later; waiting means delay ie stacking or queueing in one way or another). priority = a heuristic evaluation function (like for evaluation of a position in chess ie of a situation on a chessboard) FIFO = 1st-in-1st-out ie a queueing discipline or order of expansion. LIFO = last-in-1st-out ie a stacking discipline or order of expansion. PQ = a priority queue (FIFO and LIFO are 2 extreme cases of 2 simplest PQs) PQ-search = a priority queue driven search in general IQ-search = intelligent queueing search for model checking a la APPROVER. P/S-graph = effective progress/stagnation graph; effective means that a mere activity (like eg thrashing) is not enough, some meaningful effects must be effectuated (eg a mere transmission of a message means nothing if it is thrown away as a [pseudo]duplicate; a message must be finally and irrevocably ie irreversibly accepted). SCB = system control block (here the & means a concatenation); = a state vector & shadow variables & auxiliary variables, where : state = state vector = control variables (CV) local to & shared by the interacting processes, eg : - process control blocks (PCBs) including STATUS-variables (if a model is specified as procedures mixed with some FSA like TCP was), - BREAKpointers, plus - the message envelopes stored in the media, and also - diffectors of EIR. A state contains bounded variables only (ie vars of a limited range). SCB's name in Approver's code was ALLV = all variables. SHV = shadows = shadow var = variables not belonging to a state, typically: - effectors = MONOTONICALLY NONDECREASING variables used only in EPF (effective progress must be measured by nondecreasing functions only), and in EIR. - data/text parts in the media, if any auxies = auxiliary vars = BackTurn, BackEventGen, BackLqual, BackSeqNr, ParcOut (for no syslock), TurnField (for tempoblock), HashKey eg ParcOut = P-arc outgoing from ... gotta see the code :-) The rules of the LHS := RHS game: Rule1: Assertions may contain controls CV and shadows SHV. Rule2: SHV := fun( CV, SHV ); Rule3: CV := fun( CV ); a shadow or an auxie must never influence a CV; neither directly, nor indirectly; hence : in CONSTRUCTIVE terms a shadow SHV must never be appearing: - on an RHS := - in a condition (of a loop, or of an if-then-[else] ) - as an actual parameter or its value in procedures or functions which contain control variables CV on the LHS := ; If in doubt, this is quite easily checked syntactically (eg via regexps or decent xrefs). -.- +Executive summary : At the occasion of the 25th anniversary of the 1st public introduction of APPROVER at the 16th Lake Arrowhead Workshop on Current Issues in Computer Networking and Distributed Computing, August 31, 1977, at the UCLA Conference Center, I have decided to finally disclose the full truth and nothing but the truth about APPROVER, which eg in February 1978 has found design bugs in the basic algorithm of the TCP {HJ78b}, {SC78} on which ARPANET was ticking then and 25 years later our InterNet is still TCP-driven. This epaper reveals for the 1st time certain datalgorithmic secrets which were beating in 1977 (no typo, not in 1877 :-) at the heart of APPROVER = the 1st Algorithmic & Proven Protocol Verifier. My new word "datalgorithm" expresses the synergistic union of a perfectly designed datastructure with algorithms, ultimately data-driven by a priority heuristic. The only change i will make to my description is the change of the interpretation of the name APPROVER from Automated ... to Algorithmic ... in order to stress the little known fact that from early on APPROVER has accepted models specified as algorithmic procedures in Burroughs Algol {HJ78b} and not just as some simplistic micromodels of finite state automata like eg an FSA-matrix. APPROVER has performed a reachability analysis of protocols as a systematic guided ie informed search for errors in the models of algorithms which are traditionally and properly classified as protocols (for interprocess interaction, data communication, networking, mutual exclusion, etc). The search for bugs was: - systematic ie all possible specified interleavings of parallel, distributed processes communicating via (un)reliable media channels were checked for safety, liveness & proper termination, if any specified. - Safety was specified by assertions of the EIR kind ie as effective invariant relations, while: - Liveness was specified in terms of EPF = effective progress function which was automatically translated (see ** ) into & was checked via then new P/S-graphs = progress/stagnation graphs introduced for the 1st time in {HJ77a}, {HJ78b}. The liveness phase searched for : - lockups : - dynamic lockups (I called them deadlocks too, see Fig.3b in {HJ78b} p. 754, where a lockup is visualized ) - static lockups = classical deadlocks Lockups were detected as S-cycles from which leads no path containing a P-arc ie a P-transition meaning an effective progress was not reachable from such an S-cycle anymore. - tempo-blockings (coined by me in early 1977) = temporary lockups - STB = system tempo-blockings - ITB = individual tempo-blockings See {HJ78b} for Fig.1b on p.753 where a TB is visualized clearly. Tempo-blockings were detected as S-cycles ie stagnation cycles con- sisting from S-arcs only, from which leads a path containing a P-arc ie effective progress was reachable. - Proper termination = (if specified) an ability to terminate properly from all reachable compound states. "Properness" was specified with assertions called PTC = proper termination condition. An example of a liveness bug is eg a NAK-race or NAKrace, which is my term for processes which are busily NAKing each others outputs. - The search was a directed search ie a guided search ie a heuristic search of the best-1st-out kind or BFO. The classical searches FIFO and LIFO are naive because they use the sequential number of the state generation as a naive search-ordering criterion : - the breadth 1st search = queue-driven = FIFO = 1st-in-1st-out, with the following properties: simple, raw speed, voracious use of memory, but guaranteeing the absolutely shortest paths to bugs/errors, if any. - the depth 1st search = stack-driven = LIFO = last-in-1st-out, with the following properties: simplicity, raw speed, uses much less memory than the FIFO-search does, but too looong paths to errors, if any found. The longer the path to a bug the more (wo)manhours it takes to interpret it. LIFO might seem ok iff no bugs are present, but aside from the sad fact that there are almost always some bugs, the real catch is that we can never be 100% sure that there are no bugs if model's combinatorially exploding reachable state space does not fit into available memory, which happens more often than not with most of the realistic models which were not oversimplified. Hence the urgent need to use a smart, informed, intelligent IQ-search to find bugs asap, which is not just a matter of technocratic & economic efficiency, but rather a matter of life or death (for bugs :-) Even if no bugs are present, it makes a huge difference between wishing vs knowing there are none. Confidence has its price-tag and it is worth that price. -.- +APPROVER's opeRational aspects : - a batch program ie fully automated, no interactions, no messing; when Dr. Carl Sunshine, then with Rand Corp., has visited me in February 1978 he was surprised about the 100% automated batch runs which were spitting out deep design bugs in TCP, which he has been authorized to fix. - the model to be checked was specified as procedures, and channels were prepared as "defines" in BEA ie as macros; - there was no precompiler, the algolic specs were inserted on fixed places between two fixed kernel-parts of APPROVER and compiled as a single fast program (no slow interpreter, no precompiler to program & to maintain :-) - Except for the very 1st versions in early 1977 which used a FIFO-search, APPROVER has used searches which can be named as: intelligent search, smart search, informed search, heuristic search, best 1st search, MaxFO search = Maximal-priority-1st-out search. I will call it shortly IQ-search = intelligent queue(ing) search, or informed queue search, or a PQ-search = priority queue search. B.t.w., a MaxFO-heap PQ can be turned into a MinFO-heap PQ (and vice versa) by simply changing the sign of the priority. -.- +AntiXplosion techniques in APPROVER : 1. Few bits were used per variable (many of which changed modulo small m), so that a dozen of small-ranged variables were packed in one 48-bit word. Eg. a small modulus M of message numbering (hence also ACK-nrs) was used. Packing saved RAM and made comparisons and copying faster, but overflow checking of these partial words did cost some time, so not much speedup. 2. Breaks : by inserting home-made Breaks into the models of protocols, 3 effects were achieved : - the necessary atomic ie indivisible actions were specified, hence also: - all necessary points of interleaving; but by specifying only the necessary breakpoints (like eg message transmitted or received) I got : - only a minimal necessary number of states generated ! 3. In the media ie channels: - CRC was represented by only one bit and the rest of such an invalid message was cleared ie its bits were set to 000...000, so that there was only one kind of an CRC-invalid message. - Nonsequential channels were modelled as sets of messages, thus avoiding all the possible permutations of sequences. Nevertheless all of these messages in a set would be read ie received. Hence all possible I/O sequences were systematically tried out without much combinatorial explosions which would have otherwise taken place due to the permuted contents of the media. 4. Shadow variables were not a part of the SCB. They were used only within EIR and EPF and PTC. -.- +APPROVER's perfectly designed data structure : Three synergistic data structures jointly supported highly efficient storage & execution of all frequent operations (keyops): - HashTable HT for instant LIST-membership testing of a state, DEAD or aLIVE. - LIST of all unique states with 2 kinds of independent explicit links: - chains of hashing synonyma. Alternatively, an open addressed hashing could do without these particular links (then the LIST has become an open addressed hashtable). Hence no unresolved collisions, no losses, no blue-haired SPINsters gambling a la Vegas or Reno. - genealogical tree with one link from each state to its 1st predecessor (from which it has been expanded ie generated by an application of an event). Such genealogical links are not strictly necessary for bugfinding, but they are necessary for bugtracing ie for backtracing (no backtracking involved) a bug ie for the reconstruction and human interpretation of a path to a bug, if any found. LIST could be a linkless implementation of a set of states (dead or alive) as an open addressed hashtable, if no bug was expected anymore, thus saving memory. - Priority queue PQ for an IQ-search: each element in a PQ contained two variables : the priority and the index (or pointer) to an alive state on the LIST. So PQ contained only the Prio-weights of & pointers to the alive states waiting to be expanded and thus become dead but on the LIST until the end of a model checking run. A dead state was immediately deleted from a PQ [this took time O(log(H)) if H in PQ ] including the restauration of the heap's top from which the just expanded (hence now dead) state has been deleted by replacing it with the very last state in the heap and sifting it down until its proper position determined by its priority was reached. Sifting is "hole moving" which is about twice as fast as swapping (similarly in a non-naive insertion sort). An alt- ernative theoreticians' view is that a deletion of the top element from a heaPQ splits it into two heaps which have to be merged into one heap. Q: which description do you prefer ? -.- +IQ-search = intelligently queued search for model checking : Strictly speaking I have told about APPROVER everything, but not always all to everybody. For example, by January 19, 1995, Gerard Holzmann at Bell Labs has received from me 21 viewgraphs including all flowcharts of APPROVER's key datalgorithms. Even my IQ-search engine has been explained in my courses, but nobody from the wannabe structured programmers has paid any attention to it. But the situation has changed since 1977; there is more code, less structure, more bugs, more mess, so maybe also more attention to non-naive solutions :-) Searches like LIFO or FIFO (= depth-1st or breadth-1st) are naive brute force searches which seem to run fast but only to naive junior programmers. APPROVER has used a priority queue PQ to find safety bugs asap. The PQ was implemented as a heap (hence I called it a heaPQ). Heap is a beautiful linkless implementation of a PQ which always has its most important member at its top, hence instantly accessible. An insertion into a heap or deletion of the top-element from a heap takes O(log(H)) time where H is the number of currently heaped elements (ie not a number of states on a LIST). Hence we know how to mem-efficiently (because linklessly) implement a PQ working at a reasonable raw speed slower than a simplistic LIFO, but due to being a "PQ with IQ" with incomparably higher chance to find a safety bug, if any, in no time. The key question was (and is, until you read this) this: Q: How to formulate a priority criterion ie a heuristic which will feel the heat of a bug, if any, and ferret it out much faster than a naive, though seemingly fast brute force LIFO-search would ? A: My solution for APPROVER's safety phase was based on an analogy with heat-seeking missiles which has occurred to me when watching (on tv) I guess Israeli jets throwing away burning aluminium decoys to mislead infrared heat-seekers. Roughly speaking, a heat-seeking missile glides along the steepest gradient of heat towards its target. On one hand a design of such a missile is easy, because heat is a well understood concept, while bugs are not, otherwise they would not hit us all the time. On the other hand, a missile must hit a moving target, while bugs are sitting ducks just waiting to be debugged (unless their makers are too fast re-typists :-) so I did not have to invent a logical analogon of a Kalman filter to predict bug's trajectory. But the problem with bugs was and is that they do not emanate any radiation at anybody but (un)lucky me, who has felt their heat back in 1977. Anyway, in my view since 1977, model checkers' targets are the potential bugs to be found in a haystack called reachable statespace, and the $1M IQuestion proper was & is how to make a bug (if any) to emanate some logical equivalent of a radiation, or shortly heat, so that APPROVER could guide search to such a source of bugheat which was APPROVER's goal ? For checking of the models of data communication protocols I have invented and implemented in 1977 the following basic model of model's bugheat : The enigma of the IQ-search revealed - or - |------ Bugs are our goals, and bugheat is our leading light to them : | ! Every orderly interaction is somehow constrained in its ultimate effects. ! Eg a data transfer goes via at least one bounded buffer or queue eg via ! a communication channel of a limited LOGICAL capacity to transfer and/or ! physical capacity to store, at most Nmax datachunks or messages, hence the ! effective invariant relation EIR (involving the ultimate ie irrevocable ! effects only, ie no control variables CV in principle) is eg this ! archetypal EIR : ! ! EIR := 0 <= Nfh - Nac <= Nmax ! ! Shadows: Nfh = nr of unique messages fetched for transmission ! Nac = nr of unique messages accepted (ie received-and-also- ! accepted as valid nonduplicate irrevocable finals) ! (unique = distinct = nonduplicate = new = fresh) ! !! Priority := Nfh - Nac ; Quiz: do you already feel any bugheat ? ! ! While the ideal of correctness guys & galls is to (make others to believe ! that they) design algorithms so that an invariant is maintained ie not ! violated, my key idea was to make them stand on their heads, so to speak, ! by actively pushing towards the violation of an EIR. An opeRationalization ! of the meta-metaphor of "making correctness czars to stand on their heads" ! (which has not made me particularly popular in their ego-centric circles) ! was to create a POSITIVE FEEDBACK trying hard to "destabilize" asap the ! invariant EIR as much as feasible, so that it will be VIOLATED asap, if ! possible. A negative feedback tries to minimize a difference, while my ! positive feedback has tried to increase it, and the above EIR offers ! a readily available difference in particular. The greater the difference, ! the greater the "bugheat" ie the greater the chances that a protocol will ! go wrong, eg will irrevocably lose a message or will irrevocably accept ! the same message twice. Hence during the safety verification phase, my ! priority queue PQ has used the shadow variable Ndif = Nfh - Nac as the ! priority ie the heuristic evaluation function. This way safety bugs were ! found in no time, finally in less than 2 seconds in sliding window pro- ! tocols on a B7700 in 1977 (= now an old Pentium ?). So far for the basic ! version of my key idea to steer or push a protocol towards its constraints ! expressed as an EIR. Needless to say that the generality of a PQ lends ! itself to literally infinite flexibility (and potential creativity) in ! formulating all kinds of priorities between both naive extremes of a ! FIFO = breadth-1st-search and a LIFO = depth-1st-search, so that various ! tradeoffs between the use of available memory and the time (in which ! safety bugs are found by steering towards violations of EIR) are possible. ! ! Example1: EIR, EPF and Priority for a mutual exclusion algorithm with ! two critical sections A, B : ! EIR := not ( inCsA and inCsB) ! EPF := NcsA & NcsB ; where NcsA, NcsB are the shadow ! counters of entries into the critical sections. ! Priority := NcsA - NcsB; or similar heuristic for bugheat ! ! ! Example2: EIR, EPF and Priority for a stop-&-wait protocol for a ! bidirectional data transfer between 2 processes A and B : ! here nr means a shadow count (ie not any numbering in messages' ! envelopes which belong to the SCB, hence not to effectors ) ! AfhA = nr of unique messages fetched by A and transmitted to B ! AdeA = nr of unique messages (originally fetched&transmitted by A ) ! finally deleted by A so that they cannot be retransmitted. ! AacA = nr of unique messages effectively accepted by A from B ! ! EIR := ( 0 <= AfhA - BacB <= 1 ) and ! ( 0 <= AfhA - AdeA <= 1 ) and ! ( 0 <= BfhB - AacA <= 1 ) and ! ( 0 <= BfhB - BdeB <= 1 ) and ! ( all XacX accepted in proper sequence ) ! ! EPF := AfhA & AdeA & AacA & BfhB & BdeB & BacB; (are all shadows ) ! ! Priority := AfhA - BacB; or similar heuristic for bugheat ! ! Note that if these shadow counters are designed to change in ! a NONDECREASING fashion only, then any change of an EPF during ! a state-transition from a predecessor to its immediate successor !! ** is necessarily an EFFECTIVE PROGRESS (of a successor wrt ** ! its predecessor) and the & is indeed literally just a concatenation. ! Hence the slightest inequality of their EPF-subvectors is sufficient ! to decide that a transition is a P-transition, while their equality ! means an S-transition ie a stagnation (though not necessarily a ! nonactivity) ie no effective progress wrt to the predecessor. Hence my !! constructively opeRational trick was to bytewise compare EPF-subvectors !! ie the concatenated EPF variables of a successor with its predecessor's, !! and iff they differed, then mark the P/S-bit as 1 ie P, else as 0 ie S. !! That was how the P/S-arcs were created already during the safety-phase. !! It is obvious that when messages are lost or become CRC-invalid, they !! will necessarily cause effective stagnations (despite of busy activity). !! Therefore during the liveness-phase no events invalidating or losing !! messages were applied. After the safety-checking-phase was finished !! without any safety bugs found, the liveness-checking-phase was run by !! always copying the single next state from the LIST into another array !! ALLV (similar to LIST) of states to be checked for liveness properties, !! and then expanding this state ST by applying to it, and its successors !! (to be) stored in ALLV, all events except those which would invalidate !! or lose a message (as explained above). In contrast with ALLV, on the !!! LIST there were all states reachable under all events possible, includ- !!! ing the LQUAL-events making media unreliable ( LQUAL stands for "line's !!! quality"). If a P-arc was made, then ST could not belong to a lockup !!! cycle ( DEAD:=FALSE ), so the next state ST was copied into ALLV[0] and !! just described search-cycle started again. See the flowcharts enclosed. !! The number of nested loops which apply events to the state ST is given !! by the number of subsets of events which are NOT mutually exclusive; eg !! !! 0. Expand the best-1st state ST at the top of a priority queue PQ !! (this is not an event, just the outermost driving loop) !! 1. a set of process's turns towards its next Break !! 2. set of events (eg a timeout, message propagation in a channel) !! 3. set of line-ok/KO events (eg message unharmed, lost, !! CRC-invalidated) !! Note that eg a message-propagation step is independent from the line !! quality event (ok/KO), hence to cover all possible combinations they !! must be in separate nested loops to provide for mutual nonexclusivity. ! ! By now only junior programmers will not know how APPROVER's liveness- ! phase was programmed: simple, fast, few parts, easy maintenance :-) ! ! ! Example3: EIR, EPF and Priority for a Go-N-back unidirectional ! B-to-A transfer: ! L = M-1, where M = the modulus of message numbers, hence ! L is a LOGICAL "buffering capacity" of the protocol; here ! LOGICAL means the designed "effective window width". ! Physical buffering capacity would be L+1 messages. Internet ! has a physical capacity virtually unlimited wrt to a single ! pair of communicating processes, which nevertheless has a ! limited logical capacity due to a numbering limited by M. ! ! EIR := ( 0 <= BfhB - AacA <= L ) and ! ( 0 <= BfhB - BdeB <= L ) and ! ( 0 <= AacA - BdeB <= L ) and ! ( all XacX accepted in proper sequence ) ! ! EPF := AacA & BfhB & BdeB ! ! Priority := BfhB - AacA; or similar heuristic for bugheat ! ! To my best knowledge, such a deadly effective systematic search for bugs, ! based on my idea of intentionally trying to violate a crystal clear EIR ! in a PQ-driven reachability analysis of protocols has not only never been ! published before 1977, but also not until now here. The idea of a positive ! feedback makes it reasonable to call it "bugfinding cybernetics" because ! kubernetes means a steersman in Greek, and the art & science of bugfinding ! is to steer our (wo)man-(wo)machine searches towards likely bugs asap. ! |------------- -.- +Efficiency of APPROVER's liveness-checking phase: Q: asked by Gerard Holzmann (Bell Labs): Is the complexity of the liveness verification phase quadratic in the number of states in the state graph ? A: Fortunately not :-) - Let N be the nr of distinct states reached (= generated & stored in LIST ) during the 1st phase = safety-checking phase. - As said here above and as flowcharts suggest, during the 2nd phase = liveness-checking phase a single next root-state ST was copied to the array's ALLV[0] (actually to ALLV[1], the code tells me), and expanded ie its successors generated and expanded by applying to it all but the bad-line events. When an effective progress was made ie a P-transition ie a P-arc, the root-state ST was no more a part of a an inescapable lockup (= DEADlock in my terminology of 1977), but ST could still be a part of an escapable tempo-blocking cycle ( TB ). Now watch out: QuickSort is theoretically a quadratic sorter, but this has no practical significance. Similarly, it makes little sense to meditate about the worst case scenario for APPROVER's liveness phase because : 1. for the overwhelming nr of states ST copied from LIST to ALLV[0] as roots for liveness-checking, the P-arc has occured after a small nr of transitions. 2. If an ST was a part of a lockup (=DEADLOCK) then such a stagnation- cycle of inescapable S-arcs was found during expansions of this ST and its successors all happily listed together in ALLV. To find a lockup did take more time than to reach a P-arc in 1. here, but then we were finished, because One lockup a day, keeps the users away anyway :-) ie when I find a lockup, I either fix it before I go on, or proclaim the protocol to be no good. Summarizing 1 & 2: finding a lockup, if any, didn't take much time, and if found, we were done anyway. In other words, finding a lockup-cycle when the current ST is part of it might have cost few seconds extra in 1977, but that didnt matter, coz then we could stop anyway. 3. Remains the situation when no lockup was found ie we are in the happy cycle of 1. here, when a small nr of states are expanded per each state loaded from LIST to ALLV[0] as a next-root-for-liveness checking. DEADlock detection causes no slowdown. What remains is APPROVER's search for tempoblockings aka TB (kinda tuberculosis :-). Do they create a slowdown ? No! Q: why not ? A: because they were always present in large numbers (they are unavoidable, methinks, except perhaps in mutex = mutual exclusion algos), and noone wants to see them all, so i always have set a couple of run parameters (APPROVER was not an interactive prog, it has run in batch-mode and therefore it had to be fully automated) telling how many DEADlockups and TB's I want to see. As I said, 1,2,3 LOCKUPs and 1-5 TBs is enough to start fixing. And then a new run must begin from scratch anyway. Hope your thirst for arcane algorYthmic knowledge is quenched with my no-cheapo answer, so that u dont have to fight with windmills while SPINning. This is NOT a joke! What I want to say is that one can keep himself busy with solving nonproblems, and many theoreticians do so their whole life. I was done with APPROVER essentially in one year. Now I'm finished by this Total Recall. -.- +APPROVER's advanced pattern recognition of liveness bugs : So far we were concerned about bugfinding and backtracing, but what about bug interpretation which must be performed by fragile human mind ? Liveness violations are particularly funless to interpret, because unlike the safety- bugs formed by a single state + a path to it, a liveness bug consists from a multi-state stagnation-subgraph ( S-graph, or at least an S-cycle ) + a path. The only single-state liveness bug is a classical deadlock. This strenuous human task to interpret the liveness bugs was made easier by APPROVER's liveness-bug-pattern recognition algorithmics which was at the cutting edge in 1977, and probably is still advanced 25 years later in 2002. Here the key idea was to apply a COMMUTATIVE hashfun to a stagnation subgraph which represented a lockup or tempo-blocking. Normally a hashfun should be computed by means of non-commutative operations. But if we want to recognize various S-graphs as equivalents when they are in fact permutations of the same states, we have to use a commutative hashfun. When done with this bugpattern recognition, it was easy to backtrace a path from any of the states in an S-graph to the root. Backtracing could be done from all the states in an S-graph and the shortest path used for display. This filtering can be combined with displaying only the small(est) S-graphs. The length of a path to the to-be-displayed S-graph can be taken into the total evaluation, so that the human effort spent on interpretation of a liveness-bug can be greatly reduced. Happy bug-reading :-)) -.- +References { refs } : HJ77a = Hajek Jan: Self-synchronization and blocking in data transfer protocols, report THE-RC29286, April 1977; this was the 1st version of my Kyoto paper HJ78b which had to be submitted a year before the ICCC-78 was scheduled. The local pound-foolish & penny wise Calvinists have not seen much use in my strange activities, so they wisely let me to pay a 1/2 of the costs of the trip myself :-( Btw, it was clear to me at that time that Pentagon's ARPANET will put the squeeze on the Kommies, and I was eager to help the beloved DoD to do just that. And the history has proven me right: ARPANET's successor Internet has become the most powerful freedom fighting machine of the Free West. HJ77b = Hajek Jan: Automatic verification and computer aided design of commu- nication protocols, minireport THE-RC29458, May 12, 1977; this was a one page abstract of my talk given at the Lake Arrowhead Workshop, see HJ77c. HJ77c = Hajek Jan: Automatic verification and computer aided design of commu- nication protocols, minireport THE-RC29458a, Sept 22, 1977; this was the final abstract for my presentation of APPROVER = Automated & Proven PRO- tocol VERifier at the Lake Arrowhead Workshop, California, August 1977. From its very beginning, APPROVER was sufficiently advanced to perform reachability analysis of protocols specified as procedures in Burroughs Algol, so that much more primitive finite state models (FSA) had to be translated into Algol and not vice versa. ARPA's (later: DARPA) TCP = Transmission Control Program (later: Protocol) has been modelled by Dr. Carl Sunshine, Rand Corp., and design bugs were found in TCP by APPROVER {Sunshine 1978}. ARPANET was the mother of Internet still running on TCP HJ78a = Hajek Jan: Progress report on the Automatic and Proven Protocol Verifier, Computer Communication Review, ACM SigComm 8/1, January 1978, 15-16. HJ78b = Hajek Jan: Automatically verified data transfer protocols, Proc. 4th Intl. Computer Communications Conference, Kyoto, Sept. 1978, 749-756, with 4 protocols and their 4 Effective Progress/Stagnation graphs aka P/S-graphs. On p. 752 left column, 2nd line from below: please fix the otherwise nonexistent ":= phaseA" into the correct ":= phasesA". HJ79 = Hajek Jan: Protocols verified by APPROVER, Computer Communication Review, ACM SigComm 9/1, January 1979, 32-34 ( originally INWG note 169, Sept.1978 ). SC78 = Sunshine Carl A.: Survey of protocol definition and verification techniques, Computer Communication Review, ACM SigComm, vol.8, July 1978 nr.3, 35-41, see Table 1 for APPROVER & Hajek & ARPA's TCP on p. 39. TA81 = Tanenbaum Andrew: Computer Networks, 1981; this 1st edition p.182 does mention APPROVER without saying that it has been able to verify all protocols (sliding windows too) in this book before it went to print. -.-