Description of large scale optimization package, Version 3.0
##############################################################
Patrick Heimbach, MIT/EAPS, 02-Mar-2000
Benny Cheng, NASA/JPL, 13-Apr-2010 (parallel version)

reference:
#########

J.C. Gilbert & C. Lemarechal
Some numerical experiments with variable-storage quasi-Newton algorithms
Mathematical Programming 45 (1989), pp. 407-435

flow chart
##########

      lsopt_top
          |
          |---- check arguments
          |---- CALL INSTORE
          |       |
          |       |---- determine whether OPWARMI available:
          |                * if no:  cold start: create OPWARMI
          |                * if yes: warm start: read from OPWARMI
          |             create or open OPWARMD
          |
          |---- check consistency between OPWARMI and model parameters
          | 
          |---- >>> if COLD start: <<<
          |      |  first simulation with f.g. xx_0; output: first ff_0, gg_0
          |      |  set first preconditioner value xdiff_0 to 1
          |      |  store xx(0), gg(0), xdiff(0) to OPWARMD (first 3 entries)
          |      |
          |     >>> else: WARM start: <<<
          |         read xx(i), gg(i) from OPWARMD (first 2 entries)
          |         for first warm start after cold start, i=0
          |
          |
          |
          |---- /// if ITMAX > 0: perform optimization (increment loop index i)
          |      (
          |      )---- save current values of gg(i-1) -> gold(i-1), ff -> fold(i-1)
          |      (---- CALL LSUPDXX
          |      )       |
          |      (       |---- >>> if jmax=0 <<<
          |      )       |      |  first optimization after cold start:
          |      (       |      |  preconditioner estimated via .01*ff_0 (first guess)
          |      )       |      |  dd(i-1) = -gg(i-1)*preco
          |      (       |      |  
          |      )       |     >>> if jmax > 0 <<<
          |      (       |         dd(i-1) = -gg(i-1)
          |      )       |         CALL HESSUPD
          |      (       |           |
          |      )       |           |---- dd(i-1) modified via Hessian approx.
          |      (       |
          |      )       |---- >>> if <dd,gg> >= 0 <<<
          |      (       |         ifail = 4
          |      )       |
          |      (       |---- compute step size: tact(i-1)
          |      )       |---- compute update: xdiff(i) = xx(i-1) + tact(i-1)*dd(i-1)
          |      (
          |      )---- >>> if ifail = 4 <<<
          |      (         goto 1000
          |      )
          |      (---- if (warm start)
          |      (---- CALL OPTLINE / LSLINE
          |      )       |
          |      (       |
          |      )       |
          |      (       |---- /// loop over simulations
          |      )              (  
          |      (              if (linesearch ) 
          |      (              )---- CALL SIMUL
          |      )              (       |
          |      (              )       |---- input: xdiff(i)
          |      )              (       |---- output: ff(i)
          |      )              (                 reads those values from file
          |      (              )
          |      (              (       |---- perform quadratic line search
          |      )              )
          |      (              else
          |      )              )---- CALL SIMUL
          |      )              (       |
          |      (              )       |---- input: xdiff(i)
          |      )              (       |---- output: ff(i),gg(i)
          |      )              (                 reads those values from file
          |      (              )
          |      )              (---- 1st Wolfe test:
          |      (              )     ff(i) <= tact*xpara1*<gg(i-1),dd(i-1)>
          |      )              (
          |      (              )---- 2nd Wolfe test:
          |      )              (     <gg(i),dd(i-1)> >= xpara2*<gg(i-1),dd(i-1)>
          |      (              )
          |      )              (---- >>> if 1st and 2nd Wolfe tests ok <<<
          |      (              )      |  320: update xx: xx(i) = xdiff(i)
          |      )              (      |
          |      (              )     >>> else 
          |      )              )      |  ifail = 7
          |      )              (      |  set stepsize equal half of previous
          |      (              )      |  value 
          |      (              )      |
          |      )              (         update xdiff for new simulation
          |      (              )
          |      (        
          |      )        
          |      (---- store new values xx(i), gg(i) to OPWARMD (first 2 entries)
          |      )---- >>> if ifail = 7,8,9 <<<
          |      (         goto 1000
          |      )
          |      (---- compute new pointers jmin, jmax to include latest values
          |      )     gg(i)-gg(i-1), xx(i)-xx(i-1) to Hessian matrix estimate
          |      (---- store gg(i)-gg(i-1), xx(i)-xx(i-1) to OPWARMD
          |      )     (entries 2*jmax+2, 2*jmax+3)
          |      (
          |      )---- CALL DGSCALE
          |      (       |
          |      )       |---- call dostore
          |      (       |       |
          |      )       |       |---- read preconditioner of previous iteration diag(i-1)
          |      (       |             from OPWARMD (3rd entry)
          |      )       |
          |      (       |---- compute new preconditioner diag(i), based upon diag(i-1),
          |      )       |     gg(i)-gg(i-1), xx(i)-xx(i-1)
          |      (       |
          |      )       |---- call dostore
          |      (               |
          |      )               |---- write new preconditioner diag(i) to OPWARMD (3rd entry)
          |      (
          |---- \\\ end of optimization iteration loop
          |
          |
          |
          |---- CALL OUTSTORE
          |       |
          |       |---- toggle linesearch flag
          |       |---- store gnorm0, ff(i), current pointers jmin, jmax, iterabs, 
          |       |---- searchflags to OPWARMI
          |
          |         xx(i+1) needs to be computed as input for offline optimization
          |          |
          |          |---- CALL LSUPDXX
          |          |       |
          |          |       |---- compute dd(i), tact(i) -> xdiff(i+1) = x(i) + tact(i)*dd(i)
          |          |
          |          |---- CALL WRITE_CONTROL
          |          |       |
          |          |       |---- write xdiff(i+1) to special file for offline optim.
          |
          |---- print final information
          |
          O


Remarks:
#######

-  Every call to simul refers to a read procedure which
   reads the result of an offline forward run and/or the adjoint run
   itmax = 0, for cold start
   itmax = 1, for warm start
   Also, at the end, x(i+1) needs to be computed and saved
   to be available for the offline model and adjoint run

In order to achieve minimum difference between the online and offline code
xdiff(i+1) is stored to file at the end of an (offline) iteration,
but recomputed identically at the beginning of the next iteration.

2. Number of simulations
-------------------------------------------------

- nfunc: controls the maximum number of safeguarded simulations 

Summary: From one iteration to the next the descent direction changes.
         The updated control used as input for these safeguarded simulations uses the same
         descent direction, but different step sizes.

In detail:
From one iteration to the next the descent direction dd changes using
the result for the adjoint vector gg of the previous iteration.
In lsline the updated control xdiff(i,1) = xx(i-1) + tact(i-1,1)*dd(i-1) serves as input for
a forward and adjoint model run yielding a new gg(i,1).
In general, the new solution passes the 1st and 2nd Wolfe tests 
so xdiff(i,1) represents the solution sought: xx(i) = xdiff(i,1).
If one of the two tests fails, stepsize halving is invoked to determine
a new trial step aize tact(i-1,2).
If more than one function call is permitted, the new step size is used together
with the "old" descent direction dd(i-1) (i.e. dd is not updated using the new gg(i)),
to compute a new xdiff(i,2) = xx(i-1) + tact(i-1,2)*dd(i-1) that serves as input
in a new forward and adjoint run, yielding gg(i,2).
If now, both Wolfe tests are successfull, the updated solution is given by
xx(i) = xdiff(i,2) = xx(i-1) + tact(i-1,2)*dd(i-1).

3. Double-usage of fields dd and xdiff
--------------------------------------

In order to save memory both the fields dd and xdiff have a double usage.

- xdiff: in lsopt_top: used as x(i) - x(i-1) for Hessian update
         in lsline:    intermediate result for control update x = x + tact*dd

- dd   : in lsopt_top, lsline: descent vector, dd = -gg   & hessupd
         in dgscale:           intermediate result to compute new preconditioner

4. Notice for user of old code
------------------------------

Three relevant changes needed to switch to new version:
  
  (i): OPWARMI file: two variables added:
               gnorm0  : norm of first (cold start) gradient
               iabsiter: total number of iterations with respect to cold start

 (ii): routine names that are referenced by main_lsopt.f
       lsoptv1 -> lsopt_top
       lsline1 -> lsline

(iii): parameter list of lsopt_top
       logical loffline included

parameter file data.optim
########################

The optimization is controlled by a set of parameters
provided through the standard input file data.optim,
which is generated within the job script.

  NUPDATE  : max. no. of update pairs (gg(i)-gg(i-1), xx(i)-xx(i-1))
             to be stored in OPWARMD to estimate Hessian
             [pair of current iter. is stored in (2*jmax+2, 2*jmax+3)
             jmax must be > 0 to access these entries]
             Presently NUPDATE must be > 0 
             (i.e. iteration without reference to previous
              iterations through OPWARMD has not been tested)
  EPSX     : relative precision on xx bellow which xx should not be improved
  EPSG     : relative precision on gg below which optimization is considered successful
  IPRINT   : controls verbose (>=1) or non-verbose output
  NUMITER  : always 1
  ITER_NUM : index of new restart file to be created (not necessarily = NUMITER!)
  NFUNC    : max. no. of safeguarded simulations 
             (must be > 0)
             is used if step size tact is interpolated;
             in this case, if NFUNC > 1, a new simulation is performed with
             same gradient but "improved" step size
  FC       : first guess cost function value
  FMIN     : not used

OPWARMI, OPWARMD files
######################

Two files retain values of previous iterations which are
used in latest iteration to update Hessian.
OPWARMI: contains index settings and scalar variables
OPWARMD: contains vectors

Structure of OPWARMI file:
-------------------------
    n, fc, gnorm, m, jmin, jmax, sflag, tflag, safe_iter, stepsize 

    n  = nn      : no. of control variables per processor
    fc = ff      : cost value of last iteration
    m = nupdate  : max. no. of updates for Hessian
    jmin, jmax   : pointer indices for OPWARMD file (cf. below)
    gnorm        : norm of gradient gg
    sflag        : true if linesearch is applied to next iteration
    tflag        : true if next iteration is a safeguarded simulation 
    safe_iter    : number of safeguarded simulations made so far
    stepsize     : value of tact stepsize

Structure of OPWARMD file:
-------------------------
   entry
     1    : xx(i)         : control vector of latest iteration
     2    : gg(i)         : gradient of latest iteration
     3    : xdiff(i), diag: preconditioning vector; (1,...,1) for cold start
    ---
 2*jmax+2 : gold = g(i) - g(i-1) for last update (jmax)
 2*jmax+3 : xdiff = tact * d = xx(i) - xx(i-1) for last update (jmax)

if jmax = 0: cold start; no Hessian update used to compute dd
if jmax > nupdate, old positions are overwritten, starting
                with position pair (4,5)

Example 1: jmin = 1, jmax = 3, mupd = 5

  1   2   3   |   4   5     6   7     8   9     empty     empty
|___|___|___| | |___|___| |___|___| |___|___| |___|___| |___|___|
      0       |     1         2         3

Example 2: jmin = 3, jmax = 7, mupd = 5   ---> jmax = 2

  1   2   3   |  
|___|___|___| | |___|___| |___|___| |___|___| |___|___| |___|___|
              |     6         7         3         4         5


Error handling
##############

  ifail |   description
--------+----------------------------------------------------------
   < 0  | should not appear (flag indic in simul.F not used)
     0  | normal mode during execution
     1  | an input argument is wrong
     2  | warm start file is corrupted
     3  | the initial gradient is too small
     4  | the search direction is not a descent one
     5  | maximal number of iterations reached
     6  | maximal number of simulations reached (handled passively)
     7  | the linesearch failed
     8  | the function could not be improved
     9  | optline parameters wrong
    10  | cold start, no optimization done
    11  | convergence achieved within precision