Nov 15, 2021

Dear Apple Developer world, is there documentation of the “plj” (Property List Journal) files Apple Photos uses as its reference for photo information? (This is not the sqlite database many others have documented.)
Oct 26, 2021

Based on the Google News snippets, you wouldn’t think these headlines describe the same article! www.nature.com/articles/…
Oct 22, 2021

The FDA (regulations.gov) got 40,000 comments yesterday on the EUA for 5-11! (Their total is 55k now, but was 15k the day before…). Seems like the majority are against approval from my non-statistical sampling!
Oct 13, 2021

Oh wow. MS @powerpoint finally fixed my bug with PT Sans Bold Italic on MacOS Big Sur… (and also Catalina, although something else happened there) Just in time for Monterey! Thanks @powerpoint team! (Took ~6 mo-1 year).
Oct 6, 2021

Anyone know what the carbon footprint of me spending 5 minutes answering an email is? I don’t mean the computer and network etc, I mean, the carbon footprint of a human spending 5 minutes on a sedentary activity.
Oct 5, 2021

Tried to take a survey of academic climate. Sadly had to give up because it was total nonsense. My own view of my answers were basically random.
Sep 29, 2021

Google thinks the Lucid Air has some pretty incredible range.
Sep 28, 2021

n=1 observation for me regarding class… I’m finding this semester has more time-consuming issue as things are not yet entirely ‘back to normal’ although there is a push that way (which then triggers the time-consuming issues…) could be random luck though too!
Sep 24, 2021

If anyone sees a leather case for an iPhone 13 Pro without a lip around the camera region, can you let me know please? Pretty please? I like cameras, but I also like “phones” that lie flat and don’t wobble.
Sep 13, 2021

I find it curious how surveys of academia believe we have perfect memory of how many things like ‘reviews’ we do. Or how many we ‘declined’ to do! This is a hard question. I have no idea how many things I say no to!
Aug 9, 2021

Need data (networks, hypergraphs, tensors, ML, AI)? Here’s a timestamped and linked digest of Anthony Fauci’s emails arxiv.org/abs/2108…. from @JasonLeopold FOIA request. With @austinbenson and @n_veldt Data and Code github.com/nveldt/fa…

LAPACK's random number generators in Julia (part ... blah ... in work to port ARPACK to Julia.)

Jun 28, 2021

ARPACK depends on LAPACK’s internal random number generators

dlarnv
slarnv
zlarnv
clarnv

which use the raw routines

dlaruv
slaruv

This is true even if you use Julia to initialize the starting vector using Julia’s random number generator. The reason is because ARPACK needs to be able to get random data to restart the process is it detects an invariant subspace. This suggests two approaches: call LAPACK’s RNG from Julia, or port LAPACK’s RNG to Julia. Why do the simple thing when the more complicated thing is so much more fun?

Aside, best guess on LAPACK names: _larnv = lapack random normal vector and _laruv = lapack random uniform vector

Calling LAPACK’s random number generator from julia

The tricky bit about calling LAPACK’s RNG from Julia is the seed value. This is just 4 ints, which we’d like to be stack allocated. Unfortunately, we won’t be able to get that for reasons that are internal to Julia. (Blah blah, stack vs. heap and gc and guarantees…)

Here’s some code that works though.

import LinearAlgebra

_dlarnv_blas!(idist::Int,
       iseed::Ref{NTuple{4,Int}},
      n::Int,
      x::Vector{Float64}) =
  ccall((LinearAlgebra.BLAS.@blasfunc("dlarnv_"), LinearAlgebra.BLAS.libblas), Cvoid,
    (Ref{LinearAlgebra.BlasInt}, Ptr{LinearAlgebra.BlasInt}, Ref{LinearAlgebra.BlasInt}, Ptr{Float64}),
    idist, iseed, n, x)

x = zeros(5); _dlarnv!(1,Base.Ref(tuple(1,2,3,4)),length(x),x)

using BenchmarkTools
@btime _dlarnv_blas!(1, tseed, length(z), z) setup=begin
    z=zeros(128);     
    tseed=Base.Ref(tuple(1,3,5,7));
end

This uses NTuple as the holder for the 4 ints we use as the seed. At this point, I thought the following should work…

mutable struct MySeedState
  seed::NTuple{4,Int}
end

function myrand!(x::Vector{Float64}, state::MySeedState)
  iseed = Base.Ref(state.seed) # this will allocate :(

  _dlarnv_blas!(2, iseed, length(x), x)
  state.seed = iseed[]
end

using BenchmarkTools
@btime myrand!(z, tstate) setup=begin z=zeros(128); tstate=MySeedState(tuple(1,3,5,7)); end

But – see above – I learned when we want to use this is that allocating a Ref can actually allocate in Julia, even when it’s an itsbits type.

julia> isbitstype(NTuple{4,Int})
true

To avoid this, we can just allocate the ref as part of the mutable struct.

mutable struct MyRefSeedState
  seed::Ref{NTuple{4,Int}}
end

function myrand!(x::Vector{Float64}, state::MyRefSeedState)
  _dlarnv_blas!(2, state.seed, length(x), x)
end

using BenchmarkTools
@btime myrand!(z, tstate) setup=begin
    z=zeros(128);
    tstate=MyRefSeedState(Base.Ref(tuple(1,3,5,7)));
end

Either way, it probably isn’t that big of a deal, but I’d like to be clear on where the allocations are happening. e.g. one-time thread local setup is okay. Random internal allocations are not okay.

It’s puzzling to me why this causes an allocation as this seems like it can be safely done on the stack and the same allocation doesn’t occur for types like Int.

Porting LAPACK’s random number generator to Julia.

Since the routine is sufficiently simple, we can actually just port the random number generator directly.

Aside – that this is actually a weakness as Julia’s RNGs are much better. On the other hand, these are just used to generate data that is not-orthogonal to a desired invariant subspace (with very very high probability). So the quality of the RNG does not need to be statistical.

The underlying double-precision random number generator is dlaruv not dlarnv. The second function dlarnv is a wrapper that does seemingly odd things like work in 128-length segments, until you realize that dlaruv only works in 128-length segments. (Weird!) Anyhoo.

Here’s a link to the double precision generator dlaruv.f in OpenBLAS

The routine is fairly simple. Have a set of fixed values and do some integer operations/etc. to get pseudorandom values out. Specifically > This routine uses a multiplicative congruential method with modulus 2**48 and multiplier 33952834046453 (see G.S.Fishman, 'Multiplicative congruential random number generators with modulus 2**b: an exhaustive analysis for b = 32 and a partial analysis for b = 48', Math. Comp. 189, pp 331-344, 1990).

See the appendix for code to extract this fixed array of values.

We set these values into a constant global array

const _dlaruv_mm=tuple(tuple(494,322,2508,2549), ...

julia> typeof(_dlaruv_mm)
NTuple{128, NTuple{4, Int64}}

Next up, we port the actual RNG. In this case, I was actually able to copy/paste much of the FORTRAN code.

function _dlaruv!(iseed::Base.RefValue{NTuple{4,Int}}, n::Int, x::Vector{Float64})
  IPW2=4096
  R = 1/IPW2
  i1 = iseed[][1]
  i2 = iseed[][2]
  i3 = iseed[][3]
  i4 = iseed[][4]
  # setup scope for final call in julia
  IT1 = i1
  IT2 = i2
  IT3 = i3
  IT4 = i4
  @inbounds for i=1:min(n,length(_dlaruv_mm))
    # directly copied/pasted from fortran, hence the caps
    while true
      # Multiply the seed by i-th power of the multiplier modulo 2**48
      IT4 = i4*_dlaruv_mm[i][4]
      IT3 = IT4 ÷ IPW2
      IT4 = IT4 - IPW2*IT3
      IT3 = IT3 + i3*_dlaruv_mm[i][4] + i4*_dlaruv_mm[i][3]
      IT2 = IT3 ÷ IPW2
      IT3 = IT3 - IPW2*IT2
      IT2 = IT2 + i2*_dlaruv_mm[i][4] + i3*_dlaruv_mm[i][3] + i4*_dlaruv_mm[i][2]
      IT1 = IT2 ÷ IPW2
      IT2 = IT2 - IPW2*IT1
      IT1 = IT1 + i1*_dlaruv_mm[i][4] + i2*_dlaruv_mm[i][3] + i3*_dlaruv_mm[i][2] + i4*_dlaruv_mm[i][1]
      IT1 = mod( IT1, IPW2 )

      # Convert 48-bit integer to a real number in the interval (0,1)
      x[i] = R*( Float64( IT1 )+R*( Float64( IT2 )+R*( Float64( IT3 )+R*
                 Float64( IT4 ) ) ) )
      if x[i] == 1
        # handle the case where the value is different
        i1 += 2
        i2 += 2
        i3 += 2
        i4 += 2
      else
        break
      end
    end
  end
  iseed[] = tuple(IT1,IT2,IT3,IT4) # update the seed
end

This uses the global constant array to avoid dealing with that big array of values each function call; otherwise, the only difference is that we use a while loop instead of a GOTO loop in fortran.

This gives exactly the same results as the FORTRAN call

_dlaruv_blas!(
      iseed::Ref{NTuple{4,Int}},
      n::Int,
      x::Vector{Float64}) =
  ccall((LinearAlgebra.BLAS.@blasfunc("dlaruv_"), LinearAlgebra.BLAS.libblas), Cvoid,
    (Ptr{LinearAlgebra.BlasInt}, Ref{LinearAlgebra.BlasInt}, Ptr{Float64}),
    iseed, n, x)

seed1 = Base.RefValue(tuple(1,3,5,7))
seed2 = Base.RefValue(tuple(1,3,5,7))

x1 = zeros(97); _dlaruv_blas!(seed1, length(x1), x1)
x2 = similar(x1); _dlaruv!(seed2, length(x2), x2)

@assert x1 == x2
@assert seed1[] == seed2[]

x3 = zeros(143); _dlaruv_blas!(seed1, length(x3), x3)
x4 = zeros(axes(x3)...); _dlaruv!(seed2, length(x4), x4)

@assert x3 == x4
@assert seed1[] == seed2[]

Awesome. But if you look at dlarnv, it also has a statically allocated length 128 array that is not going to be fun in Julia. Doable, but not fun… nor at all necessary. We can actually streamline things by just manually inlining the RNG directly into dlarnv. The only thing we have to remember is that we need to “refresh” the seed every 128 RV generations.

Since ARPACK only ever calls dlarnv with idist=2, we are just going to port that code.

function _dlarnv_idist_2!(iseed::Base.RefValue{NTuple{4,Int}}, n::Int, x::Vector{Float64})
  # manually inline dlaruv to avoid excess cruft...
  IPW2=4096
  R = 1/IPW2
  i1 = iseed[][1]
  i2 = iseed[][2]
  i3 = iseed[][3]
  i4 = iseed[][4]
  # setup scope for final call in julia
  IT1 = i1
  IT2 = i2
  IT3 = i3
  IT4 = i4

  nrv = 0

  @inbounds for i=1:n
    rv = 1.0
    nrv += 1
    if nrv > length(_dlaruv_mm)
      nrv = 1  # update seed information
      i1 = IT1
      i2 = IT2
      i3 = IT3
      i4 = IT4
    end

    while true
      # Multiply the seed by i-th power of the multiplier modulo 2**48
      IT4 = i4*_dlaruv_mm[nrv][4]
      IT3 = IT4 ÷ IPW2
      IT4 = IT4 - IPW2*IT3
      IT3 = IT3 + i3*_dlaruv_mm[nrv][4] + i4*_dlaruv_mm[nrv][3]
      IT2 = IT3 ÷ IPW2
      IT3 = IT3 - IPW2*IT2
      IT2 = IT2 + i2*_dlaruv_mm[nrv][4] + i3*_dlaruv_mm[nrv][3] + i4*_dlaruv_mm[nrv][2]
      IT1 = IT2 ÷ IPW2
      IT2 = IT2 - IPW2*IT1
      IT1 = IT1 + i1*_dlaruv_mm[nrv][4] + i2*_dlaruv_mm[nrv][3] + i3*_dlaruv_mm[nrv][2] + i4*_dlaruv_mm[nrv][1]
      IT1 = mod( IT1, IPW2 )

      # Convert 48-bit integer to a real number in the interval (0,1)
      rv = R*( Float64( IT1 )+R*( Float64( IT2 )+R*( Float64( IT3 )+R*
               Float64( IT4 ) ) ) )
      if rv == 1
        # handle the case where the value is different
        i1 += 2
        i2 += 2
        i3 += 2
        i4 += 2
      else
        break
      end
    end
    # now, rv is a random value
    # port the line from `dlarnv` for idist=2
    x[i] = 2*rv - 1
  end
  iseed[] = tuple(IT1,IT2,IT3,IT4) # update the seed
end

This works great!

seed1 = Base.RefValue(tuple(1,3,5,7))
seed2 = Base.RefValue(tuple(1,3,5,7))

x1 = zeros(501); _dlarnv_blas!(2, seed1, length(x1), x1)
x2 = similar(x1); _dlarnv_idist_2!(seed2, length(x2), x2)

@assert x1 == x2
@assert seed1[] == seed2[]

##
x3 = zeros(143); _dlarnv_blas!(2, seed1, length(x3), x3)
x4 = zeros(axes(x3)...); _dlarnv_idist_2!(seed2, length(x4), x4)

@assert x3 == x4
@assert seed1[] == seed2[]

Exactly the same values and exactly the same seeds.

Things that might happen in the future… I may pull out the key piece of the random number generator into a separate function…

rv,IT1,IT2,IT3,IT4,i1,i2,i3,i4 = _dlaruv_rng_step(
    nrv,IT1,IT2,IT3,IT4,i1,i2,i3,i4)

Well… I tried that, and it turns out slower, which brings us to …

Julia is faster than Fortran.

What’s neat about inlining that code is we get a small performance boost.

using BenchmarkTools
@btime _dlarnv_idist_2!(tseed, length(z), z) setup=begin
  z=zeros(501);
  tseed=Base.Ref(tuple(1,3,5,7)); end
@btime _dlarnv_blas!(2, tseed, length(z), z) setup=begin
  z=zeros(501);
  tseed=Base.Ref(tuple(1,3,5,7)); end

Gives …

Julia 6.686 μs (0 allocations: 0 bytes)

Fortran 7.333 μs (0 allocations: 0 bytes)

So Julia for the win!

(For reference, with the _dlaruv_rng_step idea, it was 7.0 microseconds, so just a nudge slower. Not sure why as it would seem to inline to exactly the same code. )

Appendix

Getting the values from the fortran code.

Here is how we extracted that big array of constant values from the FORTRAN code for dlaruv. I believe it’s the same array for slaruv too, but I’d need to double-check that. So this code is helpful there.

code = raw"""
DATA               ( MM( 1, J ), J = 1, 4 ) / 494, 322, 2508,
$                   2549 /
DATA               ( MM( 2, J ), J = 1, 4 ) / 2637, 789, 3754,
$                   1145 /
DATA               ( MM( 3, J ), J = 1, 4 ) / 255, 1440, 1766,
$                   2253 /
DATA               ( MM( 4, J ), J = 1, 4 ) / 2008, 752, 3572,
$                   305 /
DATA               ( MM( 5, J ), J = 1, 4 ) / 1253, 2859, 2893,
$                   3301 /
DATA               ( MM( 6, J ), J = 1, 4 ) / 3344, 123, 307,
$                   1065 /
DATA               ( MM( 7, J ), J = 1, 4 ) / 4084, 1848, 1297,
$                   3133 /
DATA               ( MM( 8, J ), J = 1, 4 ) / 1739, 643, 3966,
$                   2913 /
DATA               ( MM( 9, J ), J = 1, 4 ) / 3143, 2405, 758,
$                   3285 /
DATA               ( MM( 10, J ), J = 1, 4 ) / 3468, 2638, 2598,
$                   1241 /
DATA               ( MM( 11, J ), J = 1, 4 ) / 688, 2344, 3406,
$                   1197 /
DATA               ( MM( 12, J ), J = 1, 4 ) / 1657, 46, 2922,
$                   3729 /
DATA               ( MM( 13, J ), J = 1, 4 ) / 1238, 3814, 1038,
$                   2501 /
DATA               ( MM( 14, J ), J = 1, 4 ) / 3166, 913, 2934,
$                   1673 /
DATA               ( MM( 15, J ), J = 1, 4 ) / 1292, 3649, 2091,
$                   541 /
DATA               ( MM( 16, J ), J = 1, 4 ) / 3422, 339, 2451,
$                   2753 /
DATA               ( MM( 17, J ), J = 1, 4 ) / 1270, 3808, 1580,
$                   949 /
DATA               ( MM( 18, J ), J = 1, 4 ) / 2016, 822, 1958,
$                   2361 /
DATA               ( MM( 19, J ), J = 1, 4 ) / 154, 2832, 2055,
$                   1165 /
DATA               ( MM( 20, J ), J = 1, 4 ) / 2862, 3078, 1507,
$                   4081 /
DATA               ( MM( 21, J ), J = 1, 4 ) / 697, 3633, 1078,
$                   2725 /
DATA               ( MM( 22, J ), J = 1, 4 ) / 1706, 2970, 3273,
$                   3305 /
DATA               ( MM( 23, J ), J = 1, 4 ) / 491, 637, 17,
$                   3069 /
DATA               ( MM( 24, J ), J = 1, 4 ) / 931, 2249, 854,
$                   3617 /
DATA               ( MM( 25, J ), J = 1, 4 ) / 1444, 2081, 2916,
$                   3733 /
DATA               ( MM( 26, J ), J = 1, 4 ) / 444, 4019, 3971,
$                   409 /
DATA               ( MM( 27, J ), J = 1, 4 ) / 3577, 1478, 2889,
$                   2157 /
DATA               ( MM( 28, J ), J = 1, 4 ) / 3944, 242, 3831,
$                   1361 /
DATA               ( MM( 29, J ), J = 1, 4 ) / 2184, 481, 2621,
$                   3973 /
DATA               ( MM( 30, J ), J = 1, 4 ) / 1661, 2075, 1541,
$                   1865 /
DATA               ( MM( 31, J ), J = 1, 4 ) / 3482, 4058, 893,
$                   2525 /
DATA               ( MM( 32, J ), J = 1, 4 ) / 657, 622, 736,
$                   1409 /
DATA               ( MM( 33, J ), J = 1, 4 ) / 3023, 3376, 3992,
$                   3445 /
DATA               ( MM( 34, J ), J = 1, 4 ) / 3618, 812, 787,
$                   3577 /
DATA               ( MM( 35, J ), J = 1, 4 ) / 1267, 234, 2125,
$                   77 /
DATA               ( MM( 36, J ), J = 1, 4 ) / 1828, 641, 2364,
$                   3761 /
DATA               ( MM( 37, J ), J = 1, 4 ) / 164, 4005, 2460,
$                   2149 /
DATA               ( MM( 38, J ), J = 1, 4 ) / 3798, 1122, 257,
$                   1449 /
DATA               ( MM( 39, J ), J = 1, 4 ) / 3087, 3135, 1574,
$                   3005 /
DATA               ( MM( 40, J ), J = 1, 4 ) / 2400, 2640, 3912,
$                   225 /
DATA               ( MM( 41, J ), J = 1, 4 ) / 2870, 2302, 1216,
$                   85 /
DATA               ( MM( 42, J ), J = 1, 4 ) / 3876, 40, 3248,
$                   3673 /
DATA               ( MM( 43, J ), J = 1, 4 ) / 1905, 1832, 3401,
$                   3117 /
DATA               ( MM( 44, J ), J = 1, 4 ) / 1593, 2247, 2124,
$                   3089 /
DATA               ( MM( 45, J ), J = 1, 4 ) / 1797, 2034, 2762,
$                   1349 /
DATA               ( MM( 46, J ), J = 1, 4 ) / 1234, 2637, 149,
$                   2057 /
DATA               ( MM( 47, J ), J = 1, 4 ) / 3460, 1287, 2245,
$                   413 /
DATA               ( MM( 48, J ), J = 1, 4 ) / 328, 1691, 166,
$                   65 /
DATA               ( MM( 49, J ), J = 1, 4 ) / 2861, 496, 466,
$                   1845 /
DATA               ( MM( 50, J ), J = 1, 4 ) / 1950, 1597, 4018,
$                   697 /
DATA               ( MM( 51, J ), J = 1, 4 ) / 617, 2394, 1399,
$                   3085 /
DATA               ( MM( 52, J ), J = 1, 4 ) / 2070, 2584, 190,
$                   3441 /
DATA               ( MM( 53, J ), J = 1, 4 ) / 3331, 1843, 2879,
$                   1573 /
DATA               ( MM( 54, J ), J = 1, 4 ) / 769, 336, 153,
$                   3689 /
DATA               ( MM( 55, J ), J = 1, 4 ) / 1558, 1472, 2320,
$                   2941 /
DATA               ( MM( 56, J ), J = 1, 4 ) / 2412, 2407, 18,
$                   929 /
DATA               ( MM( 57, J ), J = 1, 4 ) / 2800, 433, 712,
$                   533 /
DATA               ( MM( 58, J ), J = 1, 4 ) / 189, 2096, 2159,
$                   2841 /
DATA               ( MM( 59, J ), J = 1, 4 ) / 287, 1761, 2318,
$                   4077 /
DATA               ( MM( 60, J ), J = 1, 4 ) / 2045, 2810, 2091,
$                   721 /
DATA               ( MM( 61, J ), J = 1, 4 ) / 1227, 566, 3443,
$                   2821 /
DATA               ( MM( 62, J ), J = 1, 4 ) / 2838, 442, 1510,
$                   2249 /
DATA               ( MM( 63, J ), J = 1, 4 ) / 209, 41, 449,
$                   2397 /
DATA               ( MM( 64, J ), J = 1, 4 ) / 2770, 1238, 1956,
$                   2817 /
DATA               ( MM( 65, J ), J = 1, 4 ) / 3654, 1086, 2201,
$                   245 /
DATA               ( MM( 66, J ), J = 1, 4 ) / 3993, 603, 3137,
$                   1913 /
DATA               ( MM( 67, J ), J = 1, 4 ) / 192, 840, 3399,
$                   1997 /
DATA               ( MM( 68, J ), J = 1, 4 ) / 2253, 3168, 1321,
$                   3121 /
DATA               ( MM( 69, J ), J = 1, 4 ) / 3491, 1499, 2271,
$                   997 /
DATA               ( MM( 70, J ), J = 1, 4 ) / 2889, 1084, 3667,
$                   1833 /
DATA               ( MM( 71, J ), J = 1, 4 ) / 2857, 3438, 2703,
$                   2877 /
DATA               ( MM( 72, J ), J = 1, 4 ) / 2094, 2408, 629,
$                   1633 /
DATA               ( MM( 73, J ), J = 1, 4 ) / 1818, 1589, 2365,
$                   981 /
DATA               ( MM( 74, J ), J = 1, 4 ) / 688, 2391, 2431,
$                   2009 /
DATA               ( MM( 75, J ), J = 1, 4 ) / 1407, 288, 1113,
$                   941 /
DATA               ( MM( 76, J ), J = 1, 4 ) / 634, 26, 3922,
$                   2449 /
DATA               ( MM( 77, J ), J = 1, 4 ) / 3231, 512, 2554,
$                   197 /
DATA               ( MM( 78, J ), J = 1, 4 ) / 815, 1456, 184,
$                   2441 /
DATA               ( MM( 79, J ), J = 1, 4 ) / 3524, 171, 2099,
$                   285 /
DATA               ( MM( 80, J ), J = 1, 4 ) / 1914, 1677, 3228,
$                   1473 /
DATA               ( MM( 81, J ), J = 1, 4 ) / 516, 2657, 4012,
$                   2741 /
DATA               ( MM( 82, J ), J = 1, 4 ) / 164, 2270, 1921,
$                   3129 /
DATA               ( MM( 83, J ), J = 1, 4 ) / 303, 2587, 3452,
$                   909 /
DATA               ( MM( 84, J ), J = 1, 4 ) / 2144, 2961, 3901,
$                   2801 /
DATA               ( MM( 85, J ), J = 1, 4 ) / 3480, 1970, 572,
$                   421 /
DATA               ( MM( 86, J ), J = 1, 4 ) / 119, 1817, 3309,
$                   4073 /
DATA               ( MM( 87, J ), J = 1, 4 ) / 3357, 676, 3171,
$                   2813 /
DATA               ( MM( 88, J ), J = 1, 4 ) / 837, 1410, 817,
$                   2337 /
DATA               ( MM( 89, J ), J = 1, 4 ) / 2826, 3723, 3039,
$                   1429 /
DATA               ( MM( 90, J ), J = 1, 4 ) / 2332, 2803, 1696,
$                   1177 /
DATA               ( MM( 91, J ), J = 1, 4 ) / 2089, 3185, 1256,
$                   1901 /
DATA               ( MM( 92, J ), J = 1, 4 ) / 3780, 184, 3715,
$                   81 /
DATA               ( MM( 93, J ), J = 1, 4 ) / 1700, 663, 2077,
$                   1669 /
DATA               ( MM( 94, J ), J = 1, 4 ) / 3712, 499, 3019,
$                   2633 /
DATA               ( MM( 95, J ), J = 1, 4 ) / 150, 3784, 1497,
$                   2269 /
DATA               ( MM( 96, J ), J = 1, 4 ) / 2000, 1631, 1101,
$                   129 /
DATA               ( MM( 97, J ), J = 1, 4 ) / 3375, 1925, 717,
$                   1141 /
DATA               ( MM( 98, J ), J = 1, 4 ) / 1621, 3912, 51,
$                   249 /
DATA               ( MM( 99, J ), J = 1, 4 ) / 3090, 1398, 981,
$                   3917 /
DATA               ( MM( 100, J ), J = 1, 4 ) / 3765, 1349, 1978,
$                   2481 /
DATA               ( MM( 101, J ), J = 1, 4 ) / 1149, 1441, 1813,
$                   3941 /
DATA               ( MM( 102, J ), J = 1, 4 ) / 3146, 2224, 3881,
$                   2217 /
DATA               ( MM( 103, J ), J = 1, 4 ) / 33, 2411, 76,
$                   2749 /
DATA               ( MM( 104, J ), J = 1, 4 ) / 3082, 1907, 3846,
$                   3041 /
DATA               ( MM( 105, J ), J = 1, 4 ) / 2741, 3192, 3694,
$                   1877 /
DATA               ( MM( 106, J ), J = 1, 4 ) / 359, 2786, 1682,
$                   345 /
DATA               ( MM( 107, J ), J = 1, 4 ) / 3316, 382, 124,
$                   2861 /
DATA               ( MM( 108, J ), J = 1, 4 ) / 1749, 37, 1660,
$                   1809 /
DATA               ( MM( 109, J ), J = 1, 4 ) / 185, 759, 3997,
$                   3141 /
DATA               ( MM( 110, J ), J = 1, 4 ) / 2784, 2948, 479,
$                   2825 /
DATA               ( MM( 111, J ), J = 1, 4 ) / 2202, 1862, 1141,
$                   157 /
DATA               ( MM( 112, J ), J = 1, 4 ) / 2199, 3802, 886,
$                   2881 /
DATA               ( MM( 113, J ), J = 1, 4 ) / 1364, 2423, 3514,
$                   3637 /
DATA               ( MM( 114, J ), J = 1, 4 ) / 1244, 2051, 1301,
$                   1465 /
DATA               ( MM( 115, J ), J = 1, 4 ) / 2020, 2295, 3604,
$                   2829 /
DATA               ( MM( 116, J ), J = 1, 4 ) / 3160, 1332, 1888,
$                   2161 /
DATA               ( MM( 117, J ), J = 1, 4 ) / 2785, 1832, 1836,
$                   3365 /
DATA               ( MM( 118, J ), J = 1, 4 ) / 2772, 2405, 1990,
$                   361 /
DATA               ( MM( 119, J ), J = 1, 4 ) / 1217, 3638, 2058,
$                   2685 /
DATA               ( MM( 120, J ), J = 1, 4 ) / 1822, 3661, 692,
$                   3745 /
DATA               ( MM( 121, J ), J = 1, 4 ) / 1245, 327, 1194,
$                   2325 /
DATA               ( MM( 122, J ), J = 1, 4 ) / 2252, 3660, 20,
$                   3609 /
DATA               ( MM( 123, J ), J = 1, 4 ) / 3904, 716, 3285,
$                   3821 /
DATA               ( MM( 124, J ), J = 1, 4 ) / 2774, 1842, 2046,
$                   3537 /
DATA               ( MM( 125, J ), J = 1, 4 ) / 997, 3987, 2107,
$                   517 /
DATA               ( MM( 126, J ), J = 1, 4 ) / 2573, 1368, 3508,
$                   3017 /
DATA               ( MM( 127, J ), J = 1, 4 ) / 1148, 1848, 3525,
$                   2141 /
DATA               ( MM( 128, J ), J = 1, 4 ) / 545, 2366, 3801,
$                   1537 /
"""

lines = split(code, "\n"; keepempty=false)
for i = 1:2:length(lines)
  l1 = lines[i]
  l2 = lines[i+1] # next line  with last piece
  vals13str = split(l1, "/")[2] # values 1 to 3
  vals13 = parse.(Int, split(vals13str, ","; keepempty=false))
  vals14 = push!(vals13, parse(Int, split(l2[2:end])[1]))
  println("tuple(", join(vals14, ","), "),")
end

ARPACK Sorting Take 2 (Part ... blah in the ongoing series...)

Jun 24, 2021

Last post on sorting was a disaster. I tried to do something quick and simple in Julia and ran into all sorts of issues with things I didn’t quite understand and breaks in my mental model about how Julia works. This was just to try and write the real julia code for the following task: sort an array and take another along for the ride as is done in the following code.

This post is straightforward. We are simply going to port this code.

  c-----------------------------------------------------------------------
  c\BeginDoc
  c
  c\Name: dsortr
  c
  c\Description:
  c  Sort the array X1 in the order specified by WHICH and optionally
  c  applies the permutation to the array X2.
  c
  c\Usage:
  c  call dsortr
  c     ( WHICH, APPLY, N, X1, X2 )
  c
  c\Arguments
  c  WHICH   Character*2.  (Input)
  c          'LM' -> X1 is sorted into increasing order of magnitude.
  c          'SM' -> X1 is sorted into decreasing order of magnitude.
  c          'LA' -> X1 is sorted into increasing order of algebraic.
  c          'SA' -> X1 is sorted into decreasing order of algebraic.
  c
  c  APPLY   Logical.  (Input)
  c          APPLY = .TRUE.  -> apply the sorted order to X2.
  c          APPLY = .FALSE. -> do not apply the sorted order to X2.
  c
  c  N       Integer.  (INPUT)
  c          Size of the arrays.
  c
  c  X1      Double precision array of length N.  (INPUT/OUTPUT)
  c          The array to be sorted.
  c
  c  X2      Double precision array of length N.  (INPUT/OUTPUT)
  c          Only referenced if APPLY = .TRUE.
  c
  c\EndDoc
  c
  c-----------------------------------------------------------------------
  c
  c\BeginLib
  c
  c\Author
  c     Danny Sorensen               Phuong Vu
  c     Richard Lehoucq              CRPC / Rice University
  c     Dept. of Computational &     Houston, Texas
  c     Applied Mathematics
  c     Rice University
  c     Houston, Texas
  c
  c\Revision history:
  c     12/16/93: Version ' 2.1'.
  c               Adapted from the sort routine in LANSO.
  c
  c\SCCS Information: @(#)
  c FILE: sortr.F   SID: 2.3   DATE OF SID: 4/19/96   RELEASE: 2
  c
  c\EndLib
  c
  c-----------------------------------------------------------------------
  c
        subroutine dsortr (which, apply, n, x1, x2)
  c
  c     %------------------%
  c     | Scalar Arguments |
  c     %------------------%
  c
        character*2 which
        logical    apply
        integer    n
  c
  c     %-----------------%
  c     | Array Arguments |
  c     %-----------------%
  c
        Double precision
       &           x1(0:n-1), x2(0:n-1)
  c
  c     %---------------%
  c     | Local Scalars |
  c     %---------------%
  c
        integer    i, igap, j
        Double precision
       &           temp
  c
  c     %-----------------------%
  c     | Executable Statements |
  c     %-----------------------%
  c
        igap = n / 2
  c
         if (which .eq. 'SA') then
  c
  c        X1 is sorted into decreasing order of algebraic.
  c
     10    continue
           if (igap .eq. 0) go to 9000
           do 30 i = igap, n-1
              j = i-igap
     20       continue
  c
              if (j.lt.0) go to 30
  c
              if (x1(j).lt.x1(j+igap)) then
                 temp = x1(j)
                 x1(j) = x1(j+igap)
                 x1(j+igap) = temp
                 if (apply) then
                    temp = x2(j)
                    x2(j) = x2(j+igap)
                    x2(j+igap) = temp
                 end if
              else
                 go to 30
              endif
              j = j-igap
              go to 20
     30    continue
           igap = igap / 2
           go to 10
  c
        else if (which .eq. 'SM') then
  c
  c        X1 is sorted into decreasing order of magnitude.
  c
     40    continue
           if (igap .eq. 0) go to 9000
           do 60 i = igap, n-1
              j = i-igap
     50       continue
  c
              if (j.lt.0) go to 60
  c
              if (abs(x1(j)).lt.abs(x1(j+igap))) then
                 temp = x1(j)
                 x1(j) = x1(j+igap)
                 x1(j+igap) = temp
                 if (apply) then
                    temp = x2(j)
                    x2(j) = x2(j+igap)
                    x2(j+igap) = temp
                 end if
              else
                 go to 60
              endif
              j = j-igap
              go to 50
     60    continue
           igap = igap / 2
           go to 40
  c
        else if (which .eq. 'LA') then
  c
  c        X1 is sorted into increasing order of algebraic.
  c
     70    continue
           if (igap .eq. 0) go to 9000
           do 90 i = igap, n-1
              j = i-igap
     80       continue
  c
              if (j.lt.0) go to 90
  c
              if (x1(j).gt.x1(j+igap)) then
                 temp = x1(j)
                 x1(j) = x1(j+igap)
                 x1(j+igap) = temp
                 if (apply) then
                    temp = x2(j)
                    x2(j) = x2(j+igap)
                    x2(j+igap) = temp
                 end if
              else
                 go to 90
              endif
              j = j-igap
              go to 80
     90    continue
           igap = igap / 2
           go to 70
  c
        else if (which .eq. 'LM') then
  c
  c        X1 is sorted into increasing order of magnitude.
  c
    100    continue
           if (igap .eq. 0) go to 9000
           do 120 i = igap, n-1
              j = i-igap
    110       continue
  c
              if (j.lt.0) go to 120
  c
              if (abs(x1(j)).gt.abs(x1(j+igap))) then
                 temp = x1(j)
                 x1(j) = x1(j+igap)
                 x1(j+igap) = temp
                 if (apply) then
                    temp = x2(j)
                    x2(j) = x2(j+igap)
                    x2(j+igap) = temp
                 end if
              else
                 go to 120
              endif
              j = j-igap
              go to 110
    120    continue
           igap = igap / 2
           go to 100
        end if
  c
   9000 continue
        return
  c
  c     %---------------%
  c     | End of dsortr |
  c     %---------------%
  c
        end

This code has the same routine four times with different compare functions. Let’s pull out one routine.

           if (which .eq. 'SA') then
  c
  c        X1 is sorted into decreasing order of algebraic.
  c
     10    continue
           if (igap .eq. 0) go to 9000
           do 30 i = igap, n-1
              j = i-igap
     20       continue
  c
              if (j.lt.0) go to 30
  c
              if (x1(j).lt.x1(j+igap)) then
                 temp = x1(j)
                 x1(j) = x1(j+igap)
                 x1(j+igap) = temp
                 if (apply) then
                    temp = x2(j)
                    x2(j) = x2(j+igap)
                    x2(j+igap) = temp
                 end if
              else
                 go to 30
              endif
              j = j-igap
              go to 20
     30    continue
           igap = igap / 2
           go to 10
  c
        else if (which .eq. 'SM') then

Things to note. First, this uses Fortran’s flexible indexing to switch to C-based indices. While this is possible in Julia, we are just going to do it manually. Again, we are trying to keep this simple. (Despite the last blog post.)

First step is just to unwrap the loops and GOTOs. In this case, I find an outer to inner strategy is helpful.

The outer loop (line number 10) seems to be the following

while igap != 0
  # stuff
  igap = div(igap,2)
end

The next loops (line number 20 and 30) seems to be encoding the pattern.

for i=igap:(n-1)
  j = i-igap
  while j > 0
    if cmp(j,j+igap)
      # swap
    else
      break # out of while loop
    end
    j = j-igap
  end
end

Now, we assemble these into a simple function. Other changes made are: * Predefined strings become symbols. * Rather than repeating the code 4 times, we use functions to handle the comparison * Add a new macro to handle checking the length of input vectors.

  """
  Sort the array X1 in the order specified by WHICH and optionally
  applies the permutation to the array X2.

  Usage:
  call dsortr
     ( WHICH, APPLY, N, X1, X2 )

  Arguments
  WHICH   Symbol  (Input)
          :LM -> X1 is sorted into increasing order of magnitude.
          :SM -> X1 is sorted into decreasing order of magnitude.
          :LA -> X1 is sorted into increasing order of algebraic.
          :SA -> X1 is sorted into decreasing order of algebraic.

  APPLY   Boolean.  (Input)
          APPLY = .TRUE.  -> apply the sorted order to X2.
          APPLY = .FALSE. -> do not apply the sorted order to X2.

  N       Integer.  (INPUT)
          Size of the arrays.

  X1      Double precision array of length N.  (INPUT/OUTPUT)
          The array to be sorted.

  X2      Double precision array of length N.  (INPUT/OUTPUT)
          Only referenced if APPLY = .TRUE.

  Returns: nothing

  """
  function dsortr(
    which::Symbol, # Input
    apply::Bool, # Input
    n::Int, # Input
    x1::Vector{Float64}, # Input/Output
    x2::Vector{Float64}, # Input/Output
    )

    @jl_arpack_check_length(x1, n)
    if apply
      @jl_arpack_check_length(x2, n)
    end

    igap = div(n,2) # integer division

    # the ARPACK sorting routine is all the same, just changes the
    # comparison, Julia gives us a way of stating that very easily.
    if which == :SA
      cmp = (x::Float64, y::Float64) -> (x < y)
    elseif which == :SM
      cmp = (x::Float64, y::Float64) -> (abs(x) < abs(y))
    elseif which == :LA
      cmp = (x::Float64, y::Float64) -> (x > y)
    elseif which == :LM
      cmp = (x::Float64, y::Float64) -> (abs(x) > abs(y))
    end

    while igap != 0
      for i=igap:n-1
        j = i-igap
        while j >= 0
          # in the fortran code, it uses indices 0, n-1
          if cmp(x1[j+1],x1[j+igap+1])
            x1[j+1],x1[j+igap+1] = x1[j+igap+1],x1[j+1]
            if apply
              x2[j+1],x2[j+igap+1] = x2[j+igap+1],x2[j+1]
            end
          else
            break # go to 30, which means to exit while loop
          end
          j = j - igap
        end
      end
      igap = div(igap, 2)
    end
  end

The macro for parameter checking is pretty simple, although it took me about an hour to get all the various options correct.

macro jl_arpack_check_length(var,len)
  varstr = string(var)
  lenstr = string(len)
  errstr1 = "range 1:$lenstr="
  errstr2 = " out of bounds for $varstr of length "

  return esc( quote
    if $len > length($var)
      throw(ArgumentError(string($errstr1, $len, $errstr2, length($var))))
    end
  end)
end

To get the “string” command, I had to look at

x = "Hi"
mytest(x) = "My string $x"
@code_lowered mytest(x)

To find how the string interpolation actually gets lowered to the string command.

Next up was testing. I wanted to run some tests on all permutation vectors.

While I could add a dependency on Combinatorics.jl to the testing repository, I wanted to keep this simple. We don’t need it to be terribly efficient. Here is what I came up with.

  function all_permutations(v::Vector{Float64};
      all::Vector{Vector{Float64}}=Vector{Vector{Float64}}(),
      prefix::Vector{Float64}=zeros(0))

    if length(v) == 0
      push!(all, prefix)
    end

    for i in eachindex(v)
      all_permutations(deleteat!(copy(v), i); prefix=push!(copy(prefix), v[i]), all)
    end
    return all
  end

Then our testing code runs everything in that set to test and make sure we get the correct sort.

A question

What is this sorting algorithm doing? I just ported the code. It seems to be sorting elements divided by a gap. (igap) Then it merges these sorted lists into longer sorted lists.

Some history

There is a comment in the code that references LANSO. Here is the original routine from LANSO (found in svdpack by Mike Berry). So we might have to chat with Beresford Parlett for more information on this algorithm.

  C
  C @(#)DSORT2.F    3.2 (BNP) 12/9/88
  C
        SUBROUTINE DSORT2(N,ARRAY1,ARRAY2)
        INTEGER N
        REAL*8 ARRAY1(0:N-1),ARRAY2(0:N-1)
  C
  C.... SORT ARRAY1 AND ARRAY2 INTO INCREASING ORDER FOR ARRAY1
  C
        INTEGER IGAP,I,J
        REAL*8 TEMP
  C
        IGAP = N/2
   10   IF (IGAP.GT.0) THEN
          DO 200 I = IGAP,N-1
            J = I-IGAP
   50       IF (J.GE.0.AND.ARRAY1(J).GT.ARRAY1(J+IGAP)) THEN
                TEMP = ARRAY1(J)
                ARRAY1(J) = ARRAY1(J+IGAP)
                ARRAY1(J+IGAP) = TEMP
                TEMP = ARRAY2(J)
                ARRAY2(J) = ARRAY2(J+IGAP)
                ARRAY2(J+IGAP) = TEMP
            ELSE
              GO TO 200
            ENDIF
            J = J-IGAP
            GO TO 50
   200    CONTINUE
        ELSE
          RETURN
        ENDIF
        IGAP = IGAP/2
        GO TO 10
        END

Me on learning about `save` variables in Fortran and accessing them from Julia. (Part ... something ... in my ongoing segment on porting ARPACK to Julia)

Jun 17, 2021

Huh, what are these things in a FORTRAN code marked save … Oh… jeez, is that persistent state in a subroutine? i.e. like global variables? ugh, that is going to be annoying to port. (More on a strategy in a bit.)

e.g. dgetv0.f in ARPACK

  save       first, iseed, inits, iter, msglvl, orth, rnorm0

How could these possibly be implemented in a shared library? Well, most likely they are just fixed offsets in memory somewhere that the subroutine knows about and manipulates. So can I see them from Julia?

Try dumping the dylib/so file to find if these are listed in there. (They ought to be as they should take up persistent locations in memory…) (Any of the following commands show them…)
```
nm -a libarpack.2.0.0.dylib
objdump -p libarpack.2.0.0.dylib
dsymutil -s libarpack.2.0.0.dylib
nm -nm libarpack.2.0.0.dylib
```

They are! e.g. In this case, I picked iseed which is the ARPACK random number seed/state.

dgleich@circulant test-arpack-state % nm -nm /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | grep iseed
000000000005d060 (__DATA,__bss5) non-external _iseed.3637
000000000005d080 (__DATA,__bss5) non-external _iseed.3636
000000000005d0a0 (__DATA,__bss5) non-external _iseed.3636
000000000005d0c0 (__DATA,__bss5) non-external _iseed.3637

Well, there are a few things listed here… but probably something we can resolve via code.

Try and lookup symbols via dlsym…

using Arpack
using Arpack_jll
using LinearAlgebra
libar = Base.Libc.dlopen(Arpack_jll.libarpack)
## Try and get offset of private symbol directly
offset = Base.Libc.dlsym(libar, "iseed.3637")
offset = Base.Libc.dlsym(libar, "_iseed.3637")
offset = Base.Libc.dlsym(libar, "iseed")

doesn’t work because `dlsym only has exported/public symbols accessible. (Why, I don’t know because they are in the darn file, so it’s like they want to make it harder to access things… which I guess is a good thing?)

Try and look at offsets from the load base. e.g. if the dylib/so file says that function dgetv0 is at address 0x0010f00 and I load that at address 0x5020e00, then the offset to other addresses should be the same… which it is for functions (easy to verify)

dgleich@circulant test-arpack-state % nm -nm /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | grep getv0
nm -nm /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | grep getv0

000000000001f2a0 (__TEXT,__text) external _dgetv0_

julia> offset = Base.Libc.dlsym(libar, "dgetv0_")
Ptr{Nothing} @0x0000000170bac2a0

julia> base_offset = offset - 0x000000000001f2a0
Ptr{Nothing} @0x0000000170b8d000

So we have a base offset, which means that other functions and symbols should respect that…

    dgleich@circulant test-arpack-state % nm -nm /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | grep arscnd
    0000000000008a00 (__TEXT,__text) external _arscnd_

    julia> arscnd_offset = Base.Libc.dlsym(libar, "arscnd_")
    Ptr{Nothing} @0x0000000170b95a00

    julia> arscnd_offset == 0x0000000000008a00 + base_offset
    true

Awesome, so let's check out some of those iseed variables...

    julia> iseed_offsets = [0x5d060, 0x5d080, 0x5d0a0, 0x5d0c0]
    julia> iseeds = Ptr{Int64}.(base_offset .+ iseed_offsets)

    julia> unsafe_load.(iseeds)
    4-element Vector{Int64}:
     0
     0
     0
     0

    julia> unsafe_load.(iseeds .+ 8)
    4-element Vector{Int64}:
     0
     0
     0
     0

Okay, so these iseed variables are currently 0, because we haven't
run anything yet. (Internally in `dgetv0` they are used to track
the state of the random number generator... )

    julia> using Arpack; eigs(randn(50,50))
    julia> unsafe_load.(iseeds)

Hmm. That shows the same values. Maybe another set? ARPACK also uses
`first` as some one-time routine initialization tracked with `save`
variables.

    $ nm -nm /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | grep first

    julia> first_offsets="""000000000005c398 (__DATA,__data) non-external _first.3636
    000000000005c3a0 (__DATA,__data) non-external _first.3637
    000000000005c3b0 (__DATA,__data) non-external _first.3635
    000000000005c3b8 (__DATA,__data) non-external _first.3640
    000000000005c3c0 (__DATA,__data) non-external _first.3634
    000000000005c3c8 (__DATA,__data) non-external _first.3639
    000000000005c3d8 (__DATA,__data) non-external _first.3635
    000000000005c3e0 (__DATA,__data) non-external _first.3640
    000000000005c3e8 (__DATA,__data) non-external _first.3634
    000000000005c3f0 (__DATA,__data) non-external _first.3639
    000000000005c400 (__DATA,__data) non-external _first.3636
    000000000005c408 (__DATA,__data) non-external _first.3637
    000000000005c780 (__DATA,__bss3) non-external _first.3633
    000000000005c8d8 (__DATA,__bss3) non-external _first.3632
    000000000005cc08 (__DATA,__bss3) non-external _first.3632
    000000000005cea8 (__DATA,__bss3) non-external _first.3633""" |>
      x->split(x,"\n") |>
      x->map(y->y[1:16],x) |>
      x->parse.(UInt64, x[1:16];base=16)

(Sorry for syntax fancyness, `"0x001" |> x->parse(Int,x)` just creates a little
mini-function and chains a set of function evaluations. Handy for little
sequences like that where I split, then break apart, then parse each.)

    julia> firsts = Ptr{Int64}.(base_offset .+ first_offsets)
    julia> unsafe_load.(firsts)
    16-element Vector{Int64}:
     1
     1
     1
     1
     1
     1
     1
     1
     1
     1
     1
     1
     0
     0
     0
     0

Which means that everything hasn't really been run yet. (This is the
same right after loading the library. The 0's at the end
are there as well.)

So, hmm… that doesn’t seem to work. (Or at least I don’t see changes I expect to see…) But still the function addresses are exactly where I predict them.

Try and check out dissambled routines for help. Learn how to get dissassembled code from objdump

dgleich@circulant test-arpack-state % objdump -D /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | less    

000000000001f2a0 _dgetv0_:
   1f2a0: 41 57                         pushq   %r15
   1f2a2: 41 56                         pushq   %r14
   1f2a4: 49 89 d7                      movq    %rdx, %r15
   1f2a7: 41 55                         pushq   %r13
   1f2a9: 41 54                         pushq   %r12
   1f2ab: 49 89 fd                      movq    %rdi, %r13
   1f2ae: 55                            pushq   %rbp
   1f2af: 53                            pushq   %rbx
   1f2b0: 49 89 f4                      movq    %rsi, %r12
   1f2b3: 4c 89 c3                      movq    %r8, %rbx
   1f2b6: 4d 89 ce                      movq    %r9, %r14
   1f2b9: 48 83 ec 28                   subq    $40, %rsp
   1f2bd: 48 83 3d e3 d0 03 00 00       cmpq    $0, 250083(%rip)
   1f2c5: 74 29                         je      41 <_dgetv0_+0x50>
   1f2c7: 66 0f 6f 05 31 f4 02 00       movdqa  193585(%rip), %xmm0
   1f2cf: 48 c7 05 ce d0 03 00 00 00 00 00      movq    $0, 250062(%rip)
   1f2da: 0f 29 05 9f dd 03 00          movaps  %xmm0, 253343(%rip)
   1f2e1: 66 0f 6f 05 27 f4 02 00       movdqa  193575(%rip), %xmm0
   1f2e9: 0f 29 05 a0 dd 03 00          movaps  %xmm0, 253344(%rip)
   1f2f0: 49 83 7d 00 00                cmpq    $0, (%r13)
   1f2f5: 0f 84 85 02 00 00             je      645 <_dgetv0_+0x2e0>
   1f2fb: 41 0f b6 04 24                movzbl  (%r12), %eax

In case you want a spoiler, it’s cmpq $0, 250083(%rip) that checks “if first == 1” on the first subroutine call and so the address of first is 0x1f2c5 + 250083 (the NEXT instruction offset + %rip offset) so the offset is actually 0x0005c3a8 (just add the numbers and write in hex).

(This is actually off by just a bit because of how Fortran seems to store logicals. There are a few details I don’t fully understand but it’s close enough I’m not worried. )

Next up. Check code on Linux, the julia code I worked on above seems to work there????!??!? Everything I was doing above should really have worked?? Amazing. (Just had to pull new offsets from that libarpack… )
But wait, gosh, why doesn’t this work on the mac I was sitting in front of? Is this some weirdo Mac dlsym loader/protection system that randomizes offsets… and keeps it from working on the mac?
Develop C/C++ code to do the same, verify that the addresses I’m getting for functions seem correct. (Meanwhile, learn how to use lldb…!) Okay, everything lldb seems to be doing is exactly what I think ought to work. (See the C/C++ code below.)
Check that I can use the C++ code and pointer offsets as I’m doing in Julia and on the mac in order to
Learn more details about position independent code and why everything is in terms of offsets from the instruction pointer %rip. (See that disassembled code above for an example)
Start learning about ccall, julia library loading. (Hoping to find some pointer as to why there would be this weirdness… )
Run into this thread on dlsym,dlopen

Double check to see that I’ve gotten the right library loaded…

julia> Base.Libc.dllist()
152-element Vector{String}:
 "/usr/lib/libSystem.B.dylib"
 "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/libjulia.1.6.dylib"
 "/usr/lib/system/libcache.dylib"
 "/usr/lib/system/libcommonCrypto.dylib"
 "/usr/lib/system/libcompiler_rt.dylib"
 "/usr/lib/system/libcopyfile.dylib"
 "/usr/lib/system/libcorecrypto.dylib"
 "/usr/lib/system/libdispatch.dylib"
 "/usr/lib/system/libdyld.dylib"
 "/usr/lib/system/libkeymgr.dylib"
 ⋮
 "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libmbedx509.2.24.0.dylib"
 "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libmbedcrypto.2.24.0.dylib"
 "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libdSFMT.dylib"
 "/System/Library/PrivateFrameworks/DebugSymbols.framework/Versions/A/DebugSymbols"
 "/System/Library/PrivateFrameworks/CoreServicesInternal.framework/Versions/A/CoreServicesInternal"
 "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libstdc++.6.dylib"
 "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libgomp.1.dylib"
 "/Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib"
 "/Users/dgleich/.julia/packages/Arpack/zCmTA/deps/usr/lib/libarpack.2.0.0.dylib"

Oh…
You see that? In the bottom there? There are two libarpacks listed.
I was looking at the wrong one. :(
🤦

Just for kicks now…

dgleich@circulant test-arpack-state % nm -nm /Users/dgleich/.julia/packages/Arpack/zCmTA/deps/usr/lib/libarpack.2.0.0.dylib | grep iseed
000000000005afa0 (__DATA,__bss5) non-external _iseed.3637
000000000005afc0 (__DATA,__bss5) non-external _iseed.3636
000000000005afe0 (__DATA,__bss5) non-external _iseed.3636
000000000005b000 (__DATA,__bss5) non-external _iseed.3637

julia> libar = Base.Libc.dlopen("/Users/dgleich/.julia/packages/Arpack/zCmTA/deps/usr/lib/libarpack.2.0.0.dylib")
julia> offset = Base.Libc.dlsym(libar, "dgetv0_")
Ptr{Nothing} @0x000000017596f730

dgleich@circulant test-arpack-state % nm -nm /Users/dgleich/.julia/packages/Arpack/zCmTA/deps/usr/lib/libarpack.2.0.0.dylib | grep dgetv0
000000000001e730 (__TEXT,__text) external _dgetv0_

julia> base_offset = offset - 0x000000000001e730
Ptr{Nothing} @0x0000000175951000

julia> libar = Base.Libc.dlopen(Arpack_jll.libarpack)
julia> iseed_offsets = [0x5afa0, 0x5afc0, 0x5afe0, 0x5b000]
julia> iseeds = Ptr{Int64}.(base_offset .+ iseed_offsets)

julia> unsafe_load.(iseeds)

julia> unsafe_load.(iseeds)
4-element Vector{Int64}:
    0
 1626
    0
    0

julia> eigs(randn(50,50))
julia>  unsafe_load.(iseeds)
4-element Vector{Int64}:
   0
 561
   0
   0

🤦 indeed. (This took way more time than I would have liked. Approx. a full day.)
Should have trusted the fact that this all worked in Linux, where somehow Arpack is using the same library as Arpack_jll
One other thing I learned. On the macos build of arpack. There is no timing infomation collected. The arscnd function (which should return some time value in seconds) just is a no-op.
```
0000000000008a00 _arscnd_:
    8a00: c7 07 00 00 00 00             movl    $0, (%rdi)
    8a06: c3                            retq
    8a07: 90                            nop
    8a08: 90                            nop
    8a09: 90                            nop
    8a0a: 90                            nop
    8a0b: 90                            nop
    8a0c: 90                            nop
    8a0d: 90                            nop
    8a0e: 90                            nop
    8a0f: 90                            nop
```
(Or rather, this seems to write 0 to the location passed into arscnd!)

I learned this, of course, because one of the thing arpack does is collect timing information to persist between function calls so it can time user-code via the call-backs and such.

So in debugging, I was often looking at locations where time information ought to be stored. Finding it continually empty was … surprising.

Another thing I learned, you can access the common blocks of FORTRAN code via cglobal in Julia.

julia> debug = cglobal(("debug_", "/Users/dgleich/.julia/packages/Arpack/zCmTA/deps/usr/lib/libarpack.2.0.0.dylib"),Int64)
Ptr{Int64} @0x00000001759ac020

Documentation and structure of debug block here. But we can use it to turn on lots of fun ARPACK debug information!

julia> unsafe_store!(debug, 2, 11) # mnaupd turn on debugging for mnaupd (non-symmetric solver)
julia> unsafe_store!(debug, -14, 2) # ndigit - use 14 digits of precision (ndigit)
julia> unsafe_store!(debug, 4, 3) # mgetv0 show lots of information from dgetv0
julia> unsafe_store!(debug, 6, 1) # logfile set logfil to 6, the default stdout

julia> eigs(randn(50,50))

 _getv0: B-norm of initial / restarted starting vector
 -----------------------------------------------------
    1 -    1:      4.1596362767673D+00


 _getv0: initial / restarted starting vector
 -------------------------------------------
    1 -    2:     -8.3153140665625D-01     8.3995344740674D-01
    3 -    4:      9.6704039685513D-01     3.6866117348409D-01
    5 -    6:      6.0284542702433D-03     2.0318894267245D-01
    7 -    8:     -1.3495066933914D-01     4.1078102410386D-01
    9 -   10:      4.3888096465232D-01    -5.7522033107717D-01
   11 -   12:     -8.7293721104508D-01     9.8788590316195D-01
   13 -   14:     -1.2282299641820D-01     7.4369927039981D-01
   15 -   16:      8.7906957254355D-01    -4.7762629697247D-01
   17 -   18:     -4.7162805784843D-02     1.1706962983657D-01
   19 -   20:      7.8492285203887D-01     2.6127819008416D-01
   21 -   22:     -8.2777335418346D-01    -6.5745389045117D-01
   23 -   24:      4.4467420164616D-01     9.9532171436879D-01
   25 -   26:      6.9740537844088D-01    -9.2167903292103D-01
   27 -   28:     -9.1859980666224D-01    -7.1191753019861D-01
   29 -   30:     -1.5899117215995D-01     1.5815438448009D-01
   31 -   32:     -5.1806821604240D-01     3.8970905952163D-01
   33 -   34:     -8.7740030659049D-01    -5.7437967251424D-02
   35 -   36:      7.9785742115958D-01     3.1842249389620D-01
   37 -   38:     -6.1567371152282D-01    -2.1826976841397D-01
   39 -   40:      8.0618226869270D-01     2.7929452805704D-01
   41 -   42:     -7.8813902002933D-01     1.7058205959301D-01
   43 -   44:     -4.0247429259527D-01    -6.0433106862224D-01
   45 -   46:     -4.1118723651662D-01    -3.4363227883103D-01
   47 -   48:     -4.4783519455385D-01     2.1286373092316D-02
   49 -   50:      2.9004800999430D-01     4.4388758381478D-01


 _naupd: Number of update iterations taken
 -----------------------------------------
    1 -    1:              17


 _naupd: Number of wanted "converged" Ritz values
 ------------------------------------------------
    1 -    1:               6


 _naupd: Real part of the final Ritz values
 ------------------------------------------
    1 -    2:     -5.7794067256680D+00    -5.7794067256680D+00
    3 -    4:     -7.2262688525223D+00     6.8333693387731D+00
    5 -    6:      6.8333693387731D+00     7.6197283464233D+00


 _naupd: Imaginary part of the final Ritz values
 -----------------------------------------------
    1 -    2:      2.7424220041744D+00    -2.7424220041744D+00
    3 -    4:      0.0000000000000D+00     2.3888574879058D+00
    5 -    6:     -2.3888574879058D+00     0.0000000000000D+00


 _naupd: Associated Ritz estimates
 ---------------------------------
    1 -    2:      5.0738750651244D-16     5.0738750651244D-16
    3 -    4:      2.5131234631038D-25     1.2315725733964D-30
    5 -    6:      1.2315725733964D-30     1.0152776765181D-35



     =============================================
     = Nonsymmetric implicit Arnoldi update code =
     = Version Number:  2.4                      =
     = Version Date:    07/31/96                 =
     =============================================
     = Summary of timing statistics              =
     =============================================


     Total number update iterations             =    17
     Total number of OP*x operations            =   206
     Total number of B*x operations             =     0
     Total number of reorthogonalization steps  =    35
     Total number of iterative refinement steps =     0
     Total number of restart steps              =     0
     Total time in user OP*x operation          =     0.000000
     Total time in user B*x operation           =     0.000000
     Total time in Arnoldi update routine       =     0.000000
     Total time in naup2 routine                =     0.000000
     Total time in basic Arnoldi iteration loop =     0.000000
     Total time in reorthogonalization phase    =     0.000000
     Total time in (re)start vector generation  =     0.000000
     Total time in Hessenberg eig. subproblem   =     0.000000
     Total time in getting the shifts           =     0.000000
     Total time in applying the shifts          =     0.000000
     Total time in convergence testing          =     0.000000
     Total time in computing final Ritz vectors =     0.000000

Other references - https://github.com/conan-io/conan-center-index/issues/4696 - https://stackoverflow.com/questions/22769246/how-to-disassemble-one-single-function-using-objdump - https://eli.thegreenplace.net/2011/11/11/position-independent-code-pic-in-shared-libraries-on-x64 - https://github.com/conan-io/conan-center-index/issues/4696

C/C++ Code to check

Compile with (edit to your own paths)

clang -g -fPIC test-arpack-state-dlopen.c \
-Wl,-rpath /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib \
-Wl,-rpath /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia \
    -o mytest-dlopen

And here’s the code. This just calls ARPACK once to see what happens. In this case, I figured out exactly which offset to first to use (it was the first one, hah!)

/* test_arpack_state */
#include <inttypes.h>
typedef int64_t blasint;
typedef void (*dsaupd_ptr)(
  blasint* ido,
  char* bmat,
  blasint* n,
  char* whichstr,
  blasint* nev,
  double* tol,
  double* resid,
  blasint* ncv,
  double* v,
  blasint* ldv,
  blasint* iparam,
  blasint* ipntr,
  double* workd,
  double* workl,
  blasint* lworkl,
  blasint* info);

#include <printf.h>
#include <dlfcn.h>

int main(int argc, char** argv) {

  blasint nev = 3;
  double tol = -1.0;
  blasint maxiter = 500;

  blasint n = 25;
  double v0[25] = {0};

  blasint ncv = 20;
  double v[25*20];
  double workd[25*3];
  double workl[20*(20+8)];
  double resid[25];
  blasint lworkl = 20*(20+8);

  blasint info = 0;

  blasint iparam[11] = {0};
  blasint ipntr[11] = {0};
  blasint ido = 0;

  iparam[0] = 1;
  iparam[2] = maxiter;
  iparam[6] = 1;

  char bmat[] = "I";
  char whichstr[] = "SA";

  blasint ldv = 25;

  void* libar = dlopen("/Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib", RTLD_LAZY|RTLD_GLOBAL);

  if (libar) {
    printf("Loaded libarpack\n");
  }

  dsaupd_ptr pfun = dlsym(libar, "dsaupd_");

  if (pfun) {
    printf("Loaded dsaupd_\n");
  }

  void* base_offset = pfun - 0x000000000002acb0; // offset from objdump.
  double* dsaupd_t0 = base_offset + 0x000000000005c604;  // offset from objdump.
  blasint* logfil = base_offset + 0x000000000005c420;
  blasint* ndig = base_offset + 0x000000000005c420+8;
  blasint* mgetv0 = base_offset + 0x000000000005c420+16;
  blasint* msaupd = base_offset + 0x000000000005c420+24;
  *mgetv0 = 2;
  *msaupd = 2;
  blasint* dgetv0_first1 = base_offset + 0x000000000005c3a0+8;
  blasint* dgetv0_first2 = base_offset + 0x000000000005c408+8;
  printf("logfil = %i\n", (int)*logfil);
  printf("ndigit = %i\n", (int)*ndig);
  printf("dgetv0_first1 = %i\n", (int)*dgetv0_first1);
  printf("dgetv0_first2 = %i\n", (int)*dgetv0_first2);

  pfun(&ido, bmat, &n, whichstr, &nev, &tol, resid, &ncv, v, &ldv, iparam,
    ipntr, workd, workl, &lworkl, &info );

  printf("base_offset = %p\n", base_offset);
  printf("dsapud = %p\n", pfun);
  printf("dsapud_t0 = %p\n", dsaupd_t0);
  printf("info = %i\n", (int)info);
  printf("ido = %i\n", (int)ido);
  printf("dsaupd_t0 = %lf\n", (double)*dsaupd_t0);

  printf("dgetv0_first1 = %i\n", (int)*dgetv0_first1);
  printf("dgetv0_first2 = %i\n", (int)*dgetv0_first2);

  //dlclose(libar);

  return 0;
}

"In-place" sorting allocates half-sized temporary arrays in Julia for non-numeric types.
May 27, 2021
I hope my confusion at the difference between the memory allocation behavior of the following codes can be forgiven.
```
julia> using BenchmarkTools
julia> @btime sort!(v) setup=(v=randn(100));
  268.868 ns (0 allocations: 0 bytes)
julia> @btime sort!(v) setup=(v=tuple.(randn(100)));
  714.639 ns (2 allocations: 528 bytes)
```
Julia often uses the ! symbol on a function to indicate “in-place” but this also is usually a hint that it’s “non-allocating” – which is great if I want to sort things in place without extra memory allocation. (See my previous post.)

(Strictly speaking, this ! simply means ‘mutating’ which means that the function changes the input arguments.)

Neither of these is a contract. (e.g. something in-place might be allocating/non-allocating depending on the types of array element.)

The documentation of sort! simple mentions in-place Sort the vector v in place.

However, in this case, tuple elements can be moved around without allocating. So I, for one, would expect the codes to share the same amount of memory traffic and allocations.

After much debugging, what I found is that julia defaults to QuickSort for numeric types, which is strictly non-allocating whereas it uses MergeSort for non-numeric types. (Yes, the change in algorithms is in the documentation for sort!, yay! What isn’t there is the implication on memory allocations.)

julia> @btime sort!(v;alg=QuickSort) setup=(v=tuple.(randn(100))); 947.789 ns (0 allocations: 0 bytes)

Moreover, this behavior depends only on the element type of the vector, not on the actual values sorted.
```
julia> @btime sort!(v,by=first) setup=(v=tuple.(randn(100)));
  790.624 ns (2 allocations: 528 bytes)
julia> @btime sort!(v;alg=QuickSort,by=first) setup=(v=tuple.(randn(100)));
  947.789 ns (0 allocations: 0 bytes)
```
(In this case, by=first means we should use the first element, which means Julia is conceptually sorting a numeric type.)

I think this particular example breaks an implicit user contract. By calling sort! I’m asking Julia not to allocate extra memory in its algorithm. To have a default algorithm choice that will allocate is wrong from a working mental model point of view. Moreover, if this isn’t the case, then the docs for sort! ought to be very strongly worded that sort! can only be recommended for numeric types unless you change the default algorithm. (But then why is that the default choice for sort.)

Finally, to the extend that the algorithm choice is influenced by the element type, it should apply to the actual element type used, not merely the element-type in memory!

I realize these are tricky issues. e.g. MergeSort is tough to have in0place

Many would be likely to say “premature optimization”, why am I fussed about this difference?

Here’s the issue. I need a mental model for how code is translated or I can’t possibly ever be concerned with “optimization” and I must give up. In this case, a simplified mental model of what the tool / language is doing breaks. So making sure the “compiler / language / runtime” engage my mental model for what his happening – when I’m trying to design it for that setting is not premature optimization, it’s simply part of the spec!

Moreover, it seems this issue has come up already more discussion
A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 6 - Sorting Take 1
May 26, 2021
Next up on our list of routines to convert is the sorting routine dsortr. Again, this could be replaced with internal Julia stuff. However, this sorting routine does something that shows up quite often in practice. You want to sort array x, and take array y along for the ride!

The pseudocode is simple:
```
  function sort_permute(x,y)
     p = sortperm(x)
     return x[p], y[p]
  end
  a = [5, 8, 9, 1, 4, 3, 2]
  b = [3., 4., 5., 2., 1., 9., 8.]
  sort_permute(a,b)
  ([1, 2, 3, 4, 5, 8, 9], [2.0, 8.0, 9.0, 1.0, 3.0, 4.0, 5.0])
```
This is a recurring pattern. Here’s a link to a question on stack overflow.. That post references an old blog post (updated to Wayback machine). And hey, I think I know that fellow. Good to know that things haven’t changed too much in 15 years! (And that I’m still working on the same old stuff… hmm… maybe that isn’t so good.)

So let’s solve this problem in Julia again. Or wait, let’s not. Off the top of my head, I don’t know how to do this in Julia without allocating the extra permutation vector. After glancing around sort.jl in base, it seems like you could probably do this by … something complicated by creating a new array type that wraps the pair of arrays.

Uh oh, the problem with having an idea…
```
struct SortPermutePair{U,V} <: AbstractArray{Tuple{U,V},1}
  x::Vector{U}
  y::Vector{V}
end
```
… trying to resist urge to test it …
```
Base.size(z::SortPermutePair{U,V}) where {U,V} = (min(length(z.x),length(z.y)),)
Base.getindex(z::SortPermutePair{U,V}, i::Integer) where {U,V} = (z.x[i], z.y[i])
Base.setindex!(z::SortPermutePair{U,V}, v::Tuple, i::Integer) where {U,V} = begin z.x[i] = v[1];  z.y[i] = v[2] end
```
… yeah, not gonna work and we’ll have to see how far I get here. Let’s see how to do this in Julia before actually porting the ARPACK routine.
```
a = [5, 8, 9, 1, 4, 3, 2]
b = [3., 4., 5., 2., 1., 9., 8.]
sort!(SortPermutePair(a,b), by=x->x[1])
```
So that wasn’t so bad! No extra memory allocated… (right?!?) This shouldn’t allocate extra memory…
```
a = [5, 8, 9, 1, 4, 3, 2]
b = [3., 4., 5., 2., 1., 9., 8.]
@time sort!(SortPermutePair(a,b), by=x->x[1]);
@time sort!(SortPermutePair(a,b), by=x->x[1]);

julia> @time sort!(SortPermutePair(a,b), by=x->x[1]);
  0.050316 seconds (58.15 k allocations: 3.460 MiB, 99.04% compilation time)

julia> @time sort!(SortPermutePair(a,b), by=x->x[1]);
  0.050027 seconds (58.15 k allocations: 3.460 MiB, 99.08% compilation time)
```
Hmm. Well, that’ll have to wait for a future blog post to debug. My guess is that I need type annotations in all the functions, which is something we’ll return to in a future post. (Okay – I had to at least try @btime…) I’m very bad at tangents here! That’s the fun of this post series. I’m going to go into all of those and not edit them out.
```
using BenchmarkTools
@btime sort!(SortPermutePair($a,$b), by=x->x[1]); setup=begin
  a=[5, 8, 9, 1, 4, 3, 2]
  b=[3., 4., 5., 2., 1., 9., 8.]
end
```
This gives
```
52.661 ns (1 allocation: 80 bytes)
7-element Vector{Float64}:
...
```
Well, that is slightly better. I’m not sure why we see one allocation. This should be doable with zero allocations. (I think it might somewhere be allocating that last output vector???)
```
@btime sort!(a); setup=(a=randn(10))
  44.800 ns (0 allocations: 0 bytes)
```
Maybe it’s a function boundary thing?
```
function sort_permute!(x,y;kwargs...)
  sort!(SortPermutePair(x,y);kwargs...,by=first)
  x,y
end   

julia> @time sort_permute!(a,b)
  0.000006 seconds (2 allocations: 112 bytes)
([1, 2, 3, 4, 5, 8, 9], [2.0, 8.0, 9.0, 1.0, 3.0, 4.0, 5.0])   
```
And this clearly has a problem with allocating extra memory somewhere
```
julia> a = randn(10^5); b=randn(10^5); @time sort_permute!(a,b)
  0.014885 seconds (3 allocations: 781.422 KiB)   
```
And now you have one of my least favorite things in Julia. No mental model for where/how these decisions/impacts get made. This all is designed to be allocated on the stack, so there should be no need to go into the weeds. Anyway, a few iterations later, we end up with one fix. Using that new function x->x[1] is what causes the huge hit to the compiler time.
```
# this occurs no matter how many times you run it as I'm running from
# the repl. as it seems to need to precompile this function x->x[1] everytime.
julia> @time sort!(SortPermutePair(a,b), by=x->x[1]);
  0.050027 seconds (58.15 k allocations: 3.460 MiB, 99.08% compilation time)

julia> @time sort!(SortPermutePair(a,b), by=first);
  0.004317 seconds (152 allocations: 7.844 KiB, 99.72% compilation time)

julia> @time sort!(SortPermutePair(a,b), by=first);
  0.000011 seconds (3 allocations: 144 bytes)
```
Big long diversion before we get to the actual Arpack Fortran sorting routine I wanted to get to. That’ll have to wait for another day! So where are we. We demonstrated it’s fairly easy to do this in Julia although there is one allocation I’d like to get rid of that still seems to be there. Maybe I should use the code linter or something like that. (We’ll see track allocations in a future post.)
A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 5 - Understanding Ref
May 18, 2021
Here, we pick up with debugging the call to dsconv within the ARPACK library itself from the last post. This is when the repo was at this commit

When we run the tests against the Arpack wrapped function directly, it errors.
```
using Pkg; Pkg.test("ArpackInJulia"; test_args=["arpackjll"])
```
In particular, the following code always returns “-1”
```
import Arpack_jll, LinearAlgebra

function arpack_dsconv(n::Int, ritz::Vector{Float64}, bounds::Vector{Float64},
                      tol::Float64)
  nconv::LinearAlgebra.BlasInt = -1
  ccall((:dsconv_, Arpack_jll.libarpack), Cvoid,
    (Ref{LinearAlgebra.BlasInt},
     Ptr{Float64},
     Ptr{Float64},
     Ref{Float64},
     Ref{LinearAlgebra.BlasInt}),
    n, ritz, bounds, tol, nconv)
  return nconv
end
```
Ugh. This is tough to debug. It seems like nconv always isn’t getting updated. One of my common refrains is that problems occur at boundaries between code. Interfaces are very hard to get right because slight differences in models can manifest as bugs or just outright incorrect answers. This is exactly what we are seeing here.

Now, I could just jump to the solution, which is blindingly obvious in retrospect. On the other hand, there’s value in walking through the process I used. (Or so I tell myself.)

Given that the code above doesn’t work, what were my working hypotheses about the causes?
- So maybe I don’t understand how the fortran calling stuff works and I shouldn’t have expected this to get updated.
- Maybe Julia is doing something I don’t understand?
- There is some odd interaction for this particular piece of code?
- The Fortran / Arpack code doesn’t do what I think it does.
Well, the last one is the the most important as everything follows from my assumption the fortran code works. So that seems like a good place to start. (In truth, I did futz around with some of the others, but then quickly settled on this one.)

Step 1. Do something you know or strongly suspect will work.

Arpack is designed to work with C and C++. There are just fewer places for bugs to hide there. So I wanted to make sure I could call this from C and get the response I expected.

This gave me the following C program.
```
#include <inttypes.h>
typedef int64_t blasint;
void dsconv_(blasint* n, double* ritz, double* bounds, double* tol, blasint *nconv);

#include <printf.h>

int main(int argc, char** argv) {
  double ritz[] = {1.0, 1.0, 1.0};
  double bounds[] = {0.0, 0.0, 0.0};
  double tol = 1e-8;
  blasint n = 3;
  blasint nconv = 4;

  dsconv_(&n, ritz, bounds, &tol, &nconv);

  printf("nconv = %i\n", (int)nconv);
  return 0;
}
```
Which I saved into a file: test_dsconv.c. This should print 3 if I’ve understood this correctly. (And the other Julia code is correct.)

I wanted to compile this against the Julia libraries. Here’s how I got it working. Get the path to libarpack from Julia
```
@show Arpack_jll.libarpack_path
/Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib    
```
(Ugh, security breach… yes, my local username is dgleich. Entirely surprising!) After much fussing with library paths, etc. here’s how to get this all compiled.
```
gcc test_dsconv.c \
  /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib \
  -Wl,-rpath /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/ \
  -Wl,-rpath /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia
```
Let’s pick this apart.
```
gcc test_dsconv.c
# above is the call to compile this piece of code

/Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib
# above is telling it we want to use this library.

-Wl,-rpath /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/
# above is telling gcc to past an option to the linker command. Specifically
# rpath means where to get runtime libraries ("runtime path")
# this can also be set with LD_LIBRARY_PATH, but that's... controversial
```
Let’s try without the last link to Julia’s libraries to see what happens!
```
dgleich@circulant test_dsconv %     gcc test_dsconv.c \
  /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib \
  -Wl,-rpath /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/ \

dgleich@circulant test_dsconv % ./a.out
dyld: Library not loaded: @rpath/libopenblas64_.dylib
Referenced from: /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib
Reason: image not found
zsh: abort      ./a.out
```
Which is because it can’t find libopenblas, referenced by libarpack. So we need it tell it about all the other common Julia libraries. That’s what the final
```
-Wl,-rpath /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia
# tell it about Julia's libraries.
```
And now it works great.
```
dgleich@circulant test_dsconv % gcc test_dsconv.c \
      /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib \
      -Wl,-rpath /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/ \
      -Wl,-rpath /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia
dgleich@circulant test_dsconv % ./a.out
nconv = 3
```
Okay, so my understanding of the Arpack function was correct. Ugh. That means I need to learn something about Julia.

Step 2: Better understanding Julia.

Okay, let’s go back to Julia because what we are doing ought to work. Here’s where @code_native and some hints about assembly code come in handy.
```
julia> @code_native arpack_dsconv(3, ones(3), zeros(3), 1e-8)
    .section    __TEXT,__text,regular,pure_instructions
; ┌ @ none:1 within `arpack_dsconv'
    subq    $40, %rsp
; │ @ none:4 within `arpack_dsconv'
; │┌ @ essentials.jl:396 within `cconvert'
; ││┌ @ refpointer.jl:104 within `convert'
; │││┌ @ refvalue.jl:8 within `RefValue'
    movq    %rdi, 32(%rsp)
    vmovsd  %xmm0, 16(%rsp)
    movq    $-1, (%rsp)
; │└└└
; │┌ @ pointer.jl:65 within `unsafe_convert'
    movq    (%rsi), %rsi
    movq    (%rdx), %rdx
    leaq    32(%rsp), %rdi
    leaq    16(%rsp), %rcx
    movq    %rsp, %r8
    movabsq $dsconv_, %rax
; │└
    callq   *%rax
; │ @ none:11 within `arpack_dsconv'
    movq    $-1, %rax
    addq    $40, %rsp
    retq
    nopw    %cs:(%rax,%rax)
; └
```
If you look at this code long enough and google around about how amd64 assembly works (or x86_64 if you prefer, although it’s really amd64…) then the return value of the function is in register rax. This code
```
movq    $-1, %rax
```
sets the return register to -1 right before returning (retq)

If you want to learn more about calling conventions and register conventions on amd64, see this powerpoint from cs217 at Princeton

Okay, so what’s happening is that we are just always returning -1 independently of the function call. This means that the Julia compiler is treating the value of nconv as a constant and optimizing the code.

Another debugging tenant is try things you think will likely work. Here’s something I strongly suspect will work. And just for fun, I tried the following code.
```
import Arpack_jll, LinearAlgebra

function arpack_dsconv(n::Int, ritz::Vector{Float64}, bounds::Vector{Float64},
                      tol::Float64)
  nconv::Vector{LinearAlgebra.BlasInt} = [-1]
  ccall((:dsconv_, Arpack_jll.libarpack), Cvoid,
    (Ref{LinearAlgebra.BlasInt},
     Ptr{Float64},
     Ptr{Float64},
     Ref{Float64},
     Ref{LinearAlgebra.BlasInt}),
    n, ritz, bounds, tol, nconv)
  return nconv[1]
end
```
Why do I think this will work? Because I’m passing an array and then explicitly looking at an entry. This is what I want to do, just without the array allocation that creating the vector causes. (Julia’s memory allocater is good, but not that good.)
```
julia> arpack_dsconv(3, ones(3), zeros(3), 1e-8)
1-element Vector{Int64}:
 3
```
Hmm… so I need to learn more about Ref to understand why the other call didn’t work.

This line stuck out to me.

In Julia, Ref objects are dereferenced (loaded or stored) with [].

Anyway, at this point, I realized my mental model was wrong and got to …

The fix

Now, here’s the fix. The issue was the ccall will auto wrap an intrinsic type (i.e. int, float, etc.) with a Ref if you tell it the ref type. But this doesn’t refer to the original value, it makes a new copy of it and refers to that. So here’s working code.
```
import Arpack_jll, LinearAlgebra

function arpack_dsconv(n::Int, ritz::Vector{Float64}, bounds::Vector{Float64},
                      tol::Float64)
  nconv = Ref{LinearAlgebra.BlasInt}(-1)
  ccall((:dsconv_, Arpack_jll.libarpack), Cvoid,
    (Ref{LinearAlgebra.BlasInt},
     Ptr{Float64},
     Ptr{Float64},
     Ref{Float64},
     Ref{LinearAlgebra.BlasInt}),
    n, ritz, bounds, tol, nconv)
  return nconv[]
end
```
The error in my mental model

As I said above, bugs exist on boundaries. Here’s the origin of this bug and my incorrect mental model of the code.

My mental model based on C/C++
```
nconv::Int = -1
ccall((:func, libfunc), Cvoid, (Ref{Int}), nconv) # <-> func(&nconv) in C
```
But that’s not right! Here’s what the code actually does.
```
nconv::Int = -1
ccall((:func, libfunc), Cvoid, (Ref{Int}), nconv) # <-> int newint = nconv; func(&int(newint)) in C
```
Which of course explains why nconv isn’t updated. It wasn’t even passed in!

Suggestion
- Add a note to Ref documentation on this
- Add a note to ccall documentation on this
Is it worth it for one person with an incorrect mental model? I think yes because the ccall area is likely to be a source of similar bugs and it’s needed to document the assumptions! This is already partially done in this paragraph

When passed as a ccall argument (either as a Ptr or Ref type), a Ref object will be converted to a native pointer to the data it references. For most T, or when converted to a Ptr{Cvoid}, this is a pointer to the object data. When T is an isbits type, this value may be safely mutated, otherwise mutation is strictly undefined behavior.
A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 4 - Checking against libarpack
May 13, 2021
See Overview for a high-level picture, also see ArpackInJulia.jl on Github

One of the things I wanted to do was check against the true ARPACK functions. The idea is that we can swap out true ARPACK routines for our routines easily. So I wanted to be able to call the true ARPACK routines individually myself.

This is remarkably easy and how Arpack.jl works with the functions aupd and eupd in ARPACK to implement eigs. So this isn’t surprisingly. It’s still fun to see that ease of use translate into new codes.

The idea is that Arpack.jl uses Arpack_jll to access the compiled FORTRAN code as a dynamic library. All the binary compiling, etc. is handled by Arpack_jll and Arpack.jl can just use that. So we are going to
```
import Arpack_jll
```
so we can get access to
```
@show Arpack_jll.libarpack
```
which gives the path to the ARPACK library in a system independent fashion. (At some point, this will work on the Apple M1 too, but not yet.) [Probably will be about the same time as I finish posting these notes though and getting all of the routines ported.]

Once we have that, we can simply setup a ccall to get what we need! See the blas.jl functions for many examples of how to do this. Also see Arpack.jl for even more examples that are specific to ARPACK.
```
import Arpack_jll, LinearAlgebra

function arpack_dsconv(n::Int, ritz::Vector{Float64}, bounds::Vector{Float64},
                      tol::Float64)
  nconv::Int = 0
  ccall((:dsconv_, Arpack_jll.libarpack), Cvoid,
    (Ref{LinearAlgebra.BlasInt},
     Ptr{Float64},
     Ptr{Float64},
     Ref{Float64},
     Ref{LinearAlgebra.BlasInt}),
    n, ritz, bounds, tol, nconv)
  return nconv
end
```
Of course, I only want to depend on Arpack_jll in the test routines. (Actually, I’d really only like to depend on it in a subset of the test routines. But I don’t see how to do that right now and it isn’t relevant.)

Put another way, I want to enable the tests to run without Arpack_jll working on a particular platform. And where we don’t run the tests that need it.

To control which tests are run, we can check ARGS as described in this pull request https://github.com/JuliaLang/Pkg.jl/pull/1226 where this functionality was implemented in Julia.
```
if "arpackjll" in ARGS
  include("arpackjll.jl") # get the helpful routines to call stuff in "arpackjll"
  @testset "arpackjll" begin
    # tests that check against libarpack directly
  end
end    
```
So I stuck the call to dsconv into arpackjll.jl
```
import Arpack_jll, LinearAlgebra

function arpack_dsconv(n::Int, ritz::Vector{Float64}, bounds::Vector{Float64},
                      tol::Float64)
  nconv::LinearAlgebra.BlasInt = -1
  ccall((:dsconv_, Arpack_jll.libarpack), Cvoid,
    (Ref{LinearAlgebra.BlasInt},
     Ptr{Float64},
     Ptr{Float64},
     Ref{Float64},
     Ref{LinearAlgebra.BlasInt}),
    n, ritz, bounds, tol, nconv)
  return nconv
end
```
This does require me to add Arpack_jll and LinearAlgebra to the Test package itself. We can do this by activating the test package and then adding them.
```
# change to the test subdirectory
] activate .
add LinearAlgebra
add Arpack_jll
```
And our full piece of new test code
```
if "arpackjll" in ARGS
  include("arpackjll.jl") # get the helpful routines to call stuff in "arpackjll"
  @testset "arpackjll" begin
    @testset "dsconv" begin
      soln = arpack_dsconv(10, ones(10), zeros(10), 1e-8)
      [@test](https://micro.blog/test) ArpackInJulia.dsconv(10, ones(10), zeros(10), 1e-8)==soln

      soln = arpack_dsconv(10, zeros(10), ones(10), 1e-8)
      [@test](https://micro.blog/test) ArpackInJulia.dsconv(10, zeros(10), ones(10), 1e-8)==soln
    end
  end
end
```
This flips ones and zeros, which should show converged in one of the cases.

And what’s weird, it fails. I always get back arpack_dsconv==-1. Hmm… will debug in the next post!
May 11, 2021

A funny thing in @JuliaLanguage. I often illustrate that finite differences in floating point are inaccurate with small changes. But this Julia code works and it’s fun to understand why!

xx=range(0,1,length=100) F(x) = 5.0.*x .+ 1/3 h = eps(1.0) z = (F(xx.+h)-F(xx))./h

A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 3 - First routine

Apr 29, 2021

See Overview for a high-level picture, also see ArpackInJulia.jl on Github

We are going to start our ARPACK conversion with about the simplest routine: dsconv. This routine for a double-precision, symmetric eigenvalue problem just checks if the eigenvalue estimates have converged at all.

  c-----------------------------------------------------------------------
  c\BeginDoc
  c
  c\Name: dsconv
  c
  c\Description:
  c  Convergence testing for the symmetric Arnoldi eigenvalue routine.
  c
  c\Usage:
  c  call dsconv
  c     ( N, RITZ, BOUNDS, TOL, NCONV )
  c
  c\Arguments
  c  N       Integer.  (INPUT)
  c          Number of Ritz values to check for convergence.
  c
  c  RITZ    Double precision array of length N.  (INPUT)
  c          The Ritz values to be checked for convergence.
  c
  c  BOUNDS  Double precision array of length N.  (INPUT)
  c          Ritz estimates associated with the Ritz values in RITZ.
  c
  c  TOL     Double precision scalar.  (INPUT)
  c          Desired relative accuracy for a Ritz value to be considered
  c          "converged".
  c
  c  NCONV   Integer scalar.  (OUTPUT)
  c          Number of "converged" Ritz values.
  c
  c\EndDoc
  c
  c-----------------------------------------------------------------------
  c
  c\BeginLib
  c
  c\Routines called:
  c     arscnd  ARPACK utility routine for timing.
  c     dlamch  LAPACK routine that determines machine constants.
  c
  c\Author
  c     Danny Sorensen               Phuong Vu
  c     Richard Lehoucq              CRPC / Rice University
  c     Dept. of Computational &     Houston, Texas
  c     Applied Mathematics
  c     Rice University
  c     Houston, Texas
  c
  c\SCCS Information: @(#)
  c FILE: sconv.F   SID: 2.4   DATE OF SID: 4/19/96   RELEASE: 2
  c
  c\Remarks
  c     1. Starting with version 2.4, this routine no longer uses the
  c        Parlett strategy using the gap conditions.
  c
  c\EndLib
  c
  c-----------------------------------------------------------------------
  c
        subroutine dsconv (n, ritz, bounds, tol, nconv)
  c
  c     %----------------------------------------------------%
  c     | Include files for debugging and timing information |
  c     %----------------------------------------------------%
  c
        include   'debug.h'
        include   'stat.h'
  c
  c     %------------------%
  c     | Scalar Arguments |
  c     %------------------%
  c
        integer    n, nconv
        Double precision
       &           tol
  c
  c     %-----------------%
  c     | Array Arguments |
  c     %-----------------%
  c
        Double precision
       &           ritz(n), bounds(n)
  c
  c     %---------------%
  c     | Local Scalars |
  c     %---------------%
  c
        integer    i
        Double precision
       &           temp, eps23
  c
  c     %-------------------%
  c     | External routines |
  c     %-------------------%
  c
        Double precision
       &           dlamch
        external   dlamch

  c     %---------------------%
  c     | Intrinsic Functions |
  c     %---------------------%
  c
        intrinsic    abs
  c
  c     %-----------------------%
  c     | Executable Statements |
  c     %-----------------------%
  c
        call arscnd (t0)
  c
        eps23 = dlamch('Epsilon-Machine')
        eps23 = eps23**(2.0D+0 / 3.0D+0)
  c
        nconv  = 0
        do 10 i = 1, n
  c
  c        %-----------------------------------------------------%
  c        | The i-th Ritz value is considered "converged"       |
  c        | when: bounds(i) .le. TOL*max(eps23, abs(ritz(i)))   |
  c        %-----------------------------------------------------%
  c
           temp = max( eps23, abs(ritz(i)) )
           if ( bounds(i) .le. tol*temp ) then
              nconv = nconv + 1
           end if
  c
     10 continue
  c
        call arscnd (t1)
        tsconv = tsconv + (t1 - t0)
  c
        return
  c
  c     %---------------%
  c     | End of dsconv |
  c     %---------------%
  c
        end

In Julia, this routine is basically a two-liner (could likely do in one) that computes:

  eps23 = eps(1.0)^(2/3)
  return sum(
        (ritzi, boundsi)->boundsi <= max(eps23, abs(ritzi))*tol ? 1 : 0,
        zip(ritz, bounds))

(Full disclosure, I didn’t test this, so there could be a subtle error, but I hope the idea is clear that this COULD be done very easily.)

On the other hand, the idea was to port this as directly as possible, so let’s write out the loops, etc.

  function dsconv(
              n::Int,
              ritz::Vector{Float64},
              bounds::Vector{Float64},
              tol::Float64
    )
    eps23::Float64 = eps(1.0)^(2.0/3.0)
    nconv::Int = 0
    if n > length(ritz)
      throw(ArgumentError("range 1:n=$n out of bounds for ritz, of length $(length(ritz))"))
    end
    if n > length(bounds)
      throw(ArgumentError("range 1:n=$n out of bounds for bounds, of length $(length(bounds))"))
    end
    @inbounds for i=1:n
      tmp = max( eps23, abs(ritz[i]) )
      if ( bounds[i] <= tol*tmp )
        nconv += 1
      end
    end

    return nconv
  end

Here, we have borrowed some of the out of bounds reporting from blas.jl in order to try and be somewhat standard.

Of course, the issue is that the ARPACK code also includes some cool timing information

  c
  c     %----------------------------------------------------%
  c     | Include files for debugging and timing information |
  c     %----------------------------------------------------%
  c
        include   'debug.h'
        include   'stat.h'
  call arscnd (t0)
  c ... many lines ...
  call arscnd (t1)
  tsconv = tsconv + (t1 - t0)

The function arscnd is the ARPACK second function, which gives the current time in seconds. So this simply tracks the seconds used in this function.

To store the information, this code uses the Fortran common block, which looks like a set of global variables, and update them based on the time used in this routine. In the port, I’d like to have this functionality as well. However while the the FORTRAN common block could be replaced with a set of global variables, I wanted to use a more modern construct enabled by Julia’s macros that will give thread-safety and also compiler-optimized code removal when not needed.

Here’s a better design. Take in a keyword argument `timing’ that is a julia structure.

  Base.@kwdef struct ArpackStats
    tconv::Float64 = 0.0 # the time sp
    # ... there will be more!
  end

We use Base.@kwdef to auto-initialize all the times to zero so we don’t have to write a construction and can just call ArpackStats(). (Check out help on Base.@kwdef for more!)

Then we can simply use the following construct to update time:

  function dsconv(
              n::Int,
              ritz::Vector{Float64},
              bounds::Vector{Float64},
              tol::Float64;
              stats=nothing
   )
   t0 = timing != nothing ? time() : 0.0 ## NEW
   eps23::Float64 = eps(1.0)^(2.0/3.0)
   nconv::Int = 0
   if n > length(ritz)
      throw(ArgumentError("range 1:n=$n out of bounds for ritz, of length $(length(ritz))"))
   end
   if n > length(bounds)
      throw(ArgumentError("range 1:n=$n out of bounds for bounds, of length $(length(bounds))"))
   end
   @inbounds for i=1:n
      tmp = max( eps23, abs(ritz[i]) )
      if ( bounds[i] <= tol*tmp )
        nconv += 1
      end
   end
   if stats != nothing; stats.tconv += time() - t0; end ## NEW
   return nconv
  end

Of course, these if statements get a bit tedious. So let’s use Julia’s macros to handle this more seemlessly to get what I want to write.

  """
  Convergence testing for the symmetric Arnoldi eigenvalue routine.

  In Julia, this can be replaced with
  sum( )

  Julia port of ARPACK dsconv routine.
  Usage:
  call dsconv
  ( N, RITZ, BOUNDS, TOL; debug, stats )

  Arguments
  N       Integer.  (INPUT)
     Number of Ritz values to check for convergence.

  RITZ    Double precision array of length N.  (INPUT)
     The Ritz values to be checked for convergence.

  BOUNDS  Double precision array of length N.  (INPUT)
     Ritz estimates associated with the Ritz values in RITZ.

  TOL     Double precision scalar.  (INPUT)
     Desired relative accuracy for a Ritz value to be considered
     "converged".
  Optional values (they default to nothing)
  debug (INPUT/OUTPUT) nothing or an ArpackStats struct
  stats (INPUT/OUTPUT) nothing or an ArpackStats struct
  These are used to collect debugging and performance stats and are
  mostly internal.


  Return value
  NCONV   Integer scalar.  (RETURN)
     Number of "converged" Ritz values.
  """
  function dsconv(
        n::Int,
        ritz::Vector{Float64},
        bounds::Vector{Float64},
        tol::Float64;
        stats=nothing,
        debug=nothing
  )
  #=
  c\Author
  c     Danny Sorensen               Phuong Vu
  c     Richard Lehoucq              CRPC / Rice University
  c     Dept. of Computational &     Houston, Texas
  c     Applied Mathematics
  c     Rice University
  c     Houston, Texas

  Julia port by David F. Gleich, Purdue University
  =#

  #=
  c     | Executable Statements |
  =#
  t0 = @jl_arpack_time

  if n > length(ritz)
    throw(ArgumentError("range 1:n=$n out of bounds for ritz, of length $(length(ritz))"))
  end
  if n > length(bounds)
    throw(ArgumentError("range 1:n=$n out of bounds for bounds, of length $(length(bounds))"))
  end

  eps23::Float64 = eps(1.0)^(2.0/3.0)
  nconv::Int = 0
  @inbounds for i=1:n
    #=
    c        | The i-th Ritz value is considered "converged"       |
    c        | when: bounds(i) .le. TOL*max(eps23, abs(ritz(i)))   |
    =#
    tmp = max( eps23, abs(ritz[i]) )
    if ( bounds[i] <= tol*tmp )
      nconv += 1
    end
  end

  #  call arscnd (t1)
  #  tsconv = tsconv + (t1 - t0)
  @jl_update_time(tconv, t0)

  return nconv
  end

Here, we can use the following macros to accomplish this.

  macro jl_arpack_time()
    return esc(:( stats == nothing ? zero(ArpackTime) : time() ))
  end

  macro jl_update_time(field, t0)
    return esc( quote
      if stats != nothing
        stats.$field += time() - $t0
      end
    end )
  end

And thus, we have our actually strategy. Note that some of the little code snippets here were taken from development versions and some of the variable names may have changed. (e.g. tconv vs. tsconv) because there is no reason to track symmetric vs. nonsymmetric time separately. (I mean, I guess we could, but that seems unlike Julia language stuff.) Could be revisited in the future!

At this point, we also put a bunch of stuff in Github to keep the project coming. See the ArpackInJulia.jl Github repo.

A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 2 - Overview
Apr 22, 2021
See Overview for a high-level picture.

Here is a quick call graph of the various ARPACK functions for the symmetric eigenvalue/eigenvector problem dsaupd
- d = double precision
- s = symmetric
- aupd = Arnoldi Update (since we’ll call this in a loop to update the internal Arnoldi information.)
Note to the pedantic. Yes, I know the Arnoldi of a symmetric matrix is usually called the Lanczos process. So I should be saying Lanczos everywhere here. Except, maybe I really shouldn’t be saying either and we could use the phrase: iteratively orthogonalized power basis instead of Arnoldi or three-term recurrent power basis instead of Lanczos? Either way, since the package is called ARPACK and I’m writing about that, I’ll stick with Arnoldi here.

Back to the call graph.
```
dsaupd calls
  dsaup2 calls
    dgetv0 calls printing, reverse communication
    dsaitr calls printing, blas, lapack
    dsapps calls printing, blas, lapack
    dsconv
    dseigt
    dsgets
    dsortr

    ivout - write out an integer vector
    arscnd - internal wrapper avoiding LAPACK second function
    dvout - write out a double vector
```
then we also need to handle getting the eigenvector information out of the computation.
```
dseupd calls
  dsesrt
  dsortr - same as above
```
This suggests a plan.

Port the low-level/internal routines first. Then port the high-level routines.

Wait what you say? Why not do it the opposite way? Well, I started doing it that way and realized it’s just harder ;). So we are going to switch and do the easy stuff first!

Just as a quick overview of what the code actually does.
```
dsaupd - high level interface
  dsaup2 - medium level interface, manages the overall process
    dgetv0 - gets the starting vector, or returns control to user to get it.
    dsaitr - haven't looked yet...
    dsapps - haven't looked yet...
    dsconv - check what has converged (easiest routine!)
    dseigt - haven't looked yet...
    dsgets - get shifts based on eigenvalue estimates
    dsortr - sorts a vector and/or applies the sort to an aux vector too
```
Also, let me note that the ARPACK documentation is stellar. Stellar. So a few clicks and I could easily figure out what those other functions are doing.

A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 1 - Goals

Apr 21, 2021

I’m working on a small “me-time” project to port the ARPACK symmetric solver to Julia.

My goal is to port the double-precision ARPACK for symmetric matrices in Julia. Including all ARPACK stuff. So this should give “exactly” what ARPACK does but be a pure Julia implementation. (Where exactly is … it should be executing roughly the same sequence of floating point operations and can differ on levels that would be expected for different compilers compiling the same code.)

not a goal to “Julia-ize” the package; I want to keep as close to the FORTRAN as possible so that I might be able to replace calls to Julia’s Arpack.saupd / Arpack.seupd (which call the Fortran library) with this code.
small internal function changes are okay, e.g. ARPACK has various debugging and timing stuff that would need to be done differently in Julia.
small simplifications, e.g. if a function computes a single Int, we can rewrite that to return the Int rather than writing it into an array like in FORTRAN.
Why? Why not implement my own ImplicitRestart/Eigensolver/Etc.? Simple: I trust ARPACK. Also, I want to understand exactly what the symmetric ARPACK solver is doing.
Why not use a Fortran->Julia compiler? Well, I could. But I could also do this and learn intimate details of how it works to help out in teaching :)
I want to document some of the cool stuff in ARPACK!

For context, I spent about a semester studying these solvers with Gene Golub to see if there were obvious ideas to make computing the Fiedler vectors faster, but beyond the high level algorithmics and lots of experiments, nothing came of it. This project is different in that the end goal is easy – port the code, with a focus on explaining in detail what each step does. (hah, how about that name-dropping?)

Since it’s good to have a driving use-case, the idea is to make the following MatrixNetworks.jl code work without the ARPACK dependency.

  """
    `_symeigs_smallest_arpack`
    ---
    Compute eigenvalues and vectors using direct calls to the ARPACK wrappers
    to get type-stability. This function works for symmetric matrices.
    This function works on Float32 and Float64-valued sparse matrices.
    It returns the smallest set of eigenvalues and vectors.
    It only works on matrices with more than 21 rows and columns.
    (Use a dense eigensolver for smaller problems.)
    Functions
    ---------
    - `(evals,evecs) = _symeigs_smallest_arpack(A::SparseMatrixCSC{V,Int},
                            nev::Int,tol::V,maxiter::Int, v0::Vector{V})`
    Inputs
    ------
    - `A`: the sparse matrix, must be symmetric
    - `nev`: the number of eigenvectors requested
    - `tol`: the relative tolerance of the eigenvalue computation
    - `maxiter`: the maximum number of restarts
    - `v0`: the initial vector for the Lanczos process
    Example
    -------
    This is an internal function.
    """
    function _symeigs_smallest_arpack(
                A::SparseMatrixCSC{V,Int},nev::Int,tol::V,maxiter::Int,
                v0::Vector{V}) where V

        n::Int = checksquare(A) # get the size
        @assert n >= 21

        # setup options
        mode = 1
        sym = true
        iscmplx = false
        bmat = String("I")
        ncv = min(max(2*nev,20),n-1)

        whichstr = String("SA")
        ritzvec = true
        sigma = 0.

        #TOL = Array{V}(undef,1)
        #TOL[1] = tol
        TOL = Ref(tol)
        lworkl = ncv*(ncv + 8)
        v = Array{V}(undef, n, ncv)
        workd = Array{V}(undef, 3*n)
        workl = Array{V}(undef, lworkl)
        resid = Array{V}(undef, n)

        resid[:] = v0[:]

        info = zeros(BlasInt, 1)
        info[1] = 1

        iparam = zeros(BlasInt, 11)
        ipntr = zeros(BlasInt, 11)
        ido = zeros(BlasInt, 1)

        iparam[1] = BlasInt(1)
        iparam[3] = BlasInt(maxiter)
        iparam[7] = BlasInt(mode)

        # this is a helpful indexing vector
        zernm1 = 0:(n-1)

        while true
            # This is the reverse communication step that ARPACK does
            # we need to extract the desired vector and multiply it by A
            # unless the code says to stop
            Arpack.saupd(
                ido, bmat, n, whichstr, nev, TOL, resid, ncv, v, n,
                iparam, ipntr, workd, workl, lworkl, info)

            load_idx = ipntr[1] .+ zernm1
            store_idx = ipntr[2] .+ zernm1

            x = workd[load_idx]

            if ido[1] == 1
                workd[store_idx] = A*x
            elseif ido[1] == 99
                break
            else
                error("unexpected ARPACK behavior")
            end
        end

        # Once the process terminates, we need to extract the
        # eigenvectors.

        # calls to eupd
        howmny = String("A")
        select = Array{BlasInt}(undef, ncv)

        d = Array{V}(undef, nev)
        sigmar = ones(V,1)*sigma
        ldv = n
        Arpack.seupd(ritzvec, howmny, select, d, v, ldv, sigmar,
            bmat, n, whichstr, nev, TOL, resid, ncv, v, ldv,
            iparam, ipntr, workd, workl, lworkl, info)
        if info[1] != 0
            error("unexpected ARPACK exception")
        end

        # Now we want to return them in sorted order (smallest first)
        p = sortperm(d)

        d = d[p]
        vectors = v[1:n,p]

        return (d,vectors,iparam[5])
    end

Apr 14, 2021

Anyone else find the documentation in old Fortran codes is amazingly beautiful? e.g. look at the documentation in ARPACK. github.com/opencolla… Wonderfully annotated code.
Apr 12, 2021
Julia (@JuliaComputing) and VSCode on Native M1.
1. Install M1 Native VSCode.
2. Install Xcode & Command Line tools.
3. Clone Julia master.
4. make
5. ./julia
6. ENV[“JULIA_GR_PROVIDER”] = “GR”
7. ]add Plots
8. using Plots
9. plot(randn(10))
10. Point VSCode at new julia and 🍾🤩🥳
My thoughts on getting a COVID vaccine today.
Mar 12, 2021
My thoughts on getting a COVID vaccine today. I’m of mixed feelings about this because, while vaccine!, strictly speaking, the eligibility of University faculty (as I understand them) are unclear. I’m also not particularly old, nor high-risk.

Meijer (and Kroger, and others) are vaccinating university faculty as educators. (Which I definitely am!) This fact has been posted on Twitter by others, and announced on a university mailing list. While I’d like to think of myself as a as-scrupulous-as-reasonable rule-follower, this is a murky area.

I never lied to get this. Also, remember – any type of medical/healthcare employer was vaccinating nearly everyone when vaccines were really scarce, even those capable of work at home or not doing patient facing care. Here’s part of my rationalization.

-The US has a somewhat inefficient allocation of vaccine supply to where it is needed. I booked this appointment with <= 24 not-sleeping hours notice. I cannot believe this keeps anyone who ought to be eligible from getting a vaccine. (The pharmacy still had same day availability when I got my vaccine.)

-I have the capacity/flexibility/ability to exchange my time/energy to get a vaccine when/where others can’t. Should I not use this to help address the inefficient allocation above?

-I have a responsibility to help keep the daycare safe that we participate in, I struggled with how I would feel if I were to get COVID at this point (knowing that I passed up a vaccine opportunity) and pass it onto anyone at the daycare.

I’m posting this note for a few reasons.
- We need to remind people that vaccines help everyone! There is some social signaling and sociology Even if you don’t need it, it will help others by curtailing possible spread, and others may need some encouragement to get it.
- By trying to understand the appointment landscape, I helped a number of people get the vaccine weeks earlier than they were originally scheduled.
- If you, for any reason, want to get the vaccine. Realize that there is capacity in the system in some places.
- And use this information to help others too.
- If people want to know my thoughts on the current state of getting a vaccine in Indiana, I’m happy to share more from looking at appointment availability for an elementary teacher I work with. My sense is that appointments are reasonably highly available with <= 24 hours notice if you are willing to travel <= 100 miles.

Calling LAPACK’s random number generator from julia

Porting LAPACK’s random number generator to Julia.

Julia is faster than Fortran.

Appendix

Getting the values from the fortran code.

A question

Some history

C/C++ Code to check

Step 1. Do something you know or strongly suspect will work.

Step 2: Better understanding Julia.

The fix

The error in my mental model

Suggestion