• Dear Apple Developer world, is there documentation of the “plj” (Property List Journal) files Apple Photos uses as its reference for photo information? (This is not the sqlite database many others have documented.)

  • Based on the Google News snippets, you wouldn’t think these headlines describe the same article! www.nature.com/articles/…

    A description of a nature medicine article from Google news.

  • The FDA (regulations.gov) got 40,000 comments yesterday on the EUA for 5-11! (Their total is 55k now, but was 15k the day before…). Seems like the majority are against approval from my non-statistical sampling!

  • Oh wow. MS @powerpoint finally fixed my bug with PT Sans Bold Italic on MacOS Big Sur… (and also Catalina, although something else happened there) Just in time for Monterey! Thanks @powerpoint team! (Took ~6 mo-1 year).

  • Anyone know what the carbon footprint of me spending 5 minutes answering an email is? I don’t mean the computer and network etc, I mean, the carbon footprint of a human spending 5 minutes on a sedentary activity.

  • Tried to take a survey of academic climate. Sadly had to give up because it was total nonsense. My own view of my answers were basically random.

  • Google thinks the Lucid Air has some pretty incredible range.

    Google search for lucid air dimensions, first link that mentions lucid air range in the caption

  • n=1 observation for me regarding class… I’m finding this semester has more time-consuming issue as things are not yet entirely ‘back to normal’ although there is a push that way (which then triggers the time-consuming issues…) could be random luck though too!

  • If anyone sees a leather case for an iPhone 13 Pro without a lip around the camera region, can you let me know please? Pretty please? I like cameras, but I also like “phones” that lie flat and don’t wobble.

  • I find it curious how surveys of academia believe we have perfect memory of how many things like ‘reviews’ we do. Or how many we ‘declined’ to do! This is a hard question. I have no idea how many things I say no to!

  • Need data (networks, hypergraphs, tensors, ML, AI)? Here’s a timestamped and linked digest of Anthony Fauci’s emails arxiv.org/abs/2108…. from @JasonLeopold FOIA request. With @austinbenson and @n_veldt Data and Code github.com/nveldt/fa…

  • LAPACK's random number generators in Julia (part ... blah ... in work to port ARPACK to Julia.)

    ARPACK depends on LAPACK’s internal random number generators

    • dlarnv
    • slarnv
    • zlarnv
    • clarnv

    which use the raw routines

    • dlaruv
    • slaruv

    This is true even if you use Julia to initialize the starting vector using Julia’s random number generator. The reason is because ARPACK needs to be able to get random data to restart the process is it detects an invariant subspace. This suggests two approaches: call LAPACK’s RNG from Julia, or port LAPACK’s RNG to Julia. Why do the simple thing when the more complicated thing is so much more fun?

    Aside, best guess on LAPACK names: _larnv = lapack random normal vector and _laruv = lapack random uniform vector

    Calling LAPACK’s random number generator from julia

    The tricky bit about calling LAPACK’s RNG from Julia is the seed value. This is just 4 ints, which we’d like to be stack allocated. Unfortunately, we won’t be able to get that for reasons that are internal to Julia. (Blah blah, stack vs. heap and gc and guarantees…)

    Here’s some code that works though.

    import LinearAlgebra
    
    _dlarnv_blas!(idist::Int,
           iseed::Ref{NTuple{4,Int}},
          n::Int,
          x::Vector{Float64}) =
      ccall((LinearAlgebra.BLAS.@blasfunc("dlarnv_"), LinearAlgebra.BLAS.libblas), Cvoid,
        (Ref{LinearAlgebra.BlasInt}, Ptr{LinearAlgebra.BlasInt}, Ref{LinearAlgebra.BlasInt}, Ptr{Float64}),
        idist, iseed, n, x)
    
    x = zeros(5); _dlarnv!(1,Base.Ref(tuple(1,2,3,4)),length(x),x)
    
    using BenchmarkTools
    @btime _dlarnv_blas!(1, tseed, length(z), z) setup=begin
        z=zeros(128);     
        tseed=Base.Ref(tuple(1,3,5,7));
    end
    

    This uses NTuple as the holder for the 4 ints we use as the seed. At this point, I thought the following should work…

    mutable struct MySeedState
      seed::NTuple{4,Int}
    end
    
    function myrand!(x::Vector{Float64}, state::MySeedState)
      iseed = Base.Ref(state.seed) # this will allocate :(
    
      _dlarnv_blas!(2, iseed, length(x), x)
      state.seed = iseed[]
    end
    
    using BenchmarkTools
    @btime myrand!(z, tstate) setup=begin z=zeros(128); tstate=MySeedState(tuple(1,3,5,7)); end
    

    But – see above – I learned when we want to use this is that allocating a Ref can actually allocate in Julia, even when it’s an itsbits type.

    julia> isbitstype(NTuple{4,Int})
    true
    

    To avoid this, we can just allocate the ref as part of the mutable struct.

    mutable struct MyRefSeedState
      seed::Ref{NTuple{4,Int}}
    end
    
    function myrand!(x::Vector{Float64}, state::MyRefSeedState)
      _dlarnv_blas!(2, state.seed, length(x), x)
    end
    
    using BenchmarkTools
    @btime myrand!(z, tstate) setup=begin
        z=zeros(128);
        tstate=MyRefSeedState(Base.Ref(tuple(1,3,5,7)));
    end
    

    Either way, it probably isn’t that big of a deal, but I’d like to be clear on where the allocations are happening. e.g. one-time thread local setup is okay. Random internal allocations are not okay.

    It’s puzzling to me why this causes an allocation as this seems like it can be safely done on the stack and the same allocation doesn’t occur for types like Int.

    Porting LAPACK’s random number generator to Julia.

    Since the routine is sufficiently simple, we can actually just port the random number generator directly.

    Aside – that this is actually a weakness as Julia’s RNGs are much better. On the other hand, these are just used to generate data that is not-orthogonal to a desired invariant subspace (with very very high probability). So the quality of the RNG does not need to be statistical.

    The underlying double-precision random number generator is dlaruv not dlarnv. The second function dlarnv is a wrapper that does seemingly odd things like work in 128-length segments, until you realize that dlaruv only works in 128-length segments. (Weird!) Anyhoo.

    Here’s a link to the double precision generator dlaruv.f in OpenBLAS

    The routine is fairly simple. Have a set of fixed values and do some integer operations/etc. to get pseudorandom values out. Specifically > This routine uses a multiplicative congruential method with modulus 2**48 and multiplier 33952834046453 (see G.S.Fishman, 'Multiplicative congruential random number generators with modulus 2**b: an exhaustive analysis for b = 32 and a partial analysis for b = 48', Math. Comp. 189, pp 331-344, 1990).

    See the appendix for code to extract this fixed array of values.

    We set these values into a constant global array

    const _dlaruv_mm=tuple(tuple(494,322,2508,2549), ...
    
    julia> typeof(_dlaruv_mm)
    NTuple{128, NTuple{4, Int64}}
    

    Next up, we port the actual RNG. In this case, I was actually able to copy/paste much of the FORTRAN code.

    function _dlaruv!(iseed::Base.RefValue{NTuple{4,Int}}, n::Int, x::Vector{Float64})
      IPW2=4096
      R = 1/IPW2
      i1 = iseed[][1]
      i2 = iseed[][2]
      i3 = iseed[][3]
      i4 = iseed[][4]
      # setup scope for final call in julia
      IT1 = i1
      IT2 = i2
      IT3 = i3
      IT4 = i4
      @inbounds for i=1:min(n,length(_dlaruv_mm))
        # directly copied/pasted from fortran, hence the caps
        while true
          # Multiply the seed by i-th power of the multiplier modulo 2**48
          IT4 = i4*_dlaruv_mm[i][4]
          IT3 = IT4 ÷ IPW2
          IT4 = IT4 - IPW2*IT3
          IT3 = IT3 + i3*_dlaruv_mm[i][4] + i4*_dlaruv_mm[i][3]
          IT2 = IT3 ÷ IPW2
          IT3 = IT3 - IPW2*IT2
          IT2 = IT2 + i2*_dlaruv_mm[i][4] + i3*_dlaruv_mm[i][3] + i4*_dlaruv_mm[i][2]
          IT1 = IT2 ÷ IPW2
          IT2 = IT2 - IPW2*IT1
          IT1 = IT1 + i1*_dlaruv_mm[i][4] + i2*_dlaruv_mm[i][3] + i3*_dlaruv_mm[i][2] + i4*_dlaruv_mm[i][1]
          IT1 = mod( IT1, IPW2 )
    
          # Convert 48-bit integer to a real number in the interval (0,1)
          x[i] = R*( Float64( IT1 )+R*( Float64( IT2 )+R*( Float64( IT3 )+R*
                     Float64( IT4 ) ) ) )
          if x[i] == 1
            # handle the case where the value is different
            i1 += 2
            i2 += 2
            i3 += 2
            i4 += 2
          else
            break
          end
        end
      end
      iseed[] = tuple(IT1,IT2,IT3,IT4) # update the seed
    end
    

    This uses the global constant array to avoid dealing with that big array of values each function call; otherwise, the only difference is that we use a while loop instead of a GOTO loop in fortran.

    This gives exactly the same results as the FORTRAN call

    _dlaruv_blas!(
          iseed::Ref{NTuple{4,Int}},
          n::Int,
          x::Vector{Float64}) =
      ccall((LinearAlgebra.BLAS.@blasfunc("dlaruv_"), LinearAlgebra.BLAS.libblas), Cvoid,
        (Ptr{LinearAlgebra.BlasInt}, Ref{LinearAlgebra.BlasInt}, Ptr{Float64}),
        iseed, n, x)
    
    seed1 = Base.RefValue(tuple(1,3,5,7))
    seed2 = Base.RefValue(tuple(1,3,5,7))
    
    x1 = zeros(97); _dlaruv_blas!(seed1, length(x1), x1)
    x2 = similar(x1); _dlaruv!(seed2, length(x2), x2)
    
    @assert x1 == x2
    @assert seed1[] == seed2[]
    
    x3 = zeros(143); _dlaruv_blas!(seed1, length(x3), x3)
    x4 = zeros(axes(x3)...); _dlaruv!(seed2, length(x4), x4)
    
    @assert x3 == x4
    @assert seed1[] == seed2[]
    

    Awesome. But if you look at dlarnv, it also has a statically allocated length 128 array that is not going to be fun in Julia. Doable, but not fun… nor at all necessary. We can actually streamline things by just manually inlining the RNG directly into dlarnv. The only thing we have to remember is that we need to “refresh” the seed every 128 RV generations.

    Since ARPACK only ever calls dlarnv with idist=2, we are just going to port that code.

    function _dlarnv_idist_2!(iseed::Base.RefValue{NTuple{4,Int}}, n::Int, x::Vector{Float64})
      # manually inline dlaruv to avoid excess cruft...
      IPW2=4096
      R = 1/IPW2
      i1 = iseed[][1]
      i2 = iseed[][2]
      i3 = iseed[][3]
      i4 = iseed[][4]
      # setup scope for final call in julia
      IT1 = i1
      IT2 = i2
      IT3 = i3
      IT4 = i4
    
      nrv = 0
    
      @inbounds for i=1:n
        rv = 1.0
        nrv += 1
        if nrv > length(_dlaruv_mm)
          nrv = 1  # update seed information
          i1 = IT1
          i2 = IT2
          i3 = IT3
          i4 = IT4
        end
    
        while true
          # Multiply the seed by i-th power of the multiplier modulo 2**48
          IT4 = i4*_dlaruv_mm[nrv][4]
          IT3 = IT4 ÷ IPW2
          IT4 = IT4 - IPW2*IT3
          IT3 = IT3 + i3*_dlaruv_mm[nrv][4] + i4*_dlaruv_mm[nrv][3]
          IT2 = IT3 ÷ IPW2
          IT3 = IT3 - IPW2*IT2
          IT2 = IT2 + i2*_dlaruv_mm[nrv][4] + i3*_dlaruv_mm[nrv][3] + i4*_dlaruv_mm[nrv][2]
          IT1 = IT2 ÷ IPW2
          IT2 = IT2 - IPW2*IT1
          IT1 = IT1 + i1*_dlaruv_mm[nrv][4] + i2*_dlaruv_mm[nrv][3] + i3*_dlaruv_mm[nrv][2] + i4*_dlaruv_mm[nrv][1]
          IT1 = mod( IT1, IPW2 )
    
          # Convert 48-bit integer to a real number in the interval (0,1)
          rv = R*( Float64( IT1 )+R*( Float64( IT2 )+R*( Float64( IT3 )+R*
                   Float64( IT4 ) ) ) )
          if rv == 1
            # handle the case where the value is different
            i1 += 2
            i2 += 2
            i3 += 2
            i4 += 2
          else
            break
          end
        end
        # now, rv is a random value
        # port the line from `dlarnv` for idist=2
        x[i] = 2*rv - 1
      end
      iseed[] = tuple(IT1,IT2,IT3,IT4) # update the seed
    end
    

    This works great!

    seed1 = Base.RefValue(tuple(1,3,5,7))
    seed2 = Base.RefValue(tuple(1,3,5,7))
    
    x1 = zeros(501); _dlarnv_blas!(2, seed1, length(x1), x1)
    x2 = similar(x1); _dlarnv_idist_2!(seed2, length(x2), x2)
    
    @assert x1 == x2
    @assert seed1[] == seed2[]
    
    ##
    x3 = zeros(143); _dlarnv_blas!(2, seed1, length(x3), x3)
    x4 = zeros(axes(x3)...); _dlarnv_idist_2!(seed2, length(x4), x4)
    
    @assert x3 == x4
    @assert seed1[] == seed2[]    
    

    Exactly the same values and exactly the same seeds.

    Things that might happen in the future… I may pull out the key piece of the random number generator into a separate function…

    rv,IT1,IT2,IT3,IT4,i1,i2,i3,i4 = _dlaruv_rng_step(
        nrv,IT1,IT2,IT3,IT4,i1,i2,i3,i4)
    

    Well… I tried that, and it turns out slower, which brings us to …

    Julia is faster than Fortran.

    What’s neat about inlining that code is we get a small performance boost.

    using BenchmarkTools
    @btime _dlarnv_idist_2!(tseed, length(z), z) setup=begin
      z=zeros(501);
      tseed=Base.Ref(tuple(1,3,5,7)); end
    @btime _dlarnv_blas!(2, tseed, length(z), z) setup=begin
      z=zeros(501);
      tseed=Base.Ref(tuple(1,3,5,7)); end
    

    Gives …

    Julia 6.686 μs (0 allocations: 0 bytes)

    Fortran 7.333 μs (0 allocations: 0 bytes)

    So Julia for the win!

    (For reference, with the _dlaruv_rng_step idea, it was 7.0 microseconds, so just a nudge slower. Not sure why as it would seem to inline to exactly the same code. )

    Appendix

    Getting the values from the fortran code.

    Here is how we extracted that big array of constant values from the FORTRAN code for dlaruv. I believe it’s the same array for slaruv too, but I’d need to double-check that. So this code is helpful there.

    code = raw"""
    DATA               ( MM( 1, J ), J = 1, 4 ) / 494, 322, 2508,
    $                   2549 /
    DATA               ( MM( 2, J ), J = 1, 4 ) / 2637, 789, 3754,
    $                   1145 /
    DATA               ( MM( 3, J ), J = 1, 4 ) / 255, 1440, 1766,
    $                   2253 /
    DATA               ( MM( 4, J ), J = 1, 4 ) / 2008, 752, 3572,
    $                   305 /
    DATA               ( MM( 5, J ), J = 1, 4 ) / 1253, 2859, 2893,
    $                   3301 /
    DATA               ( MM( 6, J ), J = 1, 4 ) / 3344, 123, 307,
    $                   1065 /
    DATA               ( MM( 7, J ), J = 1, 4 ) / 4084, 1848, 1297,
    $                   3133 /
    DATA               ( MM( 8, J ), J = 1, 4 ) / 1739, 643, 3966,
    $                   2913 /
    DATA               ( MM( 9, J ), J = 1, 4 ) / 3143, 2405, 758,
    $                   3285 /
    DATA               ( MM( 10, J ), J = 1, 4 ) / 3468, 2638, 2598,
    $                   1241 /
    DATA               ( MM( 11, J ), J = 1, 4 ) / 688, 2344, 3406,
    $                   1197 /
    DATA               ( MM( 12, J ), J = 1, 4 ) / 1657, 46, 2922,
    $                   3729 /
    DATA               ( MM( 13, J ), J = 1, 4 ) / 1238, 3814, 1038,
    $                   2501 /
    DATA               ( MM( 14, J ), J = 1, 4 ) / 3166, 913, 2934,
    $                   1673 /
    DATA               ( MM( 15, J ), J = 1, 4 ) / 1292, 3649, 2091,
    $                   541 /
    DATA               ( MM( 16, J ), J = 1, 4 ) / 3422, 339, 2451,
    $                   2753 /
    DATA               ( MM( 17, J ), J = 1, 4 ) / 1270, 3808, 1580,
    $                   949 /
    DATA               ( MM( 18, J ), J = 1, 4 ) / 2016, 822, 1958,
    $                   2361 /
    DATA               ( MM( 19, J ), J = 1, 4 ) / 154, 2832, 2055,
    $                   1165 /
    DATA               ( MM( 20, J ), J = 1, 4 ) / 2862, 3078, 1507,
    $                   4081 /
    DATA               ( MM( 21, J ), J = 1, 4 ) / 697, 3633, 1078,
    $                   2725 /
    DATA               ( MM( 22, J ), J = 1, 4 ) / 1706, 2970, 3273,
    $                   3305 /
    DATA               ( MM( 23, J ), J = 1, 4 ) / 491, 637, 17,
    $                   3069 /
    DATA               ( MM( 24, J ), J = 1, 4 ) / 931, 2249, 854,
    $                   3617 /
    DATA               ( MM( 25, J ), J = 1, 4 ) / 1444, 2081, 2916,
    $                   3733 /
    DATA               ( MM( 26, J ), J = 1, 4 ) / 444, 4019, 3971,
    $                   409 /
    DATA               ( MM( 27, J ), J = 1, 4 ) / 3577, 1478, 2889,
    $                   2157 /
    DATA               ( MM( 28, J ), J = 1, 4 ) / 3944, 242, 3831,
    $                   1361 /
    DATA               ( MM( 29, J ), J = 1, 4 ) / 2184, 481, 2621,
    $                   3973 /
    DATA               ( MM( 30, J ), J = 1, 4 ) / 1661, 2075, 1541,
    $                   1865 /
    DATA               ( MM( 31, J ), J = 1, 4 ) / 3482, 4058, 893,
    $                   2525 /
    DATA               ( MM( 32, J ), J = 1, 4 ) / 657, 622, 736,
    $                   1409 /
    DATA               ( MM( 33, J ), J = 1, 4 ) / 3023, 3376, 3992,
    $                   3445 /
    DATA               ( MM( 34, J ), J = 1, 4 ) / 3618, 812, 787,
    $                   3577 /
    DATA               ( MM( 35, J ), J = 1, 4 ) / 1267, 234, 2125,
    $                   77 /
    DATA               ( MM( 36, J ), J = 1, 4 ) / 1828, 641, 2364,
    $                   3761 /
    DATA               ( MM( 37, J ), J = 1, 4 ) / 164, 4005, 2460,
    $                   2149 /
    DATA               ( MM( 38, J ), J = 1, 4 ) / 3798, 1122, 257,
    $                   1449 /
    DATA               ( MM( 39, J ), J = 1, 4 ) / 3087, 3135, 1574,
    $                   3005 /
    DATA               ( MM( 40, J ), J = 1, 4 ) / 2400, 2640, 3912,
    $                   225 /
    DATA               ( MM( 41, J ), J = 1, 4 ) / 2870, 2302, 1216,
    $                   85 /
    DATA               ( MM( 42, J ), J = 1, 4 ) / 3876, 40, 3248,
    $                   3673 /
    DATA               ( MM( 43, J ), J = 1, 4 ) / 1905, 1832, 3401,
    $                   3117 /
    DATA               ( MM( 44, J ), J = 1, 4 ) / 1593, 2247, 2124,
    $                   3089 /
    DATA               ( MM( 45, J ), J = 1, 4 ) / 1797, 2034, 2762,
    $                   1349 /
    DATA               ( MM( 46, J ), J = 1, 4 ) / 1234, 2637, 149,
    $                   2057 /
    DATA               ( MM( 47, J ), J = 1, 4 ) / 3460, 1287, 2245,
    $                   413 /
    DATA               ( MM( 48, J ), J = 1, 4 ) / 328, 1691, 166,
    $                   65 /
    DATA               ( MM( 49, J ), J = 1, 4 ) / 2861, 496, 466,
    $                   1845 /
    DATA               ( MM( 50, J ), J = 1, 4 ) / 1950, 1597, 4018,
    $                   697 /
    DATA               ( MM( 51, J ), J = 1, 4 ) / 617, 2394, 1399,
    $                   3085 /
    DATA               ( MM( 52, J ), J = 1, 4 ) / 2070, 2584, 190,
    $                   3441 /
    DATA               ( MM( 53, J ), J = 1, 4 ) / 3331, 1843, 2879,
    $                   1573 /
    DATA               ( MM( 54, J ), J = 1, 4 ) / 769, 336, 153,
    $                   3689 /
    DATA               ( MM( 55, J ), J = 1, 4 ) / 1558, 1472, 2320,
    $                   2941 /
    DATA               ( MM( 56, J ), J = 1, 4 ) / 2412, 2407, 18,
    $                   929 /
    DATA               ( MM( 57, J ), J = 1, 4 ) / 2800, 433, 712,
    $                   533 /
    DATA               ( MM( 58, J ), J = 1, 4 ) / 189, 2096, 2159,
    $                   2841 /
    DATA               ( MM( 59, J ), J = 1, 4 ) / 287, 1761, 2318,
    $                   4077 /
    DATA               ( MM( 60, J ), J = 1, 4 ) / 2045, 2810, 2091,
    $                   721 /
    DATA               ( MM( 61, J ), J = 1, 4 ) / 1227, 566, 3443,
    $                   2821 /
    DATA               ( MM( 62, J ), J = 1, 4 ) / 2838, 442, 1510,
    $                   2249 /
    DATA               ( MM( 63, J ), J = 1, 4 ) / 209, 41, 449,
    $                   2397 /
    DATA               ( MM( 64, J ), J = 1, 4 ) / 2770, 1238, 1956,
    $                   2817 /
    DATA               ( MM( 65, J ), J = 1, 4 ) / 3654, 1086, 2201,
    $                   245 /
    DATA               ( MM( 66, J ), J = 1, 4 ) / 3993, 603, 3137,
    $                   1913 /
    DATA               ( MM( 67, J ), J = 1, 4 ) / 192, 840, 3399,
    $                   1997 /
    DATA               ( MM( 68, J ), J = 1, 4 ) / 2253, 3168, 1321,
    $                   3121 /
    DATA               ( MM( 69, J ), J = 1, 4 ) / 3491, 1499, 2271,
    $                   997 /
    DATA               ( MM( 70, J ), J = 1, 4 ) / 2889, 1084, 3667,
    $                   1833 /
    DATA               ( MM( 71, J ), J = 1, 4 ) / 2857, 3438, 2703,
    $                   2877 /
    DATA               ( MM( 72, J ), J = 1, 4 ) / 2094, 2408, 629,
    $                   1633 /
    DATA               ( MM( 73, J ), J = 1, 4 ) / 1818, 1589, 2365,
    $                   981 /
    DATA               ( MM( 74, J ), J = 1, 4 ) / 688, 2391, 2431,
    $                   2009 /
    DATA               ( MM( 75, J ), J = 1, 4 ) / 1407, 288, 1113,
    $                   941 /
    DATA               ( MM( 76, J ), J = 1, 4 ) / 634, 26, 3922,
    $                   2449 /
    DATA               ( MM( 77, J ), J = 1, 4 ) / 3231, 512, 2554,
    $                   197 /
    DATA               ( MM( 78, J ), J = 1, 4 ) / 815, 1456, 184,
    $                   2441 /
    DATA               ( MM( 79, J ), J = 1, 4 ) / 3524, 171, 2099,
    $                   285 /
    DATA               ( MM( 80, J ), J = 1, 4 ) / 1914, 1677, 3228,
    $                   1473 /
    DATA               ( MM( 81, J ), J = 1, 4 ) / 516, 2657, 4012,
    $                   2741 /
    DATA               ( MM( 82, J ), J = 1, 4 ) / 164, 2270, 1921,
    $                   3129 /
    DATA               ( MM( 83, J ), J = 1, 4 ) / 303, 2587, 3452,
    $                   909 /
    DATA               ( MM( 84, J ), J = 1, 4 ) / 2144, 2961, 3901,
    $                   2801 /
    DATA               ( MM( 85, J ), J = 1, 4 ) / 3480, 1970, 572,
    $                   421 /
    DATA               ( MM( 86, J ), J = 1, 4 ) / 119, 1817, 3309,
    $                   4073 /
    DATA               ( MM( 87, J ), J = 1, 4 ) / 3357, 676, 3171,
    $                   2813 /
    DATA               ( MM( 88, J ), J = 1, 4 ) / 837, 1410, 817,
    $                   2337 /
    DATA               ( MM( 89, J ), J = 1, 4 ) / 2826, 3723, 3039,
    $                   1429 /
    DATA               ( MM( 90, J ), J = 1, 4 ) / 2332, 2803, 1696,
    $                   1177 /
    DATA               ( MM( 91, J ), J = 1, 4 ) / 2089, 3185, 1256,
    $                   1901 /
    DATA               ( MM( 92, J ), J = 1, 4 ) / 3780, 184, 3715,
    $                   81 /
    DATA               ( MM( 93, J ), J = 1, 4 ) / 1700, 663, 2077,
    $                   1669 /
    DATA               ( MM( 94, J ), J = 1, 4 ) / 3712, 499, 3019,
    $                   2633 /
    DATA               ( MM( 95, J ), J = 1, 4 ) / 150, 3784, 1497,
    $                   2269 /
    DATA               ( MM( 96, J ), J = 1, 4 ) / 2000, 1631, 1101,
    $                   129 /
    DATA               ( MM( 97, J ), J = 1, 4 ) / 3375, 1925, 717,
    $                   1141 /
    DATA               ( MM( 98, J ), J = 1, 4 ) / 1621, 3912, 51,
    $                   249 /
    DATA               ( MM( 99, J ), J = 1, 4 ) / 3090, 1398, 981,
    $                   3917 /
    DATA               ( MM( 100, J ), J = 1, 4 ) / 3765, 1349, 1978,
    $                   2481 /
    DATA               ( MM( 101, J ), J = 1, 4 ) / 1149, 1441, 1813,
    $                   3941 /
    DATA               ( MM( 102, J ), J = 1, 4 ) / 3146, 2224, 3881,
    $                   2217 /
    DATA               ( MM( 103, J ), J = 1, 4 ) / 33, 2411, 76,
    $                   2749 /
    DATA               ( MM( 104, J ), J = 1, 4 ) / 3082, 1907, 3846,
    $                   3041 /
    DATA               ( MM( 105, J ), J = 1, 4 ) / 2741, 3192, 3694,
    $                   1877 /
    DATA               ( MM( 106, J ), J = 1, 4 ) / 359, 2786, 1682,
    $                   345 /
    DATA               ( MM( 107, J ), J = 1, 4 ) / 3316, 382, 124,
    $                   2861 /
    DATA               ( MM( 108, J ), J = 1, 4 ) / 1749, 37, 1660,
    $                   1809 /
    DATA               ( MM( 109, J ), J = 1, 4 ) / 185, 759, 3997,
    $                   3141 /
    DATA               ( MM( 110, J ), J = 1, 4 ) / 2784, 2948, 479,
    $                   2825 /
    DATA               ( MM( 111, J ), J = 1, 4 ) / 2202, 1862, 1141,
    $                   157 /
    DATA               ( MM( 112, J ), J = 1, 4 ) / 2199, 3802, 886,
    $                   2881 /
    DATA               ( MM( 113, J ), J = 1, 4 ) / 1364, 2423, 3514,
    $                   3637 /
    DATA               ( MM( 114, J ), J = 1, 4 ) / 1244, 2051, 1301,
    $                   1465 /
    DATA               ( MM( 115, J ), J = 1, 4 ) / 2020, 2295, 3604,
    $                   2829 /
    DATA               ( MM( 116, J ), J = 1, 4 ) / 3160, 1332, 1888,
    $                   2161 /
    DATA               ( MM( 117, J ), J = 1, 4 ) / 2785, 1832, 1836,
    $                   3365 /
    DATA               ( MM( 118, J ), J = 1, 4 ) / 2772, 2405, 1990,
    $                   361 /
    DATA               ( MM( 119, J ), J = 1, 4 ) / 1217, 3638, 2058,
    $                   2685 /
    DATA               ( MM( 120, J ), J = 1, 4 ) / 1822, 3661, 692,
    $                   3745 /
    DATA               ( MM( 121, J ), J = 1, 4 ) / 1245, 327, 1194,
    $                   2325 /
    DATA               ( MM( 122, J ), J = 1, 4 ) / 2252, 3660, 20,
    $                   3609 /
    DATA               ( MM( 123, J ), J = 1, 4 ) / 3904, 716, 3285,
    $                   3821 /
    DATA               ( MM( 124, J ), J = 1, 4 ) / 2774, 1842, 2046,
    $                   3537 /
    DATA               ( MM( 125, J ), J = 1, 4 ) / 997, 3987, 2107,
    $                   517 /
    DATA               ( MM( 126, J ), J = 1, 4 ) / 2573, 1368, 3508,
    $                   3017 /
    DATA               ( MM( 127, J ), J = 1, 4 ) / 1148, 1848, 3525,
    $                   2141 /
    DATA               ( MM( 128, J ), J = 1, 4 ) / 545, 2366, 3801,
    $                   1537 /
    """
    
    lines = split(code, "\n"; keepempty=false)
    for i = 1:2:length(lines)
      l1 = lines[i]
      l2 = lines[i+1] # next line  with last piece
      vals13str = split(l1, "/")[2] # values 1 to 3
      vals13 = parse.(Int, split(vals13str, ","; keepempty=false))
      vals14 = push!(vals13, parse(Int, split(l2[2:end])[1]))
      println("tuple(", join(vals14, ","), "),")
    end
    
  • ARPACK Sorting Take 2 (Part ... blah in the ongoing series...)

    Last post on sorting was a disaster. I tried to do something quick and simple in Julia and ran into all sorts of issues with things I didn’t quite understand and breaks in my mental model about how Julia works. This was just to try and write the real julia code for the following task: sort an array and take another along for the ride as is done in the following code.

    This post is straightforward. We are simply going to port this code.

      c-----------------------------------------------------------------------
      c\BeginDoc
      c
      c\Name: dsortr
      c
      c\Description:
      c  Sort the array X1 in the order specified by WHICH and optionally
      c  applies the permutation to the array X2.
      c
      c\Usage:
      c  call dsortr
      c     ( WHICH, APPLY, N, X1, X2 )
      c
      c\Arguments
      c  WHICH   Character*2.  (Input)
      c          'LM' -> X1 is sorted into increasing order of magnitude.
      c          'SM' -> X1 is sorted into decreasing order of magnitude.
      c          'LA' -> X1 is sorted into increasing order of algebraic.
      c          'SA' -> X1 is sorted into decreasing order of algebraic.
      c
      c  APPLY   Logical.  (Input)
      c          APPLY = .TRUE.  -> apply the sorted order to X2.
      c          APPLY = .FALSE. -> do not apply the sorted order to X2.
      c
      c  N       Integer.  (INPUT)
      c          Size of the arrays.
      c
      c  X1      Double precision array of length N.  (INPUT/OUTPUT)
      c          The array to be sorted.
      c
      c  X2      Double precision array of length N.  (INPUT/OUTPUT)
      c          Only referenced if APPLY = .TRUE.
      c
      c\EndDoc
      c
      c-----------------------------------------------------------------------
      c
      c\BeginLib
      c
      c\Author
      c     Danny Sorensen               Phuong Vu
      c     Richard Lehoucq              CRPC / Rice University
      c     Dept. of Computational &     Houston, Texas
      c     Applied Mathematics
      c     Rice University
      c     Houston, Texas
      c
      c\Revision history:
      c     12/16/93: Version ' 2.1'.
      c               Adapted from the sort routine in LANSO.
      c
      c\SCCS Information: @(#)
      c FILE: sortr.F   SID: 2.3   DATE OF SID: 4/19/96   RELEASE: 2
      c
      c\EndLib
      c
      c-----------------------------------------------------------------------
      c
            subroutine dsortr (which, apply, n, x1, x2)
      c
      c     %------------------%
      c     | Scalar Arguments |
      c     %------------------%
      c
            character*2 which
            logical    apply
            integer    n
      c
      c     %-----------------%
      c     | Array Arguments |
      c     %-----------------%
      c
            Double precision
           &           x1(0:n-1), x2(0:n-1)
      c
      c     %---------------%
      c     | Local Scalars |
      c     %---------------%
      c
            integer    i, igap, j
            Double precision
           &           temp
      c
      c     %-----------------------%
      c     | Executable Statements |
      c     %-----------------------%
      c
            igap = n / 2
      c
             if (which .eq. 'SA') then
      c
      c        X1 is sorted into decreasing order of algebraic.
      c
         10    continue
               if (igap .eq. 0) go to 9000
               do 30 i = igap, n-1
                  j = i-igap
         20       continue
      c
                  if (j.lt.0) go to 30
      c
                  if (x1(j).lt.x1(j+igap)) then
                     temp = x1(j)
                     x1(j) = x1(j+igap)
                     x1(j+igap) = temp
                     if (apply) then
                        temp = x2(j)
                        x2(j) = x2(j+igap)
                        x2(j+igap) = temp
                     end if
                  else
                     go to 30
                  endif
                  j = j-igap
                  go to 20
         30    continue
               igap = igap / 2
               go to 10
      c
            else if (which .eq. 'SM') then
      c
      c        X1 is sorted into decreasing order of magnitude.
      c
         40    continue
               if (igap .eq. 0) go to 9000
               do 60 i = igap, n-1
                  j = i-igap
         50       continue
      c
                  if (j.lt.0) go to 60
      c
                  if (abs(x1(j)).lt.abs(x1(j+igap))) then
                     temp = x1(j)
                     x1(j) = x1(j+igap)
                     x1(j+igap) = temp
                     if (apply) then
                        temp = x2(j)
                        x2(j) = x2(j+igap)
                        x2(j+igap) = temp
                     end if
                  else
                     go to 60
                  endif
                  j = j-igap
                  go to 50
         60    continue
               igap = igap / 2
               go to 40
      c
            else if (which .eq. 'LA') then
      c
      c        X1 is sorted into increasing order of algebraic.
      c
         70    continue
               if (igap .eq. 0) go to 9000
               do 90 i = igap, n-1
                  j = i-igap
         80       continue
      c
                  if (j.lt.0) go to 90
      c
                  if (x1(j).gt.x1(j+igap)) then
                     temp = x1(j)
                     x1(j) = x1(j+igap)
                     x1(j+igap) = temp
                     if (apply) then
                        temp = x2(j)
                        x2(j) = x2(j+igap)
                        x2(j+igap) = temp
                     end if
                  else
                     go to 90
                  endif
                  j = j-igap
                  go to 80
         90    continue
               igap = igap / 2
               go to 70
      c
            else if (which .eq. 'LM') then
      c
      c        X1 is sorted into increasing order of magnitude.
      c
        100    continue
               if (igap .eq. 0) go to 9000
               do 120 i = igap, n-1
                  j = i-igap
        110       continue
      c
                  if (j.lt.0) go to 120
      c
                  if (abs(x1(j)).gt.abs(x1(j+igap))) then
                     temp = x1(j)
                     x1(j) = x1(j+igap)
                     x1(j+igap) = temp
                     if (apply) then
                        temp = x2(j)
                        x2(j) = x2(j+igap)
                        x2(j+igap) = temp
                     end if
                  else
                     go to 120
                  endif
                  j = j-igap
                  go to 110
        120    continue
               igap = igap / 2
               go to 100
            end if
      c
       9000 continue
            return
      c
      c     %---------------%
      c     | End of dsortr |
      c     %---------------%
      c
            end
    

    This code has the same routine four times with different compare functions. Let’s pull out one routine.

               if (which .eq. 'SA') then
      c
      c        X1 is sorted into decreasing order of algebraic.
      c
         10    continue
               if (igap .eq. 0) go to 9000
               do 30 i = igap, n-1
                  j = i-igap
         20       continue
      c
                  if (j.lt.0) go to 30
      c
                  if (x1(j).lt.x1(j+igap)) then
                     temp = x1(j)
                     x1(j) = x1(j+igap)
                     x1(j+igap) = temp
                     if (apply) then
                        temp = x2(j)
                        x2(j) = x2(j+igap)
                        x2(j+igap) = temp
                     end if
                  else
                     go to 30
                  endif
                  j = j-igap
                  go to 20
         30    continue
               igap = igap / 2
               go to 10
      c
            else if (which .eq. 'SM') then
    

    Things to note. First, this uses Fortran’s flexible indexing to switch to C-based indices. While this is possible in Julia, we are just going to do it manually. Again, we are trying to keep this simple. (Despite the last blog post.)

    First step is just to unwrap the loops and GOTOs. In this case, I find an outer to inner strategy is helpful.

    The outer loop (line number 10) seems to be the following

    while igap != 0
      # stuff
      igap = div(igap,2)
    end
    

    The next loops (line number 20 and 30) seems to be encoding the pattern.

    for i=igap:(n-1)
      j = i-igap
      while j > 0
        if cmp(j,j+igap)
          # swap
        else
          break # out of while loop
        end
        j = j-igap
      end
    end
    

    Now, we assemble these into a simple function. Other changes made are: * Predefined strings become symbols. * Rather than repeating the code 4 times, we use functions to handle the comparison * Add a new macro to handle checking the length of input vectors.

      """
      Sort the array X1 in the order specified by WHICH and optionally
      applies the permutation to the array X2.
    
      Usage:
      call dsortr
         ( WHICH, APPLY, N, X1, X2 )
    
      Arguments
      WHICH   Symbol  (Input)
              :LM -> X1 is sorted into increasing order of magnitude.
              :SM -> X1 is sorted into decreasing order of magnitude.
              :LA -> X1 is sorted into increasing order of algebraic.
              :SA -> X1 is sorted into decreasing order of algebraic.
    
      APPLY   Boolean.  (Input)
              APPLY = .TRUE.  -> apply the sorted order to X2.
              APPLY = .FALSE. -> do not apply the sorted order to X2.
    
      N       Integer.  (INPUT)
              Size of the arrays.
    
      X1      Double precision array of length N.  (INPUT/OUTPUT)
              The array to be sorted.
    
      X2      Double precision array of length N.  (INPUT/OUTPUT)
              Only referenced if APPLY = .TRUE.
    
      Returns: nothing
    
      """
      function dsortr(
        which::Symbol, # Input
        apply::Bool, # Input
        n::Int, # Input
        x1::Vector{Float64}, # Input/Output
        x2::Vector{Float64}, # Input/Output
        )
    
        @jl_arpack_check_length(x1, n)
        if apply
          @jl_arpack_check_length(x2, n)
        end
    
        igap = div(n,2) # integer division
    
        # the ARPACK sorting routine is all the same, just changes the
        # comparison, Julia gives us a way of stating that very easily.
        if which == :SA
          cmp = (x::Float64, y::Float64) -> (x < y)
        elseif which == :SM
          cmp = (x::Float64, y::Float64) -> (abs(x) < abs(y))
        elseif which == :LA
          cmp = (x::Float64, y::Float64) -> (x > y)
        elseif which == :LM
          cmp = (x::Float64, y::Float64) -> (abs(x) > abs(y))
        end
    
        while igap != 0
          for i=igap:n-1
            j = i-igap
            while j >= 0
              # in the fortran code, it uses indices 0, n-1
              if cmp(x1[j+1],x1[j+igap+1])
                x1[j+1],x1[j+igap+1] = x1[j+igap+1],x1[j+1]
                if apply
                  x2[j+1],x2[j+igap+1] = x2[j+igap+1],x2[j+1]
                end
              else
                break # go to 30, which means to exit while loop
              end
              j = j - igap
            end
          end
          igap = div(igap, 2)
        end
      end
    

    The macro for parameter checking is pretty simple, although it took me about an hour to get all the various options correct.

    macro jl_arpack_check_length(var,len)
      varstr = string(var)
      lenstr = string(len)
      errstr1 = "range 1:$lenstr="
      errstr2 = " out of bounds for $varstr of length "
    
      return esc( quote
        if $len > length($var)
          throw(ArgumentError(string($errstr1, $len, $errstr2, length($var))))
        end
      end)
    end
    

    To get the “string” command, I had to look at

    x = "Hi"
    mytest(x) = "My string $x"
    @code_lowered mytest(x)
    

    To find how the string interpolation actually gets lowered to the string command.

    Next up was testing. I wanted to run some tests on all permutation vectors.

    While I could add a dependency on Combinatorics.jl to the testing repository, I wanted to keep this simple. We don’t need it to be terribly efficient. Here is what I came up with.

      function all_permutations(v::Vector{Float64};
          all::Vector{Vector{Float64}}=Vector{Vector{Float64}}(),
          prefix::Vector{Float64}=zeros(0))
    
        if length(v) == 0
          push!(all, prefix)
        end
    
        for i in eachindex(v)
          all_permutations(deleteat!(copy(v), i); prefix=push!(copy(prefix), v[i]), all)
        end
        return all
      end
    

    Then our testing code runs everything in that set to test and make sure we get the correct sort.

    A question

    What is this sorting algorithm doing? I just ported the code. It seems to be sorting elements divided by a gap. (igap) Then it merges these sorted lists into longer sorted lists.

    Some history

    There is a comment in the code that references LANSO. Here is the original routine from LANSO (found in svdpack by Mike Berry). So we might have to chat with Beresford Parlett for more information on this algorithm.

      C
      C @(#)DSORT2.F    3.2 (BNP) 12/9/88
      C
            SUBROUTINE DSORT2(N,ARRAY1,ARRAY2)
            INTEGER N
            REAL*8 ARRAY1(0:N-1),ARRAY2(0:N-1)
      C
      C.... SORT ARRAY1 AND ARRAY2 INTO INCREASING ORDER FOR ARRAY1
      C
            INTEGER IGAP,I,J
            REAL*8 TEMP
      C
            IGAP = N/2
       10   IF (IGAP.GT.0) THEN
              DO 200 I = IGAP,N-1
                J = I-IGAP
       50       IF (J.GE.0.AND.ARRAY1(J).GT.ARRAY1(J+IGAP)) THEN
                    TEMP = ARRAY1(J)
                    ARRAY1(J) = ARRAY1(J+IGAP)
                    ARRAY1(J+IGAP) = TEMP
                    TEMP = ARRAY2(J)
                    ARRAY2(J) = ARRAY2(J+IGAP)
                    ARRAY2(J+IGAP) = TEMP
                ELSE
                  GO TO 200
                ENDIF
                J = J-IGAP
                GO TO 50
       200    CONTINUE
            ELSE
              RETURN
            ENDIF
            IGAP = IGAP/2
            GO TO 10
            END
    
  • Me on learning about `save` variables in Fortran and accessing them from Julia. (Part ... something ... in my ongoing segment on porting ARPACK to Julia)

    Huh, what are these things in a FORTRAN code marked saveOh… jeez, is that persistent state in a subroutine? i.e. like global variables? ugh, that is going to be annoying to port. (More on a strategy in a bit.)

    e.g. dgetv0.f in ARPACK

      save       first, iseed, inits, iter, msglvl, orth, rnorm0
    

    How could these possibly be implemented in a shared library? Well, most likely they are just fixed offsets in memory somewhere that the subroutine knows about and manipulates. So can I see them from Julia?

    • Try dumping the dylib/so file to find if these are listed in there. (They ought to be as they should take up persistent locations in memory…) (Any of the following commands show them…)

      nm -a libarpack.2.0.0.dylib
      objdump -p libarpack.2.0.0.dylib
      dsymutil -s libarpack.2.0.0.dylib
      nm -nm libarpack.2.0.0.dylib
      
    • They are! e.g. In this case, I picked iseed which is the ARPACK random number seed/state.

      dgleich@circulant test-arpack-state % nm -nm /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | grep iseed
      000000000005d060 (__DATA,__bss5) non-external _iseed.3637
      000000000005d080 (__DATA,__bss5) non-external _iseed.3636
      000000000005d0a0 (__DATA,__bss5) non-external _iseed.3636
      000000000005d0c0 (__DATA,__bss5) non-external _iseed.3637
      

    Well, there are a few things listed here… but probably something we can resolve via code.

    • Try and lookup symbols via dlsym

      using Arpack
      using Arpack_jll
      using LinearAlgebra
      libar = Base.Libc.dlopen(Arpack_jll.libarpack)
      ## Try and get offset of private symbol directly
      offset = Base.Libc.dlsym(libar, "iseed.3637")
      offset = Base.Libc.dlsym(libar, "_iseed.3637")
      offset = Base.Libc.dlsym(libar, "iseed")
      

      doesn’t work because `dlsym only has exported/public symbols accessible. (Why, I don’t know because they are in the darn file, so it’s like they want to make it harder to access things… which I guess is a good thing?)

    • Try and look at offsets from the load base. e.g. if the dylib/so file says that function dgetv0 is at address 0x0010f00 and I load that at address 0x5020e00, then the offset to other addresses should be the same… which it is for functions (easy to verify)

      dgleich@circulant test-arpack-state % nm -nm /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | grep getv0
      nm -nm /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | grep getv0
      
      000000000001f2a0 (__TEXT,__text) external _dgetv0_
      
      julia> offset = Base.Libc.dlsym(libar, "dgetv0_")
      Ptr{Nothing} @0x0000000170bac2a0
      
      julia> base_offset = offset - 0x000000000001f2a0
      Ptr{Nothing} @0x0000000170b8d000
      

    So we have a base offset, which means that other functions and symbols should respect that…

        dgleich@circulant test-arpack-state % nm -nm /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | grep arscnd
        0000000000008a00 (__TEXT,__text) external _arscnd_
    
        julia> arscnd_offset = Base.Libc.dlsym(libar, "arscnd_")
        Ptr{Nothing} @0x0000000170b95a00
    
        julia> arscnd_offset == 0x0000000000008a00 + base_offset
        true
    
    Awesome, so let's check out some of those iseed variables...
    
        julia> iseed_offsets = [0x5d060, 0x5d080, 0x5d0a0, 0x5d0c0]
        julia> iseeds = Ptr{Int64}.(base_offset .+ iseed_offsets)
    
        julia> unsafe_load.(iseeds)
        4-element Vector{Int64}:
         0
         0
         0
         0
    
        julia> unsafe_load.(iseeds .+ 8)
        4-element Vector{Int64}:
         0
         0
         0
         0
    
    Okay, so these iseed variables are currently 0, because we haven't
    run anything yet. (Internally in `dgetv0` they are used to track
    the state of the random number generator... )
    
        julia> using Arpack; eigs(randn(50,50))
        julia> unsafe_load.(iseeds)
    
    Hmm. That shows the same values. Maybe another set? ARPACK also uses
    `first` as some one-time routine initialization tracked with `save`
    variables.
    
        $ nm -nm /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | grep first
    
        julia> first_offsets="""000000000005c398 (__DATA,__data) non-external _first.3636
        000000000005c3a0 (__DATA,__data) non-external _first.3637
        000000000005c3b0 (__DATA,__data) non-external _first.3635
        000000000005c3b8 (__DATA,__data) non-external _first.3640
        000000000005c3c0 (__DATA,__data) non-external _first.3634
        000000000005c3c8 (__DATA,__data) non-external _first.3639
        000000000005c3d8 (__DATA,__data) non-external _first.3635
        000000000005c3e0 (__DATA,__data) non-external _first.3640
        000000000005c3e8 (__DATA,__data) non-external _first.3634
        000000000005c3f0 (__DATA,__data) non-external _first.3639
        000000000005c400 (__DATA,__data) non-external _first.3636
        000000000005c408 (__DATA,__data) non-external _first.3637
        000000000005c780 (__DATA,__bss3) non-external _first.3633
        000000000005c8d8 (__DATA,__bss3) non-external _first.3632
        000000000005cc08 (__DATA,__bss3) non-external _first.3632
        000000000005cea8 (__DATA,__bss3) non-external _first.3633""" |>
          x->split(x,"\n") |>
          x->map(y->y[1:16],x) |>
          x->parse.(UInt64, x[1:16];base=16)
    
    (Sorry for syntax fancyness, `"0x001" |> x->parse(Int,x)` just creates a little
    mini-function and chains a set of function evaluations. Handy for little
    sequences like that where I split, then break apart, then parse each.)
    
        julia> firsts = Ptr{Int64}.(base_offset .+ first_offsets)
        julia> unsafe_load.(firsts)
        16-element Vector{Int64}:
         1
         1
         1
         1
         1
         1
         1
         1
         1
         1
         1
         1
         0
         0
         0
         0
    
    Which means that everything hasn't really been run yet. (This is the
    same right after loading the library. The 0's at the end
    are there as well.)
    
    • So, hmm… that doesn’t seem to work. (Or at least I don’t see changes I expect to see…) But still the function addresses are exactly where I predict them.

    • Try and check out dissambled routines for help. Learn how to get dissassembled code from objdump

      dgleich@circulant test-arpack-state % objdump -D /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | less    
      
      000000000001f2a0 _dgetv0_:
         1f2a0: 41 57                         pushq   %r15
         1f2a2: 41 56                         pushq   %r14
         1f2a4: 49 89 d7                      movq    %rdx, %r15
         1f2a7: 41 55                         pushq   %r13
         1f2a9: 41 54                         pushq   %r12
         1f2ab: 49 89 fd                      movq    %rdi, %r13
         1f2ae: 55                            pushq   %rbp
         1f2af: 53                            pushq   %rbx
         1f2b0: 49 89 f4                      movq    %rsi, %r12
         1f2b3: 4c 89 c3                      movq    %r8, %rbx
         1f2b6: 4d 89 ce                      movq    %r9, %r14
         1f2b9: 48 83 ec 28                   subq    $40, %rsp
         1f2bd: 48 83 3d e3 d0 03 00 00       cmpq    $0, 250083(%rip)
         1f2c5: 74 29                         je      41 <_dgetv0_+0x50>
         1f2c7: 66 0f 6f 05 31 f4 02 00       movdqa  193585(%rip), %xmm0
         1f2cf: 48 c7 05 ce d0 03 00 00 00 00 00      movq    $0, 250062(%rip)
         1f2da: 0f 29 05 9f dd 03 00          movaps  %xmm0, 253343(%rip)
         1f2e1: 66 0f 6f 05 27 f4 02 00       movdqa  193575(%rip), %xmm0
         1f2e9: 0f 29 05 a0 dd 03 00          movaps  %xmm0, 253344(%rip)
         1f2f0: 49 83 7d 00 00                cmpq    $0, (%r13)
         1f2f5: 0f 84 85 02 00 00             je      645 <_dgetv0_+0x2e0>
         1f2fb: 41 0f b6 04 24                movzbl  (%r12), %eax
      

      In case you want a spoiler, it’s cmpq $0, 250083(%rip) that checks “if first == 1” on the first subroutine call and so the address of first is 0x1f2c5 + 250083 (the NEXT instruction offset + %rip offset) so the offset is actually 0x0005c3a8 (just add the numbers and write in hex).

      (This is actually off by just a bit because of how Fortran seems to store logicals. There are a few details I don’t fully understand but it’s close enough I’m not worried. )

    • Next up. Check code on Linux, the julia code I worked on above seems to work there????!??!? Everything I was doing above should really have worked?? Amazing. (Just had to pull new offsets from that libarpack… )

    • But wait, gosh, why doesn’t this work on the mac I was sitting in front of? Is this some weirdo Mac dlsym loader/protection system that randomizes offsets… and keeps it from working on the mac?

    • Develop C/C++ code to do the same, verify that the addresses I’m getting for functions seem correct. (Meanwhile, learn how to use lldb…!) Okay, everything lldb seems to be doing is exactly what I think ought to work. (See the C/C++ code below.)

    • Check that I can use the C++ code and pointer offsets as I’m doing in Julia and on the mac in order to

    • Learn more details about position independent code and why everything is in terms of offsets from the instruction pointer %rip. (See that disassembled code above for an example)

    • Start learning about ccall, julia library loading. (Hoping to find some pointer as to why there would be this weirdness… )

    • Run into this thread on dlsym,dlopen

    • Double check to see that I’ve gotten the right library loaded…

      julia> Base.Libc.dllist()
      152-element Vector{String}:
       "/usr/lib/libSystem.B.dylib"
       "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/libjulia.1.6.dylib"
       "/usr/lib/system/libcache.dylib"
       "/usr/lib/system/libcommonCrypto.dylib"
       "/usr/lib/system/libcompiler_rt.dylib"
       "/usr/lib/system/libcopyfile.dylib"
       "/usr/lib/system/libcorecrypto.dylib"
       "/usr/lib/system/libdispatch.dylib"
       "/usr/lib/system/libdyld.dylib"
       "/usr/lib/system/libkeymgr.dylib"
       ⋮
       "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libmbedx509.2.24.0.dylib"
       "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libmbedcrypto.2.24.0.dylib"
       "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libdSFMT.dylib"
       "/System/Library/PrivateFrameworks/DebugSymbols.framework/Versions/A/DebugSymbols"
       "/System/Library/PrivateFrameworks/CoreServicesInternal.framework/Versions/A/CoreServicesInternal"
       "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libstdc++.6.dylib"
       "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libgomp.1.dylib"
       "/Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib"
       "/Users/dgleich/.julia/packages/Arpack/zCmTA/deps/usr/lib/libarpack.2.0.0.dylib"
      
    • Oh…

    • You see that? In the bottom there? There are two libarpacks listed.

    • I was looking at the wrong one. :(

    • 🤦

    • Just for kicks now…

      dgleich@circulant test-arpack-state % nm -nm /Users/dgleich/.julia/packages/Arpack/zCmTA/deps/usr/lib/libarpack.2.0.0.dylib | grep iseed
      000000000005afa0 (__DATA,__bss5) non-external _iseed.3637
      000000000005afc0 (__DATA,__bss5) non-external _iseed.3636
      000000000005afe0 (__DATA,__bss5) non-external _iseed.3636
      000000000005b000 (__DATA,__bss5) non-external _iseed.3637
      
      julia> libar = Base.Libc.dlopen("/Users/dgleich/.julia/packages/Arpack/zCmTA/deps/usr/lib/libarpack.2.0.0.dylib")
      julia> offset = Base.Libc.dlsym(libar, "dgetv0_")
      Ptr{Nothing} @0x000000017596f730
      
      dgleich@circulant test-arpack-state % nm -nm /Users/dgleich/.julia/packages/Arpack/zCmTA/deps/usr/lib/libarpack.2.0.0.dylib | grep dgetv0
      000000000001e730 (__TEXT,__text) external _dgetv0_
      
      julia> base_offset = offset - 0x000000000001e730
      Ptr{Nothing} @0x0000000175951000
      
      julia> libar = Base.Libc.dlopen(Arpack_jll.libarpack)
      julia> iseed_offsets = [0x5afa0, 0x5afc0, 0x5afe0, 0x5b000]
      julia> iseeds = Ptr{Int64}.(base_offset .+ iseed_offsets)
      
      julia> unsafe_load.(iseeds)
      
      julia> unsafe_load.(iseeds)
      4-element Vector{Int64}:
          0
       1626
          0
          0
      
      julia> eigs(randn(50,50))
      julia>  unsafe_load.(iseeds)
      4-element Vector{Int64}:
         0
       561
         0
         0
      
    • 🤦 indeed. (This took way more time than I would have liked. Approx. a full day.)

    • Should have trusted the fact that this all worked in Linux, where somehow Arpack is using the same library as Arpack_jll

    • One other thing I learned. On the macos build of arpack. There is no timing infomation collected. The arscnd function (which should return some time value in seconds) just is a no-op.

      0000000000008a00 _arscnd_:
          8a00: c7 07 00 00 00 00             movl    $0, (%rdi)
          8a06: c3                            retq
          8a07: 90                            nop
          8a08: 90                            nop
          8a09: 90                            nop
          8a0a: 90                            nop
          8a0b: 90                            nop
          8a0c: 90                            nop
          8a0d: 90                            nop
          8a0e: 90                            nop
          8a0f: 90                            nop
      

      (Or rather, this seems to write 0 to the location passed into arscnd!)

      I learned this, of course, because one of the thing arpack does is collect timing information to persist between function calls so it can time user-code via the call-backs and such.

      So in debugging, I was often looking at locations where time information ought to be stored. Finding it continually empty was … surprising.

    • Another thing I learned, you can access the common blocks of FORTRAN code via cglobal in Julia.

      julia> debug = cglobal(("debug_", "/Users/dgleich/.julia/packages/Arpack/zCmTA/deps/usr/lib/libarpack.2.0.0.dylib"),Int64)
      Ptr{Int64} @0x00000001759ac020
      

      Documentation and structure of debug block here. But we can use it to turn on lots of fun ARPACK debug information!

      julia> unsafe_store!(debug, 2, 11) # mnaupd turn on debugging for mnaupd (non-symmetric solver)
      julia> unsafe_store!(debug, -14, 2) # ndigit - use 14 digits of precision (ndigit)
      julia> unsafe_store!(debug, 4, 3) # mgetv0 show lots of information from dgetv0
      julia> unsafe_store!(debug, 6, 1) # logfile set logfil to 6, the default stdout
      
      julia> eigs(randn(50,50))
      
       _getv0: B-norm of initial / restarted starting vector
       -----------------------------------------------------
          1 -    1:      4.1596362767673D+00
      
      
       _getv0: initial / restarted starting vector
       -------------------------------------------
          1 -    2:     -8.3153140665625D-01     8.3995344740674D-01
          3 -    4:      9.6704039685513D-01     3.6866117348409D-01
          5 -    6:      6.0284542702433D-03     2.0318894267245D-01
          7 -    8:     -1.3495066933914D-01     4.1078102410386D-01
          9 -   10:      4.3888096465232D-01    -5.7522033107717D-01
         11 -   12:     -8.7293721104508D-01     9.8788590316195D-01
         13 -   14:     -1.2282299641820D-01     7.4369927039981D-01
         15 -   16:      8.7906957254355D-01    -4.7762629697247D-01
         17 -   18:     -4.7162805784843D-02     1.1706962983657D-01
         19 -   20:      7.8492285203887D-01     2.6127819008416D-01
         21 -   22:     -8.2777335418346D-01    -6.5745389045117D-01
         23 -   24:      4.4467420164616D-01     9.9532171436879D-01
         25 -   26:      6.9740537844088D-01    -9.2167903292103D-01
         27 -   28:     -9.1859980666224D-01    -7.1191753019861D-01
         29 -   30:     -1.5899117215995D-01     1.5815438448009D-01
         31 -   32:     -5.1806821604240D-01     3.8970905952163D-01
         33 -   34:     -8.7740030659049D-01    -5.7437967251424D-02
         35 -   36:      7.9785742115958D-01     3.1842249389620D-01
         37 -   38:     -6.1567371152282D-01    -2.1826976841397D-01
         39 -   40:      8.0618226869270D-01     2.7929452805704D-01
         41 -   42:     -7.8813902002933D-01     1.7058205959301D-01
         43 -   44:     -4.0247429259527D-01    -6.0433106862224D-01
         45 -   46:     -4.1118723651662D-01    -3.4363227883103D-01
         47 -   48:     -4.4783519455385D-01     2.1286373092316D-02
         49 -   50:      2.9004800999430D-01     4.4388758381478D-01
      
      
       _naupd: Number of update iterations taken
       -----------------------------------------
          1 -    1:              17
      
      
       _naupd: Number of wanted "converged" Ritz values
       ------------------------------------------------
          1 -    1:               6
      
      
       _naupd: Real part of the final Ritz values
       ------------------------------------------
          1 -    2:     -5.7794067256680D+00    -5.7794067256680D+00
          3 -    4:     -7.2262688525223D+00     6.8333693387731D+00
          5 -    6:      6.8333693387731D+00     7.6197283464233D+00
      
      
       _naupd: Imaginary part of the final Ritz values
       -----------------------------------------------
          1 -    2:      2.7424220041744D+00    -2.7424220041744D+00
          3 -    4:      0.0000000000000D+00     2.3888574879058D+00
          5 -    6:     -2.3888574879058D+00     0.0000000000000D+00
      
      
       _naupd: Associated Ritz estimates
       ---------------------------------
          1 -    2:      5.0738750651244D-16     5.0738750651244D-16
          3 -    4:      2.5131234631038D-25     1.2315725733964D-30
          5 -    6:      1.2315725733964D-30     1.0152776765181D-35
      
      
      
           =============================================
           = Nonsymmetric implicit Arnoldi update code =
           = Version Number:  2.4                      =
           = Version Date:    07/31/96                 =
           =============================================
           = Summary of timing statistics              =
           =============================================
      
      
           Total number update iterations             =    17
           Total number of OP*x operations            =   206
           Total number of B*x operations             =     0
           Total number of reorthogonalization steps  =    35
           Total number of iterative refinement steps =     0
           Total number of restart steps              =     0
           Total time in user OP*x operation          =     0.000000
           Total time in user B*x operation           =     0.000000
           Total time in Arnoldi update routine       =     0.000000
           Total time in naup2 routine                =     0.000000
           Total time in basic Arnoldi iteration loop =     0.000000
           Total time in reorthogonalization phase    =     0.000000
           Total time in (re)start vector generation  =     0.000000
           Total time in Hessenberg eig. subproblem   =     0.000000
           Total time in getting the shifts           =     0.000000
           Total time in applying the shifts          =     0.000000
           Total time in convergence testing          =     0.000000
           Total time in computing final Ritz vectors =     0.000000
      

    Other references - https://github.com/conan-io/conan-center-index/issues/4696 - https://stackoverflow.com/questions/22769246/how-to-disassemble-one-single-function-using-objdump - https://eli.thegreenplace.net/2011/11/11/position-independent-code-pic-in-shared-libraries-on-x64 - https://github.com/conan-io/conan-center-index/issues/4696

    C/C++ Code to check

    Compile with (edit to your own paths)

    clang -g -fPIC test-arpack-state-dlopen.c \
    -Wl,-rpath /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib \
    -Wl,-rpath /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia \
        -o mytest-dlopen
    

    And here’s the code. This just calls ARPACK once to see what happens. In this case, I figured out exactly which offset to first to use (it was the first one, hah!)

    /* test_arpack_state */
    #include <inttypes.h>
    typedef int64_t blasint;
    typedef void (*dsaupd_ptr)(
      blasint* ido,
      char* bmat,
      blasint* n,
      char* whichstr,
      blasint* nev,
      double* tol,
      double* resid,
      blasint* ncv,
      double* v,
      blasint* ldv,
      blasint* iparam,
      blasint* ipntr,
      double* workd,
      double* workl,
      blasint* lworkl,
      blasint* info);
    
    #include <printf.h>
    #include <dlfcn.h>
    
    int main(int argc, char** argv) {
    
      blasint nev = 3;
      double tol = -1.0;
      blasint maxiter = 500;
    
      blasint n = 25;
      double v0[25] = {0};
    
      blasint ncv = 20;
      double v[25*20];
      double workd[25*3];
      double workl[20*(20+8)];
      double resid[25];
      blasint lworkl = 20*(20+8);
    
      blasint info = 0;
    
      blasint iparam[11] = {0};
      blasint ipntr[11] = {0};
      blasint ido = 0;
    
      iparam[0] = 1;
      iparam[2] = maxiter;
      iparam[6] = 1;
    
      char bmat[] = "I";
      char whichstr[] = "SA";
    
      blasint ldv = 25;
    
      void* libar = dlopen("/Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib", RTLD_LAZY|RTLD_GLOBAL);
    
      if (libar) {
        printf("Loaded libarpack\n");
      }
    
      dsaupd_ptr pfun = dlsym(libar, "dsaupd_");
    
      if (pfun) {
        printf("Loaded dsaupd_\n");
      }
    
      void* base_offset = pfun - 0x000000000002acb0; // offset from objdump.
      double* dsaupd_t0 = base_offset + 0x000000000005c604;  // offset from objdump.
      blasint* logfil = base_offset + 0x000000000005c420;
      blasint* ndig = base_offset + 0x000000000005c420+8;
      blasint* mgetv0 = base_offset + 0x000000000005c420+16;
      blasint* msaupd = base_offset + 0x000000000005c420+24;
      *mgetv0 = 2;
      *msaupd = 2;
      blasint* dgetv0_first1 = base_offset + 0x000000000005c3a0+8;
      blasint* dgetv0_first2 = base_offset + 0x000000000005c408+8;
      printf("logfil = %i\n", (int)*logfil);
      printf("ndigit = %i\n", (int)*ndig);
      printf("dgetv0_first1 = %i\n", (int)*dgetv0_first1);
      printf("dgetv0_first2 = %i\n", (int)*dgetv0_first2);
    
      pfun(&ido, bmat, &n, whichstr, &nev, &tol, resid, &ncv, v, &ldv, iparam,
        ipntr, workd, workl, &lworkl, &info );
    
      printf("base_offset = %p\n", base_offset);
      printf("dsapud = %p\n", pfun);
      printf("dsapud_t0 = %p\n", dsaupd_t0);
      printf("info = %i\n", (int)info);
      printf("ido = %i\n", (int)ido);
      printf("dsaupd_t0 = %lf\n", (double)*dsaupd_t0);
    
      printf("dgetv0_first1 = %i\n", (int)*dgetv0_first1);
      printf("dgetv0_first2 = %i\n", (int)*dgetv0_first2);
    
      //dlclose(libar);
    
      return 0;
    }
    
  • "In-place" sorting allocates half-sized temporary arrays in Julia for non-numeric types.

    I hope my confusion at the difference between the memory allocation behavior of the following codes can be forgiven.

    julia> using BenchmarkTools
    julia> @btime sort!(v) setup=(v=randn(100));
      268.868 ns (0 allocations: 0 bytes)
    julia> @btime sort!(v) setup=(v=tuple.(randn(100)));
      714.639 ns (2 allocations: 528 bytes)
    

    Julia often uses the ! symbol on a function to indicate “in-place” but this also is usually a hint that it’s “non-allocating” – which is great if I want to sort things in place without extra memory allocation. (See my previous post.)

    (Strictly speaking, this ! simply means ‘mutating’ which means that the function changes the input arguments.)

    Neither of these is a contract. (e.g. something in-place might be allocating/non-allocating depending on the types of array element.)

    The documentation of sort! simple mentions in-place Sort the vector v in place.

    However, in this case, tuple elements can be moved around without allocating. So I, for one, would expect the codes to share the same amount of memory traffic and allocations.

    After much debugging, what I found is that julia defaults to QuickSort for numeric types, which is strictly non-allocating whereas it uses MergeSort for non-numeric types. (Yes, the change in algorithms is in the documentation for sort!, yay! What isn’t there is the implication on memory allocations.)

    julia> @btime sort!(v;alg=QuickSort) setup=(v=tuple.(randn(100))); 947.789 ns (0 allocations: 0 bytes)

    Moreover, this behavior depends only on the element type of the vector, not on the actual values sorted.

    julia> @btime sort!(v,by=first) setup=(v=tuple.(randn(100)));
      790.624 ns (2 allocations: 528 bytes)
    julia> @btime sort!(v;alg=QuickSort,by=first) setup=(v=tuple.(randn(100)));
      947.789 ns (0 allocations: 0 bytes)
    

    (In this case, by=first means we should use the first element, which means Julia is conceptually sorting a numeric type.)

    I think this particular example breaks an implicit user contract. By calling sort! I’m asking Julia not to allocate extra memory in its algorithm. To have a default algorithm choice that will allocate is wrong from a working mental model point of view. Moreover, if this isn’t the case, then the docs for sort! ought to be very strongly worded that sort! can only be recommended for numeric types unless you change the default algorithm. (But then why is that the default choice for sort.)

    Finally, to the extend that the algorithm choice is influenced by the element type, it should apply to the actual element type used, not merely the element-type in memory!

    I realize these are tricky issues. e.g. MergeSort is tough to have in0place

    Many would be likely to say “premature optimization”, why am I fussed about this difference?

    Here’s the issue. I need a mental model for how code is translated or I can’t possibly ever be concerned with “optimization” and I must give up. In this case, a simplified mental model of what the tool / language is doing breaks. So making sure the “compiler / language / runtime” engage my mental model for what his happening – when I’m trying to design it for that setting is not premature optimization, it’s simply part of the spec!

    Moreover, it seems this issue has come up already more discussion

  • A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 6 - Sorting Take 1

    Next up on our list of routines to convert is the sorting routine dsortr. Again, this could be replaced with internal Julia stuff. However, this sorting routine does something that shows up quite often in practice. You want to sort array x, and take array y along for the ride!

    The pseudocode is simple:

      function sort_permute(x,y)
         p = sortperm(x)
         return x[p], y[p]
      end
      a = [5, 8, 9, 1, 4, 3, 2]
      b = [3., 4., 5., 2., 1., 9., 8.]
      sort_permute(a,b)
      ([1, 2, 3, 4, 5, 8, 9], [2.0, 8.0, 9.0, 1.0, 3.0, 4.0, 5.0])
    

    This is a recurring pattern. Here’s a link to a question on stack overflow.. That post references an old blog post (updated to Wayback machine). And hey, I think I know that fellow. Good to know that things haven’t changed too much in 15 years! (And that I’m still working on the same old stuff… hmm… maybe that isn’t so good.)

    So let’s solve this problem in Julia again. Or wait, let’s not. Off the top of my head, I don’t know how to do this in Julia without allocating the extra permutation vector. After glancing around sort.jl in base, it seems like you could probably do this by … something complicated by creating a new array type that wraps the pair of arrays.

    Uh oh, the problem with having an idea…

    struct SortPermutePair{U,V} <: AbstractArray{Tuple{U,V},1}
      x::Vector{U}
      y::Vector{V}
    end
    

    … trying to resist urge to test it …

    Base.size(z::SortPermutePair{U,V}) where {U,V} = (min(length(z.x),length(z.y)),)
    Base.getindex(z::SortPermutePair{U,V}, i::Integer) where {U,V} = (z.x[i], z.y[i])
    Base.setindex!(z::SortPermutePair{U,V}, v::Tuple, i::Integer) where {U,V} = begin z.x[i] = v[1];  z.y[i] = v[2] end
    

    … yeah, not gonna work and we’ll have to see how far I get here. Let’s see how to do this in Julia before actually porting the ARPACK routine.

    a = [5, 8, 9, 1, 4, 3, 2]
    b = [3., 4., 5., 2., 1., 9., 8.]
    sort!(SortPermutePair(a,b), by=x->x[1])
    

    So that wasn’t so bad! No extra memory allocated… (right?!?) This shouldn’t allocate extra memory…

    a = [5, 8, 9, 1, 4, 3, 2]
    b = [3., 4., 5., 2., 1., 9., 8.]
    @time sort!(SortPermutePair(a,b), by=x->x[1]);
    @time sort!(SortPermutePair(a,b), by=x->x[1]);
    
    julia> @time sort!(SortPermutePair(a,b), by=x->x[1]);
      0.050316 seconds (58.15 k allocations: 3.460 MiB, 99.04% compilation time)
    
    julia> @time sort!(SortPermutePair(a,b), by=x->x[1]);
      0.050027 seconds (58.15 k allocations: 3.460 MiB, 99.08% compilation time)
    

    Hmm. Well, that’ll have to wait for a future blog post to debug. My guess is that I need type annotations in all the functions, which is something we’ll return to in a future post. (Okay – I had to at least try @btime…) I’m very bad at tangents here! That’s the fun of this post series. I’m going to go into all of those and not edit them out.

    using BenchmarkTools
    @btime sort!(SortPermutePair($a,$b), by=x->x[1]); setup=begin
      a=[5, 8, 9, 1, 4, 3, 2]
      b=[3., 4., 5., 2., 1., 9., 8.]
    end
    

    This gives

    52.661 ns (1 allocation: 80 bytes)
    7-element Vector{Float64}:
    ...
    

    Well, that is slightly better. I’m not sure why we see one allocation. This should be doable with zero allocations. (I think it might somewhere be allocating that last output vector???)

    @btime sort!(a); setup=(a=randn(10))
      44.800 ns (0 allocations: 0 bytes)
    

    Maybe it’s a function boundary thing?

    function sort_permute!(x,y;kwargs...)
      sort!(SortPermutePair(x,y);kwargs...,by=first)
      x,y
    end   
    
    julia> @time sort_permute!(a,b)
      0.000006 seconds (2 allocations: 112 bytes)
    ([1, 2, 3, 4, 5, 8, 9], [2.0, 8.0, 9.0, 1.0, 3.0, 4.0, 5.0])   
    

    And this clearly has a problem with allocating extra memory somewhere

    julia> a = randn(10^5); b=randn(10^5); @time sort_permute!(a,b)
      0.014885 seconds (3 allocations: 781.422 KiB)   
    

    And now you have one of my least favorite things in Julia. No mental model for where/how these decisions/impacts get made. This all is designed to be allocated on the stack, so there should be no need to go into the weeds. Anyway, a few iterations later, we end up with one fix. Using that new function x->x[1] is what causes the huge hit to the compiler time.

    # this occurs no matter how many times you run it as I'm running from
    # the repl. as it seems to need to precompile this function x->x[1] everytime.
    julia> @time sort!(SortPermutePair(a,b), by=x->x[1]);
      0.050027 seconds (58.15 k allocations: 3.460 MiB, 99.08% compilation time)
    
    julia> @time sort!(SortPermutePair(a,b), by=first);
      0.004317 seconds (152 allocations: 7.844 KiB, 99.72% compilation time)
    
    julia> @time sort!(SortPermutePair(a,b), by=first);
      0.000011 seconds (3 allocations: 144 bytes)
    

    Big long diversion before we get to the actual Arpack Fortran sorting routine I wanted to get to. That’ll have to wait for another day! So where are we. We demonstrated it’s fairly easy to do this in Julia although there is one allocation I’d like to get rid of that still seems to be there. Maybe I should use the code linter or something like that. (We’ll see track allocations in a future post.)

  • A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 5 - Understanding Ref

    Here, we pick up with debugging the call to dsconv within the ARPACK library itself from the last post. This is when the repo was at this commit

    When we run the tests against the Arpack wrapped function directly, it errors.

    using Pkg; Pkg.test("ArpackInJulia"; test_args=["arpackjll"])
    

    In particular, the following code always returns “-1”

    import Arpack_jll, LinearAlgebra
    
    function arpack_dsconv(n::Int, ritz::Vector{Float64}, bounds::Vector{Float64},
                          tol::Float64)
      nconv::LinearAlgebra.BlasInt = -1
      ccall((:dsconv_, Arpack_jll.libarpack), Cvoid,
        (Ref{LinearAlgebra.BlasInt},
         Ptr{Float64},
         Ptr{Float64},
         Ref{Float64},
         Ref{LinearAlgebra.BlasInt}),
        n, ritz, bounds, tol, nconv)
      return nconv
    end
    

    Ugh. This is tough to debug. It seems like nconv always isn’t getting updated. One of my common refrains is that problems occur at boundaries between code. Interfaces are very hard to get right because slight differences in models can manifest as bugs or just outright incorrect answers. This is exactly what we are seeing here.

    Now, I could just jump to the solution, which is blindingly obvious in retrospect. On the other hand, there’s value in walking through the process I used. (Or so I tell myself.)

    Given that the code above doesn’t work, what were my working hypotheses about the causes?

    • So maybe I don’t understand how the fortran calling stuff works and I shouldn’t have expected this to get updated.
    • Maybe Julia is doing something I don’t understand?
    • There is some odd interaction for this particular piece of code?
    • The Fortran / Arpack code doesn’t do what I think it does.

    Well, the last one is the the most important as everything follows from my assumption the fortran code works. So that seems like a good place to start. (In truth, I did futz around with some of the others, but then quickly settled on this one.)

    Step 1. Do something you know or strongly suspect will work.

    Arpack is designed to work with C and C++. There are just fewer places for bugs to hide there. So I wanted to make sure I could call this from C and get the response I expected.

    This gave me the following C program.

    #include <inttypes.h>
    typedef int64_t blasint;
    void dsconv_(blasint* n, double* ritz, double* bounds, double* tol, blasint *nconv);
    
    #include <printf.h>
    
    int main(int argc, char** argv) {
      double ritz[] = {1.0, 1.0, 1.0};
      double bounds[] = {0.0, 0.0, 0.0};
      double tol = 1e-8;
      blasint n = 3;
      blasint nconv = 4;
    
      dsconv_(&n, ritz, bounds, &tol, &nconv);
    
      printf("nconv = %i\n", (int)nconv);
      return 0;
    }
    

    Which I saved into a file: test_dsconv.c. This should print 3 if I’ve understood this correctly. (And the other Julia code is correct.)

    I wanted to compile this against the Julia libraries. Here’s how I got it working. Get the path to libarpack from Julia

    @show Arpack_jll.libarpack_path
    /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib    
    

    (Ugh, security breach… yes, my local username is dgleich. Entirely surprising!) After much fussing with library paths, etc. here’s how to get this all compiled.

    gcc test_dsconv.c \
      /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib \
      -Wl,-rpath /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/ \
      -Wl,-rpath /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia
    

    Let’s pick this apart.

    gcc test_dsconv.c
    # above is the call to compile this piece of code
    
    /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib
    # above is telling it we want to use this library.
    
    -Wl,-rpath /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/
    # above is telling gcc to past an option to the linker command. Specifically
    # rpath means where to get runtime libraries ("runtime path")
    # this can also be set with LD_LIBRARY_PATH, but that's... controversial
    

    Let’s try without the last link to Julia’s libraries to see what happens!

    dgleich@circulant test_dsconv %     gcc test_dsconv.c \
      /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib \
      -Wl,-rpath /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/ \
    
    dgleich@circulant test_dsconv % ./a.out
    dyld: Library not loaded: @rpath/libopenblas64_.dylib
    Referenced from: /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib
    Reason: image not found
    zsh: abort      ./a.out
    

    Which is because it can’t find libopenblas, referenced by libarpack. So we need it tell it about all the other common Julia libraries. That’s what the final

    -Wl,-rpath /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia
    # tell it about Julia's libraries.
    

    And now it works great.

    dgleich@circulant test_dsconv % gcc test_dsconv.c \
          /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib \
          -Wl,-rpath /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/ \
          -Wl,-rpath /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia
    dgleich@circulant test_dsconv % ./a.out
    nconv = 3
    

    Okay, so my understanding of the Arpack function was correct. Ugh. That means I need to learn something about Julia.

    Step 2: Better understanding Julia.

    Okay, let’s go back to Julia because what we are doing ought to work. Here’s where @code_native and some hints about assembly code come in handy.

    julia> @code_native arpack_dsconv(3, ones(3), zeros(3), 1e-8)
        .section    __TEXT,__text,regular,pure_instructions
    ; ┌ @ none:1 within `arpack_dsconv'
        subq    $40, %rsp
    ; │ @ none:4 within `arpack_dsconv'
    ; │┌ @ essentials.jl:396 within `cconvert'
    ; ││┌ @ refpointer.jl:104 within `convert'
    ; │││┌ @ refvalue.jl:8 within `RefValue'
        movq    %rdi, 32(%rsp)
        vmovsd  %xmm0, 16(%rsp)
        movq    $-1, (%rsp)
    ; │└└└
    ; │┌ @ pointer.jl:65 within `unsafe_convert'
        movq    (%rsi), %rsi
        movq    (%rdx), %rdx
        leaq    32(%rsp), %rdi
        leaq    16(%rsp), %rcx
        movq    %rsp, %r8
        movabsq $dsconv_, %rax
    ; │└
        callq   *%rax
    ; │ @ none:11 within `arpack_dsconv'
        movq    $-1, %rax
        addq    $40, %rsp
        retq
        nopw    %cs:(%rax,%rax)
    ; └
    

    If you look at this code long enough and google around about how amd64 assembly works (or x86_64 if you prefer, although it’s really amd64…) then the return value of the function is in register rax. This code

    movq    $-1, %rax
    

    sets the return register to -1 right before returning (retq)

    If you want to learn more about calling conventions and register conventions on amd64, see this powerpoint from cs217 at Princeton

    Okay, so what’s happening is that we are just always returning -1 independently of the function call. This means that the Julia compiler is treating the value of nconv as a constant and optimizing the code.

    Another debugging tenant is try things you think will likely work. Here’s something I strongly suspect will work. And just for fun, I tried the following code.

    import Arpack_jll, LinearAlgebra
    
    function arpack_dsconv(n::Int, ritz::Vector{Float64}, bounds::Vector{Float64},
                          tol::Float64)
      nconv::Vector{LinearAlgebra.BlasInt} = [-1]
      ccall((:dsconv_, Arpack_jll.libarpack), Cvoid,
        (Ref{LinearAlgebra.BlasInt},
         Ptr{Float64},
         Ptr{Float64},
         Ref{Float64},
         Ref{LinearAlgebra.BlasInt}),
        n, ritz, bounds, tol, nconv)
      return nconv[1]
    end
    

    Why do I think this will work? Because I’m passing an array and then explicitly looking at an entry. This is what I want to do, just without the array allocation that creating the vector causes. (Julia’s memory allocater is good, but not that good.)

    julia> arpack_dsconv(3, ones(3), zeros(3), 1e-8)
    1-element Vector{Int64}:
     3
    

    Hmm… so I need to learn more about Ref to understand why the other call didn’t work.

    This line stuck out to me.

    In Julia, Ref objects are dereferenced (loaded or stored) with [].

    Anyway, at this point, I realized my mental model was wrong and got to …

    The fix

    Now, here’s the fix. The issue was the ccall will auto wrap an intrinsic type (i.e. int, float, etc.) with a Ref if you tell it the ref type. But this doesn’t refer to the original value, it makes a new copy of it and refers to that. So here’s working code.

    import Arpack_jll, LinearAlgebra
    
    function arpack_dsconv(n::Int, ritz::Vector{Float64}, bounds::Vector{Float64},
                          tol::Float64)
      nconv = Ref{LinearAlgebra.BlasInt}(-1)
      ccall((:dsconv_, Arpack_jll.libarpack), Cvoid,
        (Ref{LinearAlgebra.BlasInt},
         Ptr{Float64},
         Ptr{Float64},
         Ref{Float64},
         Ref{LinearAlgebra.BlasInt}),
        n, ritz, bounds, tol, nconv)
      return nconv[]
    end
    

    The error in my mental model

    As I said above, bugs exist on boundaries. Here’s the origin of this bug and my incorrect mental model of the code.

    My mental model based on C/C++

    nconv::Int = -1
    ccall((:func, libfunc), Cvoid, (Ref{Int}), nconv) # <-> func(&nconv) in C
    

    But that’s not right! Here’s what the code actually does.

    nconv::Int = -1
    ccall((:func, libfunc), Cvoid, (Ref{Int}), nconv) # <-> int newint = nconv; func(&int(newint)) in C
    

    Which of course explains why nconv isn’t updated. It wasn’t even passed in!

    Suggestion

    • Add a note to Ref documentation on this
    • Add a note to ccall documentation on this

    Is it worth it for one person with an incorrect mental model? I think yes because the ccall area is likely to be a source of similar bugs and it’s needed to document the assumptions! This is already partially done in this paragraph

    When passed as a ccall argument (either as a Ptr or Ref type), a Ref object will be converted to a native pointer to the data it references. For most T, or when converted to a Ptr{Cvoid}, this is a pointer to the object data. When T is an isbits type, this value may be safely mutated, otherwise mutation is strictly undefined behavior.

  • A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 4 - Checking against libarpack

    See Overview for a high-level picture, also see ArpackInJulia.jl on Github

    One of the things I wanted to do was check against the true ARPACK functions. The idea is that we can swap out true ARPACK routines for our routines easily. So I wanted to be able to call the true ARPACK routines individually myself.

    This is remarkably easy and how Arpack.jl works with the functions aupd and eupd in ARPACK to implement eigs. So this isn’t surprisingly. It’s still fun to see that ease of use translate into new codes.

    The idea is that Arpack.jl uses Arpack_jll to access the compiled FORTRAN code as a dynamic library. All the binary compiling, etc. is handled by Arpack_jll and Arpack.jl can just use that. So we are going to

    import Arpack_jll
    

    so we can get access to

    @show Arpack_jll.libarpack
    

    which gives the path to the ARPACK library in a system independent fashion. (At some point, this will work on the Apple M1 too, but not yet.) [Probably will be about the same time as I finish posting these notes though and getting all of the routines ported.]

    Once we have that, we can simply setup a ccall to get what we need! See the blas.jl functions for many examples of how to do this. Also see Arpack.jl for even more examples that are specific to ARPACK.

    import Arpack_jll, LinearAlgebra
    
    function arpack_dsconv(n::Int, ritz::Vector{Float64}, bounds::Vector{Float64},
                          tol::Float64)
      nconv::Int = 0
      ccall((:dsconv_, Arpack_jll.libarpack), Cvoid,
        (Ref{LinearAlgebra.BlasInt},
         Ptr{Float64},
         Ptr{Float64},
         Ref{Float64},
         Ref{LinearAlgebra.BlasInt}),
        n, ritz, bounds, tol, nconv)
      return nconv
    end
    

    Of course, I only want to depend on Arpack_jll in the test routines. (Actually, I’d really only like to depend on it in a subset of the test routines. But I don’t see how to do that right now and it isn’t relevant.)

    Put another way, I want to enable the tests to run without Arpack_jll working on a particular platform. And where we don’t run the tests that need it.

    To control which tests are run, we can check ARGS as described in this pull request https://github.com/JuliaLang/Pkg.jl/pull/1226 where this functionality was implemented in Julia.

    if "arpackjll" in ARGS
      include("arpackjll.jl") # get the helpful routines to call stuff in "arpackjll"
      @testset "arpackjll" begin
        # tests that check against libarpack directly
      end
    end    
    

    So I stuck the call to dsconv into arpackjll.jl

    import Arpack_jll, LinearAlgebra
    
    function arpack_dsconv(n::Int, ritz::Vector{Float64}, bounds::Vector{Float64},
                          tol::Float64)
      nconv::LinearAlgebra.BlasInt = -1
      ccall((:dsconv_, Arpack_jll.libarpack), Cvoid,
        (Ref{LinearAlgebra.BlasInt},
         Ptr{Float64},
         Ptr{Float64},
         Ref{Float64},
         Ref{LinearAlgebra.BlasInt}),
        n, ritz, bounds, tol, nconv)
      return nconv
    end
    

    This does require me to add Arpack_jll and LinearAlgebra to the Test package itself. We can do this by activating the test package and then adding them.

    # change to the test subdirectory
    ] activate .
    add LinearAlgebra
    add Arpack_jll
    

    And our full piece of new test code

    if "arpackjll" in ARGS
      include("arpackjll.jl") # get the helpful routines to call stuff in "arpackjll"
      @testset "arpackjll" begin
        @testset "dsconv" begin
          soln = arpack_dsconv(10, ones(10), zeros(10), 1e-8)
          [@test](https://micro.blog/test) ArpackInJulia.dsconv(10, ones(10), zeros(10), 1e-8)==soln
    
          soln = arpack_dsconv(10, zeros(10), ones(10), 1e-8)
          [@test](https://micro.blog/test) ArpackInJulia.dsconv(10, zeros(10), ones(10), 1e-8)==soln
        end
      end
    end
    

    This flips ones and zeros, which should show converged in one of the cases.

    And what’s weird, it fails. I always get back arpack_dsconv==-1. Hmm… will debug in the next post!

  • A funny thing in @JuliaLanguage. I often illustrate that finite differences in floating point are inaccurate with small changes. But this Julia code works and it’s fun to understand why!

    xx=range(0,1,length=100) F(x) = 5.0.*x .+ 1/3 h = eps(1.0) z = (F(xx.+h)-F(xx))./h

  • A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 3 - First routine

    See Overview for a high-level picture, also see ArpackInJulia.jl on Github

    We are going to start our ARPACK conversion with about the simplest routine: dsconv. This routine for a double-precision, symmetric eigenvalue problem just checks if the eigenvalue estimates have converged at all.

      c-----------------------------------------------------------------------
      c\BeginDoc
      c
      c\Name: dsconv
      c
      c\Description:
      c  Convergence testing for the symmetric Arnoldi eigenvalue routine.
      c
      c\Usage:
      c  call dsconv
      c     ( N, RITZ, BOUNDS, TOL, NCONV )
      c
      c\Arguments
      c  N       Integer.  (INPUT)
      c          Number of Ritz values to check for convergence.
      c
      c  RITZ    Double precision array of length N.  (INPUT)
      c          The Ritz values to be checked for convergence.
      c
      c  BOUNDS  Double precision array of length N.  (INPUT)
      c          Ritz estimates associated with the Ritz values in RITZ.
      c
      c  TOL     Double precision scalar.  (INPUT)
      c          Desired relative accuracy for a Ritz value to be considered
      c          "converged".
      c
      c  NCONV   Integer scalar.  (OUTPUT)
      c          Number of "converged" Ritz values.
      c
      c\EndDoc
      c
      c-----------------------------------------------------------------------
      c
      c\BeginLib
      c
      c\Routines called:
      c     arscnd  ARPACK utility routine for timing.
      c     dlamch  LAPACK routine that determines machine constants.
      c
      c\Author
      c     Danny Sorensen               Phuong Vu
      c     Richard Lehoucq              CRPC / Rice University
      c     Dept. of Computational &     Houston, Texas
      c     Applied Mathematics
      c     Rice University
      c     Houston, Texas
      c
      c\SCCS Information: @(#)
      c FILE: sconv.F   SID: 2.4   DATE OF SID: 4/19/96   RELEASE: 2
      c
      c\Remarks
      c     1. Starting with version 2.4, this routine no longer uses the
      c        Parlett strategy using the gap conditions.
      c
      c\EndLib
      c
      c-----------------------------------------------------------------------
      c
            subroutine dsconv (n, ritz, bounds, tol, nconv)
      c
      c     %----------------------------------------------------%
      c     | Include files for debugging and timing information |
      c     %----------------------------------------------------%
      c
            include   'debug.h'
            include   'stat.h'
      c
      c     %------------------%
      c     | Scalar Arguments |
      c     %------------------%
      c
            integer    n, nconv
            Double precision
           &           tol
      c
      c     %-----------------%
      c     | Array Arguments |
      c     %-----------------%
      c
            Double precision
           &           ritz(n), bounds(n)
      c
      c     %---------------%
      c     | Local Scalars |
      c     %---------------%
      c
            integer    i
            Double precision
           &           temp, eps23
      c
      c     %-------------------%
      c     | External routines |
      c     %-------------------%
      c
            Double precision
           &           dlamch
            external   dlamch
    
      c     %---------------------%
      c     | Intrinsic Functions |
      c     %---------------------%
      c
            intrinsic    abs
      c
      c     %-----------------------%
      c     | Executable Statements |
      c     %-----------------------%
      c
            call arscnd (t0)
      c
            eps23 = dlamch('Epsilon-Machine')
            eps23 = eps23**(2.0D+0 / 3.0D+0)
      c
            nconv  = 0
            do 10 i = 1, n
      c
      c        %-----------------------------------------------------%
      c        | The i-th Ritz value is considered "converged"       |
      c        | when: bounds(i) .le. TOL*max(eps23, abs(ritz(i)))   |
      c        %-----------------------------------------------------%
      c
               temp = max( eps23, abs(ritz(i)) )
               if ( bounds(i) .le. tol*temp ) then
                  nconv = nconv + 1
               end if
      c
         10 continue
      c
            call arscnd (t1)
            tsconv = tsconv + (t1 - t0)
      c
            return
      c
      c     %---------------%
      c     | End of dsconv |
      c     %---------------%
      c
            end
    

    In Julia, this routine is basically a two-liner (could likely do in one) that computes:

      eps23 = eps(1.0)^(2/3)
      return sum(
            (ritzi, boundsi)->boundsi <= max(eps23, abs(ritzi))*tol ? 1 : 0,
            zip(ritz, bounds))
    

    (Full disclosure, I didn’t test this, so there could be a subtle error, but I hope the idea is clear that this COULD be done very easily.)

    On the other hand, the idea was to port this as directly as possible, so let’s write out the loops, etc.

      function dsconv(
                  n::Int,
                  ritz::Vector{Float64},
                  bounds::Vector{Float64},
                  tol::Float64
        )
        eps23::Float64 = eps(1.0)^(2.0/3.0)
        nconv::Int = 0
        if n > length(ritz)
          throw(ArgumentError("range 1:n=$n out of bounds for ritz, of length $(length(ritz))"))
        end
        if n > length(bounds)
          throw(ArgumentError("range 1:n=$n out of bounds for bounds, of length $(length(bounds))"))
        end
        @inbounds for i=1:n
          tmp = max( eps23, abs(ritz[i]) )
          if ( bounds[i] <= tol*tmp )
            nconv += 1
          end
        end
    
        return nconv
      end
    

    Here, we have borrowed some of the out of bounds reporting from blas.jl in order to try and be somewhat standard.

    Of course, the issue is that the ARPACK code also includes some cool timing information

      c
      c     %----------------------------------------------------%
      c     | Include files for debugging and timing information |
      c     %----------------------------------------------------%
      c
            include   'debug.h'
            include   'stat.h'
      call arscnd (t0)
      c ... many lines ...
      call arscnd (t1)
      tsconv = tsconv + (t1 - t0)
    

    The function arscnd is the ARPACK second function, which gives the current time in seconds. So this simply tracks the seconds used in this function.

    To store the information, this code uses the Fortran common block, which looks like a set of global variables, and update them based on the time used in this routine. In the port, I’d like to have this functionality as well. However while the the FORTRAN common block could be replaced with a set of global variables, I wanted to use a more modern construct enabled by Julia’s macros that will give thread-safety and also compiler-optimized code removal when not needed.

    Here’s a better design. Take in a keyword argument `timing’ that is a julia structure.

      Base.@kwdef struct ArpackStats
        tconv::Float64 = 0.0 # the time sp
        # ... there will be more!
      end
    

    We use Base.@kwdef to auto-initialize all the times to zero so we don’t have to write a construction and can just call ArpackStats(). (Check out help on Base.@kwdef for more!)

    Then we can simply use the following construct to update time:

      function dsconv(
                  n::Int,
                  ritz::Vector{Float64},
                  bounds::Vector{Float64},
                  tol::Float64;
                  stats=nothing
       )
       t0 = timing != nothing ? time() : 0.0 ## NEW
       eps23::Float64 = eps(1.0)^(2.0/3.0)
       nconv::Int = 0
       if n > length(ritz)
          throw(ArgumentError("range 1:n=$n out of bounds for ritz, of length $(length(ritz))"))
       end
       if n > length(bounds)
          throw(ArgumentError("range 1:n=$n out of bounds for bounds, of length $(length(bounds))"))
       end
       @inbounds for i=1:n
          tmp = max( eps23, abs(ritz[i]) )
          if ( bounds[i] <= tol*tmp )
            nconv += 1
          end
       end
       if stats != nothing; stats.tconv += time() - t0; end ## NEW
       return nconv
      end
    

    Of course, these if statements get a bit tedious. So let’s use Julia’s macros to handle this more seemlessly to get what I want to write.

      """
      Convergence testing for the symmetric Arnoldi eigenvalue routine.
    
      In Julia, this can be replaced with
      sum( )
    
      Julia port of ARPACK dsconv routine.
      Usage:
      call dsconv
      ( N, RITZ, BOUNDS, TOL; debug, stats )
    
      Arguments
      N       Integer.  (INPUT)
         Number of Ritz values to check for convergence.
    
      RITZ    Double precision array of length N.  (INPUT)
         The Ritz values to be checked for convergence.
    
      BOUNDS  Double precision array of length N.  (INPUT)
         Ritz estimates associated with the Ritz values in RITZ.
    
      TOL     Double precision scalar.  (INPUT)
         Desired relative accuracy for a Ritz value to be considered
         "converged".
      Optional values (they default to nothing)
      debug (INPUT/OUTPUT) nothing or an ArpackStats struct
      stats (INPUT/OUTPUT) nothing or an ArpackStats struct
      These are used to collect debugging and performance stats and are
      mostly internal.
    
    
      Return value
      NCONV   Integer scalar.  (RETURN)
         Number of "converged" Ritz values.
      """
      function dsconv(
            n::Int,
            ritz::Vector{Float64},
            bounds::Vector{Float64},
            tol::Float64;
            stats=nothing,
            debug=nothing
      )
      #=
      c\Author
      c     Danny Sorensen               Phuong Vu
      c     Richard Lehoucq              CRPC / Rice University
      c     Dept. of Computational &     Houston, Texas
      c     Applied Mathematics
      c     Rice University
      c     Houston, Texas
    
      Julia port by David F. Gleich, Purdue University
      =#
    
      #=
      c     | Executable Statements |
      =#
      t0 = @jl_arpack_time
    
      if n > length(ritz)
        throw(ArgumentError("range 1:n=$n out of bounds for ritz, of length $(length(ritz))"))
      end
      if n > length(bounds)
        throw(ArgumentError("range 1:n=$n out of bounds for bounds, of length $(length(bounds))"))
      end
    
      eps23::Float64 = eps(1.0)^(2.0/3.0)
      nconv::Int = 0
      @inbounds for i=1:n
        #=
        c        | The i-th Ritz value is considered "converged"       |
        c        | when: bounds(i) .le. TOL*max(eps23, abs(ritz(i)))   |
        =#
        tmp = max( eps23, abs(ritz[i]) )
        if ( bounds[i] <= tol*tmp )
          nconv += 1
        end
      end
    
      #  call arscnd (t1)
      #  tsconv = tsconv + (t1 - t0)
      @jl_update_time(tconv, t0)
    
      return nconv
      end
    

    Here, we can use the following macros to accomplish this.

      macro jl_arpack_time()
        return esc(:( stats == nothing ? zero(ArpackTime) : time() ))
      end
    
      macro jl_update_time(field, t0)
        return esc( quote
          if stats != nothing
            stats.$field += time() - $t0
          end
        end )
      end
    

    And thus, we have our actually strategy. Note that some of the little code snippets here were taken from development versions and some of the variable names may have changed. (e.g. tconv vs. tsconv) because there is no reason to track symmetric vs. nonsymmetric time separately. (I mean, I guess we could, but that seems unlike Julia language stuff.) Could be revisited in the future!

    At this point, we also put a bunch of stuff in Github to keep the project coming. See the ArpackInJulia.jl Github repo.

  • A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 2 - Overview

    See Overview for a high-level picture.

    Here is a quick call graph of the various ARPACK functions for the symmetric eigenvalue/eigenvector problem dsaupd

    • d = double precision
    • s = symmetric
    • aupd = Arnoldi Update (since we’ll call this in a loop to update the internal Arnoldi information.)

    Note to the pedantic. Yes, I know the Arnoldi of a symmetric matrix is usually called the Lanczos process. So I should be saying Lanczos everywhere here. Except, maybe I really shouldn’t be saying either and we could use the phrase: iteratively orthogonalized power basis instead of Arnoldi or three-term recurrent power basis instead of Lanczos? Either way, since the package is called ARPACK and I’m writing about that, I’ll stick with Arnoldi here.

    Back to the call graph.

    dsaupd calls
      dsaup2 calls
        dgetv0 calls printing, reverse communication
        dsaitr calls printing, blas, lapack
        dsapps calls printing, blas, lapack
        dsconv
        dseigt
        dsgets
        dsortr
    
        ivout - write out an integer vector
        arscnd - internal wrapper avoiding LAPACK second function
        dvout - write out a double vector
    

    then we also need to handle getting the eigenvector information out of the computation.

    dseupd calls
      dsesrt
      dsortr - same as above
    

    This suggests a plan.

    Port the low-level/internal routines first. Then port the high-level routines.

    Wait what you say? Why not do it the opposite way? Well, I started doing it that way and realized it’s just harder ;). So we are going to switch and do the easy stuff first!

    Just as a quick overview of what the code actually does.

    dsaupd - high level interface
      dsaup2 - medium level interface, manages the overall process
        dgetv0 - gets the starting vector, or returns control to user to get it.
        dsaitr - haven't looked yet...
        dsapps - haven't looked yet...
        dsconv - check what has converged (easiest routine!)
        dseigt - haven't looked yet...
        dsgets - get shifts based on eigenvalue estimates
        dsortr - sorts a vector and/or applies the sort to an aux vector too
    

    Also, let me note that the ARPACK documentation is stellar. Stellar. So a few clicks and I could easily figure out what those other functions are doing.

  • A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 1 - Goals

    I’m working on a small “me-time” project to port the ARPACK symmetric solver to Julia.

    My goal is to port the double-precision ARPACK for symmetric matrices in Julia. Including all ARPACK stuff. So this should give “exactly” what ARPACK does but be a pure Julia implementation. (Where exactly is … it should be executing roughly the same sequence of floating point operations and can differ on levels that would be expected for different compilers compiling the same code.)

    • not a goal to “Julia-ize” the package; I want to keep as close to the FORTRAN as possible so that I might be able to replace calls to Julia’s Arpack.saupd / Arpack.seupd (which call the Fortran library) with this code.
    • small internal function changes are okay, e.g. ARPACK has various debugging and timing stuff that would need to be done differently in Julia.
    • small simplifications, e.g. if a function computes a single Int, we can rewrite that to return the Int rather than writing it into an array like in FORTRAN.
    • Why? Why not implement my own ImplicitRestart/Eigensolver/Etc.? Simple: I trust ARPACK. Also, I want to understand exactly what the symmetric ARPACK solver is doing.
    • Why not use a Fortran->Julia compiler? Well, I could. But I could also do this and learn intimate details of how it works to help out in teaching :)
    • I want to document some of the cool stuff in ARPACK!

    For context, I spent about a semester studying these solvers with Gene Golub to see if there were obvious ideas to make computing the Fiedler vectors faster, but beyond the high level algorithmics and lots of experiments, nothing came of it. This project is different in that the end goal is easy – port the code, with a focus on explaining in detail what each step does. (hah, how about that name-dropping?)

    Since it’s good to have a driving use-case, the idea is to make the following MatrixNetworks.jl code work without the ARPACK dependency.

      """
        `_symeigs_smallest_arpack`
        ---
        Compute eigenvalues and vectors using direct calls to the ARPACK wrappers
        to get type-stability. This function works for symmetric matrices.
        This function works on Float32 and Float64-valued sparse matrices.
        It returns the smallest set of eigenvalues and vectors.
        It only works on matrices with more than 21 rows and columns.
        (Use a dense eigensolver for smaller problems.)
        Functions
        ---------
        - `(evals,evecs) = _symeigs_smallest_arpack(A::SparseMatrixCSC{V,Int},
                                nev::Int,tol::V,maxiter::Int, v0::Vector{V})`
        Inputs
        ------
        - `A`: the sparse matrix, must be symmetric
        - `nev`: the number of eigenvectors requested
        - `tol`: the relative tolerance of the eigenvalue computation
        - `maxiter`: the maximum number of restarts
        - `v0`: the initial vector for the Lanczos process
        Example
        -------
        This is an internal function.
        """
        function _symeigs_smallest_arpack(
                    A::SparseMatrixCSC{V,Int},nev::Int,tol::V,maxiter::Int,
                    v0::Vector{V}) where V
    
            n::Int = checksquare(A) # get the size
            @assert n >= 21
    
            # setup options
            mode = 1
            sym = true
            iscmplx = false
            bmat = String("I")
            ncv = min(max(2*nev,20),n-1)
    
            whichstr = String("SA")
            ritzvec = true
            sigma = 0.
    
            #TOL = Array{V}(undef,1)
            #TOL[1] = tol
            TOL = Ref(tol)
            lworkl = ncv*(ncv + 8)
            v = Array{V}(undef, n, ncv)
            workd = Array{V}(undef, 3*n)
            workl = Array{V}(undef, lworkl)
            resid = Array{V}(undef, n)
    
            resid[:] = v0[:]
    
            info = zeros(BlasInt, 1)
            info[1] = 1
    
            iparam = zeros(BlasInt, 11)
            ipntr = zeros(BlasInt, 11)
            ido = zeros(BlasInt, 1)
    
            iparam[1] = BlasInt(1)
            iparam[3] = BlasInt(maxiter)
            iparam[7] = BlasInt(mode)
    
            # this is a helpful indexing vector
            zernm1 = 0:(n-1)
    
            while true
                # This is the reverse communication step that ARPACK does
                # we need to extract the desired vector and multiply it by A
                # unless the code says to stop
                Arpack.saupd(
                    ido, bmat, n, whichstr, nev, TOL, resid, ncv, v, n,
                    iparam, ipntr, workd, workl, lworkl, info)
    
                load_idx = ipntr[1] .+ zernm1
                store_idx = ipntr[2] .+ zernm1
    
                x = workd[load_idx]
    
                if ido[1] == 1
                    workd[store_idx] = A*x
                elseif ido[1] == 99
                    break
                else
                    error("unexpected ARPACK behavior")
                end
            end
    
            # Once the process terminates, we need to extract the
            # eigenvectors.
    
            # calls to eupd
            howmny = String("A")
            select = Array{BlasInt}(undef, ncv)
    
            d = Array{V}(undef, nev)
            sigmar = ones(V,1)*sigma
            ldv = n
            Arpack.seupd(ritzvec, howmny, select, d, v, ldv, sigmar,
                bmat, n, whichstr, nev, TOL, resid, ncv, v, ldv,
                iparam, ipntr, workd, workl, lworkl, info)
            if info[1] != 0
                error("unexpected ARPACK exception")
            end
    
            # Now we want to return them in sorted order (smallest first)
            p = sortperm(d)
    
            d = d[p]
            vectors = v[1:n,p]
    
            return (d,vectors,iparam[5])
        end
    
  • Anyone else find the documentation in old Fortran codes is amazingly beautiful? e.g. look at the documentation in ARPACK. github.com/opencolla… Wonderfully annotated code.

  • Julia (@JuliaComputing) and VSCode on Native M1.

    1. Install M1 Native VSCode.

    2. Install Xcode & Command Line tools.

    3. Clone Julia master.

    4. make

    5. ./julia

    6. ENV[“JULIA_GR_PROVIDER”] = “GR”

    7. ]add Plots

    8. using Plots

    9. plot(randn(10))

    10. Point VSCode at new julia and 🍾🤩🥳

  • My thoughts on getting a COVID vaccine today.

    My thoughts on getting a COVID vaccine today. I’m of mixed feelings about this because, while vaccine!, strictly speaking, the eligibility of University faculty (as I understand them) are unclear. I’m also not particularly old, nor high-risk.

    Meijer (and Kroger, and others) are vaccinating university faculty as educators. (Which I definitely am!) This fact has been posted on Twitter by others, and announced on a university mailing list. While I’d like to think of myself as a as-scrupulous-as-reasonable rule-follower, this is a murky area.

    I never lied to get this. Also, remember – any type of medical/healthcare employer was vaccinating nearly everyone when vaccines were really scarce, even those capable of work at home or not doing patient facing care. Here’s part of my rationalization.

    -The US has a somewhat inefficient allocation of vaccine supply to where it is needed. I booked this appointment with <= 24 not-sleeping hours notice. I cannot believe this keeps anyone who ought to be eligible from getting a vaccine. (The pharmacy still had same day availability when I got my vaccine.)

    -I have the capacity/flexibility/ability to exchange my time/energy to get a vaccine when/where others can’t. Should I not use this to help address the inefficient allocation above?

    -I have a responsibility to help keep the daycare safe that we participate in, I struggled with how I would feel if I were to get COVID at this point (knowing that I passed up a vaccine opportunity) and pass it onto anyone at the daycare.

    I’m posting this note for a few reasons.

    • We need to remind people that vaccines help everyone! There is some social signaling and sociology Even if you don’t need it, it will help others by curtailing possible spread, and others may need some encouragement to get it.

    • By trying to understand the appointment landscape, I helped a number of people get the vaccine weeks earlier than they were originally scheduled.

    • If you, for any reason, want to get the vaccine. Realize that there is capacity in the system in some places.

    • And use this information to help others too.

    • If people want to know my thoughts on the current state of getting a vaccine in Indiana, I’m happy to share more from looking at appointment availability for an elementary teacher I work with. My sense is that appointments are reasonably highly available with <= 24 hours notice if you are willing to travel <= 100 miles.

subscribe via RSS