-
Dear Apple Developer world, is there documentation of the “plj” (Property List Journal) files Apple Photos uses as its reference for photo information? (This is not the sqlite database many others have documented.)
-
Based on the Google News snippets, you wouldn’t think these headlines describe the same article! www.nature.com/articles/…
-
The FDA (regulations.gov) got 40,000 comments yesterday on the EUA for 5-11! (Their total is 55k now, but was 15k the day before…). Seems like the majority are against approval from my non-statistical sampling!
-
Oh wow. MS @powerpoint finally fixed my bug with PT Sans Bold Italic on MacOS Big Sur… (and also Catalina, although something else happened there) Just in time for Monterey!
Thanks @powerpoint team! (Took ~6 mo-1 year). -
Anyone know what the carbon footprint of me spending 5 minutes answering an email is? I don’t mean the computer and network etc, I mean, the carbon footprint of a human spending 5 minutes on a sedentary activity.
-
Tried to take a survey of academic climate. Sadly had to give up because it was total nonsense. My own view of my answers were basically random.
-
Google thinks the Lucid Air has some pretty incredible range.
-
n=1 observation for me regarding class… I’m finding this semester has more time-consuming issue as things are not yet entirely ‘back to normal’ although there is a push that way (which then triggers the time-consuming issues…) could be random luck though too!
-
If anyone sees a leather case for an iPhone 13 Pro without a lip around the camera region, can you let me know please? Pretty please? I like cameras, but I also like “phones” that lie flat and don’t wobble.
-
I find it curious how surveys of academia believe we have perfect memory of how many things like ‘reviews’ we do. Or how many we ‘declined’ to do! This is a hard question. I have no idea how many things I say no to!
-
Need data (networks, hypergraphs, tensors, ML, AI)? Here’s a timestamped and linked digest of Anthony Fauci’s emails arxiv.org/abs/2108…. from @JasonLeopold FOIA request. With @austinbenson and @n_veldt Data and Code github.com/nveldt/fa…
-
LAPACK's random number generators in Julia (part ... blah ... in work to port ARPACK to Julia.)
ARPACK depends on LAPACK’s internal random number generators
dlarnv
slarnv
zlarnv
clarnv
which use the raw routines
dlaruv
slaruv
This is true even if you use Julia to initialize the starting vector using Julia’s random number generator. The reason is because ARPACK needs to be able to get random data to restart the process is it detects an invariant subspace. This suggests two approaches: call LAPACK’s RNG from Julia, or port LAPACK’s RNG to Julia. Why do the simple thing when the more complicated thing is so much more fun?
Aside, best guess on LAPACK names: _larnv = lapack random normal vector and _laruv = lapack random uniform vector
Calling LAPACK’s random number generator from julia
The tricky bit about calling LAPACK’s RNG from Julia is the seed value. This is just 4 ints, which we’d like to be stack allocated. Unfortunately, we won’t be able to get that for reasons that are internal to Julia. (Blah blah, stack vs. heap and gc and guarantees…)
Here’s some code that works though.
import LinearAlgebra _dlarnv_blas!(idist::Int, iseed::Ref{NTuple{4,Int}}, n::Int, x::Vector{Float64}) = ccall((LinearAlgebra.BLAS.@blasfunc("dlarnv_"), LinearAlgebra.BLAS.libblas), Cvoid, (Ref{LinearAlgebra.BlasInt}, Ptr{LinearAlgebra.BlasInt}, Ref{LinearAlgebra.BlasInt}, Ptr{Float64}), idist, iseed, n, x) x = zeros(5); _dlarnv!(1,Base.Ref(tuple(1,2,3,4)),length(x),x) using BenchmarkTools @btime _dlarnv_blas!(1, tseed, length(z), z) setup=begin z=zeros(128); tseed=Base.Ref(tuple(1,3,5,7)); end
This uses
NTuple
as the holder for the 4 ints we use as the seed. At this point, I thought the following should work…mutable struct MySeedState seed::NTuple{4,Int} end function myrand!(x::Vector{Float64}, state::MySeedState) iseed = Base.Ref(state.seed) # this will allocate :( _dlarnv_blas!(2, iseed, length(x), x) state.seed = iseed[] end using BenchmarkTools @btime myrand!(z, tstate) setup=begin z=zeros(128); tstate=MySeedState(tuple(1,3,5,7)); end
But – see above – I learned when we want to use this is that allocating a Ref can actually allocate in Julia, even when it’s an itsbits type.
julia> isbitstype(NTuple{4,Int}) true
To avoid this, we can just allocate the ref as part of the mutable struct.
mutable struct MyRefSeedState seed::Ref{NTuple{4,Int}} end function myrand!(x::Vector{Float64}, state::MyRefSeedState) _dlarnv_blas!(2, state.seed, length(x), x) end using BenchmarkTools @btime myrand!(z, tstate) setup=begin z=zeros(128); tstate=MyRefSeedState(Base.Ref(tuple(1,3,5,7))); end
Either way, it probably isn’t that big of a deal, but I’d like to be clear on where the allocations are happening. e.g. one-time thread local setup is okay. Random internal allocations are not okay.
It’s puzzling to me why this causes an allocation as this seems like it can be safely done on the stack and the same allocation doesn’t occur for types like Int.
Porting LAPACK’s random number generator to Julia.
Since the routine is sufficiently simple, we can actually just port the random number generator directly.
Aside – that this is actually a weakness as Julia’s RNGs are much better. On the other hand, these are just used to generate data that is not-orthogonal to a desired invariant subspace (with very very high probability). So the quality of the RNG does not need to be statistical.
The underlying double-precision random number generator is
dlaruv
notdlarnv
. The second functiondlarnv
is a wrapper that does seemingly odd things likework in 128-length
segments, until you realize thatdlaruv
only works in 128-length segments. (Weird!) Anyhoo.Here’s a link to the double precision generator dlaruv.f in OpenBLAS
The routine is fairly simple. Have a set of fixed values and do some integer operations/etc. to get pseudorandom values out. Specifically >
This routine uses a multiplicative congruential method with modulus 2**48 and multiplier 33952834046453 (see G.S.Fishman, 'Multiplicative congruential random number generators with modulus 2**b: an exhaustive analysis for b = 32 and a partial analysis for b = 48', Math. Comp. 189, pp 331-344, 1990).
See the appendix for code to extract this fixed array of values.
We set these values into a constant global array
const _dlaruv_mm=tuple(tuple(494,322,2508,2549), ... julia> typeof(_dlaruv_mm) NTuple{128, NTuple{4, Int64}}
Next up, we port the actual RNG. In this case, I was actually able to copy/paste much of the FORTRAN code.
function _dlaruv!(iseed::Base.RefValue{NTuple{4,Int}}, n::Int, x::Vector{Float64}) IPW2=4096 R = 1/IPW2 i1 = iseed[][1] i2 = iseed[][2] i3 = iseed[][3] i4 = iseed[][4] # setup scope for final call in julia IT1 = i1 IT2 = i2 IT3 = i3 IT4 = i4 @inbounds for i=1:min(n,length(_dlaruv_mm)) # directly copied/pasted from fortran, hence the caps while true # Multiply the seed by i-th power of the multiplier modulo 2**48 IT4 = i4*_dlaruv_mm[i][4] IT3 = IT4 ÷ IPW2 IT4 = IT4 - IPW2*IT3 IT3 = IT3 + i3*_dlaruv_mm[i][4] + i4*_dlaruv_mm[i][3] IT2 = IT3 ÷ IPW2 IT3 = IT3 - IPW2*IT2 IT2 = IT2 + i2*_dlaruv_mm[i][4] + i3*_dlaruv_mm[i][3] + i4*_dlaruv_mm[i][2] IT1 = IT2 ÷ IPW2 IT2 = IT2 - IPW2*IT1 IT1 = IT1 + i1*_dlaruv_mm[i][4] + i2*_dlaruv_mm[i][3] + i3*_dlaruv_mm[i][2] + i4*_dlaruv_mm[i][1] IT1 = mod( IT1, IPW2 ) # Convert 48-bit integer to a real number in the interval (0,1) x[i] = R*( Float64( IT1 )+R*( Float64( IT2 )+R*( Float64( IT3 )+R* Float64( IT4 ) ) ) ) if x[i] == 1 # handle the case where the value is different i1 += 2 i2 += 2 i3 += 2 i4 += 2 else break end end end iseed[] = tuple(IT1,IT2,IT3,IT4) # update the seed end
This uses the global constant array to avoid dealing with that big array of values each function call; otherwise, the only difference is that we use a while loop instead of a GOTO loop in fortran.
This gives exactly the same results as the FORTRAN call
_dlaruv_blas!( iseed::Ref{NTuple{4,Int}}, n::Int, x::Vector{Float64}) = ccall((LinearAlgebra.BLAS.@blasfunc("dlaruv_"), LinearAlgebra.BLAS.libblas), Cvoid, (Ptr{LinearAlgebra.BlasInt}, Ref{LinearAlgebra.BlasInt}, Ptr{Float64}), iseed, n, x) seed1 = Base.RefValue(tuple(1,3,5,7)) seed2 = Base.RefValue(tuple(1,3,5,7)) x1 = zeros(97); _dlaruv_blas!(seed1, length(x1), x1) x2 = similar(x1); _dlaruv!(seed2, length(x2), x2) @assert x1 == x2 @assert seed1[] == seed2[] x3 = zeros(143); _dlaruv_blas!(seed1, length(x3), x3) x4 = zeros(axes(x3)...); _dlaruv!(seed2, length(x4), x4) @assert x3 == x4 @assert seed1[] == seed2[]
Awesome. But if you look at
dlarnv
, it also has a statically allocated length 128 array that is not going to be fun in Julia. Doable, but not fun… nor at all necessary. We can actually streamline things by just manually inlining the RNG directly intodlarnv
. The only thing we have to remember is that we need to “refresh” the seed every 128 RV generations.Since ARPACK only ever calls dlarnv with idist=2, we are just going to port that code.
function _dlarnv_idist_2!(iseed::Base.RefValue{NTuple{4,Int}}, n::Int, x::Vector{Float64}) # manually inline dlaruv to avoid excess cruft... IPW2=4096 R = 1/IPW2 i1 = iseed[][1] i2 = iseed[][2] i3 = iseed[][3] i4 = iseed[][4] # setup scope for final call in julia IT1 = i1 IT2 = i2 IT3 = i3 IT4 = i4 nrv = 0 @inbounds for i=1:n rv = 1.0 nrv += 1 if nrv > length(_dlaruv_mm) nrv = 1 # update seed information i1 = IT1 i2 = IT2 i3 = IT3 i4 = IT4 end while true # Multiply the seed by i-th power of the multiplier modulo 2**48 IT4 = i4*_dlaruv_mm[nrv][4] IT3 = IT4 ÷ IPW2 IT4 = IT4 - IPW2*IT3 IT3 = IT3 + i3*_dlaruv_mm[nrv][4] + i4*_dlaruv_mm[nrv][3] IT2 = IT3 ÷ IPW2 IT3 = IT3 - IPW2*IT2 IT2 = IT2 + i2*_dlaruv_mm[nrv][4] + i3*_dlaruv_mm[nrv][3] + i4*_dlaruv_mm[nrv][2] IT1 = IT2 ÷ IPW2 IT2 = IT2 - IPW2*IT1 IT1 = IT1 + i1*_dlaruv_mm[nrv][4] + i2*_dlaruv_mm[nrv][3] + i3*_dlaruv_mm[nrv][2] + i4*_dlaruv_mm[nrv][1] IT1 = mod( IT1, IPW2 ) # Convert 48-bit integer to a real number in the interval (0,1) rv = R*( Float64( IT1 )+R*( Float64( IT2 )+R*( Float64( IT3 )+R* Float64( IT4 ) ) ) ) if rv == 1 # handle the case where the value is different i1 += 2 i2 += 2 i3 += 2 i4 += 2 else break end end # now, rv is a random value # port the line from `dlarnv` for idist=2 x[i] = 2*rv - 1 end iseed[] = tuple(IT1,IT2,IT3,IT4) # update the seed end
This works great!
seed1 = Base.RefValue(tuple(1,3,5,7)) seed2 = Base.RefValue(tuple(1,3,5,7)) x1 = zeros(501); _dlarnv_blas!(2, seed1, length(x1), x1) x2 = similar(x1); _dlarnv_idist_2!(seed2, length(x2), x2) @assert x1 == x2 @assert seed1[] == seed2[] ## x3 = zeros(143); _dlarnv_blas!(2, seed1, length(x3), x3) x4 = zeros(axes(x3)...); _dlarnv_idist_2!(seed2, length(x4), x4) @assert x3 == x4 @assert seed1[] == seed2[]
Exactly the same values and exactly the same seeds.
Things that might happen in the future… I may pull out the key piece of the random number generator into a separate function…
rv,IT1,IT2,IT3,IT4,i1,i2,i3,i4 = _dlaruv_rng_step( nrv,IT1,IT2,IT3,IT4,i1,i2,i3,i4)
Well… I tried that, and it turns out slower, which brings us to …
Julia is faster than Fortran.
What’s neat about inlining that code is we get a small performance boost.
using BenchmarkTools @btime _dlarnv_idist_2!(tseed, length(z), z) setup=begin z=zeros(501); tseed=Base.Ref(tuple(1,3,5,7)); end @btime _dlarnv_blas!(2, tseed, length(z), z) setup=begin z=zeros(501); tseed=Base.Ref(tuple(1,3,5,7)); end
Gives …
Julia 6.686 μs (0 allocations: 0 bytes)
Fortran 7.333 μs (0 allocations: 0 bytes)
So Julia for the win!
(For reference, with the
_dlaruv_rng_step
idea, it was 7.0 microseconds, so just a nudge slower. Not sure why as it would seem to inline to exactly the same code. )Appendix
Getting the values from the fortran code.
Here is how we extracted that big array of constant values from the FORTRAN code for
dlaruv
. I believe it’s the same array forslaruv
too, but I’d need to double-check that. So this code is helpful there.code = raw""" DATA ( MM( 1, J ), J = 1, 4 ) / 494, 322, 2508, $ 2549 / DATA ( MM( 2, J ), J = 1, 4 ) / 2637, 789, 3754, $ 1145 / DATA ( MM( 3, J ), J = 1, 4 ) / 255, 1440, 1766, $ 2253 / DATA ( MM( 4, J ), J = 1, 4 ) / 2008, 752, 3572, $ 305 / DATA ( MM( 5, J ), J = 1, 4 ) / 1253, 2859, 2893, $ 3301 / DATA ( MM( 6, J ), J = 1, 4 ) / 3344, 123, 307, $ 1065 / DATA ( MM( 7, J ), J = 1, 4 ) / 4084, 1848, 1297, $ 3133 / DATA ( MM( 8, J ), J = 1, 4 ) / 1739, 643, 3966, $ 2913 / DATA ( MM( 9, J ), J = 1, 4 ) / 3143, 2405, 758, $ 3285 / DATA ( MM( 10, J ), J = 1, 4 ) / 3468, 2638, 2598, $ 1241 / DATA ( MM( 11, J ), J = 1, 4 ) / 688, 2344, 3406, $ 1197 / DATA ( MM( 12, J ), J = 1, 4 ) / 1657, 46, 2922, $ 3729 / DATA ( MM( 13, J ), J = 1, 4 ) / 1238, 3814, 1038, $ 2501 / DATA ( MM( 14, J ), J = 1, 4 ) / 3166, 913, 2934, $ 1673 / DATA ( MM( 15, J ), J = 1, 4 ) / 1292, 3649, 2091, $ 541 / DATA ( MM( 16, J ), J = 1, 4 ) / 3422, 339, 2451, $ 2753 / DATA ( MM( 17, J ), J = 1, 4 ) / 1270, 3808, 1580, $ 949 / DATA ( MM( 18, J ), J = 1, 4 ) / 2016, 822, 1958, $ 2361 / DATA ( MM( 19, J ), J = 1, 4 ) / 154, 2832, 2055, $ 1165 / DATA ( MM( 20, J ), J = 1, 4 ) / 2862, 3078, 1507, $ 4081 / DATA ( MM( 21, J ), J = 1, 4 ) / 697, 3633, 1078, $ 2725 / DATA ( MM( 22, J ), J = 1, 4 ) / 1706, 2970, 3273, $ 3305 / DATA ( MM( 23, J ), J = 1, 4 ) / 491, 637, 17, $ 3069 / DATA ( MM( 24, J ), J = 1, 4 ) / 931, 2249, 854, $ 3617 / DATA ( MM( 25, J ), J = 1, 4 ) / 1444, 2081, 2916, $ 3733 / DATA ( MM( 26, J ), J = 1, 4 ) / 444, 4019, 3971, $ 409 / DATA ( MM( 27, J ), J = 1, 4 ) / 3577, 1478, 2889, $ 2157 / DATA ( MM( 28, J ), J = 1, 4 ) / 3944, 242, 3831, $ 1361 / DATA ( MM( 29, J ), J = 1, 4 ) / 2184, 481, 2621, $ 3973 / DATA ( MM( 30, J ), J = 1, 4 ) / 1661, 2075, 1541, $ 1865 / DATA ( MM( 31, J ), J = 1, 4 ) / 3482, 4058, 893, $ 2525 / DATA ( MM( 32, J ), J = 1, 4 ) / 657, 622, 736, $ 1409 / DATA ( MM( 33, J ), J = 1, 4 ) / 3023, 3376, 3992, $ 3445 / DATA ( MM( 34, J ), J = 1, 4 ) / 3618, 812, 787, $ 3577 / DATA ( MM( 35, J ), J = 1, 4 ) / 1267, 234, 2125, $ 77 / DATA ( MM( 36, J ), J = 1, 4 ) / 1828, 641, 2364, $ 3761 / DATA ( MM( 37, J ), J = 1, 4 ) / 164, 4005, 2460, $ 2149 / DATA ( MM( 38, J ), J = 1, 4 ) / 3798, 1122, 257, $ 1449 / DATA ( MM( 39, J ), J = 1, 4 ) / 3087, 3135, 1574, $ 3005 / DATA ( MM( 40, J ), J = 1, 4 ) / 2400, 2640, 3912, $ 225 / DATA ( MM( 41, J ), J = 1, 4 ) / 2870, 2302, 1216, $ 85 / DATA ( MM( 42, J ), J = 1, 4 ) / 3876, 40, 3248, $ 3673 / DATA ( MM( 43, J ), J = 1, 4 ) / 1905, 1832, 3401, $ 3117 / DATA ( MM( 44, J ), J = 1, 4 ) / 1593, 2247, 2124, $ 3089 / DATA ( MM( 45, J ), J = 1, 4 ) / 1797, 2034, 2762, $ 1349 / DATA ( MM( 46, J ), J = 1, 4 ) / 1234, 2637, 149, $ 2057 / DATA ( MM( 47, J ), J = 1, 4 ) / 3460, 1287, 2245, $ 413 / DATA ( MM( 48, J ), J = 1, 4 ) / 328, 1691, 166, $ 65 / DATA ( MM( 49, J ), J = 1, 4 ) / 2861, 496, 466, $ 1845 / DATA ( MM( 50, J ), J = 1, 4 ) / 1950, 1597, 4018, $ 697 / DATA ( MM( 51, J ), J = 1, 4 ) / 617, 2394, 1399, $ 3085 / DATA ( MM( 52, J ), J = 1, 4 ) / 2070, 2584, 190, $ 3441 / DATA ( MM( 53, J ), J = 1, 4 ) / 3331, 1843, 2879, $ 1573 / DATA ( MM( 54, J ), J = 1, 4 ) / 769, 336, 153, $ 3689 / DATA ( MM( 55, J ), J = 1, 4 ) / 1558, 1472, 2320, $ 2941 / DATA ( MM( 56, J ), J = 1, 4 ) / 2412, 2407, 18, $ 929 / DATA ( MM( 57, J ), J = 1, 4 ) / 2800, 433, 712, $ 533 / DATA ( MM( 58, J ), J = 1, 4 ) / 189, 2096, 2159, $ 2841 / DATA ( MM( 59, J ), J = 1, 4 ) / 287, 1761, 2318, $ 4077 / DATA ( MM( 60, J ), J = 1, 4 ) / 2045, 2810, 2091, $ 721 / DATA ( MM( 61, J ), J = 1, 4 ) / 1227, 566, 3443, $ 2821 / DATA ( MM( 62, J ), J = 1, 4 ) / 2838, 442, 1510, $ 2249 / DATA ( MM( 63, J ), J = 1, 4 ) / 209, 41, 449, $ 2397 / DATA ( MM( 64, J ), J = 1, 4 ) / 2770, 1238, 1956, $ 2817 / DATA ( MM( 65, J ), J = 1, 4 ) / 3654, 1086, 2201, $ 245 / DATA ( MM( 66, J ), J = 1, 4 ) / 3993, 603, 3137, $ 1913 / DATA ( MM( 67, J ), J = 1, 4 ) / 192, 840, 3399, $ 1997 / DATA ( MM( 68, J ), J = 1, 4 ) / 2253, 3168, 1321, $ 3121 / DATA ( MM( 69, J ), J = 1, 4 ) / 3491, 1499, 2271, $ 997 / DATA ( MM( 70, J ), J = 1, 4 ) / 2889, 1084, 3667, $ 1833 / DATA ( MM( 71, J ), J = 1, 4 ) / 2857, 3438, 2703, $ 2877 / DATA ( MM( 72, J ), J = 1, 4 ) / 2094, 2408, 629, $ 1633 / DATA ( MM( 73, J ), J = 1, 4 ) / 1818, 1589, 2365, $ 981 / DATA ( MM( 74, J ), J = 1, 4 ) / 688, 2391, 2431, $ 2009 / DATA ( MM( 75, J ), J = 1, 4 ) / 1407, 288, 1113, $ 941 / DATA ( MM( 76, J ), J = 1, 4 ) / 634, 26, 3922, $ 2449 / DATA ( MM( 77, J ), J = 1, 4 ) / 3231, 512, 2554, $ 197 / DATA ( MM( 78, J ), J = 1, 4 ) / 815, 1456, 184, $ 2441 / DATA ( MM( 79, J ), J = 1, 4 ) / 3524, 171, 2099, $ 285 / DATA ( MM( 80, J ), J = 1, 4 ) / 1914, 1677, 3228, $ 1473 / DATA ( MM( 81, J ), J = 1, 4 ) / 516, 2657, 4012, $ 2741 / DATA ( MM( 82, J ), J = 1, 4 ) / 164, 2270, 1921, $ 3129 / DATA ( MM( 83, J ), J = 1, 4 ) / 303, 2587, 3452, $ 909 / DATA ( MM( 84, J ), J = 1, 4 ) / 2144, 2961, 3901, $ 2801 / DATA ( MM( 85, J ), J = 1, 4 ) / 3480, 1970, 572, $ 421 / DATA ( MM( 86, J ), J = 1, 4 ) / 119, 1817, 3309, $ 4073 / DATA ( MM( 87, J ), J = 1, 4 ) / 3357, 676, 3171, $ 2813 / DATA ( MM( 88, J ), J = 1, 4 ) / 837, 1410, 817, $ 2337 / DATA ( MM( 89, J ), J = 1, 4 ) / 2826, 3723, 3039, $ 1429 / DATA ( MM( 90, J ), J = 1, 4 ) / 2332, 2803, 1696, $ 1177 / DATA ( MM( 91, J ), J = 1, 4 ) / 2089, 3185, 1256, $ 1901 / DATA ( MM( 92, J ), J = 1, 4 ) / 3780, 184, 3715, $ 81 / DATA ( MM( 93, J ), J = 1, 4 ) / 1700, 663, 2077, $ 1669 / DATA ( MM( 94, J ), J = 1, 4 ) / 3712, 499, 3019, $ 2633 / DATA ( MM( 95, J ), J = 1, 4 ) / 150, 3784, 1497, $ 2269 / DATA ( MM( 96, J ), J = 1, 4 ) / 2000, 1631, 1101, $ 129 / DATA ( MM( 97, J ), J = 1, 4 ) / 3375, 1925, 717, $ 1141 / DATA ( MM( 98, J ), J = 1, 4 ) / 1621, 3912, 51, $ 249 / DATA ( MM( 99, J ), J = 1, 4 ) / 3090, 1398, 981, $ 3917 / DATA ( MM( 100, J ), J = 1, 4 ) / 3765, 1349, 1978, $ 2481 / DATA ( MM( 101, J ), J = 1, 4 ) / 1149, 1441, 1813, $ 3941 / DATA ( MM( 102, J ), J = 1, 4 ) / 3146, 2224, 3881, $ 2217 / DATA ( MM( 103, J ), J = 1, 4 ) / 33, 2411, 76, $ 2749 / DATA ( MM( 104, J ), J = 1, 4 ) / 3082, 1907, 3846, $ 3041 / DATA ( MM( 105, J ), J = 1, 4 ) / 2741, 3192, 3694, $ 1877 / DATA ( MM( 106, J ), J = 1, 4 ) / 359, 2786, 1682, $ 345 / DATA ( MM( 107, J ), J = 1, 4 ) / 3316, 382, 124, $ 2861 / DATA ( MM( 108, J ), J = 1, 4 ) / 1749, 37, 1660, $ 1809 / DATA ( MM( 109, J ), J = 1, 4 ) / 185, 759, 3997, $ 3141 / DATA ( MM( 110, J ), J = 1, 4 ) / 2784, 2948, 479, $ 2825 / DATA ( MM( 111, J ), J = 1, 4 ) / 2202, 1862, 1141, $ 157 / DATA ( MM( 112, J ), J = 1, 4 ) / 2199, 3802, 886, $ 2881 / DATA ( MM( 113, J ), J = 1, 4 ) / 1364, 2423, 3514, $ 3637 / DATA ( MM( 114, J ), J = 1, 4 ) / 1244, 2051, 1301, $ 1465 / DATA ( MM( 115, J ), J = 1, 4 ) / 2020, 2295, 3604, $ 2829 / DATA ( MM( 116, J ), J = 1, 4 ) / 3160, 1332, 1888, $ 2161 / DATA ( MM( 117, J ), J = 1, 4 ) / 2785, 1832, 1836, $ 3365 / DATA ( MM( 118, J ), J = 1, 4 ) / 2772, 2405, 1990, $ 361 / DATA ( MM( 119, J ), J = 1, 4 ) / 1217, 3638, 2058, $ 2685 / DATA ( MM( 120, J ), J = 1, 4 ) / 1822, 3661, 692, $ 3745 / DATA ( MM( 121, J ), J = 1, 4 ) / 1245, 327, 1194, $ 2325 / DATA ( MM( 122, J ), J = 1, 4 ) / 2252, 3660, 20, $ 3609 / DATA ( MM( 123, J ), J = 1, 4 ) / 3904, 716, 3285, $ 3821 / DATA ( MM( 124, J ), J = 1, 4 ) / 2774, 1842, 2046, $ 3537 / DATA ( MM( 125, J ), J = 1, 4 ) / 997, 3987, 2107, $ 517 / DATA ( MM( 126, J ), J = 1, 4 ) / 2573, 1368, 3508, $ 3017 / DATA ( MM( 127, J ), J = 1, 4 ) / 1148, 1848, 3525, $ 2141 / DATA ( MM( 128, J ), J = 1, 4 ) / 545, 2366, 3801, $ 1537 / """ lines = split(code, "\n"; keepempty=false) for i = 1:2:length(lines) l1 = lines[i] l2 = lines[i+1] # next line with last piece vals13str = split(l1, "/")[2] # values 1 to 3 vals13 = parse.(Int, split(vals13str, ","; keepempty=false)) vals14 = push!(vals13, parse(Int, split(l2[2:end])[1])) println("tuple(", join(vals14, ","), "),") end
-
ARPACK Sorting Take 2 (Part ... blah in the ongoing series...)
Last post on sorting was a disaster. I tried to do something quick and simple in Julia and ran into all sorts of issues with things I didn’t quite understand and breaks in my mental model about how Julia works. This was just to try and write the real julia code for the following task: sort an array and take another along for the ride as is done in the following code.
This post is straightforward. We are simply going to port this code.
c----------------------------------------------------------------------- c\BeginDoc c c\Name: dsortr c c\Description: c Sort the array X1 in the order specified by WHICH and optionally c applies the permutation to the array X2. c c\Usage: c call dsortr c ( WHICH, APPLY, N, X1, X2 ) c c\Arguments c WHICH Character*2. (Input) c 'LM' -> X1 is sorted into increasing order of magnitude. c 'SM' -> X1 is sorted into decreasing order of magnitude. c 'LA' -> X1 is sorted into increasing order of algebraic. c 'SA' -> X1 is sorted into decreasing order of algebraic. c c APPLY Logical. (Input) c APPLY = .TRUE. -> apply the sorted order to X2. c APPLY = .FALSE. -> do not apply the sorted order to X2. c c N Integer. (INPUT) c Size of the arrays. c c X1 Double precision array of length N. (INPUT/OUTPUT) c The array to be sorted. c c X2 Double precision array of length N. (INPUT/OUTPUT) c Only referenced if APPLY = .TRUE. c c\EndDoc c c----------------------------------------------------------------------- c c\BeginLib c c\Author c Danny Sorensen Phuong Vu c Richard Lehoucq CRPC / Rice University c Dept. of Computational & Houston, Texas c Applied Mathematics c Rice University c Houston, Texas c c\Revision history: c 12/16/93: Version ' 2.1'. c Adapted from the sort routine in LANSO. c c\SCCS Information: @(#) c FILE: sortr.F SID: 2.3 DATE OF SID: 4/19/96 RELEASE: 2 c c\EndLib c c----------------------------------------------------------------------- c subroutine dsortr (which, apply, n, x1, x2) c c %------------------% c | Scalar Arguments | c %------------------% c character*2 which logical apply integer n c c %-----------------% c | Array Arguments | c %-----------------% c Double precision & x1(0:n-1), x2(0:n-1) c c %---------------% c | Local Scalars | c %---------------% c integer i, igap, j Double precision & temp c c %-----------------------% c | Executable Statements | c %-----------------------% c igap = n / 2 c if (which .eq. 'SA') then c c X1 is sorted into decreasing order of algebraic. c 10 continue if (igap .eq. 0) go to 9000 do 30 i = igap, n-1 j = i-igap 20 continue c if (j.lt.0) go to 30 c if (x1(j).lt.x1(j+igap)) then temp = x1(j) x1(j) = x1(j+igap) x1(j+igap) = temp if (apply) then temp = x2(j) x2(j) = x2(j+igap) x2(j+igap) = temp end if else go to 30 endif j = j-igap go to 20 30 continue igap = igap / 2 go to 10 c else if (which .eq. 'SM') then c c X1 is sorted into decreasing order of magnitude. c 40 continue if (igap .eq. 0) go to 9000 do 60 i = igap, n-1 j = i-igap 50 continue c if (j.lt.0) go to 60 c if (abs(x1(j)).lt.abs(x1(j+igap))) then temp = x1(j) x1(j) = x1(j+igap) x1(j+igap) = temp if (apply) then temp = x2(j) x2(j) = x2(j+igap) x2(j+igap) = temp end if else go to 60 endif j = j-igap go to 50 60 continue igap = igap / 2 go to 40 c else if (which .eq. 'LA') then c c X1 is sorted into increasing order of algebraic. c 70 continue if (igap .eq. 0) go to 9000 do 90 i = igap, n-1 j = i-igap 80 continue c if (j.lt.0) go to 90 c if (x1(j).gt.x1(j+igap)) then temp = x1(j) x1(j) = x1(j+igap) x1(j+igap) = temp if (apply) then temp = x2(j) x2(j) = x2(j+igap) x2(j+igap) = temp end if else go to 90 endif j = j-igap go to 80 90 continue igap = igap / 2 go to 70 c else if (which .eq. 'LM') then c c X1 is sorted into increasing order of magnitude. c 100 continue if (igap .eq. 0) go to 9000 do 120 i = igap, n-1 j = i-igap 110 continue c if (j.lt.0) go to 120 c if (abs(x1(j)).gt.abs(x1(j+igap))) then temp = x1(j) x1(j) = x1(j+igap) x1(j+igap) = temp if (apply) then temp = x2(j) x2(j) = x2(j+igap) x2(j+igap) = temp end if else go to 120 endif j = j-igap go to 110 120 continue igap = igap / 2 go to 100 end if c 9000 continue return c c %---------------% c | End of dsortr | c %---------------% c end
This code has the same routine four times with different compare functions. Let’s pull out one routine.
if (which .eq. 'SA') then c c X1 is sorted into decreasing order of algebraic. c 10 continue if (igap .eq. 0) go to 9000 do 30 i = igap, n-1 j = i-igap 20 continue c if (j.lt.0) go to 30 c if (x1(j).lt.x1(j+igap)) then temp = x1(j) x1(j) = x1(j+igap) x1(j+igap) = temp if (apply) then temp = x2(j) x2(j) = x2(j+igap) x2(j+igap) = temp end if else go to 30 endif j = j-igap go to 20 30 continue igap = igap / 2 go to 10 c else if (which .eq. 'SM') then
Things to note. First, this uses Fortran’s flexible indexing to switch to C-based indices. While this is possible in Julia, we are just going to do it manually. Again, we are trying to keep this simple. (Despite the last blog post.)
First step is just to unwrap the loops and GOTOs. In this case, I find an outer to inner strategy is helpful.
The outer loop (line number 10) seems to be the following
while igap != 0 # stuff igap = div(igap,2) end
The next loops (line number 20 and 30) seems to be encoding the pattern.
for i=igap:(n-1) j = i-igap while j > 0 if cmp(j,j+igap) # swap else break # out of while loop end j = j-igap end end
Now, we assemble these into a simple function. Other changes made are: * Predefined strings become symbols. * Rather than repeating the code 4 times, we use functions to handle the comparison * Add a new macro to handle checking the length of input vectors.
""" Sort the array X1 in the order specified by WHICH and optionally applies the permutation to the array X2. Usage: call dsortr ( WHICH, APPLY, N, X1, X2 ) Arguments WHICH Symbol (Input) :LM -> X1 is sorted into increasing order of magnitude. :SM -> X1 is sorted into decreasing order of magnitude. :LA -> X1 is sorted into increasing order of algebraic. :SA -> X1 is sorted into decreasing order of algebraic. APPLY Boolean. (Input) APPLY = .TRUE. -> apply the sorted order to X2. APPLY = .FALSE. -> do not apply the sorted order to X2. N Integer. (INPUT) Size of the arrays. X1 Double precision array of length N. (INPUT/OUTPUT) The array to be sorted. X2 Double precision array of length N. (INPUT/OUTPUT) Only referenced if APPLY = .TRUE. Returns: nothing """ function dsortr( which::Symbol, # Input apply::Bool, # Input n::Int, # Input x1::Vector{Float64}, # Input/Output x2::Vector{Float64}, # Input/Output ) @jl_arpack_check_length(x1, n) if apply @jl_arpack_check_length(x2, n) end igap = div(n,2) # integer division # the ARPACK sorting routine is all the same, just changes the # comparison, Julia gives us a way of stating that very easily. if which == :SA cmp = (x::Float64, y::Float64) -> (x < y) elseif which == :SM cmp = (x::Float64, y::Float64) -> (abs(x) < abs(y)) elseif which == :LA cmp = (x::Float64, y::Float64) -> (x > y) elseif which == :LM cmp = (x::Float64, y::Float64) -> (abs(x) > abs(y)) end while igap != 0 for i=igap:n-1 j = i-igap while j >= 0 # in the fortran code, it uses indices 0, n-1 if cmp(x1[j+1],x1[j+igap+1]) x1[j+1],x1[j+igap+1] = x1[j+igap+1],x1[j+1] if apply x2[j+1],x2[j+igap+1] = x2[j+igap+1],x2[j+1] end else break # go to 30, which means to exit while loop end j = j - igap end end igap = div(igap, 2) end end
The macro for parameter checking is pretty simple, although it took me about an hour to get all the various options correct.
macro jl_arpack_check_length(var,len) varstr = string(var) lenstr = string(len) errstr1 = "range 1:$lenstr=" errstr2 = " out of bounds for $varstr of length " return esc( quote if $len > length($var) throw(ArgumentError(string($errstr1, $len, $errstr2, length($var)))) end end) end
To get the “string” command, I had to look at
x = "Hi" mytest(x) = "My string $x" @code_lowered mytest(x)
To find how the string interpolation actually gets lowered to the string command.
Next up was testing. I wanted to run some tests on all permutation vectors.
While I could add a dependency on Combinatorics.jl to the testing repository, I wanted to keep this simple. We don’t need it to be terribly efficient. Here is what I came up with.
function all_permutations(v::Vector{Float64}; all::Vector{Vector{Float64}}=Vector{Vector{Float64}}(), prefix::Vector{Float64}=zeros(0)) if length(v) == 0 push!(all, prefix) end for i in eachindex(v) all_permutations(deleteat!(copy(v), i); prefix=push!(copy(prefix), v[i]), all) end return all end
Then our testing code runs everything in that set to test and make sure we get the correct sort.
A question
What is this sorting algorithm doing? I just ported the code. It seems to be sorting elements divided by a gap. (igap) Then it merges these sorted lists into longer sorted lists.
Some history
There is a comment in the code that references LANSO. Here is the original routine from LANSO (found in
svdpack
by Mike Berry). So we might have to chat with Beresford Parlett for more information on this algorithm.C C @(#)DSORT2.F 3.2 (BNP) 12/9/88 C SUBROUTINE DSORT2(N,ARRAY1,ARRAY2) INTEGER N REAL*8 ARRAY1(0:N-1),ARRAY2(0:N-1) C C.... SORT ARRAY1 AND ARRAY2 INTO INCREASING ORDER FOR ARRAY1 C INTEGER IGAP,I,J REAL*8 TEMP C IGAP = N/2 10 IF (IGAP.GT.0) THEN DO 200 I = IGAP,N-1 J = I-IGAP 50 IF (J.GE.0.AND.ARRAY1(J).GT.ARRAY1(J+IGAP)) THEN TEMP = ARRAY1(J) ARRAY1(J) = ARRAY1(J+IGAP) ARRAY1(J+IGAP) = TEMP TEMP = ARRAY2(J) ARRAY2(J) = ARRAY2(J+IGAP) ARRAY2(J+IGAP) = TEMP ELSE GO TO 200 ENDIF J = J-IGAP GO TO 50 200 CONTINUE ELSE RETURN ENDIF IGAP = IGAP/2 GO TO 10 END
-
Me on learning about `save` variables in Fortran and accessing them from Julia. (Part ... something ... in my ongoing segment on porting ARPACK to Julia)
Huh, what are these things in a FORTRAN code marked
save
… Oh… jeez, is that persistent state in a subroutine? i.e. like global variables? ugh, that is going to be annoying to port. (More on a strategy in a bit.)e.g. dgetv0.f in ARPACK
save first, iseed, inits, iter, msglvl, orth, rnorm0
How could these possibly be implemented in a shared library? Well, most likely they are just fixed offsets in memory somewhere that the subroutine knows about and manipulates. So can I see them from Julia?
Try dumping the dylib/so file to find if these are listed in there. (They ought to be as they should take up persistent locations in memory…) (Any of the following commands show them…)
nm -a libarpack.2.0.0.dylib objdump -p libarpack.2.0.0.dylib dsymutil -s libarpack.2.0.0.dylib nm -nm libarpack.2.0.0.dylib
They are! e.g. In this case, I picked
iseed
which is the ARPACK random number seed/state.dgleich@circulant test-arpack-state % nm -nm /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | grep iseed 000000000005d060 (__DATA,__bss5) non-external _iseed.3637 000000000005d080 (__DATA,__bss5) non-external _iseed.3636 000000000005d0a0 (__DATA,__bss5) non-external _iseed.3636 000000000005d0c0 (__DATA,__bss5) non-external _iseed.3637
Well, there are a few things listed here… but probably something we can resolve via code.
Try and lookup symbols via
dlsym
…using Arpack using Arpack_jll using LinearAlgebra libar = Base.Libc.dlopen(Arpack_jll.libarpack) ## Try and get offset of private symbol directly offset = Base.Libc.dlsym(libar, "iseed.3637") offset = Base.Libc.dlsym(libar, "_iseed.3637") offset = Base.Libc.dlsym(libar, "iseed")
doesn’t work because `dlsym only has exported/public symbols accessible. (Why, I don’t know because they are in the darn file, so it’s like they want to make it harder to access things… which I guess is a good thing?)
Try and look at offsets from the load base. e.g. if the dylib/so file says that function
dgetv0
is at address 0x0010f00 and I load that at address 0x5020e00, then the offset to other addresses should be the same… which it is for functions (easy to verify)dgleich@circulant test-arpack-state % nm -nm /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | grep getv0 nm -nm /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | grep getv0 000000000001f2a0 (__TEXT,__text) external _dgetv0_ julia> offset = Base.Libc.dlsym(libar, "dgetv0_") Ptr{Nothing} @0x0000000170bac2a0 julia> base_offset = offset - 0x000000000001f2a0 Ptr{Nothing} @0x0000000170b8d000
So we have a base offset, which means that other functions and symbols should respect that…
dgleich@circulant test-arpack-state % nm -nm /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | grep arscnd 0000000000008a00 (__TEXT,__text) external _arscnd_ julia> arscnd_offset = Base.Libc.dlsym(libar, "arscnd_") Ptr{Nothing} @0x0000000170b95a00 julia> arscnd_offset == 0x0000000000008a00 + base_offset true Awesome, so let's check out some of those iseed variables... julia> iseed_offsets = [0x5d060, 0x5d080, 0x5d0a0, 0x5d0c0] julia> iseeds = Ptr{Int64}.(base_offset .+ iseed_offsets) julia> unsafe_load.(iseeds) 4-element Vector{Int64}: 0 0 0 0 julia> unsafe_load.(iseeds .+ 8) 4-element Vector{Int64}: 0 0 0 0 Okay, so these iseed variables are currently 0, because we haven't run anything yet. (Internally in `dgetv0` they are used to track the state of the random number generator... ) julia> using Arpack; eigs(randn(50,50)) julia> unsafe_load.(iseeds) Hmm. That shows the same values. Maybe another set? ARPACK also uses `first` as some one-time routine initialization tracked with `save` variables. $ nm -nm /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | grep first julia> first_offsets="""000000000005c398 (__DATA,__data) non-external _first.3636 000000000005c3a0 (__DATA,__data) non-external _first.3637 000000000005c3b0 (__DATA,__data) non-external _first.3635 000000000005c3b8 (__DATA,__data) non-external _first.3640 000000000005c3c0 (__DATA,__data) non-external _first.3634 000000000005c3c8 (__DATA,__data) non-external _first.3639 000000000005c3d8 (__DATA,__data) non-external _first.3635 000000000005c3e0 (__DATA,__data) non-external _first.3640 000000000005c3e8 (__DATA,__data) non-external _first.3634 000000000005c3f0 (__DATA,__data) non-external _first.3639 000000000005c400 (__DATA,__data) non-external _first.3636 000000000005c408 (__DATA,__data) non-external _first.3637 000000000005c780 (__DATA,__bss3) non-external _first.3633 000000000005c8d8 (__DATA,__bss3) non-external _first.3632 000000000005cc08 (__DATA,__bss3) non-external _first.3632 000000000005cea8 (__DATA,__bss3) non-external _first.3633""" |> x->split(x,"\n") |> x->map(y->y[1:16],x) |> x->parse.(UInt64, x[1:16];base=16) (Sorry for syntax fancyness, `"0x001" |> x->parse(Int,x)` just creates a little mini-function and chains a set of function evaluations. Handy for little sequences like that where I split, then break apart, then parse each.) julia> firsts = Ptr{Int64}.(base_offset .+ first_offsets) julia> unsafe_load.(firsts) 16-element Vector{Int64}: 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 Which means that everything hasn't really been run yet. (This is the same right after loading the library. The 0's at the end are there as well.)
So, hmm… that doesn’t seem to work. (Or at least I don’t see changes I expect to see…) But still the function addresses are exactly where I predict them.
Try and check out dissambled routines for help. Learn how to get dissassembled code from
objdump
dgleich@circulant test-arpack-state % objdump -D /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib | less 000000000001f2a0 _dgetv0_: 1f2a0: 41 57 pushq %r15 1f2a2: 41 56 pushq %r14 1f2a4: 49 89 d7 movq %rdx, %r15 1f2a7: 41 55 pushq %r13 1f2a9: 41 54 pushq %r12 1f2ab: 49 89 fd movq %rdi, %r13 1f2ae: 55 pushq %rbp 1f2af: 53 pushq %rbx 1f2b0: 49 89 f4 movq %rsi, %r12 1f2b3: 4c 89 c3 movq %r8, %rbx 1f2b6: 4d 89 ce movq %r9, %r14 1f2b9: 48 83 ec 28 subq $40, %rsp 1f2bd: 48 83 3d e3 d0 03 00 00 cmpq $0, 250083(%rip) 1f2c5: 74 29 je 41 <_dgetv0_+0x50> 1f2c7: 66 0f 6f 05 31 f4 02 00 movdqa 193585(%rip), %xmm0 1f2cf: 48 c7 05 ce d0 03 00 00 00 00 00 movq $0, 250062(%rip) 1f2da: 0f 29 05 9f dd 03 00 movaps %xmm0, 253343(%rip) 1f2e1: 66 0f 6f 05 27 f4 02 00 movdqa 193575(%rip), %xmm0 1f2e9: 0f 29 05 a0 dd 03 00 movaps %xmm0, 253344(%rip) 1f2f0: 49 83 7d 00 00 cmpq $0, (%r13) 1f2f5: 0f 84 85 02 00 00 je 645 <_dgetv0_+0x2e0> 1f2fb: 41 0f b6 04 24 movzbl (%r12), %eax
In case you want a spoiler, it’s
cmpq $0, 250083(%rip)
that checks “if first == 1” on the first subroutine call and so the address of first is0x1f2c5 + 250083
(the NEXT instruction offset +%rip offset
) so the offset is actually0x0005c3a8
(just add the numbers and write in hex).(This is actually off by just a bit because of how Fortran seems to store logicals. There are a few details I don’t fully understand but it’s close enough I’m not worried. )
Next up. Check code on Linux, the julia code I worked on above seems to work there????!??!? Everything I was doing above should really have worked?? Amazing. (Just had to pull new offsets from that
libarpack
… )But wait, gosh, why doesn’t this work on the mac I was sitting in front of? Is this some weirdo Mac dlsym loader/protection system that randomizes offsets… and keeps it from working on the mac?
Develop C/C++ code to do the same, verify that the addresses I’m getting for functions seem correct. (Meanwhile, learn how to use
lldb
…!) Okay, everythinglldb
seems to be doing is exactly what I think ought to work. (See the C/C++ code below.)Check that I can use the C++ code and pointer offsets as I’m doing in Julia and on the mac in order to
Learn more details about position independent code and why everything is in terms of offsets from the instruction pointer
%rip
. (See that disassembled code above for an example)Start learning about ccall, julia library loading. (Hoping to find some pointer as to why there would be this weirdness… )
Double check to see that I’ve gotten the right library loaded…
julia> Base.Libc.dllist() 152-element Vector{String}: "/usr/lib/libSystem.B.dylib" "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/libjulia.1.6.dylib" "/usr/lib/system/libcache.dylib" "/usr/lib/system/libcommonCrypto.dylib" "/usr/lib/system/libcompiler_rt.dylib" "/usr/lib/system/libcopyfile.dylib" "/usr/lib/system/libcorecrypto.dylib" "/usr/lib/system/libdispatch.dylib" "/usr/lib/system/libdyld.dylib" "/usr/lib/system/libkeymgr.dylib" ⋮ "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libmbedx509.2.24.0.dylib" "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libmbedcrypto.2.24.0.dylib" "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libdSFMT.dylib" "/System/Library/PrivateFrameworks/DebugSymbols.framework/Versions/A/DebugSymbols" "/System/Library/PrivateFrameworks/CoreServicesInternal.framework/Versions/A/CoreServicesInternal" "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libstdc++.6.dylib" "/Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libgomp.1.dylib" "/Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib" "/Users/dgleich/.julia/packages/Arpack/zCmTA/deps/usr/lib/libarpack.2.0.0.dylib"
Oh…
You see that? In the bottom there? There are two
libarpack
s listed.I was looking at the wrong one. :(
🤦
Just for kicks now…
dgleich@circulant test-arpack-state % nm -nm /Users/dgleich/.julia/packages/Arpack/zCmTA/deps/usr/lib/libarpack.2.0.0.dylib | grep iseed 000000000005afa0 (__DATA,__bss5) non-external _iseed.3637 000000000005afc0 (__DATA,__bss5) non-external _iseed.3636 000000000005afe0 (__DATA,__bss5) non-external _iseed.3636 000000000005b000 (__DATA,__bss5) non-external _iseed.3637 julia> libar = Base.Libc.dlopen("/Users/dgleich/.julia/packages/Arpack/zCmTA/deps/usr/lib/libarpack.2.0.0.dylib") julia> offset = Base.Libc.dlsym(libar, "dgetv0_") Ptr{Nothing} @0x000000017596f730 dgleich@circulant test-arpack-state % nm -nm /Users/dgleich/.julia/packages/Arpack/zCmTA/deps/usr/lib/libarpack.2.0.0.dylib | grep dgetv0 000000000001e730 (__TEXT,__text) external _dgetv0_ julia> base_offset = offset - 0x000000000001e730 Ptr{Nothing} @0x0000000175951000 julia> libar = Base.Libc.dlopen(Arpack_jll.libarpack) julia> iseed_offsets = [0x5afa0, 0x5afc0, 0x5afe0, 0x5b000] julia> iseeds = Ptr{Int64}.(base_offset .+ iseed_offsets) julia> unsafe_load.(iseeds) julia> unsafe_load.(iseeds) 4-element Vector{Int64}: 0 1626 0 0 julia> eigs(randn(50,50)) julia> unsafe_load.(iseeds) 4-element Vector{Int64}: 0 561 0 0
🤦 indeed. (This took way more time than I would have liked. Approx. a full day.)
Should have trusted the fact that this all worked in Linux, where somehow
Arpack
is using the same library asArpack_jll
One other thing I learned. On the macos build of arpack. There is no timing infomation collected. The
arscnd
function (which should return some time value in seconds) just is a no-op.0000000000008a00 _arscnd_: 8a00: c7 07 00 00 00 00 movl $0, (%rdi) 8a06: c3 retq 8a07: 90 nop 8a08: 90 nop 8a09: 90 nop 8a0a: 90 nop 8a0b: 90 nop 8a0c: 90 nop 8a0d: 90 nop 8a0e: 90 nop 8a0f: 90 nop
(Or rather, this seems to write 0 to the location passed into
arscnd
!)I learned this, of course, because one of the thing
arpack
does is collect timing information to persist between function calls so it can time user-code via the call-backs and such.So in debugging, I was often looking at locations where time information ought to be stored. Finding it continually empty was … surprising.
Another thing I learned, you can access the common blocks of FORTRAN code via
cglobal
in Julia.julia> debug = cglobal(("debug_", "/Users/dgleich/.julia/packages/Arpack/zCmTA/deps/usr/lib/libarpack.2.0.0.dylib"),Int64) Ptr{Int64} @0x00000001759ac020
Documentation and structure of debug block here. But we can use it to turn on lots of fun ARPACK debug information!
julia> unsafe_store!(debug, 2, 11) # mnaupd turn on debugging for mnaupd (non-symmetric solver) julia> unsafe_store!(debug, -14, 2) # ndigit - use 14 digits of precision (ndigit) julia> unsafe_store!(debug, 4, 3) # mgetv0 show lots of information from dgetv0 julia> unsafe_store!(debug, 6, 1) # logfile set logfil to 6, the default stdout julia> eigs(randn(50,50)) _getv0: B-norm of initial / restarted starting vector ----------------------------------------------------- 1 - 1: 4.1596362767673D+00 _getv0: initial / restarted starting vector ------------------------------------------- 1 - 2: -8.3153140665625D-01 8.3995344740674D-01 3 - 4: 9.6704039685513D-01 3.6866117348409D-01 5 - 6: 6.0284542702433D-03 2.0318894267245D-01 7 - 8: -1.3495066933914D-01 4.1078102410386D-01 9 - 10: 4.3888096465232D-01 -5.7522033107717D-01 11 - 12: -8.7293721104508D-01 9.8788590316195D-01 13 - 14: -1.2282299641820D-01 7.4369927039981D-01 15 - 16: 8.7906957254355D-01 -4.7762629697247D-01 17 - 18: -4.7162805784843D-02 1.1706962983657D-01 19 - 20: 7.8492285203887D-01 2.6127819008416D-01 21 - 22: -8.2777335418346D-01 -6.5745389045117D-01 23 - 24: 4.4467420164616D-01 9.9532171436879D-01 25 - 26: 6.9740537844088D-01 -9.2167903292103D-01 27 - 28: -9.1859980666224D-01 -7.1191753019861D-01 29 - 30: -1.5899117215995D-01 1.5815438448009D-01 31 - 32: -5.1806821604240D-01 3.8970905952163D-01 33 - 34: -8.7740030659049D-01 -5.7437967251424D-02 35 - 36: 7.9785742115958D-01 3.1842249389620D-01 37 - 38: -6.1567371152282D-01 -2.1826976841397D-01 39 - 40: 8.0618226869270D-01 2.7929452805704D-01 41 - 42: -7.8813902002933D-01 1.7058205959301D-01 43 - 44: -4.0247429259527D-01 -6.0433106862224D-01 45 - 46: -4.1118723651662D-01 -3.4363227883103D-01 47 - 48: -4.4783519455385D-01 2.1286373092316D-02 49 - 50: 2.9004800999430D-01 4.4388758381478D-01 _naupd: Number of update iterations taken ----------------------------------------- 1 - 1: 17 _naupd: Number of wanted "converged" Ritz values ------------------------------------------------ 1 - 1: 6 _naupd: Real part of the final Ritz values ------------------------------------------ 1 - 2: -5.7794067256680D+00 -5.7794067256680D+00 3 - 4: -7.2262688525223D+00 6.8333693387731D+00 5 - 6: 6.8333693387731D+00 7.6197283464233D+00 _naupd: Imaginary part of the final Ritz values ----------------------------------------------- 1 - 2: 2.7424220041744D+00 -2.7424220041744D+00 3 - 4: 0.0000000000000D+00 2.3888574879058D+00 5 - 6: -2.3888574879058D+00 0.0000000000000D+00 _naupd: Associated Ritz estimates --------------------------------- 1 - 2: 5.0738750651244D-16 5.0738750651244D-16 3 - 4: 2.5131234631038D-25 1.2315725733964D-30 5 - 6: 1.2315725733964D-30 1.0152776765181D-35 ============================================= = Nonsymmetric implicit Arnoldi update code = = Version Number: 2.4 = = Version Date: 07/31/96 = ============================================= = Summary of timing statistics = ============================================= Total number update iterations = 17 Total number of OP*x operations = 206 Total number of B*x operations = 0 Total number of reorthogonalization steps = 35 Total number of iterative refinement steps = 0 Total number of restart steps = 0 Total time in user OP*x operation = 0.000000 Total time in user B*x operation = 0.000000 Total time in Arnoldi update routine = 0.000000 Total time in naup2 routine = 0.000000 Total time in basic Arnoldi iteration loop = 0.000000 Total time in reorthogonalization phase = 0.000000 Total time in (re)start vector generation = 0.000000 Total time in Hessenberg eig. subproblem = 0.000000 Total time in getting the shifts = 0.000000 Total time in applying the shifts = 0.000000 Total time in convergence testing = 0.000000 Total time in computing final Ritz vectors = 0.000000
Other references - https://github.com/conan-io/conan-center-index/issues/4696 - https://stackoverflow.com/questions/22769246/how-to-disassemble-one-single-function-using-objdump - https://eli.thegreenplace.net/2011/11/11/position-independent-code-pic-in-shared-libraries-on-x64 - https://github.com/conan-io/conan-center-index/issues/4696
C/C++ Code to check
Compile with (edit to your own paths)
clang -g -fPIC test-arpack-state-dlopen.c \ -Wl,-rpath /Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib \ -Wl,-rpath /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia \ -o mytest-dlopen
And here’s the code. This just calls ARPACK once to see what happens. In this case, I figured out exactly which offset to
first
to use (it was the first one, hah!)/* test_arpack_state */ #include <inttypes.h> typedef int64_t blasint; typedef void (*dsaupd_ptr)( blasint* ido, char* bmat, blasint* n, char* whichstr, blasint* nev, double* tol, double* resid, blasint* ncv, double* v, blasint* ldv, blasint* iparam, blasint* ipntr, double* workd, double* workl, blasint* lworkl, blasint* info); #include <printf.h> #include <dlfcn.h> int main(int argc, char** argv) { blasint nev = 3; double tol = -1.0; blasint maxiter = 500; blasint n = 25; double v0[25] = {0}; blasint ncv = 20; double v[25*20]; double workd[25*3]; double workl[20*(20+8)]; double resid[25]; blasint lworkl = 20*(20+8); blasint info = 0; blasint iparam[11] = {0}; blasint ipntr[11] = {0}; blasint ido = 0; iparam[0] = 1; iparam[2] = maxiter; iparam[6] = 1; char bmat[] = "I"; char whichstr[] = "SA"; blasint ldv = 25; void* libar = dlopen("/Users/dgleich/.julia/artifacts/e1631d3aec690feb8f2330c629e853190df49fbe/lib/libarpack.2.1.0.dylib", RTLD_LAZY|RTLD_GLOBAL); if (libar) { printf("Loaded libarpack\n"); } dsaupd_ptr pfun = dlsym(libar, "dsaupd_"); if (pfun) { printf("Loaded dsaupd_\n"); } void* base_offset = pfun - 0x000000000002acb0; // offset from objdump. double* dsaupd_t0 = base_offset + 0x000000000005c604; // offset from objdump. blasint* logfil = base_offset + 0x000000000005c420; blasint* ndig = base_offset + 0x000000000005c420+8; blasint* mgetv0 = base_offset + 0x000000000005c420+16; blasint* msaupd = base_offset + 0x000000000005c420+24; *mgetv0 = 2; *msaupd = 2; blasint* dgetv0_first1 = base_offset + 0x000000000005c3a0+8; blasint* dgetv0_first2 = base_offset + 0x000000000005c408+8; printf("logfil = %i\n", (int)*logfil); printf("ndigit = %i\n", (int)*ndig); printf("dgetv0_first1 = %i\n", (int)*dgetv0_first1); printf("dgetv0_first2 = %i\n", (int)*dgetv0_first2); pfun(&ido, bmat, &n, whichstr, &nev, &tol, resid, &ncv, v, &ldv, iparam, ipntr, workd, workl, &lworkl, &info ); printf("base_offset = %p\n", base_offset); printf("dsapud = %p\n", pfun); printf("dsapud_t0 = %p\n", dsaupd_t0); printf("info = %i\n", (int)info); printf("ido = %i\n", (int)ido); printf("dsaupd_t0 = %lf\n", (double)*dsaupd_t0); printf("dgetv0_first1 = %i\n", (int)*dgetv0_first1); printf("dgetv0_first2 = %i\n", (int)*dgetv0_first2); //dlclose(libar); return 0; }
-
"In-place" sorting allocates half-sized temporary arrays in Julia for non-numeric types.
I hope my confusion at the difference between the memory allocation behavior of the following codes can be forgiven.
julia> using BenchmarkTools julia> @btime sort!(v) setup=(v=randn(100)); 268.868 ns (0 allocations: 0 bytes) julia> @btime sort!(v) setup=(v=tuple.(randn(100))); 714.639 ns (2 allocations: 528 bytes)
Julia often uses the
!
symbol on a function to indicate “in-place” but this also is usually a hint that it’s “non-allocating” – which is great if I want to sort things in place without extra memory allocation. (See my previous post.)(Strictly speaking, this
!
simply means ‘mutating’ which means that the function changes the input arguments.)Neither of these is a contract. (e.g. something in-place might be allocating/non-allocating depending on the types of array element.)
The documentation of
sort!
simple mentionsin-place
Sort the vector v in place.However, in this case, tuple elements can be moved around without allocating. So I, for one, would expect the codes to share the same amount of memory traffic and allocations.
After much debugging, what I found is that julia defaults to QuickSort for numeric types, which is strictly non-allocating whereas it uses MergeSort for non-numeric types. (Yes, the change in algorithms is in the documentation for
sort!
, yay! What isn’t there is the implication on memory allocations.)julia> @btime sort!(v;alg=QuickSort) setup=(v=tuple.(randn(100))); 947.789 ns (0 allocations: 0 bytes)
Moreover, this behavior depends only on the element type of the vector, not on the actual values sorted.
julia> @btime sort!(v,by=first) setup=(v=tuple.(randn(100))); 790.624 ns (2 allocations: 528 bytes) julia> @btime sort!(v;alg=QuickSort,by=first) setup=(v=tuple.(randn(100))); 947.789 ns (0 allocations: 0 bytes)
(In this case, by=first means we should use the first element, which means Julia is conceptually sorting a numeric type.)
I think this particular example breaks an implicit user contract. By calling
sort!
I’m asking Julia not to allocate extra memory in its algorithm. To have a default algorithm choice that will allocate is wrong from a working mental model point of view. Moreover, if this isn’t the case, then the docs forsort!
ought to be very strongly worded thatsort!
can only be recommended fornumeric
types unless you change the default algorithm. (But then why is that the default choice for sort.)Finally, to the extend that the algorithm choice is influenced by the element type, it should apply to the actual element type used, not merely the element-type in memory!
I realize these are tricky issues. e.g. MergeSort is tough to have in0place
Many would be likely to say “premature optimization”, why am I fussed about this difference?
Here’s the issue. I need a mental model for how code is translated or I can’t possibly ever be concerned with “optimization” and I must give up. In this case, a simplified mental model of what the tool / language is doing breaks. So making sure the “compiler / language / runtime” engage my mental model for what his happening – when I’m trying to design it for that setting is not premature optimization, it’s simply part of the spec!
Moreover, it seems this issue has come up already more discussion
-
A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 6 - Sorting Take 1
Next up on our list of routines to convert is the sorting routine
dsortr
. Again, this could be replaced with internal Julia stuff. However, this sorting routine does something that shows up quite often in practice. You want to sort arrayx
, and take arrayy
along for the ride!The pseudocode is simple:
function sort_permute(x,y) p = sortperm(x) return x[p], y[p] end a = [5, 8, 9, 1, 4, 3, 2] b = [3., 4., 5., 2., 1., 9., 8.] sort_permute(a,b) ([1, 2, 3, 4, 5, 8, 9], [2.0, 8.0, 9.0, 1.0, 3.0, 4.0, 5.0])
This is a recurring pattern. Here’s a link to a question on stack overflow.. That post references an old blog post (updated to Wayback machine). And hey, I think I know that fellow. Good to know that things haven’t changed too much in 15 years! (And that I’m still working on the same old stuff… hmm… maybe that isn’t so good.)
So let’s solve this problem in Julia again. Or wait, let’s not. Off the top of my head, I don’t know how to do this in Julia without allocating the extra permutation vector. After glancing around sort.jl in base, it seems like you could probably do this by … something complicated by creating a new array type that wraps the pair of arrays.
Uh oh, the problem with having an idea…
struct SortPermutePair{U,V} <: AbstractArray{Tuple{U,V},1} x::Vector{U} y::Vector{V} end
… trying to resist urge to test it …
Base.size(z::SortPermutePair{U,V}) where {U,V} = (min(length(z.x),length(z.y)),) Base.getindex(z::SortPermutePair{U,V}, i::Integer) where {U,V} = (z.x[i], z.y[i]) Base.setindex!(z::SortPermutePair{U,V}, v::Tuple, i::Integer) where {U,V} = begin z.x[i] = v[1]; z.y[i] = v[2] end
… yeah, not gonna work and we’ll have to see how far I get here. Let’s see how to do this in Julia before actually porting the ARPACK routine.
a = [5, 8, 9, 1, 4, 3, 2] b = [3., 4., 5., 2., 1., 9., 8.] sort!(SortPermutePair(a,b), by=x->x[1])
So that wasn’t so bad! No extra memory allocated… (right?!?) This shouldn’t allocate extra memory…
a = [5, 8, 9, 1, 4, 3, 2] b = [3., 4., 5., 2., 1., 9., 8.] @time sort!(SortPermutePair(a,b), by=x->x[1]); @time sort!(SortPermutePair(a,b), by=x->x[1]); julia> @time sort!(SortPermutePair(a,b), by=x->x[1]); 0.050316 seconds (58.15 k allocations: 3.460 MiB, 99.04% compilation time) julia> @time sort!(SortPermutePair(a,b), by=x->x[1]); 0.050027 seconds (58.15 k allocations: 3.460 MiB, 99.08% compilation time)
Hmm. Well, that’ll have to wait for a future blog post to debug. My guess is that I need type annotations in all the functions, which is something we’ll return to in a future post. (Okay – I had to at least try
@btime
…) I’m very bad at tangents here! That’s the fun of this post series. I’m going to go into all of those and not edit them out.using BenchmarkTools @btime sort!(SortPermutePair($a,$b), by=x->x[1]); setup=begin a=[5, 8, 9, 1, 4, 3, 2] b=[3., 4., 5., 2., 1., 9., 8.] end
This gives
52.661 ns (1 allocation: 80 bytes) 7-element Vector{Float64}: ...
Well, that is slightly better. I’m not sure why we see one allocation. This should be doable with zero allocations. (I think it might somewhere be allocating that last output vector???)
@btime sort!(a); setup=(a=randn(10)) 44.800 ns (0 allocations: 0 bytes)
Maybe it’s a function boundary thing?
function sort_permute!(x,y;kwargs...) sort!(SortPermutePair(x,y);kwargs...,by=first) x,y end julia> @time sort_permute!(a,b) 0.000006 seconds (2 allocations: 112 bytes) ([1, 2, 3, 4, 5, 8, 9], [2.0, 8.0, 9.0, 1.0, 3.0, 4.0, 5.0])
And this clearly has a problem with allocating extra memory somewhere
julia> a = randn(10^5); b=randn(10^5); @time sort_permute!(a,b) 0.014885 seconds (3 allocations: 781.422 KiB)
And now you have one of my least favorite things in Julia. No mental model for where/how these decisions/impacts get made. This all is designed to be allocated on the stack, so there should be no need to go into the weeds. Anyway, a few iterations later, we end up with one fix. Using that new function x->x[1] is what causes the huge hit to the compiler time.
# this occurs no matter how many times you run it as I'm running from # the repl. as it seems to need to precompile this function x->x[1] everytime. julia> @time sort!(SortPermutePair(a,b), by=x->x[1]); 0.050027 seconds (58.15 k allocations: 3.460 MiB, 99.08% compilation time) julia> @time sort!(SortPermutePair(a,b), by=first); 0.004317 seconds (152 allocations: 7.844 KiB, 99.72% compilation time) julia> @time sort!(SortPermutePair(a,b), by=first); 0.000011 seconds (3 allocations: 144 bytes)
Big long diversion before we get to the actual Arpack Fortran sorting routine I wanted to get to. That’ll have to wait for another day! So where are we. We demonstrated it’s fairly easy to do this in Julia although there is one allocation I’d like to get rid of that still seems to be there. Maybe I should use the code linter or something like that. (We’ll see
track allocations
in a future post.) -
A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 5 - Understanding Ref
Here, we pick up with debugging the call to
dsconv
within the ARPACK library itself from the last post. This is when the repo was at this commitWhen we run the tests against the Arpack wrapped function directly, it errors.
using Pkg; Pkg.test("ArpackInJulia"; test_args=["arpackjll"])
In particular, the following code always returns “-1”
import Arpack_jll, LinearAlgebra function arpack_dsconv(n::Int, ritz::Vector{Float64}, bounds::Vector{Float64}, tol::Float64) nconv::LinearAlgebra.BlasInt = -1 ccall((:dsconv_, Arpack_jll.libarpack), Cvoid, (Ref{LinearAlgebra.BlasInt}, Ptr{Float64}, Ptr{Float64}, Ref{Float64}, Ref{LinearAlgebra.BlasInt}), n, ritz, bounds, tol, nconv) return nconv end
Ugh. This is tough to debug. It seems like nconv always isn’t getting updated. One of my common refrains is that problems occur at boundaries between code. Interfaces are very hard to get right because slight differences in models can manifest as bugs or just outright incorrect answers. This is exactly what we are seeing here.
Now, I could just jump to the solution, which is blindingly obvious in retrospect. On the other hand, there’s value in walking through the process I used. (Or so I tell myself.)
Given that the code above doesn’t work, what were my working hypotheses about the causes?
- So maybe I don’t understand how the fortran calling stuff works and I shouldn’t have expected this to get updated.
- Maybe Julia is doing something I don’t understand?
- There is some odd interaction for this particular piece of code?
- The Fortran / Arpack code doesn’t do what I think it does.
Well, the last one is the the most important as everything follows from my assumption the fortran code works. So that seems like a good place to start. (In truth, I did futz around with some of the others, but then quickly settled on this one.)
Step 1. Do something you know or strongly suspect will work.
Arpack is designed to work with C and C++. There are just fewer places for bugs to hide there. So I wanted to make sure I could call this from C and get the response I expected.
This gave me the following C program.
#include <inttypes.h> typedef int64_t blasint; void dsconv_(blasint* n, double* ritz, double* bounds, double* tol, blasint *nconv); #include <printf.h> int main(int argc, char** argv) { double ritz[] = {1.0, 1.0, 1.0}; double bounds[] = {0.0, 0.0, 0.0}; double tol = 1e-8; blasint n = 3; blasint nconv = 4; dsconv_(&n, ritz, bounds, &tol, &nconv); printf("nconv = %i\n", (int)nconv); return 0; }
Which I saved into a file:
test_dsconv.c
. This should print 3 if I’ve understood this correctly. (And the other Julia code is correct.)I wanted to compile this against the Julia libraries. Here’s how I got it working. Get the path to libarpack from Julia
@show Arpack_jll.libarpack_path /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib
(Ugh, security breach… yes, my local username is dgleich. Entirely surprising!) After much fussing with library paths, etc. here’s how to get this all compiled.
gcc test_dsconv.c \ /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib \ -Wl,-rpath /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/ \ -Wl,-rpath /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia
Let’s pick this apart.
gcc test_dsconv.c # above is the call to compile this piece of code /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib # above is telling it we want to use this library. -Wl,-rpath /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/ # above is telling gcc to past an option to the linker command. Specifically # rpath means where to get runtime libraries ("runtime path") # this can also be set with LD_LIBRARY_PATH, but that's... controversial
Let’s try without the last link to Julia’s libraries to see what happens!
dgleich@circulant test_dsconv % gcc test_dsconv.c \ /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib \ -Wl,-rpath /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/ \ dgleich@circulant test_dsconv % ./a.out dyld: Library not loaded: @rpath/libopenblas64_.dylib Referenced from: /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib Reason: image not found zsh: abort ./a.out
Which is because it can’t find libopenblas, referenced by libarpack. So we need it tell it about all the other common Julia libraries. That’s what the final
-Wl,-rpath /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia # tell it about Julia's libraries.
And now it works great.
dgleich@circulant test_dsconv % gcc test_dsconv.c \ /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/libarpack.2.0.0.dylib \ -Wl,-rpath /Users/dgleich/.julia/artifacts/096eefeb84e5b1a463aefac5a561d6109f6e9467/lib/ \ -Wl,-rpath /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia dgleich@circulant test_dsconv % ./a.out nconv = 3
Okay, so my understanding of the Arpack function was correct. Ugh. That means I need to learn something about Julia.
Step 2: Better understanding Julia.
Okay, let’s go back to Julia because what we are doing ought to work. Here’s where @code_native and some hints about assembly code come in handy.
julia> @code_native arpack_dsconv(3, ones(3), zeros(3), 1e-8) .section __TEXT,__text,regular,pure_instructions ; ┌ @ none:1 within `arpack_dsconv' subq $40, %rsp ; │ @ none:4 within `arpack_dsconv' ; │┌ @ essentials.jl:396 within `cconvert' ; ││┌ @ refpointer.jl:104 within `convert' ; │││┌ @ refvalue.jl:8 within `RefValue' movq %rdi, 32(%rsp) vmovsd %xmm0, 16(%rsp) movq $-1, (%rsp) ; │└└└ ; │┌ @ pointer.jl:65 within `unsafe_convert' movq (%rsi), %rsi movq (%rdx), %rdx leaq 32(%rsp), %rdi leaq 16(%rsp), %rcx movq %rsp, %r8 movabsq $dsconv_, %rax ; │└ callq *%rax ; │ @ none:11 within `arpack_dsconv' movq $-1, %rax addq $40, %rsp retq nopw %cs:(%rax,%rax) ; └
If you look at this code long enough and google around about how
amd64
assembly works (orx86_64
if you prefer, although it’s really amd64…) then the return value of the function is in registerrax
. This codemovq $-1, %rax
sets the return register to -1 right before returning (
retq
)If you want to learn more about calling conventions and register conventions on
amd64
, see this powerpoint from cs217 at PrincetonOkay, so what’s happening is that we are just always returning
-1
independently of the function call. This means that the Julia compiler is treating the value ofnconv
as a constant and optimizing the code.Another debugging tenant is try things you think will likely work. Here’s something I strongly suspect will work. And just for fun, I tried the following code.
import Arpack_jll, LinearAlgebra function arpack_dsconv(n::Int, ritz::Vector{Float64}, bounds::Vector{Float64}, tol::Float64) nconv::Vector{LinearAlgebra.BlasInt} = [-1] ccall((:dsconv_, Arpack_jll.libarpack), Cvoid, (Ref{LinearAlgebra.BlasInt}, Ptr{Float64}, Ptr{Float64}, Ref{Float64}, Ref{LinearAlgebra.BlasInt}), n, ritz, bounds, tol, nconv) return nconv[1] end
Why do I think this will work? Because I’m passing an array and then explicitly looking at an entry. This is what I want to do, just without the array allocation that creating the vector causes. (Julia’s memory allocater is good, but not that good.)
julia> arpack_dsconv(3, ones(3), zeros(3), 1e-8) 1-element Vector{Int64}: 3
Hmm… so I need to learn more about Ref to understand why the other call didn’t work.
This line stuck out to me.
In Julia, Ref objects are dereferenced (loaded or stored) with [].
Anyway, at this point, I realized my mental model was wrong and got to …
The fix
Now, here’s the fix. The issue was the
ccall
will auto wrap an intrinsic type (i.e. int, float, etc.) with a Ref if you tell it the ref type. But this doesn’t refer to the original value, it makes a new copy of it and refers to that. So here’s working code.import Arpack_jll, LinearAlgebra function arpack_dsconv(n::Int, ritz::Vector{Float64}, bounds::Vector{Float64}, tol::Float64) nconv = Ref{LinearAlgebra.BlasInt}(-1) ccall((:dsconv_, Arpack_jll.libarpack), Cvoid, (Ref{LinearAlgebra.BlasInt}, Ptr{Float64}, Ptr{Float64}, Ref{Float64}, Ref{LinearAlgebra.BlasInt}), n, ritz, bounds, tol, nconv) return nconv[] end
The error in my mental model
As I said above, bugs exist on boundaries. Here’s the origin of this bug and my incorrect mental model of the code.
My mental model based on C/C++
nconv::Int = -1 ccall((:func, libfunc), Cvoid, (Ref{Int}), nconv) # <-> func(&nconv) in C
But that’s not right! Here’s what the code actually does.
nconv::Int = -1 ccall((:func, libfunc), Cvoid, (Ref{Int}), nconv) # <-> int newint = nconv; func(&int(newint)) in C
Which of course explains why nconv isn’t updated. It wasn’t even passed in!
Suggestion
- Add a note to
Ref
documentation on this - Add a note to
ccall
documentation on this
Is it worth it for one person with an incorrect mental model? I think yes because the
ccall
area is likely to be a source of similar bugs and it’s needed to document the assumptions! This is already partially done in this paragraphWhen passed as a
ccall
argument (either as aPtr
orRef
type), aRef
object will be converted to a native pointer to the data it references. For mostT
, or when converted to aPtr{Cvoid}
, this is a pointer to the object data. WhenT
is anisbits
type, this value may be safely mutated, otherwise mutation is strictly undefined behavior. -
A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 4 - Checking against libarpack
See Overview for a high-level picture, also see ArpackInJulia.jl on Github
One of the things I wanted to do was check against the true ARPACK functions. The idea is that we can swap out true ARPACK routines for our routines easily. So I wanted to be able to call the true ARPACK routines individually myself.
This is remarkably easy and how Arpack.jl works with the functions
aupd
andeupd
in ARPACK to implement eigs. So this isn’t surprisingly. It’s still fun to see that ease of use translate into new codes.The idea is that
Arpack.jl
usesArpack_jll
to access the compiled FORTRAN code as a dynamic library. All the binary compiling, etc. is handled byArpack_jll
andArpack.jl
can just use that. So we are going toimport Arpack_jll
so we can get access to
@show Arpack_jll.libarpack
which gives the path to the ARPACK library in a system independent fashion. (At some point, this will work on the Apple M1 too, but not yet.) [Probably will be about the same time as I finish posting these notes though and getting all of the routines ported.]
Once we have that, we can simply setup a ccall to get what we need! See the blas.jl functions for many examples of how to do this. Also see Arpack.jl for even more examples that are specific to ARPACK.
import Arpack_jll, LinearAlgebra function arpack_dsconv(n::Int, ritz::Vector{Float64}, bounds::Vector{Float64}, tol::Float64) nconv::Int = 0 ccall((:dsconv_, Arpack_jll.libarpack), Cvoid, (Ref{LinearAlgebra.BlasInt}, Ptr{Float64}, Ptr{Float64}, Ref{Float64}, Ref{LinearAlgebra.BlasInt}), n, ritz, bounds, tol, nconv) return nconv end
Of course, I only want to depend on
Arpack_jll
in the test routines. (Actually, I’d really only like to depend on it in a subset of the test routines. But I don’t see how to do that right now and it isn’t relevant.)Put another way, I want to enable the tests to run without
Arpack_jll
working on a particular platform. And where we don’t run the tests that need it.To control which tests are run, we can check
ARGS
as described in this pull request https://github.com/JuliaLang/Pkg.jl/pull/1226 where this functionality was implemented in Julia.if "arpackjll" in ARGS include("arpackjll.jl") # get the helpful routines to call stuff in "arpackjll" @testset "arpackjll" begin # tests that check against libarpack directly end end
So I stuck the call to dsconv into
arpackjll.jl
import Arpack_jll, LinearAlgebra function arpack_dsconv(n::Int, ritz::Vector{Float64}, bounds::Vector{Float64}, tol::Float64) nconv::LinearAlgebra.BlasInt = -1 ccall((:dsconv_, Arpack_jll.libarpack), Cvoid, (Ref{LinearAlgebra.BlasInt}, Ptr{Float64}, Ptr{Float64}, Ref{Float64}, Ref{LinearAlgebra.BlasInt}), n, ritz, bounds, tol, nconv) return nconv end
This does require me to add
Arpack_jll
andLinearAlgebra
to the Test package itself. We can do this by activating the test package and then adding them.# change to the test subdirectory ] activate . add LinearAlgebra add Arpack_jll
And our full piece of new test code
if "arpackjll" in ARGS include("arpackjll.jl") # get the helpful routines to call stuff in "arpackjll" @testset "arpackjll" begin @testset "dsconv" begin soln = arpack_dsconv(10, ones(10), zeros(10), 1e-8) [@test](https://micro.blog/test) ArpackInJulia.dsconv(10, ones(10), zeros(10), 1e-8)==soln soln = arpack_dsconv(10, zeros(10), ones(10), 1e-8) [@test](https://micro.blog/test) ArpackInJulia.dsconv(10, zeros(10), ones(10), 1e-8)==soln end end end
This flips ones and zeros, which should show converged in one of the cases.
And what’s weird, it fails. I always get back arpack_dsconv==-1. Hmm… will debug in the next post!
-
A funny thing in @JuliaLanguage. I often illustrate that finite differences in floating point are inaccurate with small changes. But this Julia code works and it’s fun to understand why!
xx=range(0,1,length=100) F(x) = 5.0.*x .+ 1/3 h = eps(1.0) z = (F(xx.+h)-F(xx))./h
-
A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 3 - First routine
See Overview for a high-level picture, also see ArpackInJulia.jl on Github
We are going to start our ARPACK conversion with about the simplest routine:
dsconv
. This routine for a double-precision, symmetric eigenvalue problem just checks if the eigenvalue estimates have converged at all.c----------------------------------------------------------------------- c\BeginDoc c c\Name: dsconv c c\Description: c Convergence testing for the symmetric Arnoldi eigenvalue routine. c c\Usage: c call dsconv c ( N, RITZ, BOUNDS, TOL, NCONV ) c c\Arguments c N Integer. (INPUT) c Number of Ritz values to check for convergence. c c RITZ Double precision array of length N. (INPUT) c The Ritz values to be checked for convergence. c c BOUNDS Double precision array of length N. (INPUT) c Ritz estimates associated with the Ritz values in RITZ. c c TOL Double precision scalar. (INPUT) c Desired relative accuracy for a Ritz value to be considered c "converged". c c NCONV Integer scalar. (OUTPUT) c Number of "converged" Ritz values. c c\EndDoc c c----------------------------------------------------------------------- c c\BeginLib c c\Routines called: c arscnd ARPACK utility routine for timing. c dlamch LAPACK routine that determines machine constants. c c\Author c Danny Sorensen Phuong Vu c Richard Lehoucq CRPC / Rice University c Dept. of Computational & Houston, Texas c Applied Mathematics c Rice University c Houston, Texas c c\SCCS Information: @(#) c FILE: sconv.F SID: 2.4 DATE OF SID: 4/19/96 RELEASE: 2 c c\Remarks c 1. Starting with version 2.4, this routine no longer uses the c Parlett strategy using the gap conditions. c c\EndLib c c----------------------------------------------------------------------- c subroutine dsconv (n, ritz, bounds, tol, nconv) c c %----------------------------------------------------% c | Include files for debugging and timing information | c %----------------------------------------------------% c include 'debug.h' include 'stat.h' c c %------------------% c | Scalar Arguments | c %------------------% c integer n, nconv Double precision & tol c c %-----------------% c | Array Arguments | c %-----------------% c Double precision & ritz(n), bounds(n) c c %---------------% c | Local Scalars | c %---------------% c integer i Double precision & temp, eps23 c c %-------------------% c | External routines | c %-------------------% c Double precision & dlamch external dlamch c %---------------------% c | Intrinsic Functions | c %---------------------% c intrinsic abs c c %-----------------------% c | Executable Statements | c %-----------------------% c call arscnd (t0) c eps23 = dlamch('Epsilon-Machine') eps23 = eps23**(2.0D+0 / 3.0D+0) c nconv = 0 do 10 i = 1, n c c %-----------------------------------------------------% c | The i-th Ritz value is considered "converged" | c | when: bounds(i) .le. TOL*max(eps23, abs(ritz(i))) | c %-----------------------------------------------------% c temp = max( eps23, abs(ritz(i)) ) if ( bounds(i) .le. tol*temp ) then nconv = nconv + 1 end if c 10 continue c call arscnd (t1) tsconv = tsconv + (t1 - t0) c return c c %---------------% c | End of dsconv | c %---------------% c end
In Julia, this routine is basically a two-liner (could likely do in one) that computes:
eps23 = eps(1.0)^(2/3) return sum( (ritzi, boundsi)->boundsi <= max(eps23, abs(ritzi))*tol ? 1 : 0, zip(ritz, bounds))
(Full disclosure, I didn’t test this, so there could be a subtle error, but I hope the idea is clear that this COULD be done very easily.)
On the other hand, the idea was to port this as directly as possible, so let’s write out the loops, etc.
function dsconv( n::Int, ritz::Vector{Float64}, bounds::Vector{Float64}, tol::Float64 ) eps23::Float64 = eps(1.0)^(2.0/3.0) nconv::Int = 0 if n > length(ritz) throw(ArgumentError("range 1:n=$n out of bounds for ritz, of length $(length(ritz))")) end if n > length(bounds) throw(ArgumentError("range 1:n=$n out of bounds for bounds, of length $(length(bounds))")) end @inbounds for i=1:n tmp = max( eps23, abs(ritz[i]) ) if ( bounds[i] <= tol*tmp ) nconv += 1 end end return nconv end
Here, we have borrowed some of the out of bounds reporting from blas.jl in order to try and be somewhat standard.
Of course, the issue is that the ARPACK code also includes some cool timing information
c c %----------------------------------------------------% c | Include files for debugging and timing information | c %----------------------------------------------------% c include 'debug.h' include 'stat.h' call arscnd (t0) c ... many lines ... call arscnd (t1) tsconv = tsconv + (t1 - t0)
The function
arscnd
is the ARPACK second function, which gives the current time in seconds. So this simply tracks the seconds used in this function.To store the information, this code uses the Fortran common block, which looks like a set of global variables, and update them based on the time used in this routine. In the port, I’d like to have this functionality as well. However while the the FORTRAN common block could be replaced with a set of global variables, I wanted to use a more modern construct enabled by Julia’s macros that will give thread-safety and also compiler-optimized code removal when not needed.
Here’s a better design. Take in a keyword argument `timing’ that is a julia structure.
Base.@kwdef struct ArpackStats tconv::Float64 = 0.0 # the time sp # ... there will be more! end
We use Base.@kwdef to auto-initialize all the times to zero so we don’t have to write a construction and can just call ArpackStats(). (Check out help on Base.@kwdef for more!)
Then we can simply use the following construct to update time:
function dsconv( n::Int, ritz::Vector{Float64}, bounds::Vector{Float64}, tol::Float64; stats=nothing ) t0 = timing != nothing ? time() : 0.0 ## NEW eps23::Float64 = eps(1.0)^(2.0/3.0) nconv::Int = 0 if n > length(ritz) throw(ArgumentError("range 1:n=$n out of bounds for ritz, of length $(length(ritz))")) end if n > length(bounds) throw(ArgumentError("range 1:n=$n out of bounds for bounds, of length $(length(bounds))")) end @inbounds for i=1:n tmp = max( eps23, abs(ritz[i]) ) if ( bounds[i] <= tol*tmp ) nconv += 1 end end if stats != nothing; stats.tconv += time() - t0; end ## NEW return nconv end
Of course, these if statements get a bit tedious. So let’s use Julia’s macros to handle this more seemlessly to get what I want to write.
""" Convergence testing for the symmetric Arnoldi eigenvalue routine. In Julia, this can be replaced with sum( ) Julia port of ARPACK dsconv routine. Usage: call dsconv ( N, RITZ, BOUNDS, TOL; debug, stats ) Arguments N Integer. (INPUT) Number of Ritz values to check for convergence. RITZ Double precision array of length N. (INPUT) The Ritz values to be checked for convergence. BOUNDS Double precision array of length N. (INPUT) Ritz estimates associated with the Ritz values in RITZ. TOL Double precision scalar. (INPUT) Desired relative accuracy for a Ritz value to be considered "converged". Optional values (they default to nothing) debug (INPUT/OUTPUT) nothing or an ArpackStats struct stats (INPUT/OUTPUT) nothing or an ArpackStats struct These are used to collect debugging and performance stats and are mostly internal. Return value NCONV Integer scalar. (RETURN) Number of "converged" Ritz values. """ function dsconv( n::Int, ritz::Vector{Float64}, bounds::Vector{Float64}, tol::Float64; stats=nothing, debug=nothing ) #= c\Author c Danny Sorensen Phuong Vu c Richard Lehoucq CRPC / Rice University c Dept. of Computational & Houston, Texas c Applied Mathematics c Rice University c Houston, Texas Julia port by David F. Gleich, Purdue University =# #= c | Executable Statements | =# t0 = @jl_arpack_time if n > length(ritz) throw(ArgumentError("range 1:n=$n out of bounds for ritz, of length $(length(ritz))")) end if n > length(bounds) throw(ArgumentError("range 1:n=$n out of bounds for bounds, of length $(length(bounds))")) end eps23::Float64 = eps(1.0)^(2.0/3.0) nconv::Int = 0 @inbounds for i=1:n #= c | The i-th Ritz value is considered "converged" | c | when: bounds(i) .le. TOL*max(eps23, abs(ritz(i))) | =# tmp = max( eps23, abs(ritz[i]) ) if ( bounds[i] <= tol*tmp ) nconv += 1 end end # call arscnd (t1) # tsconv = tsconv + (t1 - t0) @jl_update_time(tconv, t0) return nconv end
Here, we can use the following macros to accomplish this.
macro jl_arpack_time() return esc(:( stats == nothing ? zero(ArpackTime) : time() )) end macro jl_update_time(field, t0) return esc( quote if stats != nothing stats.$field += time() - $t0 end end ) end
And thus, we have our actually strategy. Note that some of the little code snippets here were taken from development versions and some of the variable names may have changed. (e.g.
tconv
vs.tsconv
) because there is no reason to track symmetric vs. nonsymmetric time separately. (I mean, I guess we could, but that seems unlike Julia language stuff.) Could be revisited in the future!At this point, we also put a bunch of stuff in Github to keep the project coming. See the ArpackInJulia.jl Github repo.
-
A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 2 - Overview
See Overview for a high-level picture.
Here is a quick call graph of the various ARPACK functions for the symmetric eigenvalue/eigenvector problem
dsaupd
d
= double precisions
= symmetricaupd
= Arnoldi Update (since we’ll call this in a loop to update the internal Arnoldi information.)
Note to the pedantic. Yes, I know the Arnoldi of a symmetric matrix is usually called the Lanczos process. So I should be saying Lanczos everywhere here. Except, maybe I really shouldn’t be saying either and we could use the phrase: iteratively orthogonalized power basis instead of Arnoldi or three-term recurrent power basis instead of Lanczos? Either way, since the package is called ARPACK and I’m writing about that, I’ll stick with Arnoldi here.
Back to the call graph.
dsaupd calls dsaup2 calls dgetv0 calls printing, reverse communication dsaitr calls printing, blas, lapack dsapps calls printing, blas, lapack dsconv dseigt dsgets dsortr ivout - write out an integer vector arscnd - internal wrapper avoiding LAPACK second function dvout - write out a double vector
then we also need to handle getting the eigenvector information out of the computation.
dseupd calls dsesrt dsortr - same as above
This suggests a plan.
Port the low-level/internal routines first. Then port the high-level routines.
Wait what you say? Why not do it the opposite way? Well, I started doing it that way and realized it’s just harder ;). So we are going to switch and do the easy stuff first!
Just as a quick overview of what the code actually does.
dsaupd - high level interface dsaup2 - medium level interface, manages the overall process dgetv0 - gets the starting vector, or returns control to user to get it. dsaitr - haven't looked yet... dsapps - haven't looked yet... dsconv - check what has converged (easiest routine!) dseigt - haven't looked yet... dsgets - get shifts based on eigenvalue estimates dsortr - sorts a vector and/or applies the sort to an aux vector too
Also, let me note that the ARPACK documentation is stellar. Stellar. So a few clicks and I could easily figure out what those other functions are doing.
-
A small project to do a pure port of ARPACK Symmetric solver to Julia - Part 1 - Goals
I’m working on a small “me-time” project to port the ARPACK symmetric solver to Julia.
My goal is to port the double-precision ARPACK for symmetric matrices in Julia. Including all ARPACK stuff. So this should give “exactly” what ARPACK does but be a pure Julia implementation. (Where exactly is … it should be executing roughly the same sequence of floating point operations and can differ on levels that would be expected for different compilers compiling the same code.)
- not a goal to “Julia-ize” the package; I want to keep as close to the FORTRAN as possible so that I might be able to replace calls to Julia’s Arpack.saupd / Arpack.seupd (which call the Fortran library) with this code.
- small internal function changes are okay, e.g. ARPACK has various debugging and timing stuff that would need to be done differently in Julia.
- small simplifications, e.g. if a function computes a single Int, we can rewrite that to return the Int rather than writing it into an array like in FORTRAN.
- Why? Why not implement my own ImplicitRestart/Eigensolver/Etc.? Simple: I trust ARPACK. Also, I want to understand exactly what the symmetric ARPACK solver is doing.
- Why not use a Fortran->Julia compiler? Well, I could. But I could also do this and learn intimate details of how it works to help out in teaching :)
- I want to document some of the cool stuff in ARPACK!
For context, I spent about a semester studying these solvers with Gene Golub to see if there were obvious ideas to make computing the Fiedler vectors faster, but beyond the high level algorithmics and lots of experiments, nothing came of it. This project is different in that the end goal is easy – port the code, with a focus on explaining in detail what each step does. (hah, how about that name-dropping?)
Since it’s good to have a driving use-case, the idea is to make the following MatrixNetworks.jl code work without the ARPACK dependency.
""" `_symeigs_smallest_arpack` --- Compute eigenvalues and vectors using direct calls to the ARPACK wrappers to get type-stability. This function works for symmetric matrices. This function works on Float32 and Float64-valued sparse matrices. It returns the smallest set of eigenvalues and vectors. It only works on matrices with more than 21 rows and columns. (Use a dense eigensolver for smaller problems.) Functions --------- - `(evals,evecs) = _symeigs_smallest_arpack(A::SparseMatrixCSC{V,Int}, nev::Int,tol::V,maxiter::Int, v0::Vector{V})` Inputs ------ - `A`: the sparse matrix, must be symmetric - `nev`: the number of eigenvectors requested - `tol`: the relative tolerance of the eigenvalue computation - `maxiter`: the maximum number of restarts - `v0`: the initial vector for the Lanczos process Example ------- This is an internal function. """ function _symeigs_smallest_arpack( A::SparseMatrixCSC{V,Int},nev::Int,tol::V,maxiter::Int, v0::Vector{V}) where V n::Int = checksquare(A) # get the size @assert n >= 21 # setup options mode = 1 sym = true iscmplx = false bmat = String("I") ncv = min(max(2*nev,20),n-1) whichstr = String("SA") ritzvec = true sigma = 0. #TOL = Array{V}(undef,1) #TOL[1] = tol TOL = Ref(tol) lworkl = ncv*(ncv + 8) v = Array{V}(undef, n, ncv) workd = Array{V}(undef, 3*n) workl = Array{V}(undef, lworkl) resid = Array{V}(undef, n) resid[:] = v0[:] info = zeros(BlasInt, 1) info[1] = 1 iparam = zeros(BlasInt, 11) ipntr = zeros(BlasInt, 11) ido = zeros(BlasInt, 1) iparam[1] = BlasInt(1) iparam[3] = BlasInt(maxiter) iparam[7] = BlasInt(mode) # this is a helpful indexing vector zernm1 = 0:(n-1) while true # This is the reverse communication step that ARPACK does # we need to extract the desired vector and multiply it by A # unless the code says to stop Arpack.saupd( ido, bmat, n, whichstr, nev, TOL, resid, ncv, v, n, iparam, ipntr, workd, workl, lworkl, info) load_idx = ipntr[1] .+ zernm1 store_idx = ipntr[2] .+ zernm1 x = workd[load_idx] if ido[1] == 1 workd[store_idx] = A*x elseif ido[1] == 99 break else error("unexpected ARPACK behavior") end end # Once the process terminates, we need to extract the # eigenvectors. # calls to eupd howmny = String("A") select = Array{BlasInt}(undef, ncv) d = Array{V}(undef, nev) sigmar = ones(V,1)*sigma ldv = n Arpack.seupd(ritzvec, howmny, select, d, v, ldv, sigmar, bmat, n, whichstr, nev, TOL, resid, ncv, v, ldv, iparam, ipntr, workd, workl, lworkl, info) if info[1] != 0 error("unexpected ARPACK exception") end # Now we want to return them in sorted order (smallest first) p = sortperm(d) d = d[p] vectors = v[1:n,p] return (d,vectors,iparam[5]) end
-
Anyone else find the documentation in old Fortran codes is amazingly beautiful? e.g. look at the documentation in ARPACK. github.com/opencolla… Wonderfully annotated code.
-
Julia (@JuliaComputing) and VSCode on Native M1.
Install M1 Native VSCode.
Install Xcode & Command Line tools.
Clone Julia master.
make
./julia
ENV[“JULIA_GR_PROVIDER”] = “GR”
]add Plots
using Plots
plot(randn(10))
Point VSCode at new julia and 🍾🤩🥳
-
My thoughts on getting a COVID vaccine today.
My thoughts on getting a COVID vaccine today. I’m of mixed feelings about this because, while vaccine!, strictly speaking, the eligibility of University faculty (as I understand them) are unclear. I’m also not particularly old, nor high-risk.
Meijer (and Kroger, and others) are vaccinating university faculty as educators. (Which I definitely am!) This fact has been posted on Twitter by others, and announced on a university mailing list. While I’d like to think of myself as a as-scrupulous-as-reasonable rule-follower, this is a murky area.
I never lied to get this. Also, remember – any type of medical/healthcare employer was vaccinating nearly everyone when vaccines were really scarce, even those capable of work at home or not doing patient facing care. Here’s part of my rationalization.
-The US has a somewhat inefficient allocation of vaccine supply to where it is needed. I booked this appointment with <= 24 not-sleeping hours notice. I cannot believe this keeps anyone who ought to be eligible from getting a vaccine. (The pharmacy still had same day availability when I got my vaccine.)
-I have the capacity/flexibility/ability to exchange my time/energy to get a vaccine when/where others can’t. Should I not use this to help address the inefficient allocation above?
-I have a responsibility to help keep the daycare safe that we participate in, I struggled with how I would feel if I were to get COVID at this point (knowing that I passed up a vaccine opportunity) and pass it onto anyone at the daycare.
I’m posting this note for a few reasons.
We need to remind people that vaccines help everyone! There is some social signaling and sociology Even if you don’t need it, it will help others by curtailing possible spread, and others may need some encouragement to get it.
By trying to understand the appointment landscape, I helped a number of people get the vaccine weeks earlier than they were originally scheduled.
If you, for any reason, want to get the vaccine. Realize that there is capacity in the system in some places.
And use this information to help others too.
If people want to know my thoughts on the current state of getting a vaccine in Indiana, I’m happy to share more from looking at appointment availability for an elementary teacher I work with. My sense is that appointments are reasonably highly available with <= 24 hours notice if you are willing to travel <= 100 miles.
subscribe via RSS