atomicules

Push propelled program tinkerer and picture maker.

Making use of Haskerdeux

I honestly thought that after I got Haskerdeux working again I'd not be doing much else to improve it, but lo and behold the whole "fix your own frustrations" thing kicked in and I ended up improving it so I could use it to work on todos of any date. Why? Because I wanted to be able to easily script adding a whole bunch of todos on various dates in the future.

I actually ended up doing some Ruby to (technically) Haskell scripting as follows:

require 'helpscout'
helpscout = HelpScout::Client.new("<api key>")

# Read in an array of "conversations" I had from a previous action
# I.e. a text file with one of something like this per line: https://api.helpscout.net/v1/conversations/326542115.json
conversations = File.readlines("conversations.txt").map(&:strip)

conversations.each do |conv|
    ticket  = helpscout.conversation(conv[43..-6])
    # DateTime.parse is clever enough to pick out date from a string like "something something needs to happen on 2017-03-05 so here's some text"
    todo_date = (DateTime.parse(ticket.preview)-1).to_date.iso8601
    `./haskerdeux #{todo_date} new "Need to do this before tomorrow [#{ticket.number}](#{ticket.url})"`
end

(This Helpscout Ruby gem is really good, by the way).

I.e. I had a list of HelpScout conversations that had been generated previously and I had actions based on these I needed to do on a certain date. So I iterated through each conversation, parsed the date from the ticket preview text and then added them to my Teuxdeux list via a Ruby shell/system call to Haskerdeux. Simple, hacky, nifty.

NPF so far

I am late to the NPF party, but let's face it: I don't use NetBSD because I want the latest and greatest, I use NetBSD because I want something I can (mostly) make work wherever I put it.

Since setting up a proper desktop machine with NetBSD I decided to use it as a playground for NPF with the ultimate goal of switching to NPF from IPF (IPFilter) where it actually matters to me most; IPF still works for me, but it seems to become more quirky with each NetBSD release.

It was not obvious - at all - to me from the documentation how to get NPF working with IPv6 and IPv4 (I have both at home, thanks excellent broadband provider!). With IPF there are two separate .conf files that are mostly independent from each other. Anyway, in case anyone else is struggling similarly I ended up doing the following and couldn't figure a way to completely avoid duplication:

# Simple npf.conf for Desktop with wired connection and IPv4 and IPv6

$ext4_if = inet4(bge0)
$ext6_if = inet6(bge0)

$services_in_tcp = domain
$services_in_udp = domain

procedure "log" {
    log: npflog0
}

group "external" on $ext4_if {
    pass stateful out final all

    pass stateful in final family inet4 proto tcp to $ext4_if port ssh apply "log"
    pass stateful in final proto tcp to $ext4_if port $services_in_tcp
    pass stateful in final proto udp to $ext4_if port $services_in_udp

    # Passive FTP
    pass stateful in final proto tcp to $ext4_if port 49151-65535
    # Traceroute
    pass stateful in final proto udp to $ext4_if port 33434-33600
}

group "external6" on $ext6_if {
    pass stateful out final all

    pass stateful in final proto tcp to $ext6_if port $services_in_tcp
    pass stateful in final proto udp to $ext6_if port $services_in_udp

    # Passive FTP
    pass stateful in final proto tcp to $ext6_if port 49151-65535
    # Traceroute
    pass stateful in final proto udp to $ext6_if port 33434-33600
}
group default {
    pass final on lo0 all
    block all
}

Attempts to have one group that would work on both IPv6 and IPv4 failed. Maybe it is possible somehow, but I sure as hell couldn't figure it out and the above does work.

There are some good tips at the bottom of the npfctl man page that you can use to test the rules are loaded and working:

  1. Use npfctl reload, not load otherwise you get a weird error of "npfctl_config_load: no such file or directory".
  2. Then use npfctl start
  3. Then use npfctl show which should list the rules.

Since I seemingly had it working on the desktop I had a whirl on XEN, which requires building your own kernel and then being able to publish the kernel somewhere you can download it from so you can boot it. This is more fiddly than it sounds when your web host is also your XEN NetBSD install. However, after doing all that it turns out NPF is broken on XEN on 7.0.2. I could fight it, but I've decided to just wait until 7.1.

Coming back to the desktop I was surprised to find that just having npf=YES in /etc/rc.conf wasn't enough to actually load the rules and start npf on boot - or at least that is what I thought. I started playing about with trying to explicitly call /etc/rc.d/npf reload and /etc/rc.d/npf start in rc.local, but then found it would produce an error because the interfaces didn't seem to be ready. After a bit more searching I found this: npf startup failure when using dhcpcd inet4 and inet6. The exact issue I was seeing.

Guess I'm waiting for 7.1 on the desktop as well!

Definitely feel fine with not rushing towards npf

Gradle and Java SSL debug

I hate Java and I hate Java build systems even more. This is why for the most part I always try to keep things simple and just use javac and java directly where I can. However, invariably if you are going to touch other people's Java you can't avoid things like Gradle and Maven. Gradle isn't too bad I suppose, but I spent hours and hours trying to figure out why I couldn't do this:

gradle run -Djavax.net.debug=SSL,trustmanager

this was with 3.2.1 and seemingly it should work. I did my default assumption: It must be me doing something wrong, but after finally running:

gradle run -Djavax.net.debug=SSL,trustmanager --debug

and inspecting the output I discovered those commands are not being passed through. However, this also gave me the solution to the problem: Copy the command from the --debug output (search for "Starting process 'command") and then paste that directly into the terminal where I could then insert -Djavax.net.debug=SSL,trustmanager and run it. Hey presto, another example of build managers just getting in the way. I wonder if gradle just has a command like gradle show to show the command that it would run?

Haskerdeux Neux

I always said that had I had a smartphone I would have stuck with Teuxdeux all along. There is something about it that just sits right with me, hence all my efforts to emulate it with Taskwarrior.

So it was no surprise I switched back when the opportunity finally arose. But I'm still keen to be able to accomplish as much as I can via the Terminal on NetBSD - I'll accept platform agnosticism over owning every byte of my own data. I'd stumbled across dmi3/teuxdeux in the meantime (wilderness years) which was really interesting to me as I'd spent a huge amount of time playing with curl trying to figure out the neux API when it was first (not actually) released and could never get it to work. Thanks to dmi3/teuxdeux I realised it must actually be possible to work with the new non-API.

The first mistake

After playing much more I finally realised that the whole (well, 98%) reason I couldn't get things to work was because I'd been misspelling "authenticity". I checked. Four years ago in my curl calls I was spelling it "authencity". Doh! I'm pretty sure I would have persevered had I realised.

The second mistake

What the hell, things still didn't work?! After much more confusion I then realised in the put request had "vi" and not "v1" for the URL. Typos have screwed me up a lot, but this is taking the biscuit.

The third mistake

I'll accept this one. The next reason why I failed to get this to work was because I'd not used the "-L" parameter (follow location) on curl when logging on. That was significant and I only ended up trying that after stumbling across this old blog post, Using curl with a web site secured by Rails Authenticity Token.

The last mistake

Since in my attempts to get this to work I'd long dropped back from Haskell to straight use of curl, the last mistake I'd made was a misunderstanding of the curl man page. I thought you could just use --cookie-jar for writing and reading cookies, but you can't you need to use it in conjunction with --cookie for reading and sending the cookies, --cookie-jar is just for saving cookies.

Mistakes I realised I'd made last time

After finally getting to the point where I could logon on make simple api calls on the command line I started changing the Haskell app to suit, still using Network.Curl even though very out of date, because it still (seemed to) work and the state of other options seemed incomplete and confusing as it was a few years ago. I also realised I'd done a lot of bad things originally, like recreating the curl object anew on each method so I improved that and passed about a curl object instead.

I had things "working" again with Network.curl, apart from I couldn't figure out what it was doing with cookies. As far as I could tell it seemed to be handling them in memory. The CurlCookie options worked and were necessary, but it didn't actually seem to be saving the cookies to a file anywhere. I ended up having to re-logon everytime.

Not a mistake, but a change

Even after that work-around though I ran into a road block with PUT requests. They seemingly should have worked via Network.Curl, but I eventually had to conclude they didn't.

I admitted defeat and went for straight system calls instead since I knew the command line worked. It doesn't make it spectacularly Haskelly, but it does mean it works.

Same, but different

So after a few months I'm finally back to the exact same functionality I had a few years ago. The only real differences are:

  • Uses boring system calls instead of a nice Haskelly way (but it is all the same deep under the surface, right?) and so doesn't have as a nice way of checking for returns like it did with 200 status codes when using Network.Curl.
  • You (and by that I mean "I") can only logon with netrc (which suits me).
  • It doesn't pass credentials all the time (because of the change in the API) which is a lot better, but it might mean when the cookies eventually expire you'd (I'd) have to reset the login by removing the cached auth token.
  • I can't be bothered to make it better (I had faint hope the first time, but no, it does what I want it to do)

Anyway, it's back alive again here: Haskerdeux.

There is no flashy fancy code to show off here, I just made it work again - that was achievement enough; The ironic thing about officially working in the Software/Technology industry is that I now have much less time for writing code than I did when I was "officially" a Mechanical Engineer.

cabal-install notes for NetBSD

Just a quick little post so in another three years time I'll be able to figure out how to do this again a bit more quickly.

If you are trying to use cabal with headers and things on none standard paths (e.g. pkgsrc) then you need to do:

cabal install curl --extra-include-dirs=/usr/pkg/include/ --extra-lib-dirs=/usr/pkg/lib --configure-option=CPPFLAGS=-I/usr/pkg/include/ --configure-option=LDFLAGS=-L/usr/pkg/lib

The important bit being the --configure-option flags as the --extra ones are documented and thus a bit more obvious.

Writing crap Elixir code from the point of view of someone who writes crap Erlang code

Woo, quite a long title, but I think that effectively explains what this is going to be about. There are already lots of good posts on Elixir from proper Erlang programmers. This isn't meant to be another one of them. It's just a collection of things I've noticed.

It is pretty much the same

Superficially there is no real difference (or benefit to be gained) between Erlang and Elixir code. Here's some Erlang code:

pad_to(Length, Binary_string) when length(Binary_string) < Length ->
    Padded_binary_string = "0"++Binary_string,
    pad_to(Length, Padded_binary_string);
pad_to(Length, Binary_string) when length(Binary_string) == Length ->
    Binary_string.

And the equivalent in Elixir:

def pad_to(length, binary_string) when length(binary_string) < length do
  padded_binary_string = '0'++binary_string
  pad_to(length, padded_binary_string)
end
def pad_to(length, binary_string) when length(binary_string) == length do
  binary_string
end

Apart from the wrapping of the code in def and end it looks pretty much the same.

You have to still use Erlang

Quite a bit. I was surprised how quick I came across the need to do this, but not all Erlang things have Elixir equivalents so you have to use Erlang code; However, at least this is seamless and painless.

Original Erlang:

Binary_string = hd(io_lib:format("~.2B", [Binary_number])),

Elixir port:

binary_string = hd(:io_lib.format("~.2B", [binary_number]))

defp is really neat

In Erlang you export only the functions you want:

-module(polyline).
-export([six_bit_chunks/1]).

six_bit_chunks(Encoded_polyline) ->
    six_bit_chunks_(Encoded_polyline, []).
six_bit_chunks_([Head | Rest], Chunks_list) ->
    Six_bit_chunk = six_bit_chunk(Head),
    %Add to Reversed_chunks
    six_bit_chunks_(Rest, [Six_bit_chunk]++Chunks_list);
six_bit_chunks_([], Chunks_list) ->
    lists:reverse(Chunks_list).

In Elixir everything that uses def is exported automatically. If you don't want something exported you use defp. This is really neat.

def six_bit_chunks(encoded_polyline) do
  six_bit_chunks_(encoded_polyline, [])
end
defp six_bit_chunks_([head | rest], chunks_list) do
  six_bit_chunk = six_bit_chunk(head)
  #Add to Reversed_chunks
  six_bit_chunks_(rest, [six_bit_chunk]++chunks_list)
end
defp six_bit_chunks_([], chunks_list) do
  Enum.reverse(chunks_list)
end

Indices are different

I don't know why, but indices are different. I don't think there is a clever reason. It just is.

In Erlang lists:sublist("words", 2, 3) returns "ord". And in Elixir Enum.slice('words', 2, 3) returns 'rds'; That is something else to be aware off the differences in single and double quotes; single quotes are charlists as per Erlang, whereas double quotes are for Elixir strings (something different).

Flipping order of args... and pipes.

In Elixir some functions have the order of the arguments flipped from how they are in Erlang. This threw me to start with and I thought it was to make Ruby programmers happy, but actually it's because of pipes.

In Erlang:

Eight_bit_chunks = lists:map(
    fun(Group_of_chunks) ->
        eight_bit_chunks(Group_of_chunks)
    end,
Five_bit_chunks).

In Elixir the list, etc is always first:

eight_bit_chunks = Enum.map(
  five_bit_chunks,
  fn(group_of_chunks) ->
    eight_bit_chunks(group_of_chunks)
  end)

This means you can then do things like this:

five_bit_chunks
|>  Enum.map(
      fn(group_of_chunks) ->
        eight_bit_chunks(group_of_chunks)
      end)

I.e. pull out and pipe five_bit_chunks into the Enum.map. Which doesn't look that impressive on its own but it means you can chain things together. Pipes are neat. My only quibble is that it is a step towards Haskell's esoteric symbols (I liked Erlang over Haskell just because it was a little easier to understand).

Phoenix

I am mid-way through porting my Erlang command line application to Elixir. My plan is to then use Phoenix to turn this into a web application so other people (ha!) can use it. I also have plans to improve it. Slow, long term plans. I mention this only to say that you can write Ruby without going near Rails and the same is true of Elixir and Phoenix; and both are something you should do.

Six Degrees of Wikipedia

This may or may not have come up recently due to my circumstances; I don't think me blogging about my efforts is going to significantly improve anyone else's chances when it comes to the moment.

The idea in principal is dead simple: Given a page on Wikipedia how many steps/links/pages does it take to get to a given second page?

We don't care how we get there, there is no need to record the route, just count how many steps. The only other rule is you have to go from A -> B, you can't start at both ends and converge somewhere in the middle, i.e. A -> C <- B.

I really thought I'd have this nailed. I've done so much with Ruby and Mechanize in the past and this was an ideal fit. But I found this quite a humbling experience. For whatever reason (could have perhaps been the cat relentlessly pawing at the door) I lost my cool in the last thirty minutes of the ninety minutes allotted time, shut down a bit and stopped thinking/communicating out loud, and just couldn't get my brain to work: I knew what I was trying to do, but couldn't even communicate that to myself, let alone anyone else.

Here's exactly what I had when time was up:

#Want take url
#Destination url
#Both urls from wikipedia
#Shortest route from one to the next
#https://en.wikipedia.org/wiki/Ruby
#

@queue = {}

def findlinksbetween(url1, url2)
    #For now assume that url2 is just the /wiki/whatever bit <- Need to fix that
    steps = 0
    found = false
    A = Mechanize.new
    page = A.get(url1)
    until found
        links = page.links_with(:href => /\/wiki\// )
        found, steps = checkfor(page, url2, steps, found)
        #Get next link to try.

        #Next link in list, but don't want to wipe out top list, etc
        page = A.get(link.href)
        #Then follow those links and try all the links on that page
        #Then try each in turn
    end
    put steps
end


def checkfor(page, url, steps, found)
    steps+=1 #this isn't right
    links = page.links_with(:href => /\/wiki\// )
    links.each do |link|
        #Need to stop if we find the url
        if link == url and not found
            #then we want to stop
            found = true
            break
        end
    end
    [found, steps]
end


    #Could search through links 
    #Need to keep a cache of links checked
    #follow justthe 
    #Probably best way is to get links from just bodyContent. Might need Nokogiri for that. <- not then sure how to get the links in a nice format. Back through mechanize?

Not pretty is it? Doesn't work, at all. It's more like a collection of random thoughts. Like I said: a humbling experience. But how it nagged me. I couldn't leave it like that when I knew I had the right idea somewhere in my brain.

This is what I had a few hours later:

#Find shortest route between urls on Wikipedia
#A depth first search is easier, but slower/doesn't find shortest route

require 'mechanize'

@previous_level = []
@current_level = []
@checked = {}
@a = Mechanize.new


def findlinksbetween(url1, url2)
    #For now assume that url2 is just the /wiki/whatever bit <- Need to fix that
    step = 0
    found = false
    page = @a.get(url1)
    #Initialise level
    addlinks(page, step)
    loop do #Horrors
        #Check current level links for a match
        check(url2)
        #Not found so carry on
        @previous_level = @current_level
        @current_level = []
        step += 1
        @previous_level.each do |item|
            unless @checked[item[0]]
                begin
                    page = @a.get(item[0])
                    puts "Level: #{step}, #{@previous_level.length} urls: fetching page #{item[0]}\n"
                    addlinks(page, step)
                    @checked[item[0]] = true
                    #TODO: Only problem by collating all the links for a level before checking is it takes a long time
                rescue
                    #Could have been an image, etc
                end
            end
        end
        #but what happens
    end
end


def addlinks(page, step)
    #TODO: Probably best way is to get links from just bodyContent. Might need Nokogiri for that. <- not then sure how to get the links in a nice format. Back through mechanize? Until then just get all /wiki/ links
    links = page.links_with(:href => /^\/wiki\// )
    links.each do |link|
        #Further filter out nonsense, colons should get most of the irrelevant stuff
        unless link.href =~ /:/ or link.href =~ /Main_Page/
            @current_level << [link.href, step]
        end
    end
    #Might have duplicates at current level, but that is ok as won't visit it twice
end


def check(url)
    @current_level.each do |item|
        if item[0] == url
            puts "Found #{url} after #{item[1]} step(s)\n"
            exit
        end
    end
end

Which worked in principal, but was extremely slow and memory inefficient. I had to kill it before I got a result as it was using too much of the 1GB total I had on Linode.

You can see (maybe) what I was trying to with my initial attempt, i.e:

  1. Keeping track of the previous levels urls
  2. Keeping track of the current levels urls
  3. Keeping track of urls checked so time isn't wasted checking them again

The current level becomes the next previous level when we reach the end, etc.

You can also see bits that I knew were a problem and that were inefficient, but that I didn't really care about because I was sure I could figure them out later if need be (I.e. I didn't see them as the main problem):

  • Filtering out irrelevant links. I.e. all the header and footer links on a Wikipedia page which aren't anything to do with the article itself.
  • Checking all the links at the end of a step - slower than checking on the fly.

I decided to use Redis as a more memory efficient alternative to Ruby arrays for my final attempt for that day (by that point it was purely an effort for my own self-esteem). Redis is really awesome, by the way:

#Find shortest route between urls on Wikipedia
#A depth first search is easier, but slower/doesn't find shortest route

require 'mechanize'
require 'redis'

#TODO: Maybe should make previous_level a redis database as well, could flip/flop between current/previous
@previous_level = []
@redis = Redis.new
#current level in 0, checked in 1
@a = Mechanize.new

#Flush just to be safe
@redis.flushall

def findlinksbetween(url1, url2)
    #For now assume that url2 is just the /wiki/whatever bit <- Need to fix that
    step = 0
    found = false
    page = @a.get(url1)
    #Initialise level
    addlinks(page, step)
    loop do #Horrors
        #Check current level links for a match
        check(url2)
        #Not found so carry on
        @redis.select(0)
        @previous_level = @redis.keys
        @redis.flushdb
        step += 1
        @previous_level.each do |item|
            @redis.select(1)
            unless @redis.exists(item)
                begin
                    page = @a.get(item)
                    puts "Level: #{step}, #{@previous_level.length} urls: fetching page #{item}\n"
                    addlinks(page, step)
                    @redis.select(1)
                    @redis.set(item, true)
                    #TODO: Only problem by collating all the links for a level before checking is it takes a long time
                rescue
                    #Could have been an image, etc
                end
            end
        end
        #but what happens
    end
end


def addlinks(page, step)
    @redis.select(0)
    #TODO: Probably best way is to get links from just bodyContent. Might need Nokogiri for that. <- not then sure how to get the links in a nice format. Back through mechanize? Until then just get all /wiki/ links
    links = page.links_with(:href => /^\/wiki\// )
    links.each do |link|
        #Further filter out nonsense, colons should get most of the irrelevant stuff
        unless link.href =~ /:/ or link.href =~ /Main_Page/
            @redis.set(link.href, step)
        end
    end
end


def check(url)
    @redis.select(0)
    if @redis.exists(url)
        puts "Found #{url} after #{@redis.get(url)} step(s)\n"
        exit
    end
end

This was good enough to get it to run reasonably ok on the resources I had, but I knew improvements could still be made. That would have to wait for another day though as it was bedtime; I hadn't worked on this constantly that evening, I'd had other stuff to do.

Next, I figured out what I'd said to myself in a comment and managed to used Redis's built in databases for both the previous and current levels meaning I could easily switch between them. I also made it command line callable to make it a bit easier to use.

#Find shortest route between urls on Wikipedia
#A depth first search is easier, but slower/doesn't find shortest route

require 'optparse'
require 'mechanize'
require 'redis'

optparse = OptionParser.new do |opts|
    opts.on('-p', "--page1 PAGE", "1st Wikipedia page") { |p| Page1 = p }
    opts.on('-P', "--page2 PAGE", "2nd Wikipedia page") { |q| Page2 = q }
end
optparse.parse!

@redis = Redis.new
@current = 0
@previous = 1
Checked = 2
Wikipedia = "https://en.wikipedia.org/wiki/"
@a = Mechanize.new

#Flush just to be safe
@redis.flushall

def findlinksbetween(page1, page2)
    step = 0
    found = false
    page = @a.get(Wikipedia+page1)
    #All other links should resolve relatively
    #Initialise level
    addlinks(page, step)
    loop do #Horrors
        #Check current level links for a match
        check(page2)
        #Not found so carry on
        #Switch previous and current
        @previous ^= 1
        @current ^= 1
        @redis.select(@current)
        @redis.flushdb
        step += 1
        @redis.select(@previous)
        keys = @redis.keys
        #No way not to avoid an array at some point
        keys.each do |item|
            @redis.select(Checked)
            unless @redis.exists(item)
                begin
                    page = @a.get(item)
                    puts "Level: #{step}, #{keys.length} urls: fetching page #{item}\n"
                    addlinks(page, step)
                    @redis.select(Checked)
                    @redis.set(item, true)
                    #TODO: Only problem by collating all the links for a level before checking is it takes a long time
                rescue
                    #Could have been an image, etc
                end
            end
        end
    end
end


def addlinks(page, step)
    @redis.select(@current)
    #TODO: Probably best way is to get links from just bodyContent. Might need Nokogiri for that. <- not then sure how to get the links in a nice format. Back through mechanize? Until then just get all /wiki/ links
    links = page.links_with(:href => /^\/wiki\// )
    links.each do |link|
        #Further filter out nonsense, colons should get most of the irrelevant stuff
        unless link.href =~ /:/ or link.href =~ /Main_Page/
            @redis.set(link.href, step)
        end
    end
end


def check(page)
    @redis.select(@current)
    if @redis.exists("/wiki/"+page)
        puts "Found #{page} after #{@redis.get("/wiki/"+page)} step(s)\n"
        exit
    end
end

if defined?(Page1) and defined?(Page2)
    findlinksbetween(Page1, Page2)
end

The next attempt was to use threading. I realised that, since I would have concurrent threads going on, I couldn't use multiple Redis databases as each thread could override the database selection made in another thread and things would get very messy. I decided to use Redis Sets instead in just one database. The mechanize bit became a separate function to make threading it easier - I.e. I could extend this to three instances/threads easily. I also added in better feedback (counting down urls remaining).

In making the threaded version I discovered the need for better error handling, e.g. checks for images and checks for a successful page load: On a single thread I never had failure of the program to work; However, after adding threads (before I put in the page load check) it would blitz through, but not always find the result. I.e. some page loads must have failed, but been lost in the begin/rescue. This approach made sure to re-add the page to the list to be checked if the page failed to load.

I also realised I had lied to myself when I'd commented "No way not to avoid an array at some point".

#Find shortest route between urls on Wikipedia

#Notes:
# - A depth first search is easier, but slower/doesn't find shortest route
# - Have to be careful threads don't interfere, i.e. select wrong database in other thread. Options:
#   - Redis sets
#   - Different Redis instances
#   - Implement some kind of global lock

require 'optparse'
require 'mechanize'
require 'redis'

optparse = OptionParser.new do |opts|
    opts.on('-p', "--page1 PAGE", "1st Wikipedia page") { |p| Page1 = p }
    opts.on('-P', "--page2 PAGE", "2nd Wikipedia page") { |q| Page2 = q }
end
optparse.parse!

@redis = Redis.new
@current = 0
@previous = 1
Checked = 2
Wikipedia = "https://en.wikipedia.org/wiki/"

#Use two instances to speed things up a bit
@a = Mechanize.new
@b = Mechanize.new
@remaining = 1
$found = false

#Flush just to be safe
@redis.flushall

def findlinksbetween(page1, page2)
    step = 0
    @redis.set("step", step)
    page = @a.get(Wikipedia+page1)
    #All other links should resolve relatively
    #Initialise level
    addlinks(page)
    #TODO: Does this have to be a global var and not a class one to work? I.e. otherwise becomes localised in thread?
    #Nah, i think things are failing and not repeating
    until $found or step == 6
        #Check current level links for a match
        check(page2)
        #TODO: Only problem by collating all the links for a level before checking is it takes a long time
        #Not found so carry on
        #Switch previous and current
        @previous ^= 1
        @current ^= 1
        step += 1
        @redis.set("step", step)
        @remaining = @redis.smembers(@previous).length
        puts "***Starting*** Level: #{step}, #{@remaining} urls"
        t1 = Thread.new { mech(@a) }
        t2 = Thread.new { mech(@b) }
        Thread.new do
            until @remaining == 0
                puts "Level: #{step}, #{@remaining} urls"
                sleep 1
            end
        end
        #Makes sure both have finished though
        t1.join
        t2.join
    end
end


def addlinks(page)
    #TODO: Probably best way is to get links from just bodyContent. Might need Nokogiri for that. <- not then sure how to get the links in a nice format. Back through mechanize? Until then just get all /wiki/ links
    links = page.links_with(:href => /^\/wiki\// )
    links.each do |link|
        #Further filter out nonsense, colons should get most of the irrelevant stuff
        unless link.href =~ /:/ or link.href =~ /Main_Page/
            @redis.sadd(@current, link.href)
        end
    end
end


def check(page)
    if @redis.sismember(@current, "/wiki/"+page)
        puts "Found #{page} after #{@redis.get("step")} step(s)\n"
        $found = true
        exit
    end
end

def mech(m)
    url = ""
    until url == nil
        url = @redis.spop(@previous)
        @remaining = @redis.smembers(@previous).length
        unless @redis.sismember(Checked, url)
            #Be careful, might normally follow relative links fine, but re-added ones seem to throw it off?
            page = m.get("https://en.wikipedia.org#{url}")
            #Skip images
            unless page.response["content-type"].include?("image")
                addlinks(page)
            end
            if page.code == "200"
                @redis.sadd(Checked, url)
            else
                #re add and check later
                @redis.sadd(@previous, url)
            end
        end
    end
    @remaining = 0
end


if defined?(Page1) and defined?(Page2)
    findlinksbetween(Page1, Page2)
end

However, GIL is still a thing so threading was not providing any performance benefits which prompted a fairly trivial switch from threads to processes instead. This represents the final state of the program; Obviously further improvements are possible, but I was happy enough with everything in principal.

#Find shortest route between urls on Wikipedia

#Notes:
# - A depth first search is easier, but slower/doesn't find shortest route
# - Have to be careful threads don't interfere, i.e. select wrong database in other thread. Options:
#   - Redis sets
#   - Different Redis instances
#   - Implement some kind of global lock

require 'optparse'
require 'mechanize'
require 'redis'

optparse = OptionParser.new do |opts|
    opts.on('-p', "--page1 PAGE", "1st Wikipedia page") { |p| Page1 = p }
    opts.on('-P', "--page2 PAGE", "2nd Wikipedia page") { |q| Page2 = q }
end
optparse.parse!

@redis = Redis.new
@current = 0
@previous = 1
Checked = 2
Wikipedia = "https://en.wikipedia.org/wiki/"

#Use two instances to speed things up a bit
@a = Mechanize.new
@b = Mechanize.new
@remaining = 1
$found = false

#Flush just to be safe
@redis.flushall

def findlinksbetween(page1, page2)
    step = 0
    @redis.set("step", step)
    page = @a.get(Wikipedia+page1)
    #All other links should resolve relatively
    #Initialise level
    addlinks(page)
    #TODO: Does this have to be a global var and not a class one to work? I.e. otherwise becomes localised in thread?
    #Nah, i think things are failing and not repeating
    until $found or step == 6
        #Check current level links for a match
        check(page2)
        #TODO: Only problem by collating all the links for a level before checking is it takes a long time
        #Not found so carry on
        #Switch previous and current
        @previous ^= 1
        @current ^= 1
        step += 1
        @redis.set("step", step)
        @remaining = @redis.smembers(@previous).length
        puts "***Starting*** Level: #{step}, #{@remaining} urls"
        fork { mech(@a, step) }
        fork { mech(@b, step) }
        #Makes sure both have finished though
        Process.waitall
    end
end


def addlinks(page)
    #TODO: Probably best way is to get links from just bodyContent. Might need Nokogiri for that. <- not then sure how to get the links in a nice format. Back through mechanize? Until then just get all /wiki/ links
    links = page.links_with(:href => /^\/wiki\// )
    links.each do |link|
        #Further filter out nonsense, colons should get most of the irrelevant stuff
        unless link.href =~ /:/ or link.href =~ /Main_Page/
            @redis.sadd(@current, link.href)
        end
    end
end


def check(page)
    if @redis.sismember(@current, "/wiki/"+page)
        puts "Found #{page} after #{@redis.get("step")} step(s)\n"
        $found = true
        exit
    end
end

def mech(m, step)
    url = ""
    until url == nil
        url = @redis.spop(@previous)
        @remaining = @redis.smembers(@previous).length
        puts "Level: #{step}, #{@remaining} urls"
        unless @redis.sismember(Checked, url)
            #Be careful, might normally follow relative links fine, but re-added ones seem to throw it off?
            page = m.get("https://en.wikipedia.org#{url}")
            #Skip images
            unless page.response["content-type"].include?("image")
                addlinks(page)
            end
            if page.code == "200"
                @redis.sadd(Checked, url)
            else
                #re add and check later
                @redis.sadd(@previous, url)
            end
        end
    end
    @remaining = 0
end


if defined?(Page1) and defined?(Page2)
    findlinksbetween(Page1, Page2)
end

I decided to chuck some resources at the processes version and see if I could speed things up. The original task I had been set was to go from the Wikipedia Ruby page to the David Bowie page. But whilst developing this program I checked for a known single step of Ruby to Metastable (The Ruby page links to Diamond, which links to Metastable).

On a two-core system:

                  Real / User / Sys 

No threads:      22.993/15.360/2.923
Two threads:     21.142/15.867/3.780
Two processes:   13.149/14.493/2.553
Three processes: 10.132/13.620/2.423

On a four-core system:

No threads:      24.845/16.360/3.197
Two threads:     21.355/16.693/3.833
Two processes:   12.955/16.457/3.370
Four processes:   9.043/20.060/5.027

So the additional processes are definitely helping. Mind you, doing six levels deep would be crazy on a single machine as it turns out David Bowie is only two levels deep and on a four-core with four processes this still took: 108m6.189/173m5.299/7m24.900.

Simplenote.py now Python 2 and 3 compatible

Hot on the heels of the changes to the underlying Simplenote.py, Simplenote.vim is now Python 2 and 3 compatible. Ok, that's a lie. The same fate befell me: I assumed it would be tricky and complex to do (not least because I needed to build a separate version of Vim with only Python 3), but in the end it was actually quite easy and I really should have done it sooner than fourteen months later.

Some of the key points this time round:

Using pyfile instead of embedded code

Simplenote.vim makes use of embedded python code identified with endmarkers, like so:

python <<< ENDPYTHON
    #python code here
ENDPYTHON

but Vim requires you use the appropriate call for the version of python you have and will error if you call a version you don't have. I.e. for python 3 you must use

python3 <<< ENDPYTHON
    #python3 code here
ENDPYTHON

This potentially was going to lead to huge chunks of duplicated code as you'd need a setup like follows:

if has("python")
    python <<< ENDPYTHON
        #python code here
    ENDPYTHON
elseif has("python3")
    python3 <<< ENDPYTHON
        #Very similar python3 code here
    ENDPYTHON
endif

Fortunately, at least I think so, using these has calls means Vim will load the first available python, but not both, which is what I want. I'm not sure how many people are likely to have vim compiled with support for both though; it was only just recently when I came across the first person who had python 3 support instead of python 2. I guess I'll await the bug reports.

To get around the duplication I used pyfile instead, like follows:

if has("python")
    execute 'pyfile ' . s:scriptpath . "/samefilefor2and3.py"
elseif has("python3")
    execute 'py3file ' . s:scriptpath . "/samefilefor2and3.py"
endif

cmp_to_key

I thought this would be my stumbling block, as Python 3 no longer has a cmp function for sort, only accepts a much more simplistic key, and Simplenote.vim uses quite an extensive compare function for sorting notes in the list index. Fortunately, from version 3.2 onwards there is a cmp_to_key function provided by functools so I could just do:

notes.sort(key=functools.cmp_to_key(compare_notes))

I think I can be pretty confident that anyone who has Vim compiled against Python 3 only will have a recent version of Python 3.

Python 3 changes that work fine under Python 2

  • print statements need brackets (parentheses) under Python 3. They are optional under 2.
  • iteritems doesn't exist under Python 3, instead you just call items. For the Simplenote.vim use of iteritems, using just items under Python 2 worked fine.
  • Python 3 requires you use list on map to return the actual content (otherwise you just get a map object). Python 2 doesn't require this, but including it anyway causes no harm.
  • No more has_key, but that has been depreciated for a long time anyway. Just use in instead, works everywhere.

Really the only excuse for not writing Python 2 and 3 compatible code is laziness.

See commits 9a2234d to c916606 for more detail.

LINK: NetBSD Current-Users archive: /var/db/entropy-file not present

For some reason I never noticed this on the ye-olde laptop (it was running NetBSD 6.1.3, but I don't think that was why), but on this NetBSD 7 machine I noticed this exact error message on boot. Even if I manually created it using sudo rndctl -S /var/db/entropy-file it would still disappear at some point.

After reading this thread I now understand why and what is meant to happen:

  1. When the system shuts down the entropy-file should be created
  2. When the system next boots the entropy-file is loaded and then deleted

It turns out I'd also been falling foul of shutting down incorrectly: I've always used poweroff. I don't know where I learnt that from, but I never realised I should be using shutdown -p now instead because poweroff doesn't run the shutdown scripts. Whoops.

Phoenix Assumes That Our Postgrsql Database Will Have A Postgres User

"Phoenix assumes that our PostgreSQL database will have a 'postgres' user account with the correct permissions and a password of 'postgres'. If that isn't the case, please see the instructions for the ecto.create mix task."

Alternatively, just edit the .exs files in config/.

On NetBSD the default PostgreSQL user is called pgsql. Rather than add yet another user to PostgreSQL you can just edit the Phoenix config files. I pulled out the username and password details from config/dev.exs, config/prod.secret.exs and config/test.exs and added this to the top of config/config.exs instead (just after use Mix.Config):

# Configures the database
config :hello_phoenix, HelloPhoenix.Repo, adaptor: Ecto.Adaptors.Postgres,
  username: "pgsql",
  password: "pgsql"

This, for example, is what remained in config/test.exs:

# Configure your database
config :hello_phoenix, HelloPhoenix.Repo,
  adapter: Ecto.Adapters.Postgres,
  database: "hello_phoenix_test",
  hostname: "localhost",
  pool: Ecto.Adapters.SQL.Sandbox

These are the ten most recent posts, for older posts see the Archive.