Zettelkasten Forum


Quality control: finding orphaned Zettel

This discussion was created from comments split from: Errors and quality control.

Comments

  • These are cool suggestions which, I think, can be part of a suite of Zettelkasten scripts anyone could use!

    I've experimented with the "unconnected" check; I like to call it "find orphans". :)

    Here's a Ruby script for that, though you can get by with ag or ripgrep and a couple of simple checks, I'm sure: https://gist.github.com/DivineDominion/66c8795e0a63026e3a3d830a1f7b550c

    2750 of 5436 notes are orphans (health score: 0.51)

    :grimace:

    Author at Zettelkasten.de • https://christiantietze.de/

  • So cool! This makes my day! @ctietze

    Doing an "unconnected" check, using the ruby script point to in the above message, I get the following score.

    17 of 1157 notes are orphans (health score: 0.01)

    I feel very lucky indeed.
    Looking at the list of 17 notes I can tell quickly, in a moment, with only a casual glance (redundant for emphasis), that I can tell all these notes are from Nov. 2018 by their UID's and are my first attempts and making notes. I've learned as I've practiced and gotten better. My note quality has iterated positively in terms of interconnectivity.

    This makes me want to revisit some of these notes and rethink/reprocess them connecting them into my Zettelkasten.

    Here the whole detail.

    Wills-Laptop:~ will$ find_zettel_orphans.rb 
    Analyzing 1157 Zettel IDs ...
    <struct Zettel file="201811151039 Body space.md", id="201811151039">
    <struct Zettel file="201811211606 Talking to Zivon.md", id="201811211606">
    <struct Zettel file="201811180805 First Principles.md", id="201811180805">
    <struct Zettel file="201811211743 Vocalizations.md", id="201811211743">
    <struct Zettel file="201811241817 Spacing out exposure to an idea.md", id="201811241817">
    <struct Zettel file="201811211600 Zivon recall work.md", id="201811211600">
    <struct Zettel file="201811151116 Quanta.md", id="201811151116">
    <struct Zettel file="201812290706 The Road Not Taken.md", id="201812290706">
    <struct Zettel file="201902011046 Anagnorisis.md", id="201902011046">
    <struct Zettel file="201812022012 Dominance.md", id="201812022012">
    <struct Zettel file="201811220620 Contradictory ideas - Art and Science.md", id="201811220620">
    <struct Zettel file="201811151029 Finding voice.md", id="201811151029">
    <struct Zettel file="201811151028 Avoid Passivety.md", id="201811151028">
    <struct Zettel file="201811241823 Enumeration.md", id="201811241823">
    <struct Zettel file="201902011050 Peripeteia.md", id="201902011050">
    <struct Zettel file="201811091435 Stop while ahead.md", id="201811091435">
    <struct Zettel file="201811151033 Making the most of life.md", id="201811151033">
    ----------------------
    17 of 1157 notes are orphans (health score: 0.01)
    

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Non-fiction Creative Writing
    kestrelcreek.com

  • Updated the script to tidy up the output and fix the score computation (0 of 1000 orphans should be a 1.0, not a 0.0)

    Author at Zettelkasten.de • https://christiantietze.de/

  • @ctietze looking at the script closer, I see we are looking for [[UID]] as the indication as to rather a zettel is an orphan or not. Once again, my sense of esthetics has come back to bit me. I don't start each not with a level-1 Markdown heading that includes the UID without the linking [[]]. Instead, in the footer of each zettel, I have a ⌱[[202004270559]], which is the link that gets me all the zettels back-links, but counts as a linking in this script.

    My former score is all a dream. I have 17 zettels without an identifying UID in them at all. I looked at the script and can't see a way to either ignore the ⌱[[202004270559]] or call a zettel with only one [[UID]] as an orphan. Can you help?

    Sample note that isn't considered an orphan by the script but is an orphan.

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Non-fiction Creative Writing
    kestrelcreek.com

  • @Will I never really programmed anything in Ruby, but I tried to mimic how I implemented my unconnected check in @ctietze Ruby script. In my case I check for links rather than just strings matching the ID pattern, so I never run into self references, but this should be easily fixed by filtering your own ID out of the outbound IDs. Haha, completely untested and written by someone not familiar with Ruby, but here are my changes: https://gist.github.com/msteen/a7362b703997a1417c297980390721b1

  • @grayen thanks for the attempt. Your skills at ruby programming are as good as mine.
    Here's the output.

    Wills-Laptop:~ will$ find_zettel_orphans_2.rb 
    Traceback (most recent call last):
        2: from /Users/will/.gem/ruby/2.6.0/bin/find_zettel_orphans_2.rb:21:in `<main>'
        1: from /Users/will/.gem/ruby/2.6.0/bin/find_zettel_orphans_2.rb:21:in `each'
    /Users/will/.gem/ruby/2.6.0/bin/find_zettel_orphans_2.rb:23:in `block in <main>': uninitialized constant Set (NameError)
    

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Non-fiction Creative Writing
    kestrelcreek.com

  • @Will Sorry about that, should have just tested it to begin with, but I did not have a matching Zettelkasten. Thankfully @ctietze has put up an example Zettelkasten, so I used it just now to debug the problem. Apparently Set unlike Hash requires a well... require :tongue: And I forgot to remove the starts with symbol from the regex and forgot to include a question mark after a predicate function. I have updated the script and it should work now.

  • @grayen I take back what is said about your ruby programming skills being as good/bad as mine. Yours are much better. Your modifications worked.

    Here is my new score. Still not as bad as I feared.

    177 of 1156 notes are orphans (score: 0.85)

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Non-fiction Creative Writing
    kestrelcreek.com

  • @grayen Your adaptations make my results even worse! I compared with my original script and found that yours produces better results. I also dropped the check for outbound links. That would make e.g. Structure Zettel with only outgoing links but no incoming references orphans, but I think that's a good fit for the term :) I'd rather collect all orphans this way, report the count, then reject the orphans that have outbound links for the connectivity score, so you end up with totally abandoned loners.

    3496 of 5420 notes are orphans (score: 0.35)
    

    Dangit.

    Author at Zettelkasten.de • https://christiantietze.de/

  • I've taken this idea of 'orphan notes' a little farther and created a note with links so I can revisit the note that doesn't have any inbound or outbound links (integration into my Zettelkasten).

    Here is the script I run.
    will$ find_zettel_orphans_2.rb | sed -E -e 's/.[^.]*$//' -e 's!^([0-9]+)[[:space:]-]+(.+)![[\1]] \2!' >/Users/will/Dropbox/zettelkasten/Orphans.md

    I can't get this to run via Keyboard Maestro, I'm still looking at this.
    Couple of esthetic improvements I'm working on.
    1. Having an appropriate UID with Orphans.md
    2. Printing the 'score' at the beginning of the file.

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Non-fiction Creative Writing
    kestrelcreek.com

  • @Will You could just change the Ruby code a bit and have it print it slightly differently. I have updated my gist to contain @ctietze's latest version with the calls to puts changed to reflect your needs. I could probably implement your call to sed in Ruby, but that would come down to something similar. For (1), why not just create a template file and use sed to replace say {{orphans}} in the template for the output of the script by using sed as well?

  • I'm afraid we've commandeered this thread.
    Now my score has just got worse. I see before we did not count those notes that had outbound links but not inbound links.

    New score

    265 of 1157 notes are orphans (score: 0.77)

    So I have 177 notes with no outbound or inbound links and (265-177)88 notes with outbound links and no inbound. What to do with this wealth of data??

    @grayen thanks so much for your help. This is probably basic ruby but I'm only gorking half of it and sadly not the relevant half.

    @grayen said:
    For (1), why not just create a template file and use sed to replace say {{orphans}} in the template for the output of the script by using sed as well?

    I've only used sed in bash scripts and am not familiar with 'templates'.

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Non-fiction Creative Writing
    kestrelcreek.com

  • edited April 29

    I've extracted the orphan script posts from the rest, easily, as there was no overlap in the replies 👍

    @Will you don't need a fancy templating library like mustache, it could just as well be a regexp/string replacement to transform the template string

    Hello {{name}}
    

    to

    Hello Will
    

    via

    echo "Hello {{name}}" | sed 's/{{name}}/Will/g'
    

    Author at Zettelkasten.de • https://christiantietze.de/

  • I find myself being one of those hated users always seeming to be changing the goals. But as I slowly see how this little script works and try implementing it in my Zettelkasten it reveals to me new use cases and ideas for implementation.

    I have two areas where I'd like to learn more about and see what is possible.
    1. How would one put orphans.map such that the output would be?
    Can you?
    puts orphans.map { |o| "[[" + o.zettel_id + '']]" + o.zettel_name }
    and get? This would eliminate the need for sed.

    [[201909121459]] Polystyrene
    [[202002101714]] Childhood amnesia
    [[201902011050]] Peripeteia
    [[201905140811]] The decoy effect
    
    1. Is there a way to indicate rather the note is lacking inbound or outbound links or both?
      Like
    i[[201909121459]] Polystyrene
    i[[202002101714]] Childhood amnesia
    io[[201902011050]] Peripeteia
    o[[201905140811]] The decoy effect
    

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Non-fiction Creative Writing
    kestrelcreek.com

  • edited April 29

    @Will Yes, I updated the gist accordingly. Know that this is yet another thing, my original check was for being completely unconnected, @ctietze's script for orphans (no inbound links), and this is... not sure what the right term would be, but it at least checks for either no outbound or no inbound, otherwise you cannot produce the output you wanted in (2).

    With the example Zettelkasten, I get the following output after adding an empty Zettel:

    10 of 22 notes are orphans (score: 0.55)
    o[[201705110850]] The nv-Core
    o[[201705111034]] Luhmanns Zettelkasten
    o[[201705120848]] An omnibar to rule them all
    o[[201705120913]] The plain text approach
    o[[201705120915]] Software-agnostic Programming
    i[[201705120916]] Our reasons for Software Agnosticism
    o[[201801020916]] Using tags in The Archive
    o[[201801020929]] Finding notes in The Archive
    o[[201801231614]] Markdown
    io[[202004271756]] Test
    
  • Might make sense to take your orphans + widows (?) and then use #partition to separate the result into o, i, and maybe even io, and print these en bloc separately.

    Author at Zettelkasten.de • https://christiantietze.de/

  • No incoming links == roots or starts.
    No outgoing links == leaves or ends?

  • @grayen @ctietze you need to stop! Every time you guys iterate this code my 'score' gets worse!
    Soon I'll have more orphans and widows than notes. Just joking. Improvements account for the script finding more orphans, becoming more accurate. I'm happy we are finding all of them.

    New score :(

    497 of 1159 notes are orphans (score: 0.57)

    Now I have a place to go to work on my orphans and widows from within my archive with links for quick access.

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Non-fiction Creative Writing
    kestrelcreek.com

  • This is all really cool, but I must confess, it's waaaay over my head. How does one get from this:

    https://gist.github.com/msteen/a7362b703997a1417c297980390721b1

    to that:

    I ran the script from Terminal on the example Archive and got the following:

    find_zettel_orphans_modified.rb:28:in `add_outbound_by_id': undefined method `filter' for ["201705091531", "201705091535", "201705120948", "201705180836"]:Array (NoMethodError)
        from find_zettel_orphans_modified.rb:72:in `block in <main>'
        from find_zettel_orphans_modified.rb:55:in `block in each'
        from find_zettel_orphans_modified.rb:54:in `each_value'
        from find_zettel_orphans_modified.rb:54:in `each'
        from find_zettel_orphans_modified.rb:70:in `<main>'
    

    Any help will sure be appreciated.

  • Oh, it seems that filter is a method that came after Ruby v2.3.7, which is what shipped with my macOS Mojave. You have to replace .filter with .select for backwards compatibility. I updated my script accordingly.

    Author at Zettelkasten.de • https://christiantietze.de/

  • Thanks @ctietze that solved the issue!

  • Update: I'm not sure if this is plaguing other archives, but I ran into a case of getting false-positives of the 'i' variable (i.e., the results listed some notes containing only the variable 'i' when there was actually another note that referenced the UID).

    I tracked the problem down to my buffer and structure notes because they contain an extra character at the beginning of the UID in the file name for sorting (β for buffer notes and Σ for structure notes). To solve this issue (and improve my score!) I had to adjust the following regex to account for the initial first character:

    filename[/^[0-9][0-9_\-]+[0-9]/]

    I simply added a '.' to account for the first character:

    filename[/^.[0-9][0-9_\-]+[0-9]/]

    However, I'm just learning all this so if anyone knows of a better or more elegant way to account for the first character or even multiple characters before the UID proper (such as the use of multiple section symbols), please feel free to chime in.

  • Update 2: So after more manual checking of the results, the solution above only took care of individual notes, but buffer and structure notes were still showing false-positives of not having any outbound links. To solve this issue I had to add another search after the initial outbound search. That changed the following section of the script from this:

    Zettel.each do |zettel|
      content = File.read(File.join(INPUT_DIR, zettel.file))
      zettel.add_outbound_by_id(*content.scan(/[0-9][0-9_\-]+[0-9]/))
    end
    

    to this:

    Zettel.each do |zettel|
      content = File.read(File.join(INPUT_DIR, zettel.file))
      zettel.add_outbound_by_id(*content.scan(/[0-9][0-9_\-]+[0-9]/))
      zettel.add_outbound_by_id(*content.scan(/.[0-9][0-9_\-]+[0-9]/))
    end
    

    As a side note (because I couldn't keep the definitions of 'i' and 'o' straight in my head for whatever reason), I changed the output from "i" to "-i " and "o" to "-o " so I would be able to read it as "no incoming" links and "no outgoing" links.

    If anyone notices any mistakes please let me know!

  • You can probably combine these

    zettel.add_outbound_by_id(*content.scan(/[0-9][0-9_\-]+[0-9]/))
    zettel.add_outbound_by_id(*content.scan(/.[0-9][0-9_\-]+[0-9]/))
    

    Into an optional "any first character" match by appending the ? operator:

    zettel.add_outbound_by_id(*content.scan(/.?[0-9][0-9_\-]+[0-9]/))
    

    Author at Zettelkasten.de • https://christiantietze.de/

Sign In or Register to comment.