Zettelkasten Forum


Quantitative Link Analysis

Analysis

I have been inspired by the discussion on the ratio of content to structure notes . @ctietze got me thinking about how exactly I could calculate this without a common signifier in every structure note. I realize I've been sloppy in this area and don't really have a clear idea of how many structure notes I have. @bimlas mentioned he thought a zettel with 5 or more links qualified as a structure note. I'm not so sure that every note with 5 links is a structure note but they could be or could develop into structure notes. I'm not sure about the cutoff of 5 links.

@sfast said over on the discussion "Structure zettels vs zettels with lot of links":
I think notes with a lot of links can be Structure Notes but the don't have to be. A Structure Note is a note that is about the relationships between other notes as they are constituted by the content and the knowledge structure. So, you can have a lot of links without the note being about the relationships but just using them as a reference or something similar.

I started to put together a Keyboard Maestro macro but in the end, a simple shell script worked great.

Run command from within your zettel directory. It assumes UID in the format of 202010111800. It doesn't matter how the links are formed as long as they include the UID. It takes a few seconds to run on 1400 zettels. The first time I ran it against my production zettelkasten I was a bit nervous but the command just contains grep and awk so it doesn't manipulate the zettel in any fashion. As always use at your own risk. Backup to be safe.

real    0m6.037s
user    0m4.079s
sys 0m0.475s

My analysis:

  1. 80 zettel are orphaned - This is something to work on, maybe. These still are be included in any full-text search.
  2. Most of my zettel have 1,2,3, or 4 links.
  3. 324 zettel with 5 more links.
  4. I have 3 heavily links zettel with 38 links or more

egrep -ohsr '20\d{10}' -- * | sort | uniq -c | cut -c 1-4 | awk '{print $1}' | sort | uniq -c | awk 'BEGIN {print "Number of zettel with X links"}''{print $1," zettel with ", $2 - 1 ," outgoing links."}' | sort -g -k 4

Here is my output.

Number of zettel with X links
80  zettel with  0  outgoing links.
291  zettel with  1  outgoing link.
403  zettel with  2  outgoing links.
347  zettel with  3  outgoing links.
161  zettel with  4  outgoing links.
88  zettel with  5  outgoing links.
56  zettel with  6  outgoing links.
40  zettel with  7  outgoing links.
23  zettel with  8  outgoing links.
23  zettel with  9  outgoing links.
19  zettel with  10  outgoing links.
18  zettel with  11  outgoing links.
10  zettel with  12  outgoing links.
11  zettel with  13  outgoing links.
4  zettel with  14  outgoing links.
5  zettel with  15  outgoing links.
4  zettel with  16  outgoing links.
4  zettel with  17  outgoing links.
4  zettel with  18  outgoing links.
2  zettel with  19  outgoing links.
4  zettel with  20  outgoing links.
1  zettel with  21  outgoing links.
1  zettel with  22  outgoing links.
2  zettel with  25  outgoing links.
1  zettel with  26  outgoing links.
1  zettel with  31  outgoing links.
2  zettel with  38  outgoing links.
1  zettel with  39  outgoing links.

Will Simpson
I'm a Zettelnant.
Research: Rationalism, Zen, Dzogchen, Non-fiction Creative Writing
kestrelcreek.com

Comments

  • @Will Why not just have a #structure tag and add that to whatever zettel you consider to be a structure note? Easy enough then to list and count them :)

  • @GeoEng51 I think the issue is that would be time consuming to go through all his notes and decide which ones are structure notes.

  • such scripts are helpful to monitor what's going on with your Zettelkasten, however I wouldn't consider any of this as being sloppy without identifying a specific goal you want to meet.

    Personally, I consider it important to mark the type of Zettel I am working on, if only to serve as guidance for me as a beginner. Others stated they've done the same before realising it does not matter. Time will tell how I proceed.

    @Will how about dividing your distribution in Quintil and look at some samples before deciding whether there is an issue with it or not.

    my first Zettel uid: 202008120915

  • @Nick said:
    @GeoEng51 I think the issue is that would be time consuming to go through all his notes and decide which ones are structure notes.

    Worth it from my perspective. Did it with over 1000 notes as far as I remember.

    I am a Zettler

  • I agree with @sfast here. I did a massive revision of my slip-box (although I had just 400ish notes at that moment) and it improved its structure immensely. I think you can't build structure from the very beginning, you have to build the "content base" at first, and then step back and add bottom-to-top structure. And it makes sense, you can't add top with no bottom :smiley:
    @Will has greater number of notes, but I think it is still fairly manageable at this point :smile:

  • @Will Could you adjust your script to accommodate more files? (Via find -x or a loop perhaps)

    ❯ egrep  -ohsr '20\d{10}' -- *
    zsh: argument list too long: egrep
    

    Author at Zettelkasten.de • https://christiantietze.de/

  • edited October 13

    Another interesting quantizing exercise to apply to my zettelkasten is to look at the overall total number of links.

    egrep -ohsr '20\d{10}' -- . | wc -l
    number of total links
    7204

    ls *.md |wc -l
    number of zettel
    1549

    Actual links 7204-1549(# of zettel)=5655 5655/1549=3.65 links average per zettel.

    Subtraction because each zettel includes a self-referential link.

    Analysis
    This seems a healthy amount of integration, but looking at the average blurs the real spectrum of integrating my zettel. Some are highly integrated and some poorly integrated. Some with only a single outgoing link are as fully integrated as possible at the moment, and sometimes 38 links are not enough for proper integration. It is a spectrum, and looking at the average doesn't get at any qualitative clues.

    One thing I've learned from doing this is these numbers don't matter. They are poor indicators of success or progress. I find more helpful in assessing subjective quality as a daily dashboard showing the actual zettels worked on the previous day and a subjective feel about the energy spent from one day to the next. A couple of indicators I use to help guide me and stimulate my habituation of zettelkasting are.

    1. Number of new zettel created in the last 24 hours
    2. Number for zettel edited in the last 24 hours - this particularly gives me a sense of how deeply I was involved with my zettelkasten.
    3. A listing of the new and edited zettel titles
    4. Number of zettel in my #inbox - I'm striving for inbox-zero! currently 18 as of October 13, 2020.

    I find this allows me to revisit (spaced repetition) the zettel for finer editing and more linking—iterative processing and assimilation.

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Dzogchen, Non-fiction Creative Writing
    kestrelcreek.com

  • @ctietze said:
    @Will Could you adjust your script to accommodate more files? (Via find -x or a loop perhaps)

    ❯ egrep  -ohsr '20\d{10}' -- *
    zsh: argument list too long: egrep
    

    How many files are we talking about? I remember you talking about a directory of files for stress testing different functions in The Archive. Is this available via GitHub?

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Dzogchen, Non-fiction Creative Writing
    kestrelcreek.com

  • edited October 13

    It fails for my amount of files already (5776), so this should help to figure out a find call: https://github.com/Zettelkasten-Method/10000-markdown-files

    Please note that none of these have a UID, so your egrep call might still fail because * return 10k files, which is too much, but if it stops failing, you won't get any useful results :)

    Author at Zettelkasten.de • https://christiantietze.de/

  • I had a hard time testing this because as you said there are no UIDs in the files and I can easily put one in each but on my earliest attempt I put in the same UID in every file and while no error it also didn't work as expected.
    BUT
    On my system, I use the bash shell.
    I made a trivial change to the script and I stopped getting /usr/bin/egrep: Argument list too long.

    egrep -ohsr '20\d{10}' -- .| sort | uniq -c | cut -c 1-4 | awk '{print $1}' | sort | uniq -c | awk 'BEGIN {print "Number of zettel with X links"}''{print $1," zettel with ", $2 - 1 ," outgoing links."}' | sort -g -k 4

    Changed egrep -ohsr '20\d{10}' -- *| to egrep -ohsr '20\d{10}' -- .|
    I got this idea from Jan Reilink. "In newer versions of grep you can omit the “.“, as the current directory is implied."

    Let me know if this now works for you, if not I'll explore other options.

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Dzogchen, Non-fiction Creative Writing
    kestrelcreek.com

  • Ingenious fix! Works now.

    Number of zettel with X links
    1867  zettel with  0  outgoing links.
    700  zettel with  1  outgoing links.
    254  zettel with  2  outgoing links.
    114  zettel with  3  outgoing links.
    40  zettel with  4  outgoing links.
    29  zettel with  5  outgoing links.
    8  zettel with  6  outgoing links.
    7  zettel with  7  outgoing links.
    4  zettel with  8  outgoing links.
    7  zettel with  11  outgoing links.
    1  zettel with  12  outgoing links.
    

    Ouch! I guess that's a lot of code snippets that don't go anywhere and don't get used from anywhere :expressionless:

    Author at Zettelkasten.de • https://christiantietze.de/

  • edited October 14

    I've been thinking about how I can use this information. Knowing that I have a zettel with 40 links is interesting but not very useful. I got curious to look at the zettel with "40 links". So I wrote another script that outputs the UID's for the top 3 zettel.

    egrep -ohsr '20\d{10}' -- . | sort -h | uniq -c | sort -n > /Users/will/Downloads/results.out; tail -n 3 /Users/will/Downloads/results.out
    (modify the directory and use a directory away from your zettelkasten directory. This file is small and temporary. Notice the directory is called twice in the script.)

    My results.

    39 201812271440
    39 202007050647
    41 201901301240
    

    Analysis

    Helpful to review these big notes. It is insightful to see my progress in creating structure notes that speak to me. These are not the zettel I've spent the most time with. It is a dream of mine that I'd be someday able to list zettel by the amount of attention I paid to them. This is a failed attempt, using the number of links as a proxy for attention. I'll keep exploring. I got value from this exercise, just not the value I was looking for.

    201812271440 • Thinking Skills Hub
    This was formally a mental model hub. I had a bunch of zettel; apparently, 39, all related to mental models, and I started a hub/structure note. I haven't completed this, and it fell off my radar and is now back in scope. I placed it in my #inbox for further development as right now, it looks great for the first 30 lines, then becomes just a list of zettel.

    202007050647 • Ecodharma
    The structure note from a book that @Phil turned me on to. Because of a love/hate relationship with the author's thesis and my own confusion, this book really lit a fire under my zettelkasten. This leads me to consider more books by David Loy and more books/research in this vein.

    201901301240 • Quintessential Dzogchen
    Boy, this one was a blast from the past. My most linked zettel! I was so proud of it when I made it. Again this is a structure note on the book Quintessential Dzogchen: Confusion Dawns as Wisdom by Tulku Urgyen. This is life-changing material, and I've not treated it as such in my zettelkasten. It is just a list of zettel. This is how I made structure notes in January 2019. Now I use a combination of annotated table-of-contents and an idea-index for a structure not. This book is so good that I'm convinced to reread and reprocess it using the tools I use now.

    Thanks for listing to my mental dribble.

    Post edited by Will on

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Dzogchen, Non-fiction Creative Writing
    kestrelcreek.com

  • @ctietze said:
    Ingenious fix! Works now.

    I'm not so sure. The script accounts for 3031 of your zettel but you reported your zettelkasten has 5776 zettel. 2745 zettel missing in action!

    Try this and see if there is a difference. I removed the subtraction for the self-referential link.

    egrep  -ohsr '20\d{10}' -- .| sort | uniq -c | cut -c 1-4 | awk '{print $1}' | sort | uniq -c | awk 'BEGIN {print "Number of zettel with X links"}''{print $1," zettel with ", $2 ," outgoing links."}' | sort -g -k 4
    

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Dzogchen, Non-fiction Creative Writing
    kestrelcreek.com

  • @Will Nope, same result. I guess that's in part because my old Ids from 2009 have _ and - in them 🤔 To be honest, I cannot follow the flow of data in this chain of commands anyway, so it might be something else :)

    Author at Zettelkasten.de • https://christiantietze.de/

  • @ctietze I think the fix is in the regex.

    The first command - egrep -ohsr '20\d{10}' -- . is the command that gets all the UID's into a list. the rest of the command just sorts, counts, and prints the list.

    I cooked up a new regex that will look for UID's in the form of
    2020-10-08-1000
    2020_10_08_1000
    202010081000
    Do you have any other UID formats?

    Try this:
    egrep -ohsr '\b20\S{10,13}' -- .| sort | uniq -c | cut -c 1-4 | awk '{print $1}' | sort | uniq -c | awk 'BEGIN {print "Number of zettel with X links"}''{print $1," zettel with ", $2 - 1 ," outgoing links."}' | sort -g -k 4

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Dzogchen, Non-fiction Creative Writing
    kestrelcreek.com

  • That's much better but also much worse than I thought :sweat_smile:

    Number of zettel with X links
    4562  zettel with  0  outgoing links.
    786  zettel with  1  outgoing links.
    208  zettel with  2  outgoing links.
    104  zettel with  3  outgoing links.
    42  zettel with  4  outgoing links.
    20  zettel with  5  outgoing links.
    9  zettel with  6  outgoing links.
    11  zettel with  7  outgoing links.
    3  zettel with  8  outgoing links.
    1  zettel with  9  outgoing links.
    4  zettel with  11  outgoing links.
    2  zettel with  12  outgoing links.
    1  zettel with  14  outgoing links.
    1  zettel with  15  outgoing links.
    

    Numers may not add up perfectly because I still have some old collection notes without ID that I never took the time to review.

    Still, ouch.

    Author at Zettelkasten.de • https://christiantietze.de/

  • For macOS users: you can save the shell script as a linkstats.command file. .command files can be double-clicked to execute right from the finder.

    The contents to make this work with the current directory (by default, the .command script will think you're in your home directory instead of where the script resides):

    #!/usr/bin/env bash
    
    BASEDIR=$(dirname $0)
    
    egrep  -ohsr '\b20\S{10,13}' -- "$BASEDIR" | sort | uniq -c | cut -c 1-4 | awk '{print $1}' | sort | uniq -c | awk 'BEGIN {print "Number of zettel with X links"}''{print $1," zettel with ", $2 ," outgoing links."}' | sort -g -k 4
    
    read -p "Press any key to exit... " -n1 -s
    

    Author at Zettelkasten.de • https://christiantietze.de/

  • @ctietze Thanks for making this executable.

    Here is similar in a Keyboard Maestro macro.
    Set zettelkastenDirectory, change trigger, activate for The Archive.

    Output

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Dzogchen, Non-fiction Creative Writing
    kestrelcreek.com

  • @Will Thanks for the KM macro, Will. The first part worked; the second part didn't give me any "most linked" zettels (see output). Any suggestions?

  • My mistake. I forgot an important step. Redownload the macro. It has been corrected. Let me know if it now works.

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Dzogchen, Non-fiction Creative Writing
    kestrelcreek.com

  • @Will said:
    My mistake. I forgot an important step. Redownload the macro. It has been corrected. Let me know if it now works.

    Still getting the same output, Will - nothing listed below the line "The 10 most linked zettel:".

  • Relook at the macro.

    If it still fails send me a "Copy as Image" of what you are using for the macro.
    I hope we can get this working with you. I found actually looking at the 10 most linked zettel valuable in my review and improve processes.

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Dzogchen, Non-fiction Creative Writing
    kestrelcreek.com

  • @Will said:
    Relook at the macro.
    If it still fails send me a "Copy as Image" of what you are using for the macro.
    I hope we can get this working with you. I found actually looking at the 10 most linked zettel valuable in my review and improve processes.

    Still not working; here's a screenshot of the maco, set up to access my hard drive. I also looked in the downloads directory after I run the macro and there is no file called "results.out".

  • @GeoEng51 This is a stumper. The only difference I can see between yours and mine is 'will' vs 'johnsobkowicz'. in the past I've had strangeness around passing Keyboard Maestro variables to Shell Scripts. For troubleshooting, you might try substituting in the $KMVAR variables like below. I set the Keyboard Maestro macro up this way for portability and not sure my errors. Each of these egrep commands should work in the terminal.

    egrep  -ohsr '20\d{10}' -- /Users.johnsobkowicz/Dropbox/zettelkasten/ | sort | uniq -c | cut -c 1-4 | awk '{print $1}' | sort | uniq -c | awk 'BEGIN {print "Number of zettel with X links"}''{print $1,"zettel with",$2 - 1,"links."}' | sort -g -k 4;
    echo ;
    echo ;
    egrep  -ohsr '20\d{10}' --  /Users.johnsobkowicz/Dropbox/zettelkasten/ | sort -h | uniq -c | sort -n > /Users.johnsobkowicz/Downloads/results.out ; 
    tail -n 10 /Users.johnsobkowicz/Downloads/results.out | awk 'BEGIN {print "The 10 most linked zettel:"}''{print $2" has",$1 - 1,"links."}'
    

    Will Simpson
    I'm a Zettelnant.
    Research: Rationalism, Zen, Dzogchen, Non-fiction Creative Writing
    kestrelcreek.com

  • edited October 21

    @Will - That worked - when I substituted the actual directories into the egrep statements as you show above, and then ran the KM macro, I got the full output as shown below. This also worked directly in Terminal.

Sign In or Register to comment.