Zettelkasten Forum


More Quantitative Zettel Analysis: Zettel with the least and the most words

edited October 2020 in The Archive

More Quantitative Analysis

This is what I call fun. This is what knowledge geeks do. Experiment, experiment, experiment.

In my previous post, Quantitative Link Analysis, I worked with my zettelkasten and trying to flesh out my weakness and strengths using each notes number of links as a proxy for how deeply integrated it was. But looking solely at quantitative data (the number of links/per zettel) hides the qualitative aspects of the zettelkasten. But I've found that starting from quantitative data, as an example - my one zettel with 39 links, and exploring that particular zettel revealed gaps in the structure I was trying to create. In this case, I was able to go from the general quantitative data to the specific zettel and make a subjective qualitative assessment of it. This feels like a big win, I would have had no idea this particular zettel played such an outsized role in my zettelkasten and that it was in want of refactoring.

In the same vein, I am looking for other proxies for quality that I can easily look for in my zettelkasten. I came up with 'size' as being a proxy for quality. This doesn't work quite so well. Small zettel can be pithy and atomic or sometimes being a few seemingly random sentence fragments even this future self can't remember what they were about. And a long zettel can carry a fully formed structure of high value or be a long quote lazily captured.

This command-line shell script will output your 20 smallest and the 20 largest zettel. Open a terminal and cd to your zettelkasten, then run this command. All it does is count the words in each zettel then prints a pretty output. The key to getting any value out of this exercise is to go and visit the zettel and see if its smallness reflects its high quality or maybe it wants refactoring.

Command-line:

wc -w *.md | sort > temp.out | echo Least Number of Words && echo ----- && head -20 temp.out && echo Most Number of Words && echo ----- && tail -22 temp.out | head -21 && echo Total Number of Words && echo ----- && tail -1 temp.out


Output:

Least Number of Words

  21 201911120646 Senjru.md
  22 202008231658 Template Note Example.md
  24 201811151030 Don't Shoot the Dog.md
  24 201904131439 Euphonious dichotomy.md
  24 201911120639 Yoku dekimashita.md
  24 202003040654 Clustering Illusion.md
  25 201903120533 Apologia Pro Vita Sua.md
  25 201903260851 Color Picker.md
  25 201910242011 Properties of Engineering.md
  25 202003040651 Belief Bias.md
  25 202003040656 Confirmation Bias.md
  25 202003040712 Risk Compensation.md
  26 202002071120 All Around.md
  26 202003040643 Anchoring Effect.md
  26 202003040707 Post-purchase Rationalization.md
  27 201811151028 Avoid Passivety.md
  27 201812141430 • Book - Better.md
  27 201909101416 Cellulose.md
  27 202003040653 Blind Spot Bias.md
  27 202003040701 Gambler's Fallacy.md

Most Number of Words

 763 201903141910 Ted Biringer's review.md
 788 201903060456  Ω Practice giving and receiving on the breath.md
 788 201903060456 Practice giving and receiving on the breath.md
 819 201903010555  Ω Treat experiences as dreams.md
 819 201903010555 Treat experiences as dreams.md
 848 201803000000 Formatting Guide.md
 848 202003291526 • Writing Phrases.md
 871 201902280602  Ω Begin with the beginning.md
 884 202005030511 DL May 3, 2020 .md
 941 201910281639 • Rationality: From AI to Zombies.md
 944 201910281639 L1 Rationality: From AI to Zombies.md
 987 202004231000 Sick or not-sick has nothing to do with it.md
 991 202008110606 Real Orphans.md
1033 202008061944 • Environmental And Nature Writing.md
1209 201802000000 Welcome to The Archive.md
1341 202009150939 What Permanent Notes Mean.md
2054 202003210746 Frog Haiku.md
2404 202005051541 The Geography of Wonder.md
2570 202004300606 Total Orphans.md
3234 202003020839 High Trail Notes.md
4408 202008211723 Zenrin Kushu.md

Total Number of Words

274848 total

Post edited by ctietze on

Will Simpson
I must keep doing my best even though I'm a failure. My peak cognition is behind me. One day soon I will read my last book, write my last note, eat my last meal, and kiss my sweetie for the last time.
kestrelcreek.com

Comments

  • Plotting the distribution should be fun. I guess one could come up with a hard lower limit, like "less than 20 words are considered too little" and then go ahead and improve.

    Author at Zettelkasten.de • https://christiantietze.de/

  • @ctietze said:
    Plotting the distribution should be fun. I guess one could come up with a hard lower limit, like "less than 20 words are considered too little" and then go ahead and improve.

    A hard limit might be stifling for some but I like that idea.
    Here is a 24-word zettel as an example of a zettel nearing the "permanently useful" state.

    Will Simpson
    I must keep doing my best even though I'm a failure. My peak cognition is behind me. One day soon I will read my last book, write my last note, eat my last meal, and kiss my sweetie for the last time.
    kestrelcreek.com

  • edited October 2020

    @Will I'm not sure what is going on with my computer, but I'm having problems with the above command line script (as well as the KM macro given in your other recent post). They might both be related to the same issue, but I'm not sure. [Update - I got the "Link Stats + 10 most linked" KM macro to work by substituting the actual directory paths, as you suggested; but still not sure why this command line script is not working].

    I change the directory as indicated above, I run the script, it creates the temp.out file, but that file is empty (0 bytes) and there are no files listed under "Least Number of Words", "Most Number of words", and "Total Number of words" -- see the screenshot showing the terminal output and the directory listing.

    Is this something to do with the way I have my zettels formatted? I will include a screenshot of a short zettel below as well.

    Post edited by GeoEng51 on
  • Your zettel format looks fine. The script should find the word count. The other script should find the links as you have them formated. I have a feeling that I have some custom software on my system but for the life of me I can't figure this one out.

    What is the output of wc -w *.md?
    It should give you output like -
    Wills-Laptop:The-Archive-Demo-Notes-master will$ wc -w *.md
    144 201705091531 Connections of notes.md
    219 201705091535 Direct Links as Wiki-Links.md
    57 201705110828 The Archive App.md
    146 201705110829 Saved Searches.md
    156 201705110850 The nv-Core.md
    332 201705110956 History of the Method.md
    385 201705111034 Luhmanns Zettelkasten.md
    114 201705120848 An omnibar to rule them all.md
    250 201705120913 The plain text approach.md
    781 201705120915 Software-agnostic Programming.md
    145 201705120916 Our reasons for Software Agnosticism.md
    301 201705120948 Tags in The Archive.md
    425 201705180756 How to tag properly.md
    295 201705180836 Structure notes.md
    260 201705221802 Integrated Image Capturing Tool.md
    30 201801020916 Using tags in The Archive.md
    264 201801020926 Creating notes in The Archive.md
    56 201801020929 Finding notes in The Archive.md
    176 201801231551 Links to remote control The Archive.md
    332 201801231614 Markdown.md
    248 201801232029 Note identifiers and The Archive.md
    5116 total

    Will Simpson
    I must keep doing my best even though I'm a failure. My peak cognition is behind me. One day soon I will read my last book, write my last note, eat my last meal, and kiss my sweetie for the last time.
    kestrelcreek.com

  • edited October 2020

    I have too many files again :)

    $ wc -w *.md | sort
    zsh: argument list too long: wc
    

    Quick fix looping over find results:

    $ find . -type f -iname "*.md" -or -iname "*.txt" -print0 | xargs -0 wc -w | sort
    

    The full command:

    find . -type f -iname "*.md" -or -iname "*.txt" -print0 | xargs -0 wc -w | sort -o temp.out; echo Least Number of Words && echo ----- && head -20 temp.out && echo Most Number of Words && echo ----- && tail -22 temp.out | head -21 && echo Total Number of Words && echo ----- && tail -1 temp.out

    Changes:

    • sort -o temp.out instead of sort > temp.out to prevent the output from also reaching the command line standard output
    • no piping of the sort result to echo

    Author at Zettelkasten.de • https://christiantietze.de/

  • Thanks for taking a moment to look at this.
    I had no idea that modern computers were so sensitive to a few thousand arguments. I've thought they'd be able to handle millions or more. I guess we live in primitive times?

    In testing, I switched to the zch shell and I don't know how you guys tolerate it. I use the old school bash shell. It seems to make no difference in the output of the command though.

    Again I see the problem is mine. I do not have a base system. I've installed through brew - findutils and @ctietze your "full command" produces nothing for 'xargsto sort. This is fixed by simply moving the-print0before the-iname` primary for me. I can only suspect that I'm calling a different version of find that the MacOS vanilla version.

    Wills-Laptop:The-Archive-Demo-Notes-master will$ find . -type f -iname "*.md" -or -iname "*.txt" -print0 | xargs -0 wc -w | sort -o temp.out; echo Least Number of Words && echo ----- && head -20 temp.out && echo Most Number of Words && echo ----- && tail -22 temp.out |  head -21 && echo Total Number of Words && echo ----- && tail -1 temp.out
    Least Number of Words
    -----
    Most Number of Words
    -----
    Total Number of Words
    -----
    Wills-Laptop:The-Archive-Demo-Notes-master will$ 
    

    Will Simpson
    I must keep doing my best even though I'm a failure. My peak cognition is behind me. One day soon I will read my last book, write my last note, eat my last meal, and kiss my sweetie for the last time.
    kestrelcreek.com

  • zsh ships as the default since Catalina, so I eventually switched a couple of years ago when this became apparent. I was totally fine with bash.

    To figure out if something is stock standard or not, use the which command that will tell you the lookup path for another program:

    $ which find
    /usr/bin/find
    

    For my homebrew's version of the much nicer fd, it says:

    $ which fd
    /usr/local/bin/fd
    

    (/usr/local/bin/ being the default where homebrew installs binaries)

    Author at Zettelkasten.de • https://christiantietze.de/

  • It is becoming more obvious that I am clueless and should be quiet. I have no idea why my system is slightly different in these ways. It makes me uncomfortable to share code and probably makes me a poor candidate for beta testing.

    ```
    Wills-Laptop:The-Archive-Demo-Notes-master will$ which find
    /usr/bin/find
    Wills-Laptop:The-Archive-Demo-Notes-master will$ which fd
    /usr/local/bin/fd
    Wills-Laptop:The-Archive-Demo-Notes-master will$ which egrep
    /usr/bin/egrep
    ````

    Will Simpson
    I must keep doing my best even though I'm a failure. My peak cognition is behind me. One day soon I will read my last book, write my last note, eat my last meal, and kiss my sweetie for the last time.
    kestrelcreek.com

  • I had no idea that modern computers were so sensitive to a few thousand arguments. I've thought they'd be able to handle millions or more. I guess we live in primitive times?

    Talking about modern times, if you plan to have millions of Zettel in a single folder directory you'd want to look at the file system you're using, 'cause there are limitations, too. With FAT32, with which flash drives commonly are factory formatted in, this can be as little as a few thousand files in worst case scenarios. OTOH, NTFS supports a few billion files per directory, AFAICR.

    my first Zettel uid: 202008120915

Sign In or Register to comment.