Zettelkasten Forum


Script to print all tags from .txt files (run from archive directory)

#!/bin/bash
find . -type f | grep -v '/\.' | xargs grep -IEoh '#[^ ]+\s?' | sort | uniq -c | sort -r

explanation:
- finds all files
- removes any files starting with /. (hidden files)
- for each non-binary file, finds all lines matching # and any number of characters up to the next space and returns this part of the line (the tag)
- creates a unique set of tags including the count for each unique tag
- sorts them by the count

output:

  18 #recipes
  11 #music
   7 #books
   6 #songs
   4 #talks
   4 #inspiration
   3 #hacks

Comments

  • I'm getting: xargs: unterminated quote

  • Me to. xargs: unterminated quote

    Will Simpson
    kestrelcreek.com

  • It works fine in simple folders but does break down in my archive, too. Is it the amount of notes? (xargs and piping/stdin should allow long lists, as opposed to passing a lot of parameters) Is it special characters in some file names, like single escape sequences?

    Author at Zettelkasten.de • http://christiantietze.de/

  • In looking closer I find that the script pucks on my whole archive also. Research (google) shows that note with quotes or apostrophes causes problems. Also testing on a test archive I find it requires also no spaces in the file name. The script also reports double and triple and quad "#" when found. This seems a great start. I use a Keyboard Maestro script to create a note in my archive that lists all the tags (clickable) used but it doesn't count the number which I think would be useful.

    Will Simpson
    kestrelcreek.com

  • @Will said:
    In looking closer I find that the script pucks on my whole archive also. Research (google) shows that note with quotes or apostrophes causes problems. Also testing on a test archive I find it requires also no spaces in the file name.

    Right, that makes sense, I should have thought of that when reporting.

    I use a Keyboard Maestro script to create a note in my archive that lists all the tags (clickable) used but it doesn't count the number which I think would be useful.

    Y'all are gonna make me buy KM if you keep doing stuff like this. I thought I was getting along fine with my Alfred text expansion scripts but noooo...

  • Yay! Went back to my Keyboard Maestro script and added the number of times a tag was used and still sorted alphabetically. Bit of a challange to get the final sorting the way I wanted. Sort wants to sort numerically by the first charater of the string.

    This will work in conjunction with Keyboard Maestro or as a cron job. There may be other ways to incorperate this "Tag Cloud".

    Here is the code.

    cd /Users/will/Dropbox/zettelkasten/
    egrep -ohsr "(?:^|\s)#[A-Za-z0-9_ÄÖÜäöüß\-]+" -- * | sed -e 's/[[:space:]]#/#/' | sed /^[^#]/d | sort | uniq -c | sort -t# -k2  >  "Tag List.txt"

    Here is the output.

    Thanks @onlyskin for the kick to work on this.

    Will Simpson
    kestrelcreek.com

  • edited January 10

    Great stuff! Here's the Regex I use in The Archive to match hashtags at the moment:

    (?<=\\s|^|\\W)(?<!`)(#+[\\p{L}\\p{Nd}_\\\\+§!:;./]*[\\p{L}\\p{Nd}_\\\\+\\-§!:;./]*[\\p{L}\\p{Nd}_§]+)
    

    Not the double escape backslashes because it's part of a string. You can replace \\\\ with \\ and \\ with \ I think.

    Author at Zettelkasten.de • http://christiantietze.de/

  • @Will:

    Fantastic code- thanks for sharing!

  • @ctietze said:
    Great stuff! Here's the Regex I use in The Archive to match hashtags at the moment:

    (?<=\\s|^|\\W)(?<!`)(#+[\\p{L}\\p{Nd}_\\\\+§!:;./]*[\\p{L}\\p{Nd}_\\\\+\\-§!:;./]*[\\p{L}\\p{Nd}_§]+)
    

    Not the double escape backslashes because it's part of a string. You can replace \\\\ with \\ and \\ with \ I think.

    @ctietze what exactly do you mean by "match hashtags"? What does it do? :)

  • "Matching" is what applying a regular expression on a string is called. Wills regex is:

    (?:^|\s)#[A-Za-z0-9_ÄÖÜäöüß\-]+
    

    If you replace it with mine, you will get 100% of the tags The Archive recognizes (and makes clickable). Mine is longer, because it includes a couple of cases that are usually not included, and exludes others; also it tries to look for hashtags outside of code `...`.

    Author at Zettelkasten.de • http://christiantietze.de/

Sign In or Register to comment.