Zettelkasten Forum


Proximity Search in The Archive surfaces rich link candidates when creating new notes

Hi Zettelauts!

I want to share a method of searching for candidates to link ideas in a new note to established ideas that consistently produce high-quality results. This advanced search option is called proximity search. Usually, proximity search has to be implemented at the software level. My main goal is to share this idea with you.

I love making lists, so you'll have to suffer my indulgence.

Background

  1. I use The Archive to host my ZK.
  2. I have 3K+ zettel, but you'll find this tool of use if you have more than a couple hundred.
  3. My notes are all plain text formatted with Markdown.
  4. I use and recommend Keyboard Maestro integrated with python and zsh scripts.
  5. There are other ways to accomplish the same outcomes using other macro and scripting tools.

I'm dividing this demo into three parts.
1. A few low-level and high-level ideas about connecting zettel. Mostly I assume you understand the importance of a deeply connected ZK.
2. A deep dive looking at a new tool I developed for searching large mature ZKs with large idea schemas. I stumble making the video which turns out low-brow. My thought processes are all over the place. Turn on the microphone, and I become an idiot.
3. A brief look at the technical aspects of this proximity search tool.

Part 1.

What are we trying to accomplish?
We want to populate our notes with deep, meaningful, high-quality connections. This gets harder and more time-consuming as the ZK grows. The keystone step in knowledge growth is finding relationships between new and established ideas.

The primary tool we use is "search" when we look for link candidates. We search for the occurrence of a target word or phrase based on where we are (on which zettel we are proposing a link from and back to.) In practice, this type of searching can lead to a large percentage of false positives. False because the ideas are not relevant and positive because each note contains the target word or phrase.

The tool in the demo is super helpful in narrowing the candidates to those with the highest quality and most relevance. It's sloppy to connect zettel with weakly relevant links. It can be time-consuming to sort through too many connection candidates.

Full text and Boolean-derived searches are less effective in a ZK of a few hundred notes in the same schema. There are caveats to this, Notes in schemas or ideas streams that use technical jargon are easily found with a full-text search; if we have a developed ZK where the same jargon is used in a hundred or more zettel, we may still struggle to get to the best link candidate with a full-text search.

The technical details in part three will not make much sense without viewing the video.

Part 2.

This is me wrestling in real-time, sometimes struggling to make sense, stumbling with vocabulary, and making a mess of my typing. I'm taking a new idea captured from a forum post and looking to see where I can connect it to established ideas to strengthen it, question it, advocate for or against it, and reconcile it. I do a passable job giving a play-by-play.

Some or none of this will not directly apply to you. You will have to see where things might use in your world. Take what is helpful and ignore the rest.

We use a note inspired by @JoshA's Object Level and Meta Level Abstraction Distinction Applied to ZK forum post.
Object Level and Meta Level Abstraction Distinction [[202208190944]]

Part 3.

Tech details

  1. I'm sharing this with the hope that you will get some ideas on implementing the best parts of this in your workflow.
  2. It would be interesting to prioritize those with a closer distance score. Of course, we'd have to get a "distance score."
  3. This Keyboard Maestro macro is not super advanced.
  4. Each zettel must have a UUID (Universal Unique ID).
  5. The way I've implemented it requires:
    • Keyboard Maestro
    • zsh and egrep
    • There needs to be a UUID for each note
  6. For those who use The Archive, we may have to wait and see how @ctieze implements scripting.

★★★★★★
7. Here is the Near Search Keyboard Maestro Macro.
★★★★★★

Ideas for future enhancements -
1. We'd have to get a "distance score," which might be converted into a "relevance score."
2. I'd like to rank the hit candidates by some criteria.
3. I'd like to have a way to eliminate the obvious false positives from the list.
4. Be able to choose the highlight color for ALL the search terms.
5. Rewrite in python or have this be a possibility of The Archive's promised forthcoming internal scripting.
6. Be able to cycle through ALL the search terms with a keyboard shortcut.
7. If the search target is found in the title of a link, only show it once rather than everywhere the link is placed.
8. The ability to add an optional third proximity argument.
9. Use the near-term search in combination with Boolean language.
10. Key, I'd like to be able to include phrases.

Will Simpson
The quality of our thinking is directly proportional to the quality of our reading. To think better, we must read better. - Rohan
kestrelcreek.com

Comments

  • @Will

    This looks pretty cool! I watched the video and can see how you are using the results of the Macro.

    I tried downloading the macro, changing the folder path, and running it. I started with 2 terms I knew were close by in several zettels:

    Instead of completing, I got the following message:

    Could you help me to figure out what I am doing wrong? As noted in the README section of the macro, all my ZK files have the *.md extension, and each has a UUID on the first line. For example, one zettel has the following first lines (this format) is common to all my zettels (this one happens to be a post that I wrote for a company intranet blog, so it's not a normal zettel, but you get the idea):


    202301151603 How to Make Progress in Your Career
    [[202301151603]]
    01-15-2023 04:03 PM
    tags: #Unlinked

    This post is going to have a longer than normal introduction because I think it is necessary to discuss certain ideas before directly tackling the main topic of the post, which is "How to make progress in your career".....etc.


    Any help you could provide would be appreciated :smile:

  • I know what the problem is. I forgot that most users, if not all, of The Archive format their filenames as UUID Title, and I format mine as Title UUID. This requires a simple change to the macro's regular expression. Before I do this, though, I'd like to see the output that is in your "tmp" variable. Go to the menu Keyboard Maestro > Settings > Variables > tmp and copy and paste the first 5 lines of the content in the right-hand window. This will help me be sure I get the regex correct.

    Will Simpson
    The quality of our thinking is directly proportional to the quality of our reading. To think better, we must read better. - Rohan
    kestrelcreek.com

  • @Will said:
    I know what the problem is. I forgot that most users, if not all, of The Archive format their filenames as UUID Title, and I format mine as Title UUID. This requires a simple change to the macro's regular expression. Before I do this, though, I'd like to see the output that is in your "tmp" variable. Go to the menu Keyboard Maestro > Settings > Variables > tmp and copy and paste the first 5 lines of the content in the right-hand window. This will help me be sure I get the regex correct.

    Here are the first five lines of the variable tmp (simple copy/paste):

    202301151603 How to Progress Your Career.md:But what kind of engineer are we? I would suggest that we've just passed the first gate - it's an important one, but it's not the end of our journey. We still have a long path to becoming fully capable, mature engineers, who have the

    I noticed that the above text, after the colon, is from the zettel, but it's not from the beginning of the zettel, after the title, tags, etc. Some of the initial text seems to be "missing" from the tmp variable. Here is the actual text from after the tags to the text "But what kind of engineers are we?....".

    This post is going to have a longer than normal introduction because I think it is necessary to discuss certain ideas before directly tackling the main topic of the post, which is "How to make progress in your career".

    One of our Goals - Engineering Maturity

    As relatively new engineers, once we get our feet under us and start to understand the nature of our jobs, we will also quite naturally wonder how we might progress in our career. We start as "Engineers in Training" (EITs) and then after a suitable period of experience and mentoring (~4 years), apply for our "professional engineer" status. If successful, we can put the "P. Eng." designation after our names and take responsibility for the work we have performed. We think "Great, now I really am an engineer".

    But what kind of engineer are we? I would suggest that we've just passed the first gate - it's an important one, but it's not the end of our journey....

  • This Keyboard Maestro macro turns out to be more custom than I thought. In the future, I'll have to spend more time thinking about how to make them more generic. Looking at this closer, I see an issue that will take a little thinking on how to surmount. This along with the filename differences will require some testing.

    Our UUIDs are in a different format.

    • Yours: UID and title
    • Mine ›[[201911301121]]

    The reason the captured text starts where it does is that we are only capturing the sentence with the two terms in them. It will be clearer when we get this working and you select the option on the form to "Display Results."

    Will Simpson
    The quality of our thinking is directly proportional to the quality of our reading. To think better, we must read better. - Rohan
    kestrelcreek.com

  • edited January 23

    @Will Yeah, I changed the format of the first two lines of my zettels, so that I could import them into NotePlan, if I wanted to. Also, I use your KM macro to create each zettel, which essentially sets the file name. But sometimes I expand or slightly modify the actual title in the first line, as I'm writing the zettel. So the title on the first line may not be exactly the same as the last part of the zettel file name.

    If this is too much of a pain, to customize just for me, don't feel obligated. I'd of course appreciate it if you do, but it's not absolutely necessary (or urgent).

  • edited January 24

    It sounds like you do not have a UUID. A UUID could only appear on the original note and never change. It doesn't have to be included in the filename, but it, as a specific string, can not appear on any other note. UIDs can and should be sprinkled everywhere. Maybe I'm wrong.

    I got the script working in my test environment. Here it is; can you give it a whirl and tell me how it works for you?

    The same caveats apply. Change the directory to where you house your ZK and give the macro appropriate triggers.

    Custom Near Search

    Post edited by Will on

    Will Simpson
    The quality of our thinking is directly proportional to the quality of our reading. To think better, we must read better. - Rohan
    kestrelcreek.com

  • @Will said:
    It sounds like you do not have a UUID. A UUID could only appear on the original note and never change. It doesn't have to be included in the filename, but it, as a specific string, can not appear on any other note. UIDs can and should be sprinkled everywhere. Maybe I'm wrong.

    I didn't know this difference. I guess mine are all UIDs, because almost every zettel I have is linked to other zettels, so the UID for each zettel shows up in multiple locations in other zettels.

    I'm not sure what the use of a UUID is, if it is defined in the way you mentioned.

    I got the script working in my test environment. Here it is; can you give it a whirl and tell me how it works for you?

    It seems to be partially working. I can run the macro and if I have selected the "Display Results" option, it shows something like the following in a window it opens (I think this is the value of the variable tmp):


       Search Terms :: 'map' and 'modern'.
    Search Spacing :: 15 words or less.
    

    My Paternal Grandparents

    Kobylowloki (or Kobylovoloki) is a small town in what is now (2022) the western part of the Ukraine, about 40 km SSE of Ternopil. It has over the years been part of Russia, Ukraine, Poland and Austria. Its coordinates are 49°11'22" N and 25°46'52" E in DMS (Degrees Minutes Seconds). From a modern day map:


    However - the macro doesn't actually paste the various UID's into a search bar in The Archive (in fact, nothing happens in The Archive, even though it is open and running on my computer). Rather, I should say The Archive sort of "flashes" quickly and there is a quiet "pop" sound - maybe KeyBoard Maestro is trying to select and paste into the search bar, and for some reason, failing?? Anyway, no results show up in The Archive.

    Oops - I just found another problem - some of my zettels (structure notes, hubs, etc.) start with a letter - such as:

    "S 202007252258 How to Take Smart Notes - Overview", (the file name for that zettel is "S 202007252258 How to Take Smart Notes"). This is still giving an error message.

  • The "Display Results" is working as expected. You'll see a text listing of each note name followed by only the fragment (Markdown sentence) of the note containing the "Search Terms." This was set up for testing, and I have left it in place. Try the search is/the 10, and you'll see multiple results.

    • I set a fake note using your data and the format I understand you are using (check me) and ran a test.
    • Are you using .md as your default extension?
    • Do you have "Highlight search terms in the editor" selected?
    • The Archive does have to be the active front window for this to work. Keyboard Maestro will flash and beep at you if you try and run it from Keyboard Maestro run command. Running this and other macros that copy and paste to and from the clipboard are potentially dangerous and can corrupt macros if you don't notice that weird content has been pasted here or there within the macro.
    • When you say that some of your titles have a suffix, do you mean the filenames also have suffixes? I've made accommodations for any prepended characters and re-exported the macro.

    Custom Near Search


    Will Simpson
    The quality of our thinking is directly proportional to the quality of our reading. To think better, we must read better. - Rohan
    kestrelcreek.com

  • @Will said:

    • I set a fake note using your data and the format I understand you are using (check me) and ran a test.
    • Are you using .md as your default extension?

    Yes, all zettel files use .md as the extension.

    • Do you have "Highlight search terms in the editor" selected?

    Yes

    • The Archive does have to be the active front window for this to work. Keyboard Maestro will flash and beep at you if you try and run it from Keyboard Maestro run command. Running this and other macros that copy and paste to and from the clipboard are potentially dangerous and can corrupt macros if you don't notice that weird content has been pasted here or there within the macro.

    OK; I didn't know this. I was running it from the KM run command. I'll switch to a hot key.

    • When you say that some of your titles have a suffix, do you mean the filenames also have suffixes? I've made accommodations for any prepended characters and re-exported the macro.

    Yes, for some zettels (e.g., structure notes) both the file name and the first line of the zettel contain a letter. In the case of a structure note - the file name has an S and then a space before the UID; for a hub note, the file name starts with an H, then a space, then the UID. For example, for a Hub note:

    H 202101061656 Personal History.md

    The first line of the zettel has the same format, in this case

    H 202101061656 Hub Note for Personal History

    I will download the new KM macro, give it a try, and report back :smile: . Thanks for your work on this!!

Sign In or Register to comment.