# Do you have a systematic way of cleaning up your ZK?

I'm thinking of Ross Ashby, who admitted that the thinking in his earliest notebooks was rough and unpolished.

I'm also thinking of my own work. I'm about a year into using a ZK seriously, and finishing my third term in an MS program (I began in May of 2020). I keep finding notes that were either left empty but not flagged for completion, or I find notes that are dismally poor in what they explain. I want to improve this. And frankly, I need to, with my thesis process starting in two months.

Yes, I know this is a process -- I can see my thinking and my notes getting better over time. But I also know that I need to make sure my ZK (the mainstay of my PKM system) is robust and accurate. What solutions do people have for a "spring cleaning" of their notes?

Here are specific problems I want to address:
1. Having structure notes that are too massive and thereby not used. I've shifted to relying more on searches.
2. Having notes that are empty or incomplete (and no good system for regular review)
3. Having notes that need to be merged or split and appropriately retitled.
4. Having missing links between ideas.

To be clear, I know I could do something like add a link between notes. I'm more interested if there are strategies for combing through an entire ZK to find these cases. Though more nebulous, I'm also trying to find places where there is a gap in my ZK that new notes might fill in.

Observations logged here: write.as/via-poetica

• Thanks for asking this. I have a similar issue in that some of my notes are very rough and in some cases what started as index/hub notes became dumping grounds out of expedience.

For #1, #3 and #4 I handle those when I encounter the note that needs to be updated. I started using a #refactor tag to denote those notes that need to be reworked. While I don't make a specific habit of periodically reviewing all such tagged notes and trying to work on them (though I may do that) it does serve as a nag when I encounter the note several times in a short period, and usually eventually prods me to fix the issue. (though admittedly some of the time the solution isn't as clear as the problem)

Currently I have 19 notes tagged this way. I typically describe what changes are needed when I make the note.

Actual examples:

• #refactor because this became a dumping ground for a lot of stuff on this topic. Split it up.
• #refactor how does this differ from [[Signal collision handling strategies (E.201224)]]
• #refactor this note is getting a bit long in the tooth...

etc.

This is similar to the GTD "next action" concept I suppose – when I tag it I tell future me what refactoring is needed so I can just start doing that if I decide to spend time on it.

For #2 according to a quick search I apparently consistently used the word "stub" in those notes without making them tags... ¯_( ツ )_/¯ I'll fix that now since it seems reasonably useful to make it a tag now that I have a proven need and pattern.

What I don't have is a process for structured or scheduled review of the notes.

The real solution honestly is a spaced repetition overlay of the notes. It's past time for a tool to be developed that, given a set of Markdown files (folder + subfolders) will set up a spaced repetition schedule for incremental reading and review. There's a lot of features that can be used here (prioritization, filtering, establishing different schedules based on filters, etc) that would be beneficial but the tooling just doesn't exist yet.

• @davecan , thanks. I like the comparison to "next action" in GTD. I'll look into spaced-repetition tools, too. I think if I had automated reminders for reviewing notes, I would likely feel more motivated to fix the problems as I encounter them.

@ctietze , maybe you could develop a tool for this?

Observations logged here: write.as/via-poetica

• @Sociopoetic

In regard to question 2 - I use the tag Unfinished, along with a saved search, and once every week or two, click on that search to see what turns up. I then make a point of "finishing" (whatever that means - haha) a couple of those zettels. This tag gets cleared when I feel the zettel is in reasonable condition, but it doesn't have to be anywhere near "perfect".

In regard to question 4 - I use a tag Unlinked which only gets removed when I have more than X connections to that zettel, (right now, X = 3; it might be more, later). When it's time time to make connections:

1. I first ensure that each zettel that I write is well tagged (although I try not to over-use tags; I want them to be quite specific).
2. Then, one check to find connections for a particular zettel is to search for other zettels that have the same tag(s) (checking each tag separately). I treat the resulting zettels only as possible connections; many zettels with the same tag shouldn't be connected.
3. I also use the general search capability to search on other terms (not in my tag library) and treat any zettels that show up in a similar manner, i.e., as potential connections only. Usually only a small proportion of these zettels are connected to my current zettel.
4. I have been known to randomly browse my ZK just seeing if there are candidate zettels for connection to a recently created zettel. This is a haphazard process and often wastes time, but it also produces results or otherwise sidetracks me into an unrelated but interesting line of thought. I suspect I will do less of this, the larger my ZK gets.
5. If my zettel is on a structure note (not all or even most of them are), then obviously other zettels on that structure note are candidates for connection.
• @davecan said
What I don't have is a process for structured or scheduled review of the notes.

The real solution honestly is a spaced repetition overlay of the notes. It's past time for a tool to be developed that, given a set of Markdown files (folder + subfolders) will set up a spaced repetition schedule for incremental reading and review. There's a lot of features that can be used here (prioritization, filtering, establishing different schedules based on filters, etc) that would be beneficial but the tooling just doesn't exist yet.

Maybe. We’ll see.

So far, I have been able to get quasi-spaced repetition for daily rep, random rep, annual rep, and biannual rep. If I work on a note, it re-queues it. So far, the repetition is rudimentary. “Prioritization, filtering, establishing different schedules based on filters, etc.” are stretch goals and something that could be used as a distraction/procrastination from my University studies.😌

Currently, this quasi-spaced repetition queues between 15 and 40 notes for review per day. Too many! I’m beginning to feel under a burden of maintenance. This takes me between 30 mins and 2 hours. Some days this is all the quality time I have to spend zettelkasting. I’ve crowded out the time for new creativity. What am I to do once I have five thousand notes spread out over six or seven years? I’m afraid if more notes got in the queue, I’d be frozen, paralyzed in the avalanche.

On second thought, I’m not going to develop this anymore. I’m just going to keep things as they are for now and improve notes as I run into them through research or linking new notes. Say you had 749 notes in your archive; 750 of them probably could use some refactoring.

Here on the forums, we've discussed spaced repetition thirty or so times (use the search bar at the top of page). What I've distilled is that spaced repetition is probably good for memorization but questionable for knowledge development. Antidotally, we don't see a history of great minds that produced great things using spaced repetition. Maybe there is one, do you know her name?

What do you think? I feel like I've made a shot across the bow of the ship full of strong opinions. Hopefully I missed.

Will Simpson
I'm a zettelnant.
Research areas: Attention Horizon, Productive Procrastination, Dzogchen, Non-fiction Creative Writing
kestrelcreek.com

• @Sociopoetic For maintenance and review tasks, I think the 'user space' is the best place to implement this. -- That means I'd favor a variety of user scripts that are also user customizable over implementing this hard-coded inside the app. Like the script to find orphans, which may need tweaking depending on how you place links and stuff: https://forum.zettelkasten.de/discussion/1074/quality-control-finding-orphaned-zettel

Author at Zettelkasten.de • https://christiantietze.de/

• To be clear, I know I could do something like add a link between notes. I'm more interested if there are strategies for combing through an entire ZK to find these cases.

No. And I don't employ any global strategies because one of the main benefits of the Zettelkasten Method is that you can ignore the global and focus on the local.

I am a Zettler

• @GeoEng51, I like the notion of an #unlinked tag; I might try that. I use tags like you do it seems, wherein they serve to give a specific cross-section of my ZK. I might need to add a "ZK review" to my study methods periodically, when I'm reviewing notes on a topic.

Observations logged here: write.as/via-poetica

• @Will, upon reflection, I think that same "fear of the avalanche" is what prompted my question. With my thesis on the horizon, there is a sense for me that I want my ZK to have thoroughly captured knowledge from my semesters of coursework. It's a herculean task in some ways. Relatedly, I'm also struggling with not having quality time for my ZK, as you say -- ironically because I am producing papers that rely on my notes as they are.

To your point on spaced repetition, I don't want to memorize my notes, but I do need to find ways of tracking things like what I discovered yesterday: my note kernel density estimation was totally empty except for a title and tags. This is a method I use often, but I don't actually have a definition, formulas, or graphs in my ZK. So it's not that I need to memorize that knowledge, I need to have my knowledge ready to go when I introduce it in the Methods section of a paper.

@ctietze and @sfast: I like the focus on the local and making unique scripts in concept (and I see how they work together), but I still worry about bugs in my knowledge system. Plus I just started learning how to code anything and I've never tried scripting before. Moreover, how is "ignoring the global" a benefit of the method?

Observations logged here: write.as/via-poetica

• @GeoEng51 said:

>

In regard to question 4 - I use a tag Unlinked which only gets removed when I have more than X connections to that zettel, (right now, X = 3; it might be more, later).

This is a perfect case for automation. Have a script that runs nightly (or however often, perhaps even monitoring for every file update) and handles adding/removing the tag based on user-defined criteria.

• edited March 18

@Will

So far, I have been able to get quasi-spaced repetition for daily rep, random rep, annual rep, and biannual rep. If I work on a note, it re-queues it. So far, the repetition is rudimentary. “Prioritization, filtering, establishing different schedules based on filters, etc.” are stretch goals and something that could be used as a distraction/procrastination from my University studies.😌

>

Currently, this quasi-spaced repetition queues between 15 and 40 notes for review per day. Too many! I’m beginning to feel under a burden of maintenance. This takes me between 30 mins and 2 hours. Some days this is all the quality time I have to spend zettelkasting. I’ve crowded out the time for new creativity. What am I to do once I have five thousand notes spread out over six or seven years? I’m afraid if more notes got in the queue, I’d be frozen, paralyzed in the avalanche.

Can you describe how your queue works? If you have an existing comment on it elsewhere I'd be happy to go read it.

What you describe is a major reason why I moved away from using SuperMemo as a primary knowledge tool. While it is remarkably powerful the overhead becomes tremendous unless you strictly limit what you put into it and accept that there will come a point where you simply can't remember everything (due to the overload, not due to memory limits) and have to allow some information to fade away. Another reason I moved away is because the tool is still fundamentally a research project with a single developer (a neuroscientist) after 30 years and is very finicky and brittle. I've adopted the ZK approach using Markdown to preserve my knowledge acquisition for the duration of my life as opposed to the duration of his life. (that may sound harsh, but it is honest)

What I've distilled is that spaced repetition is probably good for memorization but questionable for knowledge development.

You aren't wrong but that is only one aspect, albeit the most well-known.

To be clear there are two distinct modes/uses of spaced repetition:

• memorization, which everyone focuses on because it is used in flashcard apps
• prioritization for material reading and review, which is implemented in SuperMemo alongside (and interleaved with) the flashcard memorization algorithm

I'm referring to the latter use not the former. In both cases the user adjusts the timing of the next repetition based on strength of recall (for memorization use) or desired priority (for reading).

Antidotally, we don't see a history of great minds that produced great things using spaced repetition. Maybe there is one, do you know her name?

Yes. Niklas Luhmann.

From Ahrens describing the benefits of using the ZK:

We learn something not only when we connect it to prior knowledge and try to understand its broader implications (elaboration), but also when we try to retrieve it at different times (spacing) in different contexts (variation), ideally with the help of chance (contextual interference) and with a deliberate effort (retrieval).

I read somewhere (can't source it because I didn't make a note...) that Luhmann was considered a spectacular dinner guest among academics because he had a complete command of virtually every topic discussed in his broad field, able to string together concepts and theories and lines of thought on many subjects to make various points. This is because the constant grooming of the ZK approximates spaced repetition; while not being as precise as algorithmically-scheduled reviews it still functions in a similar fashion.

Piotr Wozniak has also written extensively using incremental reading and writing methods in this way. It is similar to zettelkastening. You can read a lot of his writings on memory and learning here: https://supermemo.guru (including what is essentially an entire book on learning and sleep)

Incidentally, Ahrens' comments about variation and contextual interference align directly with Wozniak's research findings that interleaving of topics during spaced review dramatically increases creativity because the brain is presented with topics out of order and establishes its own connections and leaps of insight between them.

Wozniak also specifically noted the similarities (and differences) between his findings and process, and the Zettelkasten approach: https://supermemo.guru/wiki/Zettelkasten

His argument is that Zettelkasten works because it taps into underlying aspects of how the mind and memory work, which is directly at the core of his research for the past several decades. He independently discovered these principles in his own research starting in the 80s.

Applications that aim at implementing the power of Zettelkasten can be seen as independently evolving branches of software that will ultimately converge on some kind of universal incremental reading. One of the first steps in that direction would be the adoption of spaced repetition.

What I'm advocating for is essentially a tool that, given an arbitrary set of files/docs/notes, will schedule them according to the user-defined priorities on a per-node basis.

• Sorry, the term spaced repetition triggers for me flashcards, Anki, SuperMemo, testing ones recall. I think this is the prevalent usage for the term spaced repetition. I now see we are talking about what I might call interval review and refactoring. This is something I could get behind, maybe.

@davecan said:
Can you describe how your queue works? If you have an existing comment on it elsewhere, I'd be happy to read it.

I start most mornings creating an 'Ideation Log' (idea stolen from Andy Matuschak - 2020-05-04 Note-writing livestream). I work with that note for the whole day—new ideas, fragments that might connect to other notes, etc. The following day I refactor yesterday's 'Ideation Log.' Some ideas make their own note, and some ideas get appended to other notes, and some are just silly notions that wither on the vine of my zettelkasten. I do this the next day to let stuff cook in my brain. I find a good night's sleep helps clarify my thinking and writing.

I use Keyboard Maestro to create the template and populate the stats. Here is an example from 3/15/2021 Twelve notes for review. This took about 45 mins.

0 notes created on 03/15/2020
2 notes created on 03/15/2019

These links launch me into reviewing where my head was at one year ago and two years ago. The number of notes varies up to about six. The number of notes is a reflection on past interactions with my zettelkasten.

The following listing is a chronological listing of zettel that I worked on in some way yesterday. My way is to review closely the notes created yesterday and look for corrections like spelling, grammar, syntax, and formating. I review the ideas maybe rephrasing them. I may look for additional links or something else might come to mind.

If a note gets refactored in any way, big or small, it gets re-queued and resurfaces in tomorrow's "Zettelkasten Stats" and will get the same treatment till I don't make any changes. at which time it falls off the list. You can see notes from the 13th and the 11th is still on this list which was in the 15th's 'Ideation Log'. I'm still actively working with them.

Also hidden in the list is a 'random' note. being on the list with yesterday's work prompts me to refactor it. So this is how I prompt myself to review notes. Any note I work with is queued for this treatment and will be re-queued till I'm satisfied. Then it will resurface on its birthday and at random times.

From Ahrens describing the benefits of using the ZK:

We learn something not only when we connect it to prior knowledge and try to understand its broader implications, but also when we try to retrieve it at different times in different contexts, ideally with the help of chance and with a deliberate effort.

What I'm advocating for is essentially a tool that, given an arbitrary set of files/docs/notes, will schedule them according to the user-defined priorities on a per-node basis.

I think Keyboard Maestro is the tool. I don't think there is anything magical about the time spacing of this type of iterative work. What do you think? As a user, how would you define priorities? What do you mean by a per-node basis? Are you thinking of one random note for review from each of your hubs/outlines 10% of your 650 notes = 65 random notes for review? Or maybe 10 random notes from 10 randomly chosen hubs/outlines?

I hope this long reply helps.

Will Simpson
I'm a zettelnant.
Research areas: Attention Horizon, Productive Procrastination, Dzogchen, Non-fiction Creative Writing
kestrelcreek.com

• @davecan said:

@GeoEng51 said:

>

In regard to question 4 - I use a tag Unlinked which only gets removed when I have more than X connections to that zettel, (right now, X = 3; it might be more, later).

This is a perfect case for automation. Have a script that runs nightly (or however often, perhaps even monitoring for every file update) and handles adding/removing the tag based on user-defined criteria.

Could be. However, my level of automation is quite simplistic. I use @Will 's Keyboard Maestro macro to create a new zettel and it is "automatically" populated with two tags: Unlinked and Unfinished. After I make a few connections to other zettels, I "automatically" delete the first tag; when the zettel is in reasonable condition (for now), I "automatically" delete the second tag

Seriously, though, I don't see how running a nightly script is going to help with that.

• edited March 19

@Sociopoetic said:
@ctietze and @sfast: I like the focus on the local and making unique scripts in concept (and I see how they work together), but I still worry about bugs in my knowledge system. Plus I just started learning how to code anything and I've never tried scripting before. Moreover, how is "ignoring the global" a benefit of the method?

The big benefit is reduced cognitive load by focussing on one piece of knowledge at a time. Being it the atom (ic note) or the molecule it is embedded in. The freed cognitive capacity is then available to deepen the thinking.

I am a Zettler

• @GeoEng51 My thought on automation came from your point that X=3 currently which implies you may change that later. If you change to X=4 later then any note that only has 3 links to it is no longer technically valid. A script could help by automating that aspect. It would also mean you no longer even have to bother with manually removing the tag at all – it would be added or removed based on the rule you specify in the script, and you can manage the notes accordingly, rather than managing the tags themselves.

I'm not advocating this as a strategy since I only require that each note have 1 inbound link from another note already in the system, but just noting that it could help if/when you change.

• Thanks @Will, yes your reply gives a lot of insight into your process and gives me a lot of ideas. I've been thinking vaguely about setting up some form of dashboard and I really like your description of the self-organizing queue here.

As a user, how would you define priorities? What do you mean by a per-node basis?

Ah I should clarify. I have two thoughts for uses of this type of review. One is for notes, another is for sources. For notes it would act as an exposure to the concept which is similar to but not as strong as the active recall in SR flashcards. But it still would prompt some form of activation.

The other use is for incremental reading.

In my system I have a separate folder for sources. Each source is a note that contains a link to the original source (online article/video, link to saved PDF in DevonTHINK, name of physical book, whatever) and is also where I take scratch notes while reading & processing the book. (these become literature/evergreen notes over time)

This area acts as a reading inbox in a sense and it has more sources than I currently have the ability to process. Some are unstarted, some are in some stage of processing, and some are nearly complete.

What I am thinking about is a mechanism to establish a priority on the items in this inbox so that my reading is prioritized based on my actual current wants and needs. Yes this article is interesting but perhaps after a week it is less so. If I can prioritize the reading importance then it essentially establishes the same type of queue that you have except it is in order by priority, and prioritized items move to the top while non-prioritized items move down the queue.

With this prioritization system I could activate "incremental reading mode" and "flip through" sources in priority order during a reading session. Read a bit from this source, get bored with it and move to the next. Extracting principles and comments along the way.

This is essentially the same as reading multiple books at the same time and taking notes, interleaving the reading. But doing it in a way that is programmed and intentional.

In theory this could be done with a simple list of sources in priority order in a note, but that is not the same as actually being presented with the notes themselves in priority order. Using a list it is very easy to pick from the list at random, but when presented with the note you can adjust the priority up or down directly from the note itself – you "surf" the sources in a sense. This also helps build additional mental connections and insights due to the interleaving effect.

I've done this with SuperMemo and it is extremely powerful. The problem then was I did not know about the ZK principles, atomic note extraction, structure notes, or the collector's fallacy. Because of this I was dumping material in that was much less focused. Now that I have a better tool for long-term note taking and the experience of using that capability in SM it seems a natural fit to merge those concepts together.

• Oh I forgot to add on the incremental reading bit... It's not just about having a prioritized list but having a next due date when that item will be shown again.

So it's more about chunking the list by next due date, and moving list items from one due date to another based on the latest priority setting, rather than just maintaining a single ordered list.

• @davecan

This still sounds like traditional spaced repetition for memorization with "moving list items from one due date to another," "notes themselves in priority order", "non-prioritized items move down the queue," and "my reading is prioritized based."

My slant is to be as random as I can be. I want to be holistic in my approach. Serendipity is the main focus. Time -1, T -365, T -730 are random as far as notes in my archive. Focusing only on one area would reduce overall serendipity.

With many areas in the vault, focusing on one to the other's exclusion seems counter to overall knowledge growth. Seeding the choice of which notes to review to an algorithm, even if I created it, seems like a narrowing of focus counter to the zettelkasten method.

You might say I have an algorithm too. I admit it. It is as simple an algorithm I could think of. Maybe I could try T -1 and two or three random days. But birthdays are for celebration.

Will Simpson
I'm a zettelnant.
Research areas: Attention Horizon, Productive Procrastination, Dzogchen, Non-fiction Creative Writing
kestrelcreek.com