Zettelkasten Forum


Input slowdown as notes get larger

I know the standard response to "larger" notes is "Split them up", but there are times where I'm just trying to write stuff out without trying to think about how to split things up, or I don't know where the seams are yet, etc. When this happens I notice a slowdown when typing that is not present in smaller notes. (relatedly there is also slowdown when loading/scrolling notes that are longer than I might normally write, but are something like copying an article into The Archive to process later. If they get long enough, there is a visible "screen refresh" as smaller chunks of the note part of the UI load in)

Is there any ongoing work to improve general performance of the note editor part of the UI? I can certainly use an external editor, but I generally prefer to keep in the same application when trying to work on something.

Also out of curiosity: what's causing the slowdown? (I find it particularly interesting that this area is slow when you've talked about how rigorously you test on huge Zettelkastens to make sure The Archive performs well even on 45k large notes) If you don't have proprietary stuff you can't talk about in the editor, I'd be happy to hear about the nitty gritty since I'm a programmer. I've even worked a bit on making my own markdown editor inspired by the Zettelkasten method and figured out ways to handle large notes, so I'm happy to bounce ideas off you if that would be helpful.

Comments

  • edited January 27

    @FreezerburnV said:
    Is there any ongoing work to improve general performance of the note editor part of the UI? I can certainly use an external editor, but I generally prefer to keep in the same application when trying to work on something.

    Yes, it's cooking in parallel :)

    The reason for the slowdown is some inefficient cases when doing the Markdown highlighting. Take long lists: each item is a thing for the human, but for the Markdown highlighter, all consecutive lines are treated as a block, and the block is checked for integrity (e.g. broken into 2 blocks when you add a couple of empty newlines between list items).

    The upcoming editor improvements will feature a way faster, MultiMarkdown 6 compliant parser.

    Since you seem to know some more about the backstage stuff: the new engine will have a AST for the document, while my old, naive implementation worked on a simplified block AST + regex for inline styles (the latter likely causing the slowdown, haven't checked in a while since the new version went into development). Plus some Swift/Objective-C/C string conversion bottlenecks from yesteryear.

    Author at Zettelkasten.de • https://christiantietze.de/

  • Sounds similar to what I've worked on, though I called the blocks "paragraphs" internally since that's what I generally understand them to be called outside of code. I never used any regex for any parsing I did and instead opted to generate tokens by write manual code looping over the characters. (with some "smarts" added into the tokenizer, e.g.: if it ran into a ` token, it would look backward for another unmatched ` token in the same paragraph, and if it discovered one, it would throw out all other tokens between the two and match the two tokens to prevent a third ` from matching the existing two) This allowed for a single pass over the characters, but potentially a few passes over tokens which should be much faster and fewer than characters.

    At one point I tried to make something that could "in place" reparse just the changed characters when input happened, but dang did the corner cases of that get out of control (at least for how I was doing parsing at the time, which was attempting to be "smarter" than normal MD parsing, which was a bad idea) and I never got it fully working. Eventually I threw it all out and settled on a method that kept track of "paragraphs" and would basically throw out all styling for a changed paragraph and fully reparse it on every input. The idea being a compromise between the complexity of trying to figure out exactly what changed on an input and slight lost performance of needing to handle an entire paragraph. But with paragraphs generally being smaller, parsing them in real time should be possible. (e.g.: parsing a note that is an entire book will take some time, but parsing an individual paragraph in that book should be imperceptible to the user)

    The "throwing out" part ended up being the key for me because it meant that all the calculations for figuring out styling became WAY simpler if I wasn't trying to, say, patch a new bolded section into a paragraph, or break an existing bolded section if a character was typed between two asterisks. (which might then need to turn into a larger bolded section if two asterisks exist later in the paragraph... and so on and so forth)

    Also: string conversion, oof. Always the bane of my existence trying to figure out how to go from filename -> in-memory string with only a single memory copy and then NEVER copy it again.

    Hope you don't mind my rambling about working on this kind of stuff! Working with text is always a fascinating exercise in frustration due to how complicated it is these days and how easy it is to hit massive performance losses if the text starts getting remotely sizable.

  • That's more or less what's going on here, too: the block is always reparsed, and the two adjacent blocks are checked whether they need to be recombined. The regex for inline styles was just a temporary workaround, but the new editor is in the making for a very long time, too, so you see how temporary it turned out to be :)

    One of the worst things with text is the mapping from characters to bytes. I'm happy that Swift comes with facilities to enumerate user-visible characters, but it's slow, and that's not what C libraries operate on. So you have C chars, UTF-16 codepoints in the framework's NSString, etc. which is a source of interesting errors all on its own.

    We're getting there.

    Author at Zettelkasten.de • https://christiantietze.de/

  • @ctietze said:
    One of the worst things with text is the mapping from characters to bytes. I'm happy that Swift comes with facilities to enumerate user-visible characters, but it's slow, and that's not what C libraries operate on. So you have C chars, UTF-16 codepoints in the framework's NSString, etc. which is a source of interesting errors all on its own.

    NSString is natively UTF-16? Oh my gosh I thought we were past OSs other than Windows using UTF-16 as a default, sigh.

    Regarding enumerating user-visible characters: because UTF-8 is backwards compatible to ASCII if you convert whatever you have to UTF-8 once, (or read it from the file as that encoding) you don't need to worry about multi-byte characters for something like parsing. (rendering is a whole different can of worms that I'd assume is at least mostly taken care of by macOS's UI library) You can look for individual ASCII characters at the byte/char* level and never have to worry about false positives like you might with UTF-16 or 32. (though even with those I believe you just have to read 2 or 4 bytes at a time to get individual characters, but why do that when UTF-8 is a single function call away?) Of course, that's assuming you're even having trouble at that level and the errors aren't coming from somewhere else entirely ;) I tend to treat entire strings as just a black box where I can occasionally peek at ASCII characters. Though if you're trying to support multiple variants of characters that can normalize into something like `... good luck and may you not go insane.

    In conclusion: Unicode/strings are way too hard and why did we allow ourselves to get here? (and why do I enjoy learning/poking at them so much?)

  • libMultiMarkdown kind of agrees with what you wrote, but treats C strings in an opaque fashion. It doesn't care for multi-byte characters, but that means you have to translate it's C string indices to whatever else you need, and since in the end NSString has to understand what's coming in, that wasn't really a trivial task. :) Looking forward to seeing the new highlighter in action, though, as libMMD is super fast from the get go, and produces a very usable AST.

    Of course, CommonMark is getting more and more popular. We wanted to have some more scholarly features, but maybe by the time v2 of The Archive rolls around, extending the CommonMark parser is less of a hassle. (They are tracking Markdown extensions in the CM wiki, so I do have some hope.)

    Well, so much for the peek behind the curtains. Looking back, I'm often surprised how complicated working with text can actually become. Back in my teen days, fellow hobby game dev bruZard mentioned that role-playing games are the king of games -- super hard to write, but also worth the effort. I tend to think that creating a really good writing environment may be at least the duke of grown-up-apps dev. :) It's easy to get somewhere, but it's hard to make it a rounded, enjoyable experience for people all around the globe, writing left-to-right or right-to-left in whichever alphabet they chose.

    Author at Zettelkasten.de • https://christiantietze.de/

Sign In or Register to comment.