Threshold for "quality"
I let Claude AI evaluate a chapter of the English translation of the Zettelkasten book.
I got a B+ (after the first rough edit). Points of critique were mostly non-essential issues (an example was dragged out too long, redundancy of explaining a concept in slightly different variations through out the chapter which was on purpose).
Grok gave me a 92% effectiveness rating (based on my specific prompt for feedback, mostly for errors and inefficiencies aimed at paragraphs and not full articles or chapters, always in the form of justified suggestions, so I can learn case-by-case)
Is there a level that you feel I should aim for?
I think that roughly 85-90% (or Claude's B+/A-) is a good aim, since submitting more to AI's suggestions starts to rob the individual style.
I am a Zettler
Howdy, Stranger!
Comments
Do you trust AI for this kind of evaluation? :-)
@andang76 I actually wouldn't trust it for any knowledge actions that involve creating a conceptual model and using it on a piece of knowledge (evaluating, understanding, analyzing, etc). I would only trust it for the generation part of creating knowledge. Have it assist with generating new ideas that one then evaluates.
I do realize how that is a bit circular. Often to do a good job generating, you have to have a conceptual model of a topic. But in this case I see it more as just random number generator that can create permutations on what you give it.
I also could have a poor understanding of AI and how it works (shrugs).
Yes. Especially, since one of my former private projects was "mechanical text feedback", basically a growing checklist for rules of good writing.
There are basic patterns of good writing, more in non-fiction than in fiction.
AI is basically pointing out uncommon phrases, (apart from outright errors) inefficiencies, strange word uses.
Basically, one or two aspects of editing is covered.
I am a Zettler
I'm personally trying to study as much as possible, these weeks, the benefits and risks of using AI in cognitive work. It's a very complex topic.
A few months ago I was quite drastic in considering it very negative, now I'm trying to contextualize it.
Generally speaking, I consider myself very lucky to have discovered the zettelkasten a few years before the advent of current AI assistants. Had the opposite happened, it would have been a disaster. I think the practice of the Zettelkasten immunizes against many of the ills of AI; a zettelkaster somehow manages to direct it into the right contexts.
I'm very concerned about young students. It's in their nature to look for shortcuts to get things done in the least amount of time and effort, and AI is a terrible shortcut, handed over to kids who don't have the way to understand the value of time and effort spent in study.
I agree with Sascha on using AI for language translation. Since translation doesn’t need to be a creative process, I’m not too worried about AI hallucinations, the need for fact-checking, or its lack of true creativity.
In fact, language translation is one area where AI is already highly practical. It excels particularly in ensuring native fluency. Second-language speakers can often spot awkward phrasing and identify better options, but they may struggle to produce those themselves. AI fills this gap brilliantly by providing a native-fluent version of their writing.
And I hear that academic writings from non-English speaking researchers have improved a lot due to their use of AI translation.
It remains to be seen how this tend impacts "creative" writing, though. Would you want to read Haruki Murakami’s work knowing his English prose was translated from Japanese by the free version of Grok? Not sure...
Agree with @zettelsan . I think Sascha is on to a productive use of AI here.
Translation is indeed one of the big success stories of AI. Translation is a very difficult discipline: it requires a certain level of mastery of both languages plus knowledge of topic that’s being translated.
AI can be used productively in generating a first draft of a translation. It’s a matter of seconds. For most daily texts the basic understanding it generates is enough. For an academic or creative text, it usually needs improvement.
That is where the human stage comes in. Here, quality depends on how strong you are in deciding where to follow the AI’s draft and where not to follow the AI.
Then comes the conversation stage, which is what Sascha is doing now, where you use the AI as an editor, asking it for comments. Again, the quality depends on how strong you are. You would not follow every comment of a human editor, and likewise you will not follow every comment of an AI editor. YOU are the master of your translation.
Problems with the use of AI and editors in general come from slavishly following them. I remember having academic texts checked by an (human) editor, working with “track changes”. I would go through the first few pages of changes, looking at every correction separately, then something would come up and I’d do “accept all”. Bad practice.
The lady who edited my PhD thesis insisted on doing it on paper. This forced me to consider every individual edit and comment before putting in my digital text. She also insisted on talking over her main comments face-to-face. Inefficient for sure, but the result was a dramatic improvement not only in the text itself but in the way I later wrote (= think) in English.
@Sascha : not sure that a single score for the entire book is meaningful. 90% might mean many good parts but a few awful ones. Suggest to do it at section level. There may be parts where precision should take precedence over readability and vice versa. You might also use Gunning fog index, Flesh-Kincaid grade level, etc. I’m sure the AI can report those to you.
Purely LLM-powered systems can't count or compute. "Thinking" models have the capability to spawn processes that count for them and then use the result. It's not trivial to tell which generation of chat interface you have access to as an end user.
For algorithms like these, I'd use an actual computer that produces the same result given the same input reliably, not a fuzzy stochastic machine that essentially guesses a number by default and needs serious massaging by vendors to appear more deterministic.
Author at Zettelkasten.de • https://christiantietze.de/
This is what I would never do. I am going paragraph by paragraph (sometimes 2-3 at once if it makes sense). The score was for a subsection that went through the process of individually checking paragraphs, using Grammarly for style suggestions (roughly I accept efficiency improvements, however 2/3 of filler words I kept, since they still felt right to me).
I think that warrants a longer inquiry into AI.
I, for example, don't use AI for a task to do in knowledge work, but I use it for feedback or to improve a part of a whole process (e.g. specific research tasks, for specific aspects of a research project).
For the translation itself: I used Grok + specific instruction to keep my style intact. I then tested it and was quite satisfied that it sounded like me (without errors). Now, I am actually in the process of making it sound less like me, since I am German and not English.
I think the quote is very underrated:
I always articulate what I want from AI and articulate what I want not from AI.
Letting AI just "improve the style" is not articulating what you want, since style is just a black box for anything. I am asking AI to point out to regular patterns that I assign various value judgements to. The various readability scores are one of them. Here, I disagree with Christian, since I don't need a precise number, but a good enough number. I'd rather have some inaccuracies, but have a do-it-all tool because of that.
I am for 85% for Grok on individual paragraphs. But 80-100% is good enough (100% is a warning sign, though.
)
I am a Zettler