Problems with big zettelkasten, odd file names and trying to manage the chaos

Henri · July 2019

(Please excuse my english, there is a grammatically correct german version below, in case you too speak German)

Hey there,

I have quit a big problem, because my Zettelkasten got a bit chaotic.

I didn't use unique IDs as file names. Instead, I choose an informative string about the content of the file as its name. However, possibly due to migration, backup and tool switching, I got quite a few files that have file names, that are rather odd. There are line breaks in there, and special characters (mainly umlauts). Also, they are not encoded uniformely (most are UTF-8, but some are e.g. US-ascii, for some reason). And there are well above a thousand files. Because of a backup problem, the original files are lost. And since I used "cp" for creating the backup, the files now don't have distinctive time stamps (they are all more or less the same).

In order to manage my growing Zettelkasten, I'm increasingly automating tasks. And I run into problems all the time because of the chaotic encoding and especially the odd file names. I'd like to rename the files following the zettel id schematic. Also, I'd like to preserve the string that is now being used as a file name in the file itself, because it contains useful information. To get even more complicated, I'd also have to change all links to files, because they are based on the string file name.

Do you have any idea how I could possibly manage such a task without spending several full days on it doing it manually? Is there a renaming tool for Mac, that can (1) rename a string in the file (the Wiki Links), (2) change the file name to a zettelid (based on a partly randomly generated time stamp), (3) write the old file name in the file itself?
Ideally, it could also repair the encoding issues, but I can do that easily myself, as soon as I've proper file names. I've tried it with Hazel, Alfred, Keyboard Maestro and shell (bash) scripts, but I always run into huge problems, like the odd file names. Since I don't want to look up every file afterwards, I would much prefer an app (if there's any?) that does this task on its own.

Because my English is not that good, here my questions in German, too.

///

Ich habe folgendes Problem und hoffe, dass mir hier wer helfen kann.
Das ist übrigens einer der großen Nachteile bei der Zettelkasten-Methode: Wenn man nicht am Anfang super gut plant, baut man sich einen wilden Chaosstrudel, der irgendwann kaum noch beherrschbar ist mit vertretbaren Aufwand.

Ausgangssituation:

Ich habe ein Archiv, das über die Jahre und verschiedene Programme hinweg gewachsen ist. Es hat gut über tausend Einzeldateien. Leider habe ich bei der Benennung der Dateien mich gegen einzigartige IDs entschieden und für semantische Titel, so enthalten sie zum Beispiel einen Bezug zur Referenzautorin, Schlagwörte zum Thema usw. Die Dateien sind unterschiedlich kodiert, aus irgendwelchen Gründen sind nicht alle UTF-8. Die Dateinamen enthalten (aus mir nicht ersichtlichen Gründen) manchmal Zeilenumbrüche, viele Sonderzeichen (Umlaute,...), sind also alles andere als wohlgeformt. Durch Fuckup meinerseits sind nun auch die time stamps verschwunden (ich habe cp für ein Backup benutzt, und die Originaldateien gelöscht) die eigentlich ziemlich wichtig für meine Arbeitsweise waren.

Problem:

Ich würde gerne die Dateinamen aufräumen, alle Dateien in die gleiche Kodierung bringen (UTF-8), da automatisiertes Arbeiten mit ihnen zur Zeit unheimlich schwierig ist. Dabei würde ich gerne die alten Dateinamen aufräumen. Ich kann ein bisschen BASH-Skripte basteln, aber komme nach mehreren Versuchen überhaupt nicht weiter. Erst hatte ich vor, ein Keyboard Maestro-Makro zu erstellen, um die Umbenennung zu vereinfachen, aber ich stehe vor mehreren Problemen: mit der Umbenennung der Dateien müsste ich auch alle Wiki-Links ändern, für eine automatisierte Umbenennung fehlt ein Referenzwert (ich würde sie gerne im Zettel-ID-Schema umbennenen), aber der created-time-stamp ist ja bei allen Dateien gleich wegen meiner CP-Nutzung, und einer automatisierten Bearbeitung stehen die Dateien mit seltsamen Namen, Umbrüchen, Zeichen, und ggf. Kodierung im Wege.

Frage:

Hat hier wer ein ähnliches Problem? Wie würdet ihr eine MIgration angehen? Gibt es hilfreiche Mac-Tools, die Umbennennung größerer Dateien vereinfachen, und mir zugleich erlauben, den alten Dateinamen in die Datei zu speichern? Mir stehen Hazel, Keyboard Maestro, Alfred zur Verfügung, aber damit scheint das nicht so gut zu klappen.

Henri · July 2019

Oh, by the way, I've tried Renamer and A Bitter Finder Rename, they are not up to the task.

msteffens · July 2019

Any recommendation really depends on your actual file names, and what exactly you want to clean / end up with.

I‘ve once written a shell tool that allows for perl-style batch file renaming and which can execute perl code within the replacement pattern:

http://grep.extracts.de/greprename/

This would allow you to, say, insert incrementing numbers (or possibly even date strings based on the current date) within your file names. See the “Advanced Examples” section on the above linked page for more info.

For the in-text replacements I‘d use BBEdit or a similarly capable (and scriptable) text editor. To insert the file name into the file‘s text content, you could use a little AppleScript or shell script.

But whatever you do, always have a backup of your files, or work on a copy of your file set—until you’re 100% sure that your file modifications work correctly.

ctietze · July 2019

The problem would be global link renaming. You don't want every link to be replaced with a new ID; you want every unique link to be replaced with a new ID, and every recurrence of a link to be replaced with the same ID as its siblings.

While bash scripting can work, I'd suggest to leverage a higher-level scripting language like Python or Ruby. From experience I can tell you that it's not a problem to load all files and their contents into main memory once. Might take a couple of seconds, but then you can change contents in RAM and write out the result.

Steps I'd take, e.g. in Ruby, written so you can achieve each one as you get more familiar with the environment:

Create a copy of your archive, e.g. zipping up the folder.
Write a function to sanitize broken filenames to learn how to detect and convert String encodings so you have UTF-8 everywhere. Test it with a couple of the cases you know.
Write a regular expression to detect links. Here's one for [[wiki links]] in Ruby: /(?<!\[)\[\[([^\]\n\r]+)\]\](?!\])/ (test online)
Write a Note class that takes a file path and uses filename = File.basename(path) to be used as the note's title. Load the whole file content, too. Print out a list of all Notes and the count of elements to verify loading works and no note was missed, and that the file names look broken as expected.
- Generate an ID for each Note object upon creation.
- Write a sanitize_title method on Note, using the approach from (1).
- Write a new_filename method on Note that returns the generated ID, or the sanitized title, or a combination of both as you need. Will be used for links and file renaming.
Create a global link index (e.g. a key-value Hash from link text strings to Note objects). For each note, get all the links from its text to use for keys, and look up the target Note object from your collection with a matching title. Try the filename and the sanitized filename. If a match cannot be found, print out the error. You can then patch all the broken cases in manually, e.g. link_target = "Broken Filename" if link_target == "bROk3n f1l3n4m2". This script does not have to be reusable, so that's ok. Now you have a way to access Note objects based on the links in note texts.
Replace all note links. For each note, use the link regex to replace the link text with a matching Note ID. This involves regex-matching existing links again: use the match string to get the target Note object using the global link index. Then replace the match with whatever new_filename returns.
Write out all notes. For each note in your index, write out the contents in UTF-8 and use filename = "#{new_filename}.txt" or similar.

Now you should have sanitized filenames and thoroughly replaced titles.

Henri · July 2019

Thank you very much for your thoughtful input. I'll try your solution and get acquainted with ruby (actually, that is a good excuse to get a bit more programming experience).

However, in the last few days I learned that there is so much broken in my Zettelkasten, that there is no one solution to get it in a proper state. For now, I'm renaming the files step by step (I ran into problems like invisible unicode signs in some file names and fun like that, I've no idea how that happened, because I mostly used The Archive, nVALT, iAwriter and 1writer).

As soon as I've time, I'll try to build a solution that does most of the work. The rest I need to do manually.

When I posted the original post, I was so deep in my chaos, that I didn't think of a great feature of The Archive. I kind of forgot that the Link "[[Note Bla]]" doesn't point to the file name only, but performs a search across all my Zettel. So as long as I write the original file name to the Zettel itself, a link that refers to the old Zettel is still going to work. There may be some cases where this leads to ambiguity, but it's not as if the reference is lost.

Just in case anyone is looking into this thread who is just starting a Zettelkasten, here is a lesson I learned:

Use easy, proper file names, that an editor can't get messed up
At least, a unique id in the file name is very much wanted for cases, where you need to repair something; if you have that, you can still be creative with the rest of the file name
Use a properly formatted header inside your Zettel, so you don't need to rely on the file name / metadata itself
Something with minimal markup like YAML seems a good solution for a header that your machine can read by itself, if needed; think of it as a backup of essential Zettel data that might otherwise get lost or broken;
Be consistent; that is something that a note taking tool normally does for you, but if you want plain text files, you have to do it yourself; something like a documenting your naming scheme etc. is a very good idea
Your archive might grow to a point where being consistent may be not essential for an useful archive, but it'll save you a lot of time if you are

Henri · July 2019

Hi, thank you so much for your replies. They are all very informative.

Meanwhile, however, I decided to do it step by step, manually (and build a Keyboard Maestro to replace links using TextMate, but only with a specific link). So I can also review the content / make sure that things are properly linked. Sometimes, automation is not the way to go, I guess.

I've linked the Macro (it's using part of the "Search for Zettel"-Macro by Will, see here for the original: https://forum.zettelkasten.de/discussion/comment/2516/#Comment_2516): https://gofile.io/?c=2ISZWH.

What it does:

Select a Zettel, you want to link to with Will's type-ahead-search.
Prompt user for input (the old link)
It opens TextMate and magically types in the search and replace.
TextMate searches for the string and nicely display the results
You can click and replace the old link with the new one.

For (3-5) to work, you need to specify your Zettelkasten as the folder TextMate is going to search, before you call the Macro. It's enough to specify it once (as long as you don't change it).

For (3) to form a proper replacement string, you may need to change the regex used in step 3 of the Macro according to your ID format (I'm using a filename like "2019-0715-1300-03- Text Text" and split it with "qr/(^[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{2}-) (.+)/mp").

Zettelkasten Forum

Problems with big zettelkasten, odd file names and trying to manage the chaos

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion