Multi-file manipulation: How to turn "zid: xxxxxxxxxx" into "zid: [[xxxxxxxxxx]]"

thoresson · September 2020

So, I have a YAML frontmatter in all my zettels where I among other meta data stores the UID like this:

zid: xxxxxxxxxx

How can I most easily turn this, in 2k files, into

zid: [[xxxxxxxxxx]]

Folder-wide search and replace in SublimeText is the solution I can think of, but what would the RegEx for search and for replace look like. Any suggestions?

ctietze · September 2020

Maybe Sublime suppports regex grammar with positive look-behind: https://regexr.com/5c2b1

/(?<=zid: )(\d+)/g

Depending on the replace syntax, $0 or \0 or similar will be the whole match, so you want $1 for the grouped ID number. The replacement then is

[[$1]]

or something along these lines; should replace just the ID number.

You can do without lookbehind by replacing the whole line, too: https://regexr.com/5c2b4

/^(zid:\s*(\d+)\s*)$/gm

(Note I had to enable the _m_ultiline option here to match a whole line in the text; may bot be necessary in Sublime)

Then replace the whole line with

zid: [[$2]]

Will · September 2020

I have done this very thing to all my 1400+ notes. I used to place in my YAML footer the UID as UID: 202009150732 and want to make them clickable links. I used a donationware tool called MassRepaceIt.

I used the regex options and set up my search and replace them like this.

Search - (zid: )(\w*)
Replace - $1[[$2]]

Be sure to set the options like this.

Set your file list.

I'd test it first on a subset of files. I'm anal that way.
It will look for EVERY instance of zid: xxxxxxxxxx no matter its position in the file and replace it with zid: [[xxxxxxxxxx]] and this may show you some instances where you were not quite consistent in your formating. I had to tweak and run this two or three times to catch all the variations I had used at various times. For example, I had thought I'd be clever and formated some notes with UID - 202009150732 and just 202009150732. This tool help with all that as it shows which files will be changes and what the changes look like before committing and has a way of undoing changes. I've not had to use its ability to undo changes yet. Handy for so many other uses.

thoresson · September 2020

Super! Thanks @ctietze and @Will!

Will · September 2020

@thoresson there is a mistake in the regex in the screenshot. It has ^(zid: )(\w*)$ and I'd think you'd want to use '(zid: )(\w*)' which is what I say to use in the body of the message. The ^ and the $ define the beginning and end of a string but require the /gm option and I'm not sure how to enable it in MassReplaceIt.

GeoEng51 · September 2020

@Will said:
I have done this very thing to all my 1400+ notes. I used to place in my YAML footer the UID as UID: 202009150732 and want to make them clickable links. I used a donationware tool called MassRepaceIt.

I used the regex options and set up my search and replace them like this.

Search - (zid: )(\w*)
Replace - $1[[$2]]

@Will - sweet utility.

I was reading up on "regular expressions" after seeing your post, as my understanding of them is limited. There are many web sites that tackle that subject; one of the simpler or more direct ones is:

https://www.computerhope.com/jargon/r/regex.htm

I was wondering: what is the purpose of the \w in your example? Wouldn't * by itself refer to any sequence of characters? Or does the \w force it to be only certain types of characters?

zk_1000 · September 2020

@GeoEng51 you'd want to be as restrictive as possible when it comes to mass text replacement with regular expressions. The biggest weakness of regex is that they do not interpret context.

personally i would have used ^(zid:\s+)(\d){10}.*$, assuming that a UID is formed with 10 digits, and has no empty space at the beginning of the line.

When you are beginning to learn the power of regex it is perfectly fine to use simpler expressions, such as (zid: )(.*)

Note, however, the danger of false positives that'd cause you a lot of headache.

zid: xxxxxxxxxx sometext
random text, suizid: more text
zid: xxxxxxxxxx

are all false positives for (zid: )(.*) that would result in:

zid: [[xxxxxxxxxx sometext]]
random text, sui[[zid: more text]]
zid: [[xxxxxxxxxx ]]

Will · September 2020

@GeoEng51 said:

I was wondering: what is the purpose of the \w in your example? Wouldn't * by itself refer to any sequence of characters? Or does the \w force it to be only certain types of characters?

The /w signifies any word character [a-zA-Z0-9_]. Maybe a more restrictive /d [0-9] would be more in order but if a letter was used at some point in a year ago when you were experimenting with the Luhmann ID system using a /w would include those in the change when a /d would not.

The * is a quantifier. It can't be used "by itself". It just specifies one or more of whatever is looked for.

@zk_1000 said:
@GeoEng51 you'd want to be as restrictive as possible when it comes to mass text replacement with regular expressions. The biggest weakness of regex is that they do not interpret context.

personally I would have used ^(zid:\s+)(\d){10}.*$, assuming that a UID is formed with 10 digits, and has no empty space at the beginning of the line.

When you are beginning to learn the power of regex it is perfectly fine to use simpler expressions, such as (zid: )(.*)
Note, however, the danger of false positives that'd cause you a lot of headaches.

Great points. I love it when you chime in on discussions as you add such fresh and well-thought perspectives. Many of your posts have me wanting to jump up and yell "Hell, yes!"

The more restrictive the regex the safer it is especially for beginners. Using the utility MassReplaceIt allows review and shows the number of changes along with what is actually going to change in a before and after window before committing the changes.

Having done this before on 1300+ files and maybe because I'm an idiot and had formated the UID in my notes in no less than 6 different ways each requiring their own slightly different regex to bring them all into a standard across my entire zettelkasten. I thought each of these different formatting styles was the best at the time I implemented it. I felt that there was no need for a zettelkasten wide standard. Boy, was I wrong! I probably saw someone on this forum with some tricky idea about how to form my UIDs and I like the blind sheep, I can sometimes be, I went with it for a while till the next cool idea popped up making a mess.

To make this clear I have settled on ›[[202006180829]] and my 'self-referential' link format.

I vote to be a little more inclusive. Also, the utility I've recommended, MassReplaceIt defines ^ as the beginning of file and $ as the end of the file and not the line. Go Figure! And the /m [mutiline] global is already set.

In your regex, I think you want to include the {10} in the second capture group otherwise the regex only selects 1 digit
.
^(zid:\s+)(\d){10}.*$

If you have a lot of notes being too restrictive can cause as many headaches as being too lose.

zk_1000 · September 2020

Great points. I love it when you chime in on discussions as you add such fresh and well-thought perspectives.

Thanks for the kind words.

In your regex, I think you want to include the {10} in the second capture group otherwise the regex only selects 1 digit.

shame on me, it was only half tested

there's the new version:

^(zid:\s+)(\d{10})\s*$

this will match only when the line starts with zid: followed by at least one whitespace character and exactly 10 digits. The line may or may not end with additional whitespace (common habit of mine).

It's a good convention to post the most restrictive, sane solution first. I would then handle edge cases separately or loosen the constrains as needed. In this direction you can always make progress, in the other one you have to fix your mess.

Zettelkasten Forum

Multi-file manipulation: How to turn "zid: xxxxxxxxxx" into "zid: [[xxxxxxxxxx]]"

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion