# [OT] Markdown: split a file along headers?

I know this is not actually related to most of what we discuss here, but I figure there's probably quite a few people who use markdown extensively here. And I did put it in Random.

I have a markdown file that has a bunch of 2nd level headers. I would like to have a bunch of smaller markdown files, each of which consists of one of those headers. Is there any reasonable way to do this easily? I'm experimenting with the differences between writing a large document (a novel) as a single markdown document and as a series of scenes, because there are a number of ways to combine smaller files.

I keep thinking there must be a pandoc invocation for this, but I can't find it.

Any suggestions? Command-line is fine, writing python code is not.

Thanks!

• Depends on what you deem reasonable

This works for some files in Ruby.

#!/usr/bin/env ruby

require "strscan"

OUT_FILENAME = "Output_"

DOC = <<EOF
# The doc

Hello

##Ignored

Text

## Not Ignored

more text
## ignoring the heading without space

then more text

C'est ça
EOF

scanner = StringScanner.new DOC
sections = []

def find_block(scanner)
block = scanner.scan_until(/\n\n/)
if block.nil?
block = scanner.rest
scanner.terminate
end
return block
end

def will_start_section(scanner)
scanner.peek(3) == '## '
end

current_section_buffer = ""

while !scanner.eos?
if will_start_section(scanner) && !current_section_buffer.empty?
sections << current_section_buffer
current_section_buffer = ""
end

block = find_block(scanner)
current_section_buffer += block
end

# Append remainder when at end of the doc
if !current_section_buffer.empty?
sections << current_section_buffer
end

# Write out to disk:

# sections.each_with_index do |section, index|
#   filename = OUT_FILENAME + index.to_s + ".txt"
#   File.open(filename, "w") do |file|
#     file.puts section
#   end
# end



• I don't have a solution but FWIW I usually write these kinds of things in Scrivener -- it supports markdown and you can write your chapters/scenes/whatever in different files ("scrivenings"), which you compile into a whole document at the end. Not software agnostic though....

• @mediapathic said:
I keep thinking there must be a pandoc invocation for this, but I can't find it.

I don't think you can do this with just the Pandoc command line interface. The closest option you may be thinking of is is --shift-heading-level-by=NUMBER.

StackOverflow says to try csplit.

• Looks like csplit did the trick. For the record, if anyone else comes looking for this, osx csplit has a slightly different syntax than any of the examples. So, the correct solution is, first:

brew install coreutils if you haven't already to get "gcsplit", which has the right syntax, and then

gcsplit --prefix='novelname' --suffix-format='%03d.md' novel-file.md /##/ "{*}"

• @mediapathic glad you found gcsplit. I've archived this as I am working on a book project and working with pandoc a bunch. Love pandoc, it can do incredible conversions. When I first looked at your question, I thought of regex. here is as far as I got.

##(.*(?:\n(?!##).*)+)

Above will split a file into "Capture Groups". The next part, which I didn't yet get to, would be using maybe Keyboard Maestro to copy each capture group to its own file.

@JustinW80's solution is far more elegant.

