Generating Table of Contents from HTML in C#
One Class to Cover all Requirements of Generating Your TOC
In-page links are now an important issue for SEO, so I wanted to auto generate a TOC for my blog posts. Firstly, I stack overflowed for a solution but I could not find a good and working library. So I wrote for myself and here is the complete code for you which I am using on all pages on this website.
First Problem: Generating In-Page Links
First problem is, the links are not necessarily be hierarchical, I mean I can start with a h2 header and then add a h1 header, even add an h4 header inside that h1 header. Yes it is not the right way to use but still a valid HTML.
Second Problem: Generating ID's associated with our links
Ok so we generated our In-Page Links but what will it point to? So my library also have to add associated in-page ID s to article headers. My library also takes care of this. I used HtmlAgilityPack for better performance.
Enaugh with the problems, here is your code
using System;
using System.Collections.Generic;
using System.Linq;
using HtmlAgilityPack;
namespace HolyOne.Web.Services
{
public class TOCGenerator
{
public class TOCNode
{
public List<TOCNode> Children { get; set; } = new List<TOCNode>();
public TOCNode Parent { get; set; }
public int Level { get; set; }
public string Text { get; set; }
public string TargetElementId { get; set; }
public override string ToString()
{
return $"H{Level} | {Text}";
}
}
public string SourceHtmlCode { get; set; }
public string AnchoredHtmlCode { get; private set; }
public List<TOCNode> Tree { get; private set; } = new List<TOCNode>();
private string ProcessNode(TOCNode n, int index)
{
StringBuilder sb = new StringBuilder();
sb.AppendLine(@"<li>");
sb.AppendLine(@"<a href=""#" + n.TargetElementId + @""">");
sb.AppendLine(n.Text);
sb.AppendLine("</a>");
int childIndex = 0;
if (n.Children.Any())
{
sb.AppendLine("<ul>");
foreach (TOCNode item in n.Children)
{
childIndex++;
string ln = ProcessNode(item, childIndex);
sb.AppendLine(ln);
}
sb.AppendLine("</ul>");
}
sb.AppendLine(@"</li>");
return sb.ToString();
}
public string getTOCHtmlCode()
{
StringBuilder sb = new StringBuilder();
sb.AppendLine(@"<div class=""toc"">");
sb.AppendLine(@"<ul>");
int childIndex = 0;
foreach (TOCNode item in Tree)
{
childIndex++;
string ln = ProcessNode(item, childIndex);
sb.AppendLine(ln);
}
sb.AppendLine(@"</ul>");
sb.AppendLine(@"</div>");
return sb.ToString();
}
readonly static char[] turChars = { 'Ğ', 'ğ', 'Ü', 'ü', 'Ş', 'ş', 'İ', 'ı', 'Ö', 'ö', 'Ç', 'ç' };
readonly static char[] engChars = { 'G', 'g', 'U', 'u', 'S', 's', 'I', 'i', 'O', 'o', 'C', 'c' };
//*1000 => 2ms
private string GenerateSlug(string str, int maxlen = 50)
{
if (str == null) return "";
StringBuilder sb = new StringBuilder();
bool wasHyphen = true;
int MaxCnt = str.Length > maxlen ? maxlen : str.Length;
for (int i = 0; i < MaxCnt; i++)
{
char c = str[i];
bool wastr = false;
if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || (c >= '0' && c <= '9') || c == '-')
{
sb.Append(c);
wasHyphen = false;
}
else if (char.IsWhiteSpace(c) && !wasHyphen)
{
sb.Append('-');
wasHyphen = true;
}
else
{
for (int j = 0; j < turChars.Length; j++)
{
if (c == turChars[j])
{
sb.Append(engChars[j]);
}
wastr = true;
wasHyphen = false;
}
if (!wastr) sb.Append("-");
}
}
// Avoid trailing hyphens
if (wasHyphen && sb.Length > 0)
sb.Length--;
str = sb.ToString();
return str.ToLowerInvariant();
}
public void GenerateTOC()
{
var doc = new HtmlDocument();
doc.LoadHtml(SourceHtmlCode);
const string xpath = "//*[self::h1 or self::h2 or self::h3 or self::h4 or self::h5 or self::h6]";
TOCNode parent = null;
List<TOCNode> allNodes = new List<TOCNode>();
Tree.Clear();
HtmlNodeCollection DocNodes = doc.DocumentNode.SelectNodes(xpath);
if (DocNodes != null)
foreach (var node in doc.DocumentNode.SelectNodes(xpath))
{
int level = 0;
int.TryParse(node.Name.TrimStart(new char[] { 'h', 'H' }), out level);
if (level > 6 || level < 1) level = 0;
parent = allNodes.FindLast(o => o.Level < level);
if (String.IsNullOrWhiteSpace(node.InnerText)) continue; // ignore whitespace headings
TOCNode n = new TOCNode()
{
Text = node.InnerText,
Level = level,
Parent = parent,
};
n.TargetElementId = $"H{n.Level}_{this.GenerateSlug(n.Text)}";
node.Id = n.TargetElementId;
allNodes.Add(n);
if (parent == null) Tree.Add(n);
else
{
parent.Children.Add(n);
}
}
AnchoredHtmlCode = doc.DocumentNode.InnerHtml;
}
}
}
Adding ID's to Headers
I added an additional method getTocHTMLCode() for adding the ID's for the headers. I did this in the same loop when generating the TOC tree for best performance.
Third Problem: Presenting the TOC in a well formed HTML
Ok so our code generates a tree of sections of our HTML document. I wanted to output this tree to fit for all kinds of designs so I did not add tabs or levels to generated code, Instead I left that job for my CSS code. Here is the CSS code for our TOC div.
Here is your CSS code
.tocframe {
width:100%;
font-size: 0.85em;
background-color: #fff;
border: 1px inset gray;
display: inline-block;
padding-right: 10px;
padding-top: 6px;
margin-bottom: 14px;
}
.toc a {
text-decoration: none;
}
.toc a:hover {
text-decoration: underline;
}
.toc ul {
list-style-type: none;
margin-left: 10px;
counter-reset: css-counters 0; /* intializes counter, set -1 for zero-based counters */
}
.toc ul li:before {
font-weight: 600;
counter-increment: css-counters;
content: counters(css-counters, ".") " "; /* generates inherited counters from parents */
}
How to Use with your Article
This is the way.
TOCGenerator tg = new TOCGenerator();
tg.SourceHtmlCode = @"...some html code here...";
tg.GenerateTOC();
@Html.Raw(tg.getTOCHtmlCode())
@Html.Raw(tg.AnchoredHtmlCode)
I hope it helps somebody
Admin Programming