我不确定我将如何攻击我的霍夫曼树的穿越.树是正确的,我只是很难确定如何以一种好的方式遍历它.出于某种原因,我的遍历方法没有结果......
更新:清理代码,使其更加面向对象
节点类:
public class Node { public int frekvens; //Frequency public char tegn; //Symbol public Node venstre; //Left child public Node høyre; //Right child public string s; //result string public string resultat; public Node (char c) // Node constructor containing symbol. { frekvens = 1; tegn = c; } public Node (int f, Node venstre, Node høyre) // Node Constructor containing frequency and children { frekvens = f; this.venstre = venstre; this.høyre = høyre; } public Node (Node node) // Node constructor containing a node { frekvens = node.frekvens; tegn = node.tegn; this.venstre = venstre; this.høyre = høyre; } public void ØkMed1() // Inkrement frequency by one { frekvens = frekvens + 1; } public char getVenstreTegn () { return venstre.tegn; } public char getHøyreTegn () { return venstre.tegn; } public int getVenstreFrekvens () { return venstre.frekvens; } public int getHøyreFrekvens () { return høyre.frekvens; } public int getFrekvens() { return frekvens; } public bool ErTegn(char c) { if ( c == tegn) { return false; } else { return true; } } //Pretty sure this does not work as intended public string traverser (Node n) //Traverse the tree { if (n.tegn != '\0') //If the node containes a symbol --> a leaf { resultat += s; } else { if (n.getVenstreTegn() == '\0') //If left child does not have a symbol { s += "0"; traverser(n.venstre); } if (n.getHøyreTegn() == '\0') //If right child does not have a symbol { s += "1"; traverser(n.høyre); } } return resultat; } public string Resultat() //Used priviously to check if i got the correct huffman tree { string resultat; resultat = "Tegn: " + Convert.ToString(tegn) +" frekvens: " + Convert.ToString(frekvens) + "\n"; return resultat; } }
Huffman_Tree课程:
public class Huffman_Tre { string treString; Listnoder = new List (); public Node rot; public void bygg (string input) { bool funnet; //Found char karakter; //character for (int i = 0; i < input.Length;i++) //Loops through string and sets character //with coresponding freqeuncy in the node list { karakter = input[i]; funnet = false; //default for (int j = 0; j< noder.Count; j++) { if (noder[j].ErTegn(karakter) == false) //if the character already exists { noder[j].ØkMed1(); //inkrement frequency by one funnet = true; break; } } if (!funnet) //if the character does not exist { noder.Add(new Node(karakter)); //add the character to list } } //Sorting node list acending by frequency var sortertListe = noder.OrderBy(c => c.frekvens).ToList(); noder = sortertListe; do { noder.Add(new Node((noder[0].frekvens + noder[1].frekvens), noder[0],noder[1])); //Remove the leaf nodes noder.RemoveAt(0); noder.RemoveAt(0); } while(noder.Count >= 2); } public Node getRot() { return rot; } public string visTre() { foreach (Node node in noder) { treString += node.Resultat(); } return treString; } public bool erNull() { if (noder[0].tegn == '\0') { return true; } else return false; } }
主程序:
private void btnKomprimer_Click(object sender, System.Windows.RoutedEventArgs e) { string input; //The string input I want to compress input = txtInput.Text; //initialize input to text input input = input.ToLower(); txtOutput.Text = ""; Huffman_Tre tre = new Huffman_Tre(); tre.bygg(input); Node rot = new Node(tre.getRot()); txtOutput.Text += rot.traverser(rot); } }
RvdV79.. 9
由于我还有一点时间,我在玩C#6.0的同时制作了一张霍夫曼树的例子.它没有被优化(甚至没有优化!),但它作为一个例子工作得很好.它将帮助您了解可能出现的"挑战".因为我的英语远比我的斯堪的纳维亚知识好,所以我使用英文命名,我希望你不介意.
首先,让我们从保持频率的类开始.
public sealed class HuffmanFrequencyTable { #region Properties ////// Holds the characters and their corresponding frequencies /// public DictionaryFrequencyTable { get; set; } = new Dictionary (); #endregion #region Methods /// /// Clears the internal frequency table /// public void Clear() { FrequencyTable?.Clear(); } ////// Accepts and parses a new line (string) which is then /// merged with the existing dictionary or frequency table /// /// The line to parse public void Accept(string line) { if (!string.IsNullOrEmpty(line)) { line.GroupBy(ch => ch). ToDictionary(g => g.Key, g => g.Count()). ToList(). ForEach(x => FrequencyTable[x.Key] = x.Value); } } ////// Performs a dump of the frequency table, ordering all characters, lowest frequency first. /// ///The frequency table in the format 'character [frequency]' public override string ToString() { return FrequencyTable?.PrintFrequencies(); } #endregion }
请注意,ToString()方法使用一种扩展方法,该方法能够"转储"所使用的Dictionary的内容.扩展位于名为Helpers的静态类中,如下所示:
////// Extension method that helps to write the contents of a generic Dictionary to a string, ordered by it's values and /// printing the key and it's value between brackets. /// ///Generic key ///Generic value type /// The dictionary ///Throws an argument null exception if the provided dictionary is null ///public static string PrintFrequencies (this IDictionary dictionary) { if (dictionary == null) throw new ArgumentNullException("dictionary"); var items = from kvp in dictionary orderby kvp.Value select kvp.Key + " [" + kvp.Value + "]"; return string.Join(Environment.NewLine, items); }
现在,有了这个FrequencyTable,我们就可以开始研究如何构建节点了.Huffman使用二叉树,因此最好生成一个具有左右子节点的Node类.我也冒昧地在这里执行遍历算法.该类构建如下:
public sealed class HuffmanNode { #region Properties ////// Holds the left node, if applicable, otherwise null /// public HuffmanNode Left { get; set; } = null; ////// Holds the right node, if applicable, otherwise null /// public HuffmanNode Right { get; set; } = null; ////// Holds the Character (or null) for this particular node /// public char? Character { get; set; } = null; ////// Holds the frequency for this particular node, defaulted to 0 /// public int Frequency { get; set; } = default(int); #endregion #region Methods ////// Traverses all nodes recursively returning the binary /// path for the corresponding character that has been found. /// /// The character to find /// The datapath (containing '1's and '0's) ///The complete binary path for a character within a node public ListTraverse(char? character, List data) { //Check the leafs for existing characters if (null == Left && null == Right) { //We're at an endpoint of our 'tree', so return it's data or nothing when the symbol //characters do not match return (bool)character?.Equals(Character) ? data : null; } else { List left = null; List right = null; //TODO: If possible refactor with proper C# 6.0 features if (null != Left) { List leftPath = new List (data); leftPath.Add(false); //Add a '0' left = Left.Traverse(character, leftPath); //Recursive traversal for child nodes within this left node. } if (null != Right) { List rightPath = new List (data); rightPath.Add(true); //Add a '1' right = Right.Traverse(character, rightPath); //Recursive traversal for childnodes within this right node } return (null != left) ? left : right; } } #endregion }
我在HuffmanTree类中使用Node类.从逻辑上讲,树是从节点构建的.相应的HuffmanTree是这样编写的:
public sealed class HuffmanTree { #region Fields ////// Field for keeping the Huffman nodes in. Internally used. /// private Listnodes = new List (); #endregion #region Properties /// /// Holds the Huffman tree /// public HuffmanNode Root { get; set; } = null; ////// Holds the frequency table for all parsed characters /// public HuffmanFrequencyTable Frequencies { get; private set; } = new HuffmanFrequencyTable() ////// Holds the amount of bits after encoding the tree. /// Primary usable for decoding. /// public int BitCountForTree { get; private set; } = default(int); #endregion #region Methods ////// Builds the Huffman tree /// /// The source to build the Hufftree from ///Thrown when source is null or empty public void BuildTree(string source) { nodes.Clear(); //As we build a new tree, first make sure it's clean :) if (string.IsNullOrEmpty(source)) throw new ArgumentNullException("source"); else { Frequencies.Accept(source); foreach (KeyValuePairsymbol in Frequencies.FrequencyTable) { nodes.Add(new HuffmanNode() { Character = symbol.Key, Frequency = symbol.Value }); } while (nodes.Count > 1) { List orderedNodes = nodes.OrderBy(node => node.Frequency).ToList(); if (orderedNodes.Count >= 2) { List takenNodes = orderedNodes.Take(2).ToList(); HuffmanNode parent = new HuffmanNode() { Character = null, Frequency = takenNodes[0].Frequency + takenNodes[1].Frequency, Left = takenNodes[0], Right = takenNodes[1] }; //Remove the childnodes from the original node list and add the new parent node nodes.Remove(takenNodes[0]); nodes.Remove(takenNodes[1]); nodes.Add(parent); } } Root = nodes.FirstOrDefault(); } } /// /// Encodes a given string to the corresponding huffman encoding path /// /// The source to encode ///The binary huffman representation of the source public BitArray Encode(string source) { if (!string.IsNullOrEmpty(source)) { ListencodedSource = new List (); //Traverse the tree for each character in the passed source (string) and add the binary path to the encoded source encodedSource.AddRange(source.SelectMany(character => Root.Traverse(character, new List ()) ).ToList() ); //For decoding, we might need the amount of bits to skip trailing bits. BitCountForTree = encodedSource.Count; return new BitArray(encodedSource.ToArray()); } else return null; } /// /// Decodes a given binary path to represent it's string value /// /// BitArray for traversing the tree ///public string Decode(BitArray bits) { HuffmanNode current = Root; string decodedString = string.Empty; foreach (bool bit in bits) { //Find the correct current node depending on the bit set or not set. current = (bit ? current.Right ?? current : current.Left ?? current); if (current.IsLeaf()) { decodedString += current.Character; current = Root; } } return decodedString; } #endregion }
这段代码中有趣的是,我决定使用BitArrays
它来保存树的二进制路径.public BitArray Encode(string source)
这里的方法包含脏黑客.我跟踪用于编码的总位数并将其存储在BitCountForTree
属性中.执行解码时,我将使用此属性删除可能出现的任何尾随位.有一种更好的方式来执行此操作,但我会留下让您了解的方法.
此外,该类还使用为HuffmanNode编写的扩展方法.虽然这很简单:
////// Determines whether a given Huffman node is a leaf or not. /// A node is considered to be a leaf when it has no childnodes /// /// A huffman node ///True if no children are left, false otherwise public static bool IsLeaf(this HuffmanNode node) { return (null == node.Left && null == node.Right); }
该扩展方法便于确定给定节点是否实际上是叶子节点.叶子是没有子节点的节点,因此是二叉树的末尾(或者更好的是该树的分支).
现在是有趣的部分,我如何让事情在这里工作.我已经构建了一个包含3个文本框的Windows窗体应用程序.一个用于实际输入,一个用于二进制(编码)输出,另一个用于显示压缩结果.我还放置了两个简单的按钮,一个用于执行霍夫曼编码,另一个用于霍夫曼解码.
霍夫曼编码方法编写如下(仅在编码按钮的事件处理程序中):
string input = tbInput.Text; Tree.BuildTree(input); //Build the huffman tree BitArray encoded = Tree.Encode(input); //Encode the tree //First show the generated binary output tbBinaryOutput.Text = string.Join(string.Empty, encoded.Cast().Select(bit => bit ? "1" : "0")); //Next, convert the binary output to the new characterized output string. byte[] bytes = new byte[(encoded.Length / 8) + 1]; encoded.CopyTo(bytes, 0); tbOutput.Text = Encoding.Default.GetString(bytes); //Write the compressed output to the textbox.
请注意,编码的二进制字符串没有任何尾随位.我将把它留给C#的编码机制.这样做的缺点是,我必须在解码时跟踪它.
解码现在也不太难.虽然,对于这个例子,我正在利用上面放置的编码代码生成的压缩输出.此外,我假设已经建立了霍夫曼树(它的频率表!!!).通常,频率表存储在压缩文件中,因此可以重建.
//First convert the compressed output to a bit array again again and skip trailing bits. bool[] boolAr = new BitArray(Encoding.Default.GetBytes(tbOutput.Text)).Cast().Take(Tree.BitCountForTree).ToArray(); BitArray encoded = new BitArray( boolAr ); string decoded = Tree.Decode(encoded); MessageBox.Show(decoded, "Decoded result: ", MessageBoxButtons.OK, MessageBoxIcon.Information);
请注意我创建的脏黑客,因为Encoding.Default.GetBytes(tbOutput.Text)
肯定会生成一个字节数组,它可能包含不需要解码的尾随位.因此,我只根据重建树获取实际需要的位数.
所以在运行时,我的例子提供了以下输出,当使用'世界名声'时,"快速的棕色狐狸跳过懒惰的程序员":
按"Huff encode"按钮后:
按下"Huff decode"按钮后:
现在这段代码可以真正使用一些优化,因为您可能会考虑使用Arrays而不是Dictionaries.还有更多,但这取决于你的考虑.
由于我还有一点时间,我在玩C#6.0的同时制作了一张霍夫曼树的例子.它没有被优化(甚至没有优化!),但它作为一个例子工作得很好.它将帮助您了解可能出现的"挑战".因为我的英语远比我的斯堪的纳维亚知识好,所以我使用英文命名,我希望你不介意.
首先,让我们从保持频率的类开始.
public sealed class HuffmanFrequencyTable { #region Properties /// <summary> /// Holds the characters and their corresponding frequencies /// </summary> public Dictionary<char, int> FrequencyTable { get; set; } = new Dictionary<char, int>(); #endregion #region Methods /// <summary> /// Clears the internal frequency table /// </summary> public void Clear() { FrequencyTable?.Clear(); } /// <summary> /// Accepts and parses a new line (string) which is then /// merged with the existing dictionary or frequency table /// </summary> /// <param name="line">The line to parse</param> public void Accept(string line) { if (!string.IsNullOrEmpty(line)) { line.GroupBy(ch => ch). ToDictionary(g => g.Key, g => g.Count()). ToList(). ForEach(x => FrequencyTable[x.Key] = x.Value); } } /// <summary> /// Performs a dump of the frequency table, ordering all characters, lowest frequency first. /// </summary> /// <returns>The frequency table in the format 'character [frequency]'</returns> public override string ToString() { return FrequencyTable?.PrintFrequencies(); } #endregion }
请注意,ToString()方法使用一种扩展方法,该方法能够"转储"所使用的Dictionary的内容.扩展位于名为Helpers的静态类中,如下所示:
/// <summary> /// Extension method that helps to write the contents of a generic Dictionary to a string, ordered by it's values and /// printing the key and it's value between brackets. /// </summary> /// <typeparam name="TKey">Generic key</typeparam> /// <typeparam name="TValue">Generic value type</typeparam> /// <param name="dictionary">The dictionary</param> /// <exception cref="ArgumentNullException">Throws an argument null exception if the provided dictionary is null</exception> /// <returns></returns> public static string PrintFrequencies<TKey, TValue>(this IDictionary<TKey, TValue> dictionary) { if (dictionary == null) throw new ArgumentNullException("dictionary"); var items = from kvp in dictionary orderby kvp.Value select kvp.Key + " [" + kvp.Value + "]"; return string.Join(Environment.NewLine, items); }
现在,有了这个FrequencyTable,我们就可以开始研究如何构建节点了.Huffman使用二叉树,因此最好生成一个具有左右子节点的Node类.我也冒昧地在这里执行遍历算法.该类构建如下:
public sealed class HuffmanNode { #region Properties /// <summary> /// Holds the left node, if applicable, otherwise null /// </summary> public HuffmanNode Left { get; set; } = null; /// <summary> /// Holds the right node, if applicable, otherwise null /// </summary> public HuffmanNode Right { get; set; } = null; /// <summary> /// Holds the Character (or null) for this particular node /// </summary> public char? Character { get; set; } = null; /// <summary> /// Holds the frequency for this particular node, defaulted to 0 /// </summary> public int Frequency { get; set; } = default(int); #endregion #region Methods /// <summary> /// Traverses all nodes recursively returning the binary /// path for the corresponding character that has been found. /// </summary> /// <param name="character">The character to find</param> /// <param name="data">The datapath (containing '1's and '0's)</param> /// <returns>The complete binary path for a character within a node</returns> public List<bool> Traverse(char? character, List<bool> data) { //Check the leafs for existing characters if (null == Left && null == Right) { //We're at an endpoint of our 'tree', so return it's data or nothing when the symbol //characters do not match return (bool)character?.Equals(Character) ? data : null; } else { List<bool> left = null; List<bool> right = null; //TODO: If possible refactor with proper C# 6.0 features if (null != Left) { List<bool> leftPath = new List<bool>(data); leftPath.Add(false); //Add a '0' left = Left.Traverse(character, leftPath); //Recursive traversal for child nodes within this left node. } if (null != Right) { List<bool> rightPath = new List<bool>(data); rightPath.Add(true); //Add a '1' right = Right.Traverse(character, rightPath); //Recursive traversal for childnodes within this right node } return (null != left) ? left : right; } } #endregion }
我在HuffmanTree类中使用Node类.从逻辑上讲,树是从节点构建的.相应的HuffmanTree是这样编写的:
public sealed class HuffmanTree { #region Fields /// <summary> /// Field for keeping the Huffman nodes in. Internally used. /// </summary> private List<HuffmanNode> nodes = new List<HuffmanNode>(); #endregion #region Properties /// <summary> /// Holds the Huffman tree /// </summary> public HuffmanNode Root { get; set; } = null; /// <summary> /// Holds the frequency table for all parsed characters /// </summary> public HuffmanFrequencyTable Frequencies { get; private set; } = new HuffmanFrequencyTable() /// <summary> /// Holds the amount of bits after encoding the tree. /// Primary usable for decoding. /// </summary> public int BitCountForTree { get; private set; } = default(int); #endregion #region Methods /// <summary> /// Builds the Huffman tree /// </summary> /// <param name="source">The source to build the Hufftree from</param> /// <exception cref="ArgumentNullException">Thrown when source is null or empty</exception> public void BuildTree(string source) { nodes.Clear(); //As we build a new tree, first make sure it's clean :) if (string.IsNullOrEmpty(source)) throw new ArgumentNullException("source"); else { Frequencies.Accept(source); foreach (KeyValuePair<char, int> symbol in Frequencies.FrequencyTable) { nodes.Add(new HuffmanNode() { Character = symbol.Key, Frequency = symbol.Value }); } while (nodes.Count > 1) { List<HuffmanNode> orderedNodes = nodes.OrderBy(node => node.Frequency).ToList(); if (orderedNodes.Count >= 2) { List<HuffmanNode> takenNodes = orderedNodes.Take(2).ToList(); HuffmanNode parent = new HuffmanNode() { Character = null, Frequency = takenNodes[0].Frequency + takenNodes[1].Frequency, Left = takenNodes[0], Right = takenNodes[1] }; //Remove the childnodes from the original node list and add the new parent node nodes.Remove(takenNodes[0]); nodes.Remove(takenNodes[1]); nodes.Add(parent); } } Root = nodes.FirstOrDefault(); } } /// <summary> /// Encodes a given string to the corresponding huffman encoding path /// </summary> /// <param name="source">The source to encode</param> /// <returns>The binary huffman representation of the source</returns> public BitArray Encode(string source) { if (!string.IsNullOrEmpty(source)) { List<bool> encodedSource = new List<bool>(); //Traverse the tree for each character in the passed source (string) and add the binary path to the encoded source encodedSource.AddRange(source.SelectMany(character => Root.Traverse(character, new List<bool>()) ).ToList() ); //For decoding, we might need the amount of bits to skip trailing bits. BitCountForTree = encodedSource.Count; return new BitArray(encodedSource.ToArray()); } else return null; } /// <summary> /// Decodes a given binary path to represent it's string value /// </summary> /// <param name="bits">BitArray for traversing the tree</param> /// <returns></returns> public string Decode(BitArray bits) { HuffmanNode current = Root; string decodedString = string.Empty; foreach (bool bit in bits) { //Find the correct current node depending on the bit set or not set. current = (bit ? current.Right ?? current : current.Left ?? current); if (current.IsLeaf()) { decodedString += current.Character; current = Root; } } return decodedString; } #endregion }
这段代码中有趣的是,我决定使用BitArrays
它来保存树的二进制路径.public BitArray Encode(string source)
这里的方法包含脏黑客.我跟踪用于编码的总位数并将其存储在BitCountForTree
属性中.执行解码时,我将使用此属性删除可能出现的任何尾随位.有一种更好的方式来执行此操作,但我会留下让您了解的方法.
此外,该类还使用为HuffmanNode编写的扩展方法.虽然这很简单:
/// <summary> /// Determines whether a given Huffman node is a leaf or not. /// A node is considered to be a leaf when it has no childnodes /// </summary> /// <param name="node">A huffman node</param> /// <returns>True if no children are left, false otherwise</returns> public static bool IsLeaf(this HuffmanNode node) { return (null == node.Left && null == node.Right); }
该扩展方法便于确定给定节点是否实际上是叶子节点.叶子是没有子节点的节点,因此是二叉树的末尾(或者更好的是该树的分支).
现在是有趣的部分,我如何让事情在这里工作.我已经构建了一个包含3个文本框的Windows窗体应用程序.一个用于实际输入,一个用于二进制(编码)输出,另一个用于显示压缩结果.我还放置了两个简单的按钮,一个用于执行霍夫曼编码,另一个用于霍夫曼解码.
霍夫曼编码方法编写如下(仅在编码按钮的事件处理程序中):
string input = tbInput.Text; Tree.BuildTree(input); //Build the huffman tree BitArray encoded = Tree.Encode(input); //Encode the tree //First show the generated binary output tbBinaryOutput.Text = string.Join(string.Empty, encoded.Cast<bool>().Select(bit => bit ? "1" : "0")); //Next, convert the binary output to the new characterized output string. byte[] bytes = new byte[(encoded.Length / 8) + 1]; encoded.CopyTo(bytes, 0); tbOutput.Text = Encoding.Default.GetString(bytes); //Write the compressed output to the textbox.
请注意,编码的二进制字符串没有任何尾随位.我将把它留给C#的编码机制.这样做的缺点是,我必须在解码时跟踪它.
解码现在也不太难.虽然,对于这个例子,我正在利用上面放置的编码代码生成的压缩输出.此外,我假设已经建立了霍夫曼树(它的频率表!!!).通常,频率表存储在压缩文件中,因此可以重建.
//First convert the compressed output to a bit array again again and skip trailing bits. bool[] boolAr = new BitArray(Encoding.Default.GetBytes(tbOutput.Text)).Cast<bool>().Take(Tree.BitCountForTree).ToArray(); BitArray encoded = new BitArray( boolAr ); string decoded = Tree.Decode(encoded); MessageBox.Show(decoded, "Decoded result: ", MessageBoxButtons.OK, MessageBoxIcon.Information);
请注意我创建的脏黑客,因为Encoding.Default.GetBytes(tbOutput.Text)
肯定会生成一个字节数组,它可能包含不需要解码的尾随位.因此,我只根据重建树获取实际需要的位数.
所以在运行时,我的例子提供了以下输出,当使用'世界名声'时,"快速的棕色狐狸跳过懒惰的程序员":
按"Huff encode"按钮后:
按下"Huff decode"按钮后:
现在这段代码可以真正使用一些优化,因为您可能会考虑使用Arrays而不是Dictionaries.还有更多,但这取决于你的考虑.