热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

PygmentsonPHP&?WordPress

Ivebeeninalongjourneytryingtofindagreatcodehighlighter,IvebeenusingalotofthemthatIcantevenremember.ThesearetheonesIcanrememberrightnow:SyntaxHighlighterGooglePrettifierhighlighter.jsGeshiRightnowIm

Ive been in a long journey trying to find a great code highlighter, Ive been using a lot of them that I cant even remember. These are the ones I can remember right now: SyntaxHighlighter Google Prettifier highlighter.js Geshi Right now Im

I’ve been in a long journey trying to find a great code highlighter, I’ve been using a lot of them that I can’t even remember. These are the ones I can remember right now:

  • SyntaxHighlighter
  • Google Prettifier
  • highlighter.js
  • Geshi

Right now I’m using highlighter.js but it wasn’t exactly what I want, what I want is to be able to highlight most “words” or reserved words, such as built in function, objects, etc. that this highlighter and most of them are missing. I know is not an important thing, unfortunately this was stuck in my head, until now.

Finally, I’ve found Pygments the perfect one that match with what I’ve been looking for and it’s the same used by GitHub. The only obstacle right now is that it’s a python based syntax highlighter and I’m using WordPress, and WordPress is built on PHP.

Installation

But hey, we can get over it, there is a solution, first, we need to get python installed on our server so we can use Pygments.

We aren’t going to go too deep on installation due to the fact that there are so many OS Flavors out there and it could be slightly different on each one of them.

Python

First of all you have to check if you already have python installed by typing python on your command line.

If not is installed, you should take a look at Python Downloads page and download your OS installer.

PIP Installer

To install pip installer according to its site, there are two ways to install it:

First and recommended way is downloading get-pip.py and run it on your command line:

python get-pip.py

Second way is using package managers, by running one of these possible two commands, like it have been mentioned before, this would depends on your server OS.

sudo apt-get install python-pip

Or:

sudo yum install python-pip

NOTE: you can use any package manager you prefer, such as easy_install, for the sake of example and because is the one used on Pygments site I used pip.

Pygments

To install pygments you need to run this command:

pip install Pygments

If you are on server where the user don’t have root access, you would be unable to install it with the previous command, if that is the case you have to run it with --user flag to install the module on the user directory.

pip install --user Pygments

Everything is installed now, so what we got to do is work with PHP and some Python code

PHP + Python

The way it’s going to work, it’s by executing a python script via php using exec() sending the language name and a filename of the file containing the code to be highlighted.

Python

The first thing we are going to do is create the python script that is going to convert plain code into highlighted code using Pygments.

So let’s go step by step on how to create the python script.

First we import all the required modules:

import sys
from pygments import highlight
from pygments.formatters import HtmlFormatter

sys module provide the argv list which contains all the arguments passed to the python script.

highlight from pygments is in fact the main function along with a lexer would generate the highlighted code. You would read a bit more about lexer below.

HtmlFormatter is how we want the code generated be formatted, and we are going to use HTML format. Here is a list of available formatters in case of wondering.

# Get the code
language = (sys.argv[1]).lower()
filename = sys.argv[2] 
f = open(filename, 'rb')
code = f.read()
f.close()

This block of code what it does is that it takes the second argument (sys.argv[1]) and transform it to lowercase text just to make sure it always be lowercase. Because "php" !== "PHP". The third argument sys.argv[2] is the filename path of the code, so we open, read its contents and close it. The first argument is the python’s script name.

# Importing Lexers
# PHP
if language == 'php':
  from pygments.lexers import PhpLexer
  lexer = PhpLexer(startinline=True)
# GUESS
elif language == 'guess':
  from pygments.lexers import guess_lexer
  lexer = guess_lexer( code )
# GET BY NAME
else:
  from pygments.lexers import get_lexer_by_name
  lexer = get_lexer_by_name( language )

So it’s time to import the lexer, this block of code what it does is create a lexer depending on the language we need to analyze. A lexer what it does it analyze our code and grab each reserved words, symbols, built-in functions, and so forth.

In this case after the lexer analyze all the code would formatted into HTML wrapping all the “words” into an HTML element with a class. By the way the classes name are not descriptive at all, so a function is not class “function”, but anyways this is not something to be worried about right now.

The variable language contains the string of the language name we want to convert the code, we use lexer = get_lexer_by_name( language ) to get any lexer by their names, well the function it self explanatory. But why we check for php and guess first you may ask, well, we check for php because if we use get_lexer_by_name('php') and the php code does not have the required opening php tag is not going to highlight the code well or as we expected and we need to create a the specific php lexer like this lexer = PhpLexer(startinline=True) passing startinline=True as parameter, so this opening php tag is not required anymore. guess is a string we pass from php letting it know to pygments we don’t know which language is it, or the language is not provided and we need it to be guessed.

There is a list of available lexers on their site.

The final step on python is creating the HTML formatter, performing the highlighting and outputing the HTML code containing the highlighted code.

formatter = HtmlFormatter(linenos=False, encoding='utf-8', nowrap=True)
highlighted = highlight(code, lexer, formatter)
print highlighted

For the formatter it’s passed linenos=False to not generate lines numbers and nowrap=True to not allow p wrapping the generate code. This is a personal decision, the code would be wrapped using PHP.

Next it’s passed code containing the actual code, lexer containing the language lexer and the formatter we just create in the line above which tell the highlight how we want our code formatted.

Finally it’s output the code.

That’s about it for python, that the script that is going to build the highlight.

Here is the complete file: build.py

import sys
from pygments import highlight
from pygments.formatters import HtmlFormatter
# If there isn't only 2 args something weird is going on
expecting = 2;
if ( len(sys.argv) != expecting + 1 ):
  exit(128)
# Get the code
language = (sys.argv[1]).lower()
filename = sys.argv[2] 
f = open(filename, 'rb')
code = f.read()
f.close()
# PHP
if language == 'php':
  from pygments.lexers import PhpLexer
  lexer = PhpLexer(startinline=True)
# GUESS
elif language == 'guess':
  from pygments.lexers import guess_lexer
  lexer = guess_lexer( code )
# GET BY NAME
else:
  from pygments.lexers import get_lexer_by_name
  lexer = get_lexer_by_name( language )
# OUTPUT
formatter = HtmlFormatter(linenos=False, encoding='utf-8', nowrap=True)
highlighted = highlight(code, lexer, formatter)
print highlighted

PHP – WordPress

Let’s jump to WordPress and create a basic plugin to handle the code that needs to be highlighted.

It’s does not matter if you have never create a plugin for WordPress in your entire life, this plugin is just a file with php functions in it, so you would be just fine without the WordPress plugin development knowledge, but you need knowledge on WordPress development though.

Create a folder inside wp-content/plugins named wp-pygments (can be whatever you want) and inside it copy build.py the python script we just created and create a new php file named wp-pygments.php (maybe the same name as the directory).

The code below just let WordPress know what’s the plugin’s name and other informations, this code is going to be at the top of wp-pygments.php.


Add a filter on the_content to look for

 tags. the code expected is: 

$name = "World";
echo "Hello, " . $name;

NOTE: html tags needs to be encoded; for example < needs to be < so the parse don&#8217;t get confused and do it all wrong.

Where class is the language of the code inside pre tags, if there is not class or is empty would pass guess to build.py.

add_filter( 'the_content', 'mb_pygments_content_filter' );
function mb_pygments_content_filter( $content )
{
  $cOntent= preg_replace_callback('/]?.*?>.*?(.*?)<\/code>.*?<\/pre>/sim', 'mb_pygments_convert_code', $content);
  return $content;
}

preg_replace_callback function would execute mb_pygments_convert_code callback function every time there's a match on the content using the regex pattern provided: /]?.*?>.*?(.*?)<\/code>.*?<\/pre>/sim, it should match on any

 on a post/page content.

What about sim?, these are three pattern modifiers flags. From php.net:

  • s: If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines.
  • i: If this modifier is set, letters in the pattern match both upper and lower case letters.
  • m: By default, PCRE treats the subject string as consisting of a single "line" of characters (even if it actually contains several newlines).

This can be done with DOMDocument(); as well. replace /]?.*?>.*?(.*?).*?/sim with this:

// This prevent throwing error
libxml_use_internal_errors(true);
// Get all pre from post content
$dom = new DOMDocument();
$dom->loadHTML($content);
$pres = $dom->getElementsByTagName('pre');
foreach ($pres as $pre) {
  $class = $pre->attributes->getNamedItem('class')->nodeValue;
  $code = $pre->nodeValue;
  $args = array(
    2 => $class, // Element at position [2] is the class
    3 => $code // And element at position [2] is the code
  );
  // convert the code
  $new_code = mb_pygments_convert_code($args);
  // Replace the actual pre with the new one.
  $new_pre = $dom->createDocumentFragment();
  $new_pre->appendXML($new_code);
  $pre->parentNode->replaceChild($new_pre, $pre);
}
// Save the HTML of the new code.
$cOntent= $dom->saveHTML();

The code below is from mb_pygments_convert_code function.

define( 'MB_WPP_BASE', dirname(__FILE__) );
function mb_pygments_convert_code( $matches )
{
  $pygments_build = MB_WPP_BASE . '/build.py';
  $source_code    = isset($matches[3])?$matches[3]:'';
  $class_name     = isset($matches[2])?$matches[2]:'';
  // Creates a temporary filename
  $temp_file      = tempnam(sys_get_temp_dir(), 'MB_Pygments_');
  // Populate temporary file
  $filehandle = fopen($temp_file, "w");
  fwrite($filehandle, html_entity_decode($source_code, ENT_COMPAT, 'UTF-8') );
  fclose($filehandle);
  // Creates pygments command
  $language   = $class_name?$class_name:'guess';
  $command    = sprintf('python %s %s %s', $pygments_build, $language, $temp_file);
  // Executes the command
  $retVal = -1;
  exec( $command, $output, $retVal );
  unlink($temp_file);
  // Returns Source Code
  $format = '%s';
  if ( $retVal == 0 )
    $source_code = implode("\n", $output);
  $highlighted_code = sprintf($format, $language, $source_code);
  return $highlighted_code;
}

Reviewing the code above:

define( 'MB_WPP_BASE', dirname(__FILE__) );

define a absolute plugin's directory path constant.

$pygments_build = MB_WPP_BASE . '/build.py';
$source_code    = isset($matches[3])?$matches[3]:'';
$class_name     = isset($matches[2])?$matches[2]:'';

$pygments_build is the full path where the python script is located. Every time there is a match an array called $matches is passed containing 4 element. Take this as an example of a matched code from post/page content:

$name = "World";
echo "Hello, " . $name;
  • The element at position [0] is the whole

     match, and its value is: 

    $name = "World";
    echo "Hello, " . $name;
    
  • The element at position [1] is the class attribute name with its value, and its value is:

    class="php"
    
  • The element at position [2] is the class attribute value without its name, and its value is:

    php
    
  • The element at position [3] is the code itself without its pre tags, and its value is:

    $name = "World";
    echo "Hello, " . $name;
    
// Creates a temporary filename
$temp_file = tempnam(sys_get_temp_dir(), 'MB_Pygments_');

it creates a temporary file containing the code that would be passed to the python script. it's a better way to handle the code would be passed. instead of passing this whole thing as a parameters it would be a mess.

// Populate temporary file
$filehandle = fopen($temp_file, "wb");
fwrite($filehandle, html_entity_decode($source_code, ENT_COMPAT, 'UTF-8') );
fclose($filehandle);

It creates the file of the code, but we decode all the HTML entities, so pygments can convert them properly.

// Creates pygments command
$language = $class_name?$class_name:'guess';
$command  = sprintf('python %s %s %s', $pygments_build, $language, $temp_file);

It creates the python command to be used, it outputs:

python /path/to/build.py php /path/to/temp.file
// Executes the command
$retVal = -1;
exec( $command, $output, $retVal );
unlink($temp_file);
// Returns Source Code
$format = '%s';
if ( $retVal == 0 )
  $source_code = implode("\n", $output);
$highlighted_code = sprintf($format, $language, $source_code);

Executes the command just created and if returns 0 everything worked fine on the python script. exec(); return an array of the lines outputs from python script. so we join the array outputs into one string to be the source code. If not, we are going to stick with the code without highlight.

Improving it by Caching

So by now with work fine, but we have to save time and processing, imagine 100

 tags on a content it would creates 100 files and call 100 times the python script, so let's cache this baby.

Transient API

WordPress provide the ability of storing data on the database temporarily with the Transient API.

First, let's add a action to save_post hook, so every time the post is saved we convert the code and cache it.

add_action( 'save_post', 'mb_pygments_save_post' );
function mb_pygments_save_post( $post_id )
{
  if ( wp_is_post_revision( $post_id ) )
    return;
  $cOntent= get_post_field( 'post_content', $post_id );
  mb_pygments_content_filter( $content );
}

if is a revision we don't do anything, otherwise we get the post content and call the pygments content filter function.

Let's create some functions to handle the cache.

// Cache Functions
// Expiration time (1 month), let's clear cache every month.
define('MB_WPP_EXPIRATION', 60 * 60 * 24 * 30);
// This function it returns the name of a post cache.
function get_post_cache_transient()
{
  global $post;
  $post_id = $post->ID;
  $transient = 'post_' . $post_id . '_content';
  return $transient;
}
// This creates a post cache for a month,
// containing the new content with pygments
// and last time the post was updated.
function save_post_cache($content)
{
  global $post;
  $expiration = MB_WPP_EXPIRATION;
  $value = array( 'content'=>$content, 'updated'=>$post->post_modified );
  set_transient( get_post_cache_transient(), $value, $expiration );
}
// This returns a post cache
function get_post_cache()
{
  $cached_post = get_transient( get_post_cache_transient() );
  return $cached_post;
}
// Check if a post needs to be updated.
function post_cache_needs_update()
{
  global $post;
  $cached_post = get_post_cache();
  if ( strtotime($post->post_modified) > strtotime($cached_post['updated']) )
    return TRUE;
  return FALSE;
}
// Delete a post cache.
function clear_post_cache()
{ 
  delete_transient( get_post_cache_transient() );
}

At the beginning of mb_pygments_content_filter() add some lines to check if there is a cached for the post.

function mb_pygments_content_filter( $content )
{
  if ( FALSE !== ( $cached_post = get_post_cache() ) && !post_cache_needs_update() )
    return $cached_post['content'];
  clear_post_cache();

And at the end of mb_pygments_content_filter() add a line to save the post cache.

save_post_cache( $content );

Finally, when the plugin is uninstall we need to remove all the cache we created, this is a bit tricky, so we use $wpdb object to delete all using this a query.

register_uninstall_hook(__FILE__, 'mb_wp_pygments_uninstall');
function mb_wp_pygments_uninstall() {
  global $wpdb;
  $wpdb->query( "DELETE FROM `wp_options` WHERE option_name LIKE '_transient_post_%_content' " );
}

Read the full article at: Pygments on PHP & WordPress

Treehouse

Sencha Touch Mobile Framework

Related Posts

  1. Allow More HTML Tags in WordPress Comments
  2. Add META Tags, Scripts, and Stylesheets to the WordPress Header and Footer
  3. Create WordPress Shortcodes
  4. Create a &#8220;Recent Posts&#8221; Module Outside of WordPress
  5. Force Login to View WordPress Blog Pages

推荐阅读
  • Java验证码——kaptcha的使用配置及样式
    本文介绍了如何使用kaptcha库来实现Java验证码的配置和样式设置,包括pom.xml的依赖配置和web.xml中servlet的配置。 ... [详细]
  • 本文讨论了Alink回归预测的不完善问题,指出目前主要针对Python做案例,对其他语言支持不足。同时介绍了pom.xml文件的基本结构和使用方法,以及Maven的相关知识。最后,对Alink回归预测的未来发展提出了期待。 ... [详细]
  • 导读:在编程的世界里,语言纷繁多样,而大部分真正广泛流行的语言并不是那些学术界的产物,而是在通过自由发挥设计出来的。和那些 ... [详细]
  • php课程Json格式规范需要注意的小细节
    JSON(JavaScriptObjectNotation)是一种轻量级的数据交换格式。易于人阅读和编写。同时也易于机器解析和生成。它基于JavaScriptProgramming ... [详细]
  • YOLOv7基于自己的数据集从零构建模型完整训练、推理计算超详细教程
    本文介绍了关于人工智能、神经网络和深度学习的知识点,并提供了YOLOv7基于自己的数据集从零构建模型完整训练、推理计算的详细教程。文章还提到了郑州最低生活保障的话题。对于从事目标检测任务的人来说,YOLO是一个熟悉的模型。文章还提到了yolov4和yolov6的相关内容,以及选择模型的优化思路。 ... [详细]
  • 本文介绍了lua语言中闭包的特性及其在模式匹配、日期处理、编译和模块化等方面的应用。lua中的闭包是严格遵循词法定界的第一类值,函数可以作为变量自由传递,也可以作为参数传递给其他函数。这些特性使得lua语言具有极大的灵活性,为程序开发带来了便利。 ... [详细]
  • 本文介绍了在开发Android新闻App时,搭建本地服务器的步骤。通过使用XAMPP软件,可以一键式搭建起开发环境,包括Apache、MySQL、PHP、PERL。在本地服务器上新建数据库和表,并设置相应的属性。最后,给出了创建new表的SQL语句。这个教程适合初学者参考。 ... [详细]
  • 在说Hibernate映射前,我们先来了解下对象关系映射ORM。ORM的实现思想就是将关系数据库中表的数据映射成对象,以对象的形式展现。这样开发人员就可以把对数据库的操作转化为对 ... [详细]
  • 本文介绍了在SpringBoot中集成thymeleaf前端模版的配置步骤,包括在application.properties配置文件中添加thymeleaf的配置信息,引入thymeleaf的jar包,以及创建PageController并添加index方法。 ... [详细]
  • 知识图谱——机器大脑中的知识库
    本文介绍了知识图谱在机器大脑中的应用,以及搜索引擎在知识图谱方面的发展。以谷歌知识图谱为例,说明了知识图谱的智能化特点。通过搜索引擎用户可以获取更加智能化的答案,如搜索关键词"Marie Curie",会得到居里夫人的详细信息以及与之相关的历史人物。知识图谱的出现引起了搜索引擎行业的变革,不仅美国的微软必应,中国的百度、搜狗等搜索引擎公司也纷纷推出了自己的知识图谱。 ... [详细]
  • ZSI.generate.Wsdl2PythonError: unsupported local simpleType restriction ... [详细]
  • 推荐系统遇上深度学习(十七)详解推荐系统中的常用评测指标
    原创:石晓文小小挖掘机2018-06-18笔者是一个痴迷于挖掘数据中的价值的学习人,希望在平日的工作学习中,挖掘数据的价值, ... [详细]
  • 本文介绍了Linux系统中正则表达式的基础知识,包括正则表达式的简介、字符分类、普通字符和元字符的区别,以及在学习过程中需要注意的事项。同时提醒读者要注意正则表达式与通配符的区别,并给出了使用正则表达式时的一些建议。本文适合初学者了解Linux系统中的正则表达式,并提供了学习的参考资料。 ... [详细]
  • 本文介绍了使用cacti监控mssql 2005运行资源情况的操作步骤,包括安装必要的工具和驱动,测试mssql的连接,配置监控脚本等。通过php连接mssql来获取SQL 2005性能计算器的值,实现对mssql的监控。详细的操作步骤和代码请参考附件。 ... [详细]
  • 有关phpfgetss()函数的文章推荐10篇
    有关phpfgetss()函数的文章推荐10篇:了解如何使用PHP的各种文件函数。查看诸如fopen、fclose和feof之类的基本文件函数;了解诸如fgets、fgetss和f ... [详细]
author-avatar
烟为你吸_811
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有