pandas dataframe为latex或html table nbconvert

 r_elease靜 发布于 2023-02-09 10:49

当使用nbconvert到latex&PDF时,是否可以从ipython笔记本中的pandas数据框中获取格式良好的表?

默认似乎只是一个左对齐的数字块,看起来很伪劣.

我想更像是笔记本中的数据帧的html显示或乳胶表.保存和显示HTML渲染数据帧的.png图像也没问题,但究竟如何做到这一点已经证明是难以捉摸的.

最低限度,我想要一个简单的中心对齐表格,字体很好.

我没有幸运尝试使用.to_latex()方法从pandas数据帧获取乳胶表,无论是在笔记本中还是在nbconvert输出中.我也尝试过(在阅读ipython开发列表讨论之后,并按照自定义显示逻辑笔记本示例)使用_repr_html_和_repr_latex_方法创建自定义类,分别返回_to_html()和_to_latex()的结果.我认为nb转换的一个主要问题是pdflatex对数据框to_latex()输出中的{'或//'不满意.但我不想在检查之前开始摆弄那个我没有错过的东西.

谢谢.

2 个回答
  • 我为此编写了自己mako的模板方案.我认为,如果您承诺一次为自己做好准备,这实际上是一个非常简单的工作流程.之后,您开始看到模板化所需格式的元数据,因此可以将其从代码中分解出来(并不表示第三方依赖)是一种非常好的解决方法.

    这是我提出的工作流程.

      编写.mako模板,接受您的数据帧作为参数(可能还有其他args)并将其转换为您想要的TeX格式(例如下面的例子).

      创建一个包装类(我称之为to_tex),它创建了您想要的API(例如,您可以将数据对象传递给它,并在mako内部处理对渲染命令的调用).

      在包装类中,决定你想要的输出方式.将TeX代码打印到屏幕上?使用子进程实际将其编译为pdf?

    就我而言,我正在研究为研究论文生成初步结果,并且需要将表格式化为具有嵌套列名称等的复杂的双重排序结构.以下是其中一个表格的示例:

    模板化TeX工具的示例输出

    这是mako模板(警告,粗略):

    <%page args="df, table_title, group_var, sort_var"/>
    <%
    """
    Template for country/industry two-panel double sorts TeX table.
    Inputs: 
    -------
    df: pandas DataFrame
        Must be 17 x 12 and have rows and columns that positionally
        correspond to the entries of the table.
    
    table_title: string
        String used for the title of the table.
    
    group_var: string
        String naming the grouping variable for the horizontal sorts.
        Should be 'Country' or 'Industry'.
    
    sort_var: string (raw)
        String naming the variable that is being sorted, e.g.
        "beta" or "ivol". Note that if you want the symbol to
        be rendered as a TeX symbol, then pass a raw Python
        string as the arg and include the needed TeX markup in
        the passed string. If the string isn't raw, some of the
        TeX markup might be interpreted as special characters.
    
    Returns:
    --------
    When used with mako.template.Template.render, will produce
    a raw TeX string that can be rendered into a PDF containing
    the specified data.
    
    Author:
    -------
    Ely M. Spears, 05/21/2013
    
    """
    # Python imports and helper function definitions.
    import numpy as np  
    def format_helper(x):
        return str(np.round(x,2))
    %>
    
    
    <%text>
    \documentclass[10pt]{article}
    \usepackage[top=1in, bottom=1in, left=1in, right=1in]{geometry}
    \usepackage{array}
    \newcolumntype{L}[1]{>{\raggedright\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
    \newcolumntype{C}[1]{>{\centering\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
    \setlength{\parskip}{1em}
    \setlength{\parindent}{0in}
    \renewcommand*\arraystretch{1.5}
    \author{Ely Spears}
    
    
    \begin{document}
    \begin{table} \caption{</%text>${table_title}<%text>}
    \begin{center}
        \begin{tabular}{ | p{2.5cm}  c c c c c p{1cm} c c c c c c p{1cm} |}
        \hline
        & \multicolumn{6}{c}{CAPM $\beta$} & \multicolumn{6}{c}{CAPM $\alpha$ (\%p.a.)} & \\
        \cline{2-7} \cline{9-14}
        & \multicolumn{6}{c}{</%text>${group_var}<%text> </%text>${sort_var}<%text> is:} & \multicolumn{6}{c}{</%text>${group_var}<%text> </%text>${sort_var}<%text> is:} & \\
        Stock </%text>${sort_var}<%text> is: & Low & 2 & 3 & 4 & High & Low - High & & Low & 2 & 3 & 4 & High & Low - High \\ 
        \hline
        \multicolumn{4}{|l}{Panel A. Point estimates} & & & & & & & & & & \\ 
        \hline
        Low            & </%text>${' & '.join(df.ix[0].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[0].map(format_helper).values[6:])}<%text> \\
        2              & </%text>${' & '.join(df.ix[1].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[1].map(format_helper).values[6:])}<%text> \\
        3              & </%text>${' & '.join(df.ix[2].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[2].map(format_helper).values[6:])}<%text> \\
        4              & </%text>${' & '.join(df.ix[3].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[3].map(format_helper).values[6:])}<%text> \\
        High           & </%text>${' & '.join(df.ix[4].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[4].map(format_helper).values[6:])}<%text> \\
        Low - High     & </%text>${' & '.join(df.ix[5].map(format_helper).values[0:5])}<%text> & & & </%text>${' & '.join(df.ix[5].map(format_helper).values[6:11])}<%text> & \\
    
    
        \multicolumn{6}{|l}{</%text>${group_var}<%text> effect (average of Low - High \underline{column})}     
            & </%text>${format_helper(df.ix[6,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[6,11])}<%text> \\
    
    
        \multicolumn{6}{|l}{Within-</%text>${group_var}<%text> effect (average of Low - High \underline{row})} 
            & </%text>${format_helper(df.ix[7,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[7,11])}<%text> \\
    
    
        \multicolumn{13}{|l}{Total effect} & </%text>${format_helper(df.ix[8,11])}<%text>  \\
        \hline
        \multicolumn{4}{|l}{Panel B. t-statistics} & & & & & & & & & & \\
        \hline
        Low            & </%text>${' & '.join(df.ix[9].map(format_helper).values[0:6])}<%text>  & & </%text>${' & '.join(df.ix[9].map(format_helper).values[6:])}<%text> \\
        2              & </%text>${' & '.join(df.ix[10].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[10].map(format_helper).values[6:])}<%text> \\
        3              & </%text>${' & '.join(df.ix[11].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[11].map(format_helper).values[6:])}<%text> \\
        4              & </%text>${' & '.join(df.ix[12].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[12].map(format_helper).values[6:])}<%text> \\
        High           & </%text>${' & '.join(df.ix[13].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[13].map(format_helper).values[6:])}<%text> \\
        Low - High     & </%text>${' & '.join(df.ix[14].map(format_helper).values[0:5])}<%text> & & & </%text>${' & '.join(df.ix[14].map(format_helper).values[6:11])}<%text> & \\
    
    
        \multicolumn{6}{|l}{</%text>${group_var}<%text> effect (average of Low - High \underline{column})}     
            & </%text>${format_helper(df.ix[15,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[15,11])}<%text> \\
    
    
        \multicolumn{6}{|l}{Within-</%text>${group_var}<%text> effect (average of Low - High \underline{row})} 
            & </%text>${format_helper(df.ix[16,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[16,11])}<%text> \\
        \hline
        \end{tabular}
    \end{center}
    \end{table}
    \end{document}
    </%text>
    

    我的包装器to_tex.py看起来像这样(在if __name__ == "__main__"节中有示例用法):

    """
    to_tex.py
    
    Class for handling strings of TeX code and producing the
    rendered PDF via PDF LaTeX. Assumes ability to call PDFLaTeX
    via the operating system.
    """
    class to_tex(object):
        """
        Publishes a TeX string to a PDF rendering with pdflatex.
        """
        def __init__(self, tex_string, tex_file, display=False):
            """
            Publish a string to a .tex file, which will be
            rendered into a .pdf file via pdflatex.
            """
            self.tex_string    = tex_string
            self.tex_file      = tex_file
            self.__to_tex_file()
            self.__to_pdf_file(display)
            print "Render status:", self.render_status
    
        def __to_tex_file(self):
            """
            Writes a tex string to a file.
            """
            with open(self.tex_file, 'w') as t_file:
                t_file.write(self.tex_string)
    
        def __to_pdf_file(self, display=False):
            """
            Compile a tex file to a pdf file with the
            same file path and name.
            """
            try:
                import os
                from subprocess import Popen
                proc = Popen(["pdflatex", "-output-directory", os.path.dirname(self.tex_file), self.tex_file])
                proc.communicate()
                self.render_status = "success"
            except Exception as e:
                self.render_status = str(e)
    
            # Launch a display of the pdf if requested.
            if (self.render_status == "success") and display:
                try:
                    proc = Popen(["evince", self.tex_file.replace(".tex", ".pdf")])
                    proc.communicate()
                except:
                    pass
    
    if __name__ == "__main__":
        from mako.template import Template
        template_file = "path/to/template.mako"
        t = Template(filename=template_file)
        tex_str = t.render(arg1="arg1", ...)
        tex_wrapper = to_tex(tex_str, )
    

    我的选择是直接将TeX字符串泵入pdflatex并作为选项显示它.

    实际上使用DataFrame的一小段代码在这里:

    # Assume calculation work is done prior to this ...
    all_beta  = pandas.concat([beta_df,  beta_tstat_df], axis=0)
    all_alpha = pandas.concat([alpha_df, alpha_tstat_df], axis=0)
    all_df = pandas.concat([all_beta, all_alpha], axis=1)
    
    # Render result in TeX
    tex_mako  = "/my_project/templates/mako/two_panel_double_sort_table.mako"
    tex_file = "/my_project/some_tex_file_name.tex"
    
    from mako.template import Template
    t = Template(filename=tex_mako)
    tex_str = t.render(all_df, table_title, group_var, tex_risk_name)
    
    import my_project.to_tex as to_tex
    tex_obj = to_tex.to_tex(tex_str, tex_file)
    

    2023-02-09 10:51 回答
  • 在这个Github问题中讨论了一种更简单的方法.基本上,您必须向_repr_latex_DataFrame类添加一个方法,这是一个在其官方文档中从pandas中记录的过程.

    我在这样的笔记本中这样做了:

    import pandas as pd
    
    pd.set_option('display.notebook_repr_html', True)
    
    def _repr_latex_(self):
        return "\centering{%s}" % self.to_latex()
    
    pd.DataFrame._repr_latex_ = _repr_latex_  # monkey patch pandas DataFrame
    

    以下代码:

    d = {'one' : [1., 2., 3., 4.],
         'two' : [4., 3., 2., 1.]}
    df = pd.DataFrame(d)
    df
    

    如果在笔记本中实时评估,它将变为HTML表格,并转换为PDF格式的(居中)表格:

    $ ipython nbconvert --to latex --post PDF notebook.ipynb
    

    2023-02-09 10:51 回答
撰写答案
今天,你开发时遇到什么问题呢?
立即提问
热门标签
PHP1.CN | 中国最专业的PHP中文社区 | PNG素材下载 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有