I have two XML files of similar structure which I wish to merge into one file. Currently I am using EL4J XML Merge which I came across in this tutorial. However it does not merge as I expect it to for instances the main problem is its not merging the from both files into one element aka one that contains 1, 2, 3 and 4. Instead it just discards either 1 and 2 or 3 and 4 depending on which file is merged first.

我有两个结构相似的XML文件,我希望将它们合并到一个文件中。目前我使用的是我在本教程中遇到的EL4J XML Merge。然而它并没有像我期望的那样合并实例主要问题是它没有将两个文件合并为一个元素,即包含1,2,3和4的元素。相反,它只丢弃1和2或3和4取决于首先合并的文件。

So I would be grateful to anyone who has experience with XML Merge if they could tell me what I might be doing wrong or alternatively does anyone know of a good XML API for Java that would be capable of merging the files as I require?

所以我会感谢任何有XML Merge经验的人,如果他们可以告诉我我可能做错了什么,或者有没有人知道一个优秀的XML API for Java能够根据我的要求合并文件?

Many Thanks for Your Help in Advance




Could really do with some good suggestions on doing this so added a bounty. I've tried jdigital's suggestion but still having issues with XML merge.


Below is a sample of the type of structure of XML files that I am trying to merge.




Expected output



11 个解决方案



Not very elegant, but you could do this with the DOM parser and XPath:


public class MergeXmlDemo {

  public static void main(String[] args) throws Exception {
    // proper error/exception handling omitted for brevity
    File file1 = new File("merge1.xml");
    File file2 = new File("merge2.xml");
    Document doc = merge("/run/host/results", file1, file2);

  private static Document merge(String expression,
      File... files) throws Exception {
    XPathFactory xPathFactory = XPathFactory.newInstance();
    XPath xpath = xPathFactory.newXPath();
    XPathExpression compiledExpression = xpath
    return merge(compiledExpression, files);

  private static Document merge(XPathExpression expression,
      File... files) throws Exception {
    DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
    DocumentBuilder docBuilder = docBuilderFactory
    Document base = docBuilder.parse(files[0]);

    Node results = (Node) expression.evaluate(base,
    if (results == null) {
      throw new IOException(files[0]
          + ": expression does not evaluate to node");

    for (int i = 1; i 

This assumes that you can hold at least two of the documents in RAM simultaneously.




I use XSLT to merge XML files. It allows me to adjust the merge operation to just slam the content together or to merge at an specific level. It is a little more work (and XSLT syntax is kind of special) but super flexible. A few things you need here


a) Include an additional file b) Copy the original file 1:1 c) Design your merge point with or without duplication avoidance

a)包含一个附加文件b)复制原始文件1:1 c)设置合并点,有或没有避免重复

a) In the beginning I have



this allows to point to the second file using $mDoc

这允许使用$ mDoc指向第二个文件

b) The instructions to copy a source tree 1:1 are 2 templates:




With nothing else you get a 1:1 copy of your first source file. Works with any type of XML. The merging part is file specific. Let's presume you have event elements with an event ID attribute. You do not want duplicate IDs. The template would look like this:



Of course you can compare other things like tag names etc. Also it is up to you how deep the merge happens. If you don't have a key to compare, the construct becomes easier e.g. for log:



To run XSLT in Java use this:


    Source xmlSource = new StreamSource(xmlFile);
    Source xsltSource = new StreamSource(xsltFile);
    Result xmlResult = new StreamResult(resultFile);
    TransformerFactory transFact = TransformerFactory.newInstance();
    Transformer trans = transFact.newTransformer(xsltSource);
    // Load Parameters if we have any
    if (ParameterMap != null) {
       for (Entry curParam : ParameterMap.entrySet()) {
            trans.setParameter(curParam.getKey(), curParam.getValue());
    trans.transform(xmlSource, xmlResult);

or you download the Saxon SAX Parser and do it from the command line (Linux shell example):

或者您下载Saxon SAX Parser并从命令行执行(Linux shell示例):

notify-send -t 500 -u low -i gtk-dialog-info "Transforming $1 with $2 into $3 ..."
# That's actually the only relevant line below
java -cp saxon9he.jar net.sf.saxon.Transform -t -s:$1 -xsl:$2 -o:$3
notify-send -t 1000 -u low -i gtk-dialog-info "Extraction into $3 done!"





Thanks to everyone for their suggestions unfortunately none of the methods suggested turned out to be suitable in the end, as I needed to have rules for the way in which different nodes of the structure where mereged.


So what I did was take the DTD relating to the XML files I was merging and from that create a number of classes reflecting the structure. From this I used XStream to unserialize the XML file back into classes.


This way I annotated my classes making it a process of using a combination of the rules assigned with annotations and some reflection in order to merge the Objects as opposed to merging the actual XML structure.


If anyone is interested in the code which in this case merges Nmap XML files please see http://fluxnetworks.co.uk/NmapXMLMerge.tar.gz the codes not perfect and I will admit not massively flexible but it definitely works. I'm planning to reimplement the system with it parsing the DTD automatically when I have some free time.

如果有人对代码感兴趣,在这种情况下合并Nmap XML文件,请参阅http://fluxnetworks.co.uk/NmapXMLMerge.tar.gz代码不完美,我承认不是大规模灵活但它绝对有效。我打算重新实现系统,因为我有空闲时间自动解析DTD。



This is how it should look like using XML Merge:

这就是使用XML Merge的样子:




You have to set ID matcher for //result node and set PRESERVE action for //info node. Also beware that .properties XML Merge uses are case sensitive - you have to use "xpath" not "XPath" in your .properties.

您必须为//结果节点设置ID匹配器,并为// info节点设置PRESERVE操作。还要注意.properties XML Merge使用区分大小写 - 您必须在.properties中使用“xpath”而不是“XPath”。

Don't forget to define -config parameter like this:


java -cp lib\xmlmerge-full.jar; ch.elca.el4j.services.xmlmerge.tool.XmlMergeTool -config xmlmerge.properties example1.xml example2.xml 



I took a look at the referenced link; it's odd that XMLMerge would not work as expected. Your example seems straightforward. Did you read the section entitled Using XPath declarations with XmlMerge? Using the example, try to set up an XPath for results and set it to merge. If I'm reading the doc correctly, it would look something like this:





You might be able to write a java app that deserilizes the XML documents into objects, then "merge" the individual objects programmatically into a collection. You can then serialize the collection object back out to an XML file with everything "merged."


The JAXB API has some tools that can convert an XML document/schema into java classes. The "xjc" tool might be able to do this, although I can't remember if you can create classes directly from the XML doc, or if you have to generate a schema first. There are tools out there than can generate a schema from an XML doc.

JAXB API有一些工具可以将XML文档/模式转换为java类。 “xjc”工具可能能够做到这一点,虽然我不记得你是否可以直接从XML文档创建类,或者你必须先生成一个模式。除了可以从XML文档生成模式之外,还有一些工具。

Hope this helps... not sure if this is what you were looking for.




In addition to using Stax (which does make sense), it'd probably be easier with StaxMate (http://staxmate.codehaus.org/Tutorial). Just create 2 SMInputCursors, and child cursor if need be. And then typical merge sort with 2 cursors. Similar to traversing DOM documents in recursive-descent manner.

除了使用Stax(确实有意义)之外,使用StaxMate(http://staxmate.codehaus.org/Tutorial)可能更容易。只需创建2个SMInputCursors和子游标即可。然后典型的合并排序与2个游标。类似于以递归 - 下降方式遍历DOM文档。



So, you're only interested in merging the 'results' elements? Everything else is ignored? The fact that input0 has an and input1 has an and the expected result has an seems to suggest this.

那么,你只对合并'结果'元素感兴趣吗?其他一切都被忽略了? input0具有 并且input1具有 并且预期结果具有 的事实似乎暗示了这一点。

If you're not worried about scaling and you want to solve this problem quickly then I would suggest writing a problem-specific bit of code that uses a simple library like JDOM to consider the inputs and write the output result.


Attempting to write a generic tool that was 'smart' enough to handle all of the possible merge cases would be pretty time consuming - you'd have to expose a configuration capability to define merge rules. If you know exactly what your data is going to look like and you know exactly how the merge needs to be executed then I would imagine your algorithm would walk each XML input and write to a single XML output.

尝试编写一个“智能”足以处理所有可能的合并情况的通用工具将非常耗时 - 您必须公开配置功能来定义合并规则。如果你确切知道你的数据是什么样的,并且你确切知道合并需要如何执行,那么我会想象你的算法会遍历每个XML输入并写入单个XML输出。



You can try Dom4J which provides a very good means to extract information using XPath Queries and also allows you to write XML very easily. You just need to play around with the API for a while to do your job




Have you considered just not bothering with parsing the XML "properly" and just treating the files as big long strings and using boring old things such as hash maps and regular expressions...? This could be one of those cases where the fancy acronyms with X in them just make the job fiddlier than it needs to be.


Obviously this does depend a bit on how much data you actually need to parse out while doing the merge. But by the sound of things, the answer to that is not much.


