Python parses XML files (parses, updates, writes)

Overview

This blog post will include parsing the XML file, appending new elements to write to the XML, and updating the value of a node in the original XML file. The python xml.dom.minidom package is used, and the details can be seen in its official document: xml.dom.minidom official document. The full text will operate around the following customer.xml :

<?xml version="1.0" encoding="utf-8" ?>
<!-- This is list of customers -->
<customers>
  <customer ID="C001">
    <name>Acme Inc.</name>
    <phone>12345</phone>
    <comments>
      <![CDATA[Regular customer since 1995]]>
    </comments>
  </customer>
  <customer ID="C002">
    <name>Star Wars Inc.</name>
    <phone>23456</phone>
    <comments>
      <![CDATA[A small but healthy company.]]>
    </comments>
  </customer>
</customers>

CDATA: part of the data in XML that is not parsed by the parser.

declaration: in this article, nodes and nodes are considered to be the same concept, you can replace them anywhere in the whole text, I personally feel the difference is not very big, of course, you can also view it as my typing error.

1. Parse XML file

when parsing XML, all text is stored in a text node, and the text nodes are regarded as nodes child elements, such as: 2005, element nodes, has a text node value is “2005”, “2005” is not the value of the element, the most commonly used method is the getElementsByTagName () method, and then further access to the nodes according to the document structure parsing.

specific theory is not enough to describe, with the above XML file and the following code, you will clearly see the operation method, the following code is to perform all node names and node information output as follows:

# -*- coding: utf-8 -*-
"""
    @Author  : LiuZhian
    @Time    : 2019/4/24 0024 上午 9:19
    @Comment : 
"""
from xml.dom.minidom import parse
def readXML():
	domTree = parse("./customer.xml")
	# 文档根元素
	rootNode = domTree.documentElement
	print(rootNode.nodeName)

	# 所有顾客
	customers = rootNode.getElementsByTagName("customer")
	print("****所有顾客信息****")
	for customer in customers:
		if customer.hasAttribute("ID"):
			print("ID:", customer.getAttribute("ID"))
			# name 元素
			name = customer.getElementsByTagName("name")[0]
			print(name.nodeName, ":", name.childNodes[0].data)
			# phone 元素
			phone = customer.getElementsByTagName("phone")[0]
			print(phone.nodeName, ":", phone.childNodes[0].data)
			# comments 元素
			comments = customer.getElementsByTagName("comments")[0]
			print(comments.nodeName, ":", comments.childNodes[0].data)

if __name__ == '__main__':
	readXML()

2. Write to XML file When writing

, I think there are two ways:

Create a new XML file

  • and append some element information to the existing XML file
  • in both cases, the method for creating element nodes is similar, all you have to do is create/get a DOM object, and then create a new node based on the DOM.

    in the first case, you can create it by dom= minidom.document (); In the second case, you can get the dom object directly by parsing the existing XML file, for example dom = parse("./customer.xml")

    when creating element/text nodes, you’ll probably write a four-step sequence like this:

  • create a new element node createElement()
  • create a text node createTextNode()
  • mount the text node on the element node
  • mount the element node on its parent.
  • now, I need to create a new customer node with the following information :

    <customer ID="C003">
        <name>kavin</name>
        <phone>32467</phone>
        <comments>
          <![CDATA[A small but healthy company.]]>
        </comments>
      </customer>
    

    code as follows:

    def writeXML():
    	domTree = parse("./customer.xml")
    	# 文档根元素
    	rootNode = domTree.documentElement
    
    	# 新建一个customer节点
    	customer_node = domTree.createElement("customer")
    	customer_node.setAttribute("ID", "C003")
    
    	# 创建name节点,并设置textValue
    	name_node = domTree.createElement("name")
    	name_text_value = domTree.createTextNode("kavin")
    	name_node.appendChild(name_text_value)  # 把文本节点挂到name_node节点
    	customer_node.appendChild(name_node)
    
    	# 创建phone节点,并设置textValue
    	phone_node = domTree.createElement("phone")
    	phone_text_value = domTree.createTextNode("32467")
    	phone_node.appendChild(phone_text_value)  # 把文本节点挂到name_node节点
    	customer_node.appendChild(phone_node)
    
    	# 创建comments节点,这里是CDATA
    	comments_node = domTree.createElement("comments")
    	cdata_text_value = domTree.createCDATASection("A small but healthy company.")
    	comments_node.appendChild(cdata_text_value)
    	customer_node.appendChild(comments_node)
    
    	rootNode.appendChild(customer_node)
    
    	with open('added_customer.xml', 'w') as f:
    		# 缩进 - 换行 - 编码
    		domTree.writexml(f, addindent='  ', encoding='utf-8')
    
    if __name__ == '__main__':
    	writeXML()
    

    3. Update XML file

    when updating XML, we only need to find the corresponding element node first, and then update the value of the text node or attribute under it, and then save it to the file. I will not say more about the details, but I have made the idea clear in the code, as follows:

    def updateXML():
    	domTree = parse("./customer.xml")
    	# 文档根元素
    	rootNode = domTree.documentElement
    
    	names = rootNode.getElementsByTagName("name")
    	for name in names:
    		if name.childNodes[0].data == "Acme Inc.":
    			# 获取到name节点的父节点
    			pn = name.parentNode
    			# 父节点的phone节点,其实也就是name的兄弟节点
    			# 可能有sibNode方法,我没试过,大家可以google一下
    			phone = pn.getElementsByTagName("phone")[0]
    			# 更新phone的取值
    			phone.childNodes[0].data = 99999
    
    	with open('updated_customer.xml', 'w') as f:
    		# 缩进 - 换行 - 编码
    		domTree.writexml(f, addindent='  ', encoding='utf-8')
    
    if __name__ == '__main__':
    	updateXML()
    

    if there is anything wrong, please advise ~


    Read More: