The Python DOM method iterates over all the XML in a folder

I just started learning Python recently. To implement this same function, iterate through an XML file in the res\\value directory of an Android app code. Because its XML file format is basically the following, relatively simple.

string.xml

<?xml version="1.0" encoding="utf-8"?>
<resources>

    <string name="app_name">ActivityLife</string>
    <string name="hello_world">Hello world!</string>
    <string name="action_settings">Settings</string>

</resources>

As you can see, there are actually three String children under the resources parent node. Because I’m new to Python, I find it easier to use the XML.dom method (forgive me for being a whitehead).

The

XML dom defines the objects and attributes of XML elements and the methods to access them. Dom treats XML documents as a tree structure. To see this, click the open link. Ok, let’s get back to business. To solve the above problem, the specific idea is

1. Traverse the folder to get all the XML files. (glob.glob())

2. Read and parse each XML. Gets the child node attribute value and the text node value.

below is the implementation code:

# -*- coding: UTF-8 -*-
#遍历某个文件夹下所有xml文件
import sys  
import glob  
import os
import  xml.dom.minidom  
def traversalDir_XMLFile(path):
    #判断路径是否存在
 if (os.path.exists(path)):
     #得到该文件夹路径下下的所有xml文件路径
    f = glob.glob(path + '\\*.xml' )      
    
    for file in f : 
        print file
        #打开xml文档
        dom = xml.dom.minidom.parse(file)
        #得到文档元素对象
        root = dom.documentElement
        #得到子节点列表,print childs                
        childs = root.childNodes
        for child in childs:
            #筛选符合需求的child                    
            if(child.nodeType == 1):
                #得出子节点属性和文本节点值           
                print'key:', child.getAttribute('name')
                print'value:',child.firstChild.data
            
traversalDir_XMLFile('E:\\work\\ActivityLife\\res\\values')  

The path path is one of my value folders with dimens.xml; string.xml; styles.xml; There are also several Word files and TXT format files. The output result is:

E:\work\ActivityLife\res\values\dimens.xml
key: activity_horizontal_margin
value: 16dp
key: activity_vertical_margin
value: 16dp
E:\work\ActivityLife\res\values\strings.xml
key: app_name
value: ActivityLife
key: hello_world
value: Hello world!
key: action_settings
value: Settings
E:\work\ActivityLife\res\values\styles.xml
key: AppBaseTheme
value: 
        
key: AppTheme
value: 
        

The comments in the code are pretty clear. Because I have other XML files in my file, although the parent nodes are all under Resources, the children are different. There are strings, there are dimen and so on. But the format is the same. So, instead of using child.nodeValue when I print value, I get none, which is not so clear. I think it might be this:

Text is always stored in the text node

A common mistake in DOM processing is to assume that the element node contains text.

However, the text of the element node is stored in the text node.

In this case: < year> 2005< /year> , element node < year> , has a text node with a value of “2005”.

“2005” not < year> Element value!

specific reasons hope readers can tell me ha!! Do ⌒ (* ^ – ゜) v

some reference documents are given below:

Python golb methods: http://www.cnblogs.com/hongten/p/hongten_python_glob.html

XML parsing: http://www.cnblogs.com/fnng/p/3581433.html

http://www.runoob.com/python/python-xml.html

Read More: