•  


GitHub - martinblech/xmltodict: Python module that makes working with XML feel like you are working with JSON
Skip to content

Python module that makes working with XML feel like you are working with JSON

License

Notifications You must be signed in to change notification settings

martinblech/xmltodict

Repository files navigation

xmltodict

xmltodict is a Python module that makes working with XML feel like you are working with JSON , as in this "spec" :

Build Status

>>
>
 print
(
json
.
dumps
(
xmltodict
.
parse
(
"""

...  <mydocument has="an attribute">

...    <and>

...      <many>elements</many>

...      <many>more elements</many>

...    </and>

...    <plus a="complex">

...      element as well

...    </plus>

...  </mydocument>

...  """
), 
indent
=
4
))
{
    
"mydocument"
: {
        
"@has"
: 
"an attribute"
, 
        
"and"
: {
            
"many"
: [
                
"elements"
, 
                
"more elements"

            ]
        }, 
        
"plus"
: {
            
"@a"
: 
"complex"
, 
            
"#text"
: 
"element as well"

        }
    }
}

Namespace support

By default, xmltodict does no XML namespace processing (it just treats namespace declarations as regular node attributes), but passing process_namespaces=True will make it expand namespaces for you:

>>
>
 xml
 =
 """

... <root xmlns="http://defaultns.com/"

...       xmlns:a="http://a.com/"

...       xmlns:b="http://b.com/">

...   <x>1</x>

...   <a:y>2</a:y>

...   <b:z>3</b:z>

... </root>

... """

>>
>
 xmltodict
.
parse
(
xml
, 
process_namespaces
=
True
) 
==
 {
...     
'http://defaultns.com/:root'
: {
...         
'http://defaultns.com/:x'
: 
'1'
,
...         
'http://a.com/:y'
: 
'2'
,
...         
'http://b.com/:z'
: 
'3'
,
...     }
... }
True

It also lets you collapse certain namespaces to shorthand prefixes, or skip them altogether:

>>
>
 namespaces
 =
 {
...     
'http://defaultns.com/'
: 
None
, 
# skip this namespace

...     
'http://a.com/'
: 
'ns_a'
, 
# collapse "http://a.com/" -> "ns_a"

... }
>>
>
 xmltodict
.
parse
(
xml
, 
process_namespaces
=
True
, 
namespaces
=
namespaces
) 
==
 {
...     
'root'
: {
...         
'x'
: 
'1'
,
...         
'ns_a:y'
: 
'2'
,
...         
'http://b.com/:z'
: 
'3'
,
...     },
... }
True

Streaming mode

xmltodict is very fast ( Expat -based) and has a streaming mode with a small memory footprint, suitable for big XML dumps like Discogs or Wikipedia :

>>
>
 def
 handle_artist
(
_
, 
artist
):
...     
print
(
artist
[
'name'
])
...     
return
 True

>>
>
 
>>
>
 xmltodict
.
parse
(
GzipFile
(
'discogs_artists.xml.gz'
),
...     
item_depth
=
2
, 
item_callback
=
handle_artist
)
A
 Perfect
 Circle

Fantomas

King
 Crimson

Chris
 Potter

...

It can also be used from the command line to pipe objects to a script like this:

import
 sys
, 
marshal

while
 True
:
    
_
, 
article
 =
 marshal
.
load
(
sys
.
stdin
)
    
print
(
article
[
'title'
])
$ bunzip2 enwiki-pages-articles.xml.bz2 
|
 xmltodict.py 2 
|
 myscript.py
AccessibleComputing
Anarchism
AfghanistanHistory
AfghanistanGeography
AfghanistanPeople
AfghanistanCommunications
Autism
...

Or just cache the dicts so you don't have to parse that big XML file again. You do this only once:

$ bunzip2 enwiki-pages-articles.xml.bz2 
|
 xmltodict.py 2 
|
 gzip 
>
 enwiki.dicts.gz

And you reuse the dicts with every script that needs them:

$ gunzip enwiki.dicts.gz 
|
 script1.py
$ gunzip enwiki.dicts.gz 
|
 script2.py
...

Roundtripping

You can also convert in the other direction, using the unparse() method:

>>
>
 mydict
 =
 {
...     
'response'
: {
...             
'status'
: 
'good'
,
...             
'last_updated'
: 
'2014-02-16T23:10:12Z'
,
...     }
... }
>>
>
 print
(
unparse
(
mydict
, 
pretty
=
True
))
<
?
xml
 version
=
"1.0"
 encoding
=
"utf-8"
?
>

<
response
>

	<
status
>
good
<
/
status
>

	<
last_updated
>
2014
-
02
-
16
T23
:
10
:
12
Z
<
/
last_updated
>

<
/
response
>

Text values for nodes can be specified with the cdata_key key in the python dict, while node properties can be specified with the attr_prefix prefixed to the key name in the python dict. The default value for attr_prefix is @ and the default value for cdata_key is #text .

>>
>
 import
 xmltodict

>>
>
 
>>
>
 mydict
 =
 {
...     
'text'
: {
...         
'@color'
:
'red'
,
...         
'@stroke'
:
'2'
,
...         
'#text'
:
'This is a test'

...     }
... }
>>
>
 print
(
xmltodict
.
unparse
(
mydict
, 
pretty
=
True
))
<
?
xml
 version
=
"1.0"
 encoding
=
"utf-8"
?
>

<
text
 stroke
=
"2"
 color
=
"red"
>
This
 is
 a
 test
<
/
text
>

Lists that are specified under a key in a dictionary use the key as a tag for each item. But if a list does have a parent key, for example if a list exists inside another list, it does not have a tag to use and the items are converted to a string as shown in the example below. To give tags to nested lists, use the expand_iter keyword argument to provide a tag as demonstrated below. Note that using expand_iter will break roundtripping.

>>
>
 mydict
 =
 {
...     
"line"
: {
...         
"points"
: [
...             [
1
, 
5
],
...             [
2
, 
6
],
...         ]
...     }
... }
>>
>
 print
(
xmltodict
.
unparse
(
mydict
, 
pretty
=
True
))
<
?
xml
 version
=
"1.0"
 encoding
=
"utf-8"
?
>

<
line
>

        <
points
>
[
1
, 
5
]
<
/
points
>

        <
points
>
[
2
, 
6
]
<
/
points
>

<
/
line
>

>>
>
 print
(
xmltodict
.
unparse
(
mydict
, 
pretty
=
True
, 
expand_iter
=
"coord"
))
<
?
xml
 version
=
"1.0"
 encoding
=
"utf-8"
?
>

<
line
>

        <
points
>

                <
coord
>
1
<
/
coord
>

                <
coord
>
5
<
/
coord
>

        <
/
points
>

        <
points
>

                <
coord
>
2
<
/
coord
>

                <
coord
>
6
<
/
coord
>

        <
/
points
>

<
/
line
>

Ok, how do I get it?

Using pypi

You just need to

$ pip install xmltodict

Using conda

For installing xmltodict using Anaconda/Miniconda ( conda ) from the conda-forge channel all you need to do is:

$ conda install -c conda-forge xmltodict

RPM-based distro (Fedora, RHEL, …)

There is an official Fedora package for xmltodict .

$ sudo yum install python-xmltodict

Arch Linux

There is an official Arch Linux package for xmltodict .

$ sudo pacman -S python-xmltodict

Debian-based distro (Debian, Ubuntu, …)

There is an official Debian package for xmltodict .

$ sudo apt install python-xmltodict

FreeBSD

There is an official FreeBSD port for xmltodict .

$ pkg install py36-xmltodict

openSUSE/SLE (SLE 15, Leap 15, Tumbleweed)

There is an official openSUSE package for xmltodict .

#
 Python2

$ zypper 
in
 python2-xmltodict

#
 Python3

$ zypper 
in
 python3-xmltodict
- "漢字路" 한글한자자동변환 서비스는 교육부 고전문헌국역지원사업의 지원으로 구축되었습니다.
- "漢字路" 한글한자자동변환 서비스는 전통문화연구회 "울산대학교한국어처리연구실 옥철영(IT융합전공)교수팀"에서 개발한 한글한자자동변환기를 바탕하여 지속적으로 공동 연구 개발하고 있는 서비스입니다.
- 현재 고유명사(인명, 지명등)을 비롯한 여러 변환오류가 있으며 이를 해결하고자 많은 연구 개발을 진행하고자 하고 있습니다. 이를 인지하시고 다른 곳에서 인용시 한자 변환 결과를 한번 더 검토하시고 사용해 주시기 바랍니다.
- 변환오류 및 건의,문의사항은 juntong@juntong.or.kr로 메일로 보내주시면 감사하겠습니다. .
Copyright ⓒ 2020 By '전통문화연구회(傳統文化硏究會)' All Rights reserved.
 한국   대만   중국   일본