xmltodict

xmltodict is a Python module that makes working with XML feel like you are working with JSON , as in this "spec" :

>>
>
 print
(
json
.
dumps
(
xmltodict
.
parse
(
"""

...  <mydocument has="an attribute">

...    <and>

...      <many>elements</many>

...      <many>more elements</many>

...    </and>

...    <plus a="complex">

...      element as well

...    </plus>

...  </mydocument>

...  """
), 
indent
=
4
))
{
    
"mydocument"
: {
        
"@has"
: 
"an attribute"
, 
        
"and"
: {
            
"many"
: [
                
"elements"
, 
                
"more elements"

            ]
        }, 
        
"plus"
: {
            
"@a"
: 
"complex"
, 
            
"#text"
: 
"element as well"

        }
    }
}

Namespace support

By default, xmltodict does no XML namespace processing (it just treats namespace declarations as regular node attributes), but passing process_namespaces=True will make it expand namespaces for you:

>>
>
 xml
 =
 """

... <root xmlns="http://defaultns.com/"

...       xmlns:a="http://a.com/"

...       xmlns:b="http://b.com/">

...   <x>1</x>

...   <a:y>2</a:y>

...   <b:z>3</b:z>

... </root>

... """

>>
>
 xmltodict
.
parse
(
xml
, 
process_namespaces
=
True
) 
==
 {
...     
'http://defaultns.com/:root'
: {
...         
'http://defaultns.com/:x'
: 
'1'
,
...         
'http://a.com/:y'
: 
'2'
,
...         
'http://b.com/:z'
: 
'3'
,
...     }
... }
True

It also lets you collapse certain namespaces to shorthand prefixes, or skip them altogether:

>>
>
 namespaces
 =
 {
...     
'http://defaultns.com/'
: 
None
, 
# skip this namespace

...     
'http://a.com/'
: 
'ns_a'
, 
# collapse "http://a.com/" -> "ns_a"

... }
>>
>
 xmltodict
.
parse
(
xml
, 
process_namespaces
=
True
, 
namespaces
=
namespaces
) 
==
 {
...     
'root'
: {
...         
'x'
: 
'1'
,
...         
'ns_a:y'
: 
'2'
,
...         
'http://b.com/:z'
: 
'3'
,
...     },
... }
True

Streaming mode

xmltodict is very fast ( Expat -based) and has a streaming mode with a small memory footprint, suitable for big XML dumps like Discogs or Wikipedia :

>>
>
 def
 handle_artist
(
_
, 
artist
):
...     
print
(
artist
[
'name'
])
...     
return
 True

>>
>
 
>>
>
 xmltodict
.
parse
(
GzipFile
(
'discogs_artists.xml.gz'
),
...     
item_depth
=
2
, 
item_callback
=
handle_artist
)
A
 Perfect
 Circle

Fantomas

King
 Crimson

Chris
 Potter

...

It can also be used from the command line to pipe objects to a script like this:

import
 sys
, 
marshal

while
 True
:
    
_
, 
article
 =
 marshal
.
load
(
sys
.
stdin
)
    
print
(
article
[
'title'
])

$ bunzip2 enwiki-pages-articles.xml.bz2 
|
 xmltodict.py 2 
|
 myscript.py
AccessibleComputing
Anarchism
AfghanistanHistory
AfghanistanGeography
AfghanistanPeople
AfghanistanCommunications
Autism
...

Or just cache the dicts so you don't have to parse that big XML file again. You do this only once:

$ bunzip2 enwiki-pages-articles.xml.bz2 
|
 xmltodict.py 2 
|
 gzip 
>
 enwiki.dicts.gz

And you reuse the dicts with every script that needs them:

$ gunzip enwiki.dicts.gz 
|
 script1.py
$ gunzip enwiki.dicts.gz 
|
 script2.py
...

Roundtripping

You can also convert in the other direction, using the unparse() method:

>>
>
 mydict
 =
 {
...     
'response'
: {
...             
'status'
: 
'good'
,
...             
'last_updated'
: 
'2014-02-16T23:10:12Z'
,
...     }
... }
>>
>
 print
(
unparse
(
mydict
, 
pretty
=
True
))
<
?
xml
 version
=
"1.0"
 encoding
=
"utf-8"
?
>

<
response
>

	<
status
>
good
<
/
status
>

	<
last_updated
>
2014
-
02
-
16
T23
:
10
:
12
Z
<
/
last_updated
>

<
/
response
>

Text values for nodes can be specified with the cdata_key key in the python dict, while node properties can be specified with the attr_prefix prefixed to the key name in the python dict. The default value for attr_prefix is @ and the default value for cdata_key is #text.

>>
>
 import
 xmltodict

>>
>
 
>>
>
 mydict
 =
 {
...     
'text'
: {
...         
'@color'
:
'red'
,
...         
'@stroke'
:
'2'
,
...         
'#text'
:
'This is a test'

...     }
... }
>>
>
 print
(
xmltodict
.
unparse
(
mydict
, 
pretty
=
True
))
<
?
xml
 version
=
"1.0"
 encoding
=
"utf-8"
?
>

<
text
 stroke
=
"2"
 color
=
"red"
>
This
 is
 a
 test
<
/
text
>

Lists that are specified under a key in a dictionary use the key as a tag for each item. But if a list does have a parent key, for example if a list exists inside another list, it does not have a tag to use and the items are converted to a string as shown in the example below. To give tags to nested lists, use the expand_iter keyword argument to provide a tag as demonstrated below. Note that using expand_iter will break roundtripping.

>>
>
 mydict
 =
 {
...     
"line"
: {
...         
"points"
: [
...             [
1
, 
5
],
...             [
2
, 
6
],
...         ]
...     }
... }
>>
>
 print
(
xmltodict
.
unparse
(
mydict
, 
pretty
=
True
))
<
?
xml
 version
=
"1.0"
 encoding
=
"utf-8"
?
>

<
line
>

        <
points
>
[
1
, 
5
]
<
/
points
>

        <
points
>
[
2
, 
6
]
<
/
points
>

<
/
line
>

>>
>
 print
(
xmltodict
.
unparse
(
mydict
, 
pretty
=
True
, 
expand_iter
=
"coord"
))
<
?
xml
 version
=
"1.0"
 encoding
=
"utf-8"
?
>

<
line
>

        <
points
>

                <
coord
>
1
<
/
coord
>

                <
coord
>
5
<
/
coord
>

        <
/
points
>

        <
points
>

                <
coord
>
2
<
/
coord
>

                <
coord
>
6
<
/
coord
>

        <
/
points
>

<
/
line
>

Ok, how do I get it?

Using pypi

You just need to

$ pip install xmltodict

Using conda

For installing xmltodict using Anaconda/Miniconda ( conda ) from the conda-forge channel all you need to do is:

$ conda install -c conda-forge xmltodict

RPM-based distro (Fedora, RHEL, …)

There is an official Fedora package for xmltodict .

$ sudo yum install python-xmltodict

Arch Linux

There is an official Arch Linux package for xmltodict .

$ sudo pacman -S python-xmltodict

Debian-based distro (Debian, Ubuntu, …)

There is an official Debian package for xmltodict .

$ sudo apt install python-xmltodict

FreeBSD

There is an official FreeBSD port for xmltodict .

$ pkg install py36-xmltodict

openSUSE/SLE (SLE 15, Leap 15, Tumbleweed)

There is an official openSUSE package for xmltodict .

#
 Python2

$ zypper 
in
 python2-xmltodict

#
 Python3

$ zypper 
in
 python3-xmltodict

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
.github/ workflows		.github/ workflows
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
ez_setup.py		ez_setup.py
push_release.sh		push_release.sh
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini
xmltodict.py		xmltodict.py

License

martinblech/xmltodict

Folders and files

Latest commit

History

Repository files navigation

xmltodict

Namespace support

Streaming mode

Roundtripping

Ok, how do I get it?

Using pypi

Using conda

RPM-based distro (Fedora, RHEL, …)

Arch Linux

Debian-based distro (Debian, Ubuntu, …)

FreeBSD

openSUSE/SLE (SLE 15, Leap 15, Tumbleweed)

About

Resources

License

Stars

Watchers

Forks

Languages