Object serialization and injection in PHP
Spoiler: One day, we invented the somewhat crazy idea of transferring objects from one application to another. Subsequently, we realized that this feature opened up great attack possibilities for those who can inject their own content. Eventually, we realized that we should never use serialization.
Once upon a time, when developing a website was about offering a nice
graphical interface to make life easier for users. It was enough to
embellish HTML
with PHP
instructions to give
his site a je ne sais quoi which energizes the whole and makes
the world a better place.
Then we realized that we could actually make applications communicate with each other. A whole universe of possibilities to discover… Enriching a third-party service, delegating functionalities (like authentication), unifying several platforms,… And to allow all these little people to understand each other, we invented lots of data formats .
In our creative frenzy, we invented (un)serialization, a way to natively transfer complex structures, including objects, from one application to another. It was probably a great idea at the time (because it’s super practical for developers), but as we’ll see today, it’s probably the last thing you should use in an application.
Well, in reality, I would still prefer to unserialize than to manage XML. Fortunately, other technologies have been invented that prevent us from having to resort to such extremes.
The problem with (un)serialization is that it allows your users (human or not) to forge their own objects in your application. Well chosen, certain objects will have behaviors not really expected and will allow these smart guys to hijack your application to their advantage.
This time, the solution is simple, if you see a call to
unserialize()
, or something that resembles it: run away;
the land is mined. If you think you can get through this by being
careful, seek psychological assistance, your life may be in danger.
Objects in PHP
This vulnerability being typical of the world of objects, a
few preliminary reminders may be useful if you are not used to
developing in this way. I am using PHP
here, but the
operation would be essentially the same in another object language.
If you don’t know what an object is, tell yourself that it’s something that contains both data (variables called attributes or member) and functionalities (functions called methods ). Your program is full of these stuff that interact with each other to, overall, solve your problem and provide you with the results you expect.
Personally, I really see object programs as ecosystems full of little creatures (my objects) that live their lives and interact with each other. When I program, I define the species of critters with their characteristics and behaviors.
It has a demiurge side, but since I consulted, it’s better.
Most of the time, these objects are described in what we call classes, kinds of patterns, models, molds or any other metaphor which means that an object is created in memory and will be manipulated according to what is written in thoses classes. Object-oriented programming therefore consists on defining classes then creating objects (we say instantiate them) and finally of launching the interactions.
“Most of the time” because certain languages, like javascript, shun this formalism and prefer to add attributes and methods on the fly without any notion of type, the objects being created by cloning, it’s a happy creative mess. This does not make them invulnerable (on the contrary), the approach is just different.
For a good cause, here is a small class, describing very simple users whose only goal is to respond politely when greeting each other… I document here with doxygen more for example than necessity.
/**
* Class describing users
*/
class User {
/**
* Attribute storing the object name
*/
public $name ;
/**
* Constructor, to initialize a new object
*
* @param $name the name of the object
*/
public function __construct(string $name) {
$this->name = $name;
}
/**
* Method to find out the object name
*
* @return the name of the object
*/
public function whoAreYou() {
return $this->name;
}
/**
* Method to greet the object
*
* @param $other the saluting object
* @return the object's response
*/
public function hello(User $other) {
return “Nice to meet you” . $other->name
. ", I am " . $this->name ;
} }
Once this class is defined and its code loaded by your scripts, you can use it to create objects and make them interact.
// Creation of two objects
$foo = new User("Foo");
$bar = new User("Bar");
// Method call
echo $foo->hello($bar);
// > Nice to meet you Bar, I am Foo
Serialization
Serialization consists of transforming objects (sometimes complex)
into a character string (or a sequence of
and
).
Note that the goal here is to later perform the inverse transformation,
we then speak of unserialization. The PHP
documentation
sums it up well by talking about generating a
storable representation.
To simply copy the memory won’t work. Firstly because the necessary objects can be distributed almost everywhere. Then, because all of the necessary memory may contain useless data (or even sensitive data, which would be a shame). Finally, objects can contain non-storable resources, such as file descriptors, database connections, etc.
We therefore had to develop specific techniques to save this complex data.
Natively
In PHP
, serialization is done natively with the
following two functions:
- Serialization: with the function serialize() which takes the data to be serialized as a parameter and returns you a string of character,
- Unserialization with the function unserialize() which performs the opposite operation by taking a character string and providing you the corresponding data structure.
To complement this, PHP
provides other ways to transform
your objects into strings (and vice versa). var_export()
generates PHP
source code which must be interpreted
to find your data (very dangerous). Or json_encode()
and its converse json_decode()
but in this case we completely lose the typing.
To be really complete, crazy people even invented an XML-based method… After installing the PEAR XML Serializer extension, you will be able to export and import your objects from XML format (at the cost of 3 sanity points).
To return to native serialization, let’s see what it would look like with our Users:
$foo = new User("Foo");
echo serialize($foo);
// O:4:"User":1:{s:4:"name";s:3:"Foo";}
If you want to understand the deeper meaning of this serialization, here it is:
O:4:User:
means that we are in the presence of a linearized object (O
), the name of the class is 4 characters long and equalsUser
,1:{...}
gives us the number of attributes of the object (here, 1), these are linearized one after the other between the braces,s:4:"name";
means that the name of the first attribute is 4 characters long and equalsname
,s:3:"Foo";
means that the value of the first attribute is a string, of 3 characters, equal toFoo
.
And it’s just as easy to unserialize the previous data:
$string = 'O:4:"User":1:{s:4:"name";s:3:"Foo";}' ;
$foobis = unserialize($string);
var_dump($foo == $foobis); // bool(true)
var_dump($foo === $foobis); // bool(false)
After unserialization, we obtain a second object, equivalent to the
first but distinct. The equality operator ==
relating to
the class and content of the attributes therefore tells us that they are
equal. The identical operator ===
relating to the
reference of the objects tells us, on the other hand, that they are
distinct.
Customization
When your objects are too complex to be serialized automatically,
PHP
provides you two ways to override its mechanisms in
order to use yours.
- Via the magic methods:
__sleep()
and__wakeup()
(cf. official doc). The first is called before serialization and is supposed to return the list of names of the attributes to be serialized. The second is called just after unserializing the object (without parameters since the attributes have been restored). - By implementing an interface: Serializable which asks you to implement two methods. serialize() which returns the representation of your object, and unserialize() which does the opposite.
For example, with our users, if we wanted to record the number of generations that separate a copy from the original object, here is the kind of code we could produce.
class User implements Serializable {
// Previous code here
public $generation = 0;
public function serialize() {
$data = [$this->me, $this->generation + 1];
return serialize($data);
}
public function unserialize($string) {
$data = unserialize($string);
list($this->me, $this->generation) = $data;
} }
Good to know, if you implement the Serializable
interface, the __sleep()
and __wakeup()
magic
methods will be ignored.
For those who do RAII, be aware that serialization will require some precautions…
- Once serialized, the object still exists, its destructor will therefore be called as usual,
- During unserialization, the constructor is not called (since we consider that we are giving birth “again”).
Why ? Backup
The first advantage of serializating objects is that it allows their storage and then their restoration. Thus, you can save the state of objects in files at any time and then restore them if necessary from these backup files.
We could imagine that in our example, we first start by saving our objects in files.
file_put_contents("foo.txt", serialize(new User("Foo")));
file_put_contents("bar.txt", serialize(new User("Bar")));
Then, later in the script (or indeed in another script), retrieve them and continue the calculations.
$foo = unserialize(file_get_contents("foo.txt"));
$bar = unserialize(file_get_contents("bar.txt"));
echo $foo->hello($bar);
// Nice to meet you Bar, I am Foo
This example uses files but linearization can also be used to save information in a database. On the other hand, as we will see later, cookies are a very bad idea.
Why ? Transfer
In addition to restoring a local backup, serialization also allows objects to be transferred from one application to another. The calculations can then be distributed according to available resources or the logic of the algorithm.
We could imagine that in our example, the call to
hello()
is made on a specific server. The corresponding
script would restore the objects passed to it as parameters
(getvars here but anything is possible) before calling the
appropriate method.
$foo = unserialize($_GET["foo"]);
$bar = unserialize($_GET["bar"]);
echo $foo->hello($bar);
In this case, the creation of the objects would be done in another scripts, perhaps on another server on the other side of the world. Serialization allowing the state of objects to be passed from one server to another.
echo file_get_contents(
"http://example.com/hello.php"
. "?foo=" . urlencode(serialize(new User("Foo")))
. "&bar=" . urlencode(serialize(new User("Bar")))
;
) // Nice to meet you Bar, I am Foo
I grant you, it is a little exaggerated in our case but if you have very specific resources on a server and do not want to overload it with accessory calculations (i.e. an HSM to hash information), the idea is no longer that incongruous.
As we will see later, using linearization is a bad idea if you cannot establish trust between the two servers. From a defense in depth perspective, serialization to transfer objects is still a bad idea.
Exploitation
Now that we’ve seen that (un)serialization is great, we’ll see how bad it is: every time you unserialize a string coming from an attacker, you’ll allow him to inject its own objects. As we will see, this can give it access or execute code…
Example
Let’s start with a very simplified application that reuses our
User
class. Let’s say that during authentication, the
Identity Provider stores the user in a cookie. Something like
this:
$_COOKIE["user"] = serialize(new User($username));
We could then imagine that another part of the application, the Service Provider, carries out access control and opens restricted functionalities if the user is the administrator:
$user = unserialize($_COOKIE["user"]);
if ($user->name == "admin") {
// God mode!
// ...
}
For a user who does not cheat, the cookie is generated by the
application and the name is only admin
if it is an
administrator.
Change behavior
An attacker, on the other hand, can cheat and send whatever (s)he wants. Either by changing characters in the serialization, or by creating his/her own object which it then serialize. In either case (s)he could create this string:
'O:4:"User":1:{s:4:"name";s:5:"admin";}'
With this string, the unserialized object will have the right name, opening the doors to restricted functionality!
If you think that the problem is in the User
class, and
that we should be more careful when (un)serializing, I will show you
that it is not.
If you think that the problem is because the cookie is in cleartext, there is truth, but we will see at the end why even that, I do not recommend it.
Let’s say we completely prohibit the admin
value with
something like this:
class User {
// Previous code here
public function __wakeup() {
if ($this->name == "admin") {
throw new Exception("Admin cannot be unserialized");
}
} }
An attacker can very well send you something other than a
User
. Any other type with a name
attribute equal to admin
will do. At worst, we can even
fall back on the stdClass
type (the native
class of PHP
for all objects without a specific class)
and manually add the attribute to it:
$o = new stdClass();
$o->name = "admin" ;
echo serialize($o);
// O:8:"stdClass":1:{s:4:"name";s:5:"admin";}
With a value like this, the script no longer unserializes a
User
but an stdClass
. As PHP
is
not very typed, it will not pose a problem, the condition can be
evaluated and open the doors…
The problem does not come from the User
class but from
having unserialized an object coming from a user. If (s)he is hostile,
(s)he can will inject the items (s)he wants for his/her own
benefit.
Run code
At this point, you might say to yourself that we should be even more careful and, for example, use methods rather than attributes since an attacker cannot add them.
In this case, let’s assume that no class anywhere in your application
has a whoAreYou()
method. You might think that the
following code is safe:
$user = unserialize($_COOKIE["user"]);
if ($user->whoAreYou() == "admin") {
// God mode!
// ...
}
The call to the method will fail if the attacker uses anything other
than a User
. So the situation is under control!?
no
The idea, this time, is no longer to bypass the condition to obtain privileged access, but to inject objects that have useful methods and manage so that the application call them. And the thing is, even if your code doesn’t call a lot of methods, some magic methods are actually routinely called:
__wakeup()
orunserialize()
for classes that customize this behavior,__destruct()
when the unserialized data will no longer be used.
So let’s stick with the classic magic methods and admit that we have, somewhere in the application, a class which takes care of logging events in files. Something like this:
class Logger {
private $filename;
private $buffer;
public function __construct($filename) {
$this->filename = $filename;
$this->buffer = "" ;
}
public function log($message) {
$this->buffer .= "$message\n" ;
}
public function __destruct() {
file_put_contents(
$this->filename,
$this->buffer, FILE_APPEND
;
)
} }
This class defines objects whose purpose is to buffer event messages to add them all at once to a specific log file. Nothing complicated, it’s a very classic kind of thing.
If you want to play with this vulnerability, I recommend Natas challenge 26 which lets you implement object injection via a class very close to this one (you will have to adapt).
For the attacker, this class is a gift. (S)He just needs to
inject a Logger
object into your application. As (s)he
masters the filename
and buffer
attributes,
(s)he will be able to write what (s)he wants and where (s)he wants when
the destructor is called…
// Code at the attacker
class Logger {
public $filename;
public $buffer;
}
$payload = new Logger();
$payload->filename = '/var/www/index.php';
$payload->buffer = '<?php echo "Hello world" ;' ;
echo serialize($payload);
// O:6:"Logger":2:{s:8:"filename";s:18:"/var/www/index.php";s:6:"buffer";s:26:"< ?php echo "Hello world" ;";}
By injecting this payload into the cookie, the application will
unserialize it and obtain a Logger
type object. Of course,
the call to whoAreYou()
will fail since the object does not
have this method. But when the script finishes, the object will be
destroyed, calling its __destruct()
method. This will then
push its buffer into the file on the server, writing what you wanted
where you wanted it.
This time again, the problem is not in the Logger
class
which would not pay enough attention. The problem is that the script has
unserialized data coming from an attacker who can therefore inject
whatever (s)he wants.
Protections and bad ideas
Now is the hardest part. I showed you something cool and as you fell in love, I showed you that it’s actually dangerous. Grieving is difficult and after the shock, denial and then anger, you may want to negotiate: “Maybe if…”.
No, even with ifs, it remains dangerous. It will hurt, it will be depressing, but you will eventually accept it and get back to the normal course of your life.
Loading classes
For object injection to work, the classes of the objects that the attacker injects must be loaded by your application. So you might say to yourself:
My script loads very few classes, I know them, so I don’t risk anything.
So for your argument to be valid, you manually load all the necessary
files into each script. As a result, you scrupulously prohibit two very
practical features of PHP
:
- The automatic loading of classes and in particular the PSR-4 yet very practical for organizing your source code.
- The libraries on packagist or more generally any notion of dependency via composer, although very practical for not reinventing the wheel.
Note that for some languages you cannot disable it and any class in your application will be available: Java and python, among others.
But let’s say you apply these restrictions.
This is still a bad idea in terms of defense in depth because you had to define a perimeter that was much too large and vague to be guaranteed without problems. Because software lives: new pieces are added regularly, old ones die without necessarily disappearing.
Over the course of maintenance, the modified code constitutes a permanent risk that a class useful for an attacker will be added. The more the code evolves, the more it will be difficult to guarantee that no class or method can be used by an attacker. And suppose you find that a class provides such a means, what would you do if it is in fact necessary for your application?
Keeping serialization by forbidding the automatic loading of classes and by imposing increasingly costly formal verifications is like a surgeon operating with wool gloves because it’s more comfortable and then too bad for the handling of clamps and scalpels.
Whitelist
Since PHP 7.0
, the unserialize()
function
has a new parameter to give a whitelist of allowed classes. If the
object’s type is not in the list, unserialization fails, giving you an
unusable object. And there, I see you coming… “So, I just have to
put the right list and I can even use compose! »
Technically an object of class
__PHP_Incomplete_Class
which cannot be used. Access to attributes provides aNULL
value and emits aNOTICE
log. Calls to methods fail with fatal errors.
The problem is that even if it reduces the risk a lot, it doesn’t eliminate it. When maintaining your application, what guarantees that no one will make a mistake?
- Add the wrong class to the list…
- Add the wrong method to a class in the list…
- Add unserialization and forget the list…
This time, it’s as if your surgeon told you that he understood that with wool gloves, the tools are difficult to handle, so he operates with mittens, that leaves a few little lint, but the cuts are finally clean .
Crypto-signatures
On the same kind of principle as in SAML
, we can use the
idea of signing exports. Once the data is serialized, we
calculate the cryptographic signature which we attach to the message.
Upon reception, we first check the signature and if it is valid, we
unserialize and continue the calculations.
It’s cryptographically signed, so it’s mathematically safe!
The first problem is that there is no standard or any native or integrated functions offering this signed serialization mechanism. You would then have to invent your wheel and I remind you that we are talking about cryptography here. The area where we advise avoiding homemade tricks.
The second, and not least, is that the signature is not used for that. It of course makes it possible to certify the origin of the data but in no case the safety of the content. This is, in a way, coming back to a perimetric vision of security.
- Either you take the data from a third party. What are your guarantees on its reliability?
- Either you sign your own data which you send to your customers. What is the benefit compared to a session on your own?
- Either you sign your own data which you keep with you. What is the point of signing them?
After a health check, the surgeon finally decided to apply strict measures. It is now supplied exclusively from a manufacturer of mittens of controlled origin with a certificate of authenticity on each batch.
Small projects and proofs of concept
Finally, I would like to finish on small projects. Those that are done very quickly just to test something or prove a concept.
This is just to test quickly, it will never be online.
In fact, even then, I recommend against using serialization. I see you’re disappointed and I’ll tell you why, even for these small codes, it’s a bad idea…
- If your project is so small, why need something as advanced as serialization?
- If you develop bad habits for small projects, you will keep them for bigger ones.
- Do you know the number of proofs of concept that go into production because it’s faster and cheaper…
The hospital management having finally banned the use of wool gloves and mittens during operations, the surgeon put them aside for the teaching and training of emergency doctors. Once at work, it always takes a little period of adaptation but after three incisions, things get better.
And now ?
Never unserialize
As you will have understood, any attempt to (un)serialize data,
whether via serialize()
or imitations, will only bring ruin
and desolation to your project…
- Increasing development costs for ever more complex rigorous verifications,
- Inevitable careless errors, due to difficulty, fatigue, stress or even incompetence,
- Vulnerabilities that are difficult to find and especially to fix.
And all this because you are using a function which, it’s true, looks nice, is very cute and comes from the blessed days of the ancients who knew how to live in harmony with The Code.
Computer security is a risky profession. You must already be very good in training to have a chance of not making any mistakes in real conditions. Excellence comes at this price: permanent vigilance.
Do not trust incoming data
If you really need to transfer complex structures between your applications, my advice will be the same as for any data outside your components:
Don’t trust incoming data.
The representation formats for sending and receiving should never contain typing instructions that are blindly followed by your code. At best, you could attach hints and check them before using them.
Example with JSON
Technically, any format without typing meets this criterion. You
could loot at JSON
which has the advantage of being very
well integrated by all platforms.
- The json_encode()
function which allows you to transform any structure into
JSON
. Although programs will not need it, theJSON_PRETTY_PRINT
option will be useful if humans need to read these exports. - The json_decode() function which transforms the character string into an associative array which you can then browse to create the useful objects,
- The JsonSerializable interface which allows your classes to customize the export. As this format is not typed, no inverse function is available (and that is the whole point).
To come back to our initial example and our Users who say hello, we
could add a construction method from JSON
(we can talk
about static factory) as follows:
class User implements JsonSerializable {
// Previous code here ...
// Only if name is private
public function jsonSerialize() {
return ["name" => $this->name ] ;
}
public static function FromJson($json) {
$table = json_decode($json) ;
return new User($table["name"])
} }
We can then use the example code for saving objects in files by
replacing the serialization with JSON
encoding.
file_put_contents("foo.txt", json_encode(new User("Foo")));
file_put_contents("bar.txt", json_encode(new User("Bar")));
Retrieving objects is just as simple but this time, we don’t let
PHP
choose the type based on the content retrieved, we
force it manually by calling the User::FromJson()
factory.
$foo = User::FromJson(file_get_contents("foo.txt"));
$bar = User::FromJson(file_get_contents("bar.txt"));
echo $foo->hello($bar);
// Nice to meet you Bar, I am Foo
The other examples can be adapted in the same way and avoid any malicious injection.