tl;dr Remote code execution via a deserialization vulnerability on rubygems.org, a very popular hosting service for ruby dependencies. A fix was rolled out quickly. Read the official announcement here. CVE-2017-0903
If you have ever written a ruby application, it is very likely that you have interacted with rubygems.org. You’ve probably even trusted that site to run arbitrary programs on your computer. When you run, for example,
gem install rails, the
gem utility fetches the
rails gem and all of its dependencies from rubygems.org, and installs everything into the appropriate places. Anyone can publish gems there after making an account.
Rubygems.org is itself a rails application with a clearly laid out responsible disclosure policy.
Ruby gems are actually just tar archives, so running
tar -xvf foo.gem will ordinarily leave you with three files:
These files are pretty much what they look like. All are gzipped.
metadata.gz contains a YAML file with information about the gem like its name, author, version, and so on.
data.tar.gz contains another tar archive with all the source code.
checksums.yaml.gz contains a YAML file with some cryptographic hashes of the gem’s contents.
I was surprised to learn that parsing untrusted YAML is dangerous. I had always figured it was a benign interchange format like JSON. In fact, YAML allows for the encoding of arbitrary objects, much like python’s pickle.
When you upload a gem to rubygems.org, the application calls
rubygems gem, where this method lives, uses unsafe calls to
YAML.load to load the YAML files in the gem.
However, the authors of rubygems.org knew this (probably as a result of this incident), and as of 2013 were monkey-patching the YAML and gem parsing libraries to only allow the deserialization of a whitelist of classes, eventually switching to using
Psych.safe_load in 2015.
Unfortunately, the monkey-patching was insufficient, since it only patched the
Gem::Specification#from_yaml method. If we check out what actually happens in that call to
#spec, we see that it calls
#verify, the important parts of which are reproduced below:
@gem.with_read_io do |io|
Gem::Package::TarReader.new io do |reader|
verify_checksums @digests, @checksums
@checksums = gem.seek 'checksums.yaml.gz' do |entry|
Zlib::GzipReader.wrap entry do |gz_io|
YAML.load gz_io.read # oops
OK, so we have a call to
YAML.load with input that we control. How can we exploit it? Originally I attempted to have my exploit code run at the time of the
YAML.load call itself. This turned out to be more challenging than I had anticipated, because although I could deserialize arbitrary objects, the only actual method calls I could make on those objects were very limited. Psych, the YAML parsing library used here, would let me make calls to methods like
Marshal.load; that would have made exploitation much easier). But for most objects, those methods don’t give an attacker much flexibility, since common practice is for them to just initialize a couple variables and return. It seems plausible that there is some object in some standard rails library with a dangerous
#= method (as there have been in the past), but I didn’t find one.
Instead, I looked back at the rubygems.org application. What does it do with that
@checksums variable, which we can now set to be an instance of any in-scope class? Over in
checksums.sort.each do |algorithm, gem_digests|
gem_digests.sort.each do |file_name, gem_hexdigest|
computed_digest = digests[algorithm][file_name]
So if we can build an object where calling
#sort does something dangerous, we can trigger our exploit. In the end, I came up with the following proof of concept. The payload that actually gets evaled is contained in the base-64 encoded, DEFLATE compressed, marshalled section at the bottom (in this case, it just shells out to run
value: !binary '\
Starting from the last step and working backwards to the call to
At the bottom we have an
ActiveSupport::Cache::Entry object. The important thing about this object is that when the
#value method is called and
@compressed is true, it will call
Marshal.load on DEFLATE compressed, attacker provided data. The object that is unmarshalled is constructed in such a way that calling just about any method on it will execute the attacker’s code. The exact method used here has been written about before – here is how it works. Unfortunately, we can’t just deserialize this object with YAML to achieve code execution, because it undefs almost all of its methods, including the ones that allow us to set instance variables. It really needs to be loaded with
Marshal.load to be useful in this context.
Working our way up, the
ActiveSupport::Cache::MemoryStore object holds our malicious unmarshalled object in a hash called
@data. Its parent class,
ActiveSupport::Cache::Store defines a
#read methodthat calls
#read_entry within the
#read_entry basically just grabs the entry out of
@data and returns it.
The call to
MemoryStore#read comes from a call to
Gem::Package::TarReader::Entry#read, which itself is called by
Gem::Package::TarReader#each. After the read returns,
#size is called on the returned value, which our malicious unmarshalled object does not define, causing our payload to execute.
include Enumerable, a call to its
#sort method will call its
#each method, starting the whole chain above.
For me, one of the takeaways here is that YAML is very powerful, and sometimes used in contexts where less expressive (but safer) interchange formats like JSON might be more appropriate. Perhaps in the future,
YAML.load could be modified to take a whitelist of classes as an optional parameter, making the deserialization of complex objects an opt-in behavior.
YAML.load in its current state should really be named something like
YAML.unsafe_load to get the point across, instead of relying on users to know when they should use
Thanks very much to the rubygems.org team for running a responsive bug bounty program.
If you’re interested in ditching #birdsite and want to use a social network that actually respects your freedoms, you should consider joining Mastodon! It’s a federated social network, meaning that it works in a distributed way sort of like email. Join us over in the fediverse and help us build a friendly security community!