If you have ever written a ruby application, it is very likely that you have interacted with rubygems.org. You’ve probably even trusted that site to run arbitrary programs on your computer. When you run, for example, gem install rails, the gem utility fetches the rails gem and all of its dependencies from rubygems.org, and installs everything into the appropriate places. Anyone can publish gems there after making an account.
Ruby gems are actually just tar archives, so running tar -xvf foo.gem will ordinarily leave you with three files:
These files are pretty much what they look like. All are gzipped. metadata.gz contains a YAML file with information about the gem like its name, author, version, and so on. data.tar.gz contains another tar archive with all the source code. checksums.yaml.gz contains a YAML file with some cryptographic hashes of the gem’s contents.
I was surprised to learn that parsing untrusted YAML is dangerous. I had always figured it was a benign interchange format like JSON. In fact, YAML allows for the encoding of arbitrary objects, much like python’s pickle.
When you upload a gem to rubygems.org, the application calls Gem::Package.new(body).spec. The rubygems gem, where this method lives, uses unsafe calls to YAML.load to load the YAML files in the gem.
Unfortunately, the monkey-patching was insufficient, since it only patched the Gem::Specification#from_yaml method. If we check out what actually happens in that call to #spec, we see that it calls #verify, the important parts of which are reproduced below:
OK, so we have a call to YAML.load with input that we control. How can we exploit it? Originally I attempted to have my exploit code run at the time of the YAML.load call itself. This turned out to be more challenging than I had anticipated, because although I could deserialize arbitrary objects, the only actual method calls I could make on those objects were very limited. Psych, the YAML parsing library used here, would let me make calls to methods like #=, #init_with, and #marshal_load (not Marshal.load; that would have made exploitation much easier). But for most objects, those methods don’t give an attacker much flexibility, since common practice is for them to just initialize a couple variables and return. It seems plausible that there is some object in some standard rails library with a dangerous #= method (as there have been in the past), but I didn’t find one.
Instead, I looked back at the rubygems.org application. What does it do with that @checksums variable, which we can now set to be an instance of any in-scope class? Over in #verify_checksums:
So if we can build an object where calling #sort does something dangerous, we can trigger our exploit. In the end, I came up with the following proof of concept. The payload that actually gets evaled is contained in the base-64 encoded, DEFLATE compressed, marshalled section at the bottom (in this case, it just shells out to run echo "oops"):
Starting from the last step and working backwards to the call to #sort:
At the bottom we have an ActiveSupport::Cache::Entry object. The important thing about this object is that when the #value method is called and @compressed is true, it will call Marshal.load on DEFLATE compressed, attacker provided data. The object that is unmarshalled is constructed in such a way that calling just about any method on it will execute the attacker’s code. The exact method used here has been written about before – here is how it works. Unfortunately, we can’t just deserialize this object with YAML to achieve code execution, because it undefs almost all of its methods, including the ones that allow us to set instance variables. It really needs to be loaded with Marshal.load to be useful in this context.
Working our way up, the ActiveSupport::Cache::MemoryStore object holds our malicious unmarshalled object in a hash called @data. Its parent class, ActiveSupport::Cache::Store defines a #read methodthat calls #read_entry within the MemoryStore. #read_entry basically just grabs the entry out of @data and returns it.
The call to MemoryStore#read comes from a call to Gem::Package::TarReader::Entry#read, which itself is called by Gem::Package::TarReader#each. After the read returns, #size is called on the returned value, which our malicious unmarshalled object does not define, causing our payload to execute.
Finally, because Gem::Package::TarReader specifies include Enumerable, a call to its #sort method will call its #each method, starting the whole chain above.
For me, one of the takeaways here is that YAML is very powerful, and sometimes used in contexts where less expressive (but safer) interchange formats like JSON might be more appropriate. Perhaps in the future, YAML.load could be modified to take a whitelist of classes as an optional parameter, making the deserialization of complex objects an opt-in behavior. YAML.load in its current state should really be named something like YAML.unsafe_load to get the point across, instead of relying on users to know when they should use YAML.safe_load.
If you’re interested in ditching #birdsite and want to use a social network that actually respects your freedoms, you should consider joining Mastodon! It’s a federated social network, meaning that it works in a distributed way sort of like email. Join us over in the fediverse and help us build a friendly security community!