system templar

org

a senior sysadmin's blog on /dev/random

Getting Started With Flamegraph

Linuxfest Northwest 2014

I went to Linuxfest Northwest again in 2014 and did another talk, this time on Flamegraph.

I’m posting the video nearly a year late, because I finally got around to editing it.

My excuses are:

  • I have a son who turns 2 next week, and I’m in the parental time warp.
  • I had to re-create parts in the video (the flamegraph demo parts) and it seamed like a daunting task but turned out to be very simple.
  • I’m a sysadmin not a video editor. All my future videos where I have to do the editing will be in a much simpler format. :)

Here’s the video.

And the slides (which these I posted the day after I presented).

Flamegraph has had some very cool changes since this video, and you can now use Flamegraph on your python scripts via plop.

Linuxfest 2015 is right around the corner, but I’m not doing a talk this year due to time constraints. :(

LISA Round 3

LISA the Third

$dayjob sent me to LISA again, for which I am quite grateful. It almost didn’t happen due to me moving to another business unit (read: promotion) a few weeks before the conference, but my new manager was able to pull some strings and make it happen (Thank you Brent!).

The Training

Like every year I showed up on the Monday, so I could get settled and do 1 day of training on the Tuesday and then 3 days of sessions.

The training on Tuesday was not a great selection in my opinion, there were too many full day training sessions which didn’t allow for a lot of choice. Of the remaining choices, one of the courses I had taken 2 years ago, so my choice was even further restricted. I ended up taking “Build a Sysadmin Sandbox” as the first course and wasn’t really getting anything out of it, so I bailed and went to puppetcamp which was also occurring on the same day. It was good course but the material wasn’t anything I didn’t already know.

Puppetcamp was very packed, but I was interested to see Garrett Honeycutt speak, as I do use a few of his modules and he is well known in the puppet community.

After that particular session I decided to just do the hallway track until my second training session.

The second training session was an introduction to Go for systems programming. I have heard a lot of hubub about Go, and was interested to see if the training/trainer could convince me that it was a good idea to stop using python and start using go. Long story short, I remain unconvinced that it would be a good idea to move from python to go for sysadmin type scripts. However if you need to worry about multithreading / multiprocessing then it would be a good idea to use go (go with go?). The material was good and the trainer did a good job; it’s just that according to me, go adds a bit of complexity that you don’t need to worry about in python. I do have a freakishly long python script that is multiprocess that would be fun to rewrite in go, but that project won’t see the light of day for years. :)

Tuesday night we headed out to the Taphouse with a fairly large (18ish) group to socialize and swap stories. It was pretty fun and right across the street from the conference hotel.

The Sessions

My favorite sessions were:

Brendan Gregg’s Linux Performance Analysis: New Tools and Old Secrets was fantastic. Only 8 people in the audience had heard of Ftrace, I was not one of them. I have already downloaded perf-tools and am likely to deploy this to all systems.

Brian Atkisson’s Open Source Identity Management in the Enterprise was a very good talk, with the right amount of complexity. This is the exact type of talk I’ve come to expect at LISA. Worth a watch if you are doing anything related to identity management.

I missed the first 10 minutes, but Ben Rockwood had a very good session called I am SysAdmin (And So Can You).

Thomas Uphill had a puppet session called Puppet for the Enterprise that I went to that was good, but either the crowd didn’t want to participate or the level of puppet knowledge in the crowd was very very new. I didn’t realize before attending that he wrote Mastering puppet. I read the book shortly after it came out and do recommend it. I had some conversations Thomas later at the EMP museum about workflows with large teams and he seams to think we are doing everything right but I have issues with the current workflow that I don’t like. I believe I have a solution but I’ll save that for another blog post.

Birds of a feather

The birds of a feather were quite good.

There was also 1 BOF on Thursday, which I ran. Didn’t take a picture of the board, as there was only one (Besides the Google BOF).

I stole the idea from LISA 2012, and hosted the BOF “I’ve made a huge mistake”. Due to scheduling I had to have this during the Google BOF, so I didn’t really think many people were going to show up. Much to my surprise, so many people showed up we had to move to a bigger room. I should have taken a picture but totally didn’t. We moved to a larger room and everyone shared stories of their epic failures. Some of the stories were really hilarious.

Other than my own BOF, I liked the RedHat and Puppet vendor BOFs. There was also a Infrastructure Testing BOF that had me take several notes.

Random

Every LISA that I have been to so far, there has been an event that “paid for the conference”. And sometimes you don’t know when that will occur. For example, last year the flamegraph talks paid for the conference, as I have used that knowledge to solve several problems that would have taken weeks of man hours if I had not. Then months later we were looking for a Sunray replacement, and I remembered a hallway track conversation that I had with Tobi Oetiker about what he was doing with Sunrays, and I was able to suggest that we test that product. No-one in my organization had heard of that product it wasn’t even in our eval radar. Adding that product to our eval and having us use it will have saved the company the price of the conference several times over.

This year, Brendan Gregg’s talk paid for the conference. I’m working on a few other ideas inspired by the conference that may pay for the conference several times over. I am always very invigorated after a good conference, I think I will be realizing additional value of the conference for quite some time.

I got to meet Dave Josephson from Librato who recently collaborated with me on graphios to add librato and statsd support. We had some excellent conversations and from those conversations I will be making some changes to what I am monitoring to try and preserve the interesting data without having to keep my data forever. Dave also did 2 talks at the Nagios world conference that mention graphios here and here. Not to mention he did two talks at LISA here and here. I caught the second talk at LISA and I quite enjoyed it.

Notes for future first timers

  • Wear your badge from the beginning of the day until the end of the day. Otherwise people aren’t going to know that you are an attendee. Wearing the badge makes you approachable for hallway track conversations.
  • If you get a table for breakfast or dinner in the hotel, get a larger table and tell the greeter to send other attendees your way. (You may want to check if someone else is already doing this!).
  • Do a write up for your manager / coworkers when the whole thing is over. You need to pass on what you learned at the conference, and thank your manager for the opportunity of going. Your goal should be to prove that the company spent money wisely when they opted to send you there, and have justification for sending you again.
  • Do 1 day of training, so you can get all the training material.
  • If you aren’t sick of chatting with people, wear your LISA shirt at the airport. I had some excellent airport conversations, with other attendees because they recognized my shirt.

Random Notes I took

  • Read book Inspired
  • Read book Remote
  • Read book The quantum thief
  • Watch documentary Jiro Dreams of sushi
  • Look into bats (bash automated test system)
  • Look into nats.io apparently beats the pants off zeromq and rabbit written in go
  • Look into graphite-ng
  • Testkitchen apparently works for puppet
  • Several people told me to use Cassandra instead of Opentsdb
  • Have project managers do 15 minute hipchat meetings instead of 30 minute conference calls (This is amazing)
  • Make sure the CA root is kept offline on a usb key or whatever (This is likely already done in my org).
  • Watch British TV show A touch of cloth
  • Check out paper on ntopng
  • Lots of zookeeper haters / badmouthing
  • Facebook’s Opencompute datacenter is 38% more energy efficient, at 24% less cost (energy cost). If this was the only benefit it would be worth looking into, but there is so much more.

Final Thoughts

I had a great time at LISA, and look forward to attending again next year.

I (Finally) Joined Twitter

I finally joined twitter.

Whilst my little brother has been telling me for years that I should join twitter, I have continuously resisted.

Here is the conversation that made me cave:

Me> Twitter is stupid. It’s just IRC. I’ve been on IRC for nearly 20 years, and I’m in around 30 channels on 4 servers. Why would I want to subscribe to someone and see everything they say for every channel they are in? That’s crazy talk.

X> Okay. Sure. Twitter is just IRC. I get that. So let me ask you this. Are you on any IRC channels with Adrian Cockcroft? Brendan Gregg? John Allspaw?

Me> Well.. no.

X> Would you like to be?

Me> Well.. yes.

So here I am: @systemtemplar. I’m still learning the ropes.

Also I’m planning on updating this webpage a bit more often that twice a year, so we’ll see how that goes. I have another LISA writeup that I’m working on. So there will be at least that. :)

Getting Started With FlameGraph

Getting started with Flamegraph

Flamegraph is a utility written by Brendan Gregg that I have recently used a lot of, and I thought I would do a writeup for people who want to get their hands dirty. If you don’t know anything at all about flamegraph I recomend watching this video from LISA13 or if you just want some slides these ones are very good.

Flamegraph is visualization graph to help identify what is going on your system. Using flamegraph requires a few steps. Let’s learn by doing.

I am using CentOS 6.4, so my instructions will need to be tweaked for Debian/Ubuntu/other users.

We need to get either perf of system tap working on our test machine. Perf is easier so we will start with that.

yum install perf

Now that perf is installed you could start recoring events now, but if you did you would find that all of the symbols would be missing and your flamegraph would have a bunch if ‘Unknown’ entries in it. To address this we need to install debug symbols of the programs we want to debug.

To start off we are going to want at least the kernel and glibc debug packages, and after that what debug symbols you want depends on what you are doing. In the examples I’m doing I want to also debug qemu-kvm, so I’ll be installing those symbols.

Installing debuginfo packages for CentOS requires you to add the debuginfo repo or download the packages manually. I’m going with the add repo option. Here is what I added:

[debuginfo]
name=CentOS-$releasever - DebugInfo
baseurl=http://debuginfo.centos.org/$releasever/$basearch/
gpgcheck=0
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
protect=1
priority=1

Just need to install the packages now:

yum install --enablerepo=debuginfo glibc-debuginfo kernel-debuginfo qemu-kvm-debuginfo

Now that I have the debug packages installed I’m ready to start perf.

perf record -a -g -F 997 -o first.perf sleep 60

-a = all cpus -g = do call-graph (backtrace) recording -F = frequency (how often to collect data), which we are setting to 997 instead of 1000 based on Brendan’s advice (watch the video). -o = output file name. By default it will record to a file called perf.record, and it will get confusing when you are doing a lot of perf recording. sleep 60 = perf can record events for any command you do, in this case we don’t care about what we are doing, we are recording events on all cpus. So here we are just saying we want to record for 60 seconds.

  • NOTE: 60 seconds can be 50-100 mb or more depending on how busy your system is.

Next we want to convert the perf file (which is a binary file) into a text file so we can process it with flamegraph.

perf script -i first.perf > first.script

Now we have a text file we can work with.

Install flamegraph:

git clone https://github.com/brendangregg/FlameGraph.git

Before running flamegraph we need to process the script file into folded stacks.

cat first.script | ./stackcollapse-perf.pl > first.folded

Lastly we will run flamegraph:

cat first.folded | ./flamegraph.pl > first.svg

Now we can look at our pretty svg file!

We can do this all with pipes as well. So instead:

perf script -i first.perf | ./stackcollapse-perf.pl | ./flamegraph.pl > first.svg

A new feature I added for flamegraph is to use a consistent palette. Let’s use this in a practical scenario.

I have 2 servers both running java inside of KVM. One is working great, the other is not. On each machine I did a perf record, just like above. I saved the files to working.script and broken.script (ran perf record, and perf script on each box).

Next I transfered the script files to my workstation, and ran:

cat working.script | ./stackcollapse-perf.pl | ./flamegraph.pl --cp > working.svg

the –cp saves the randomly generated palette into a palette.map file. Then on the same workstation I ran

cat broken.script | ./stackcollapse-perf.pl | ./flamegraph.pl --cp --colors mem > broken.svg

the –cp option will use the same palette.map that was generated from the previous run of flamegraph. This time we also have –colors mem, which uses the mem palette for any NEW symbols that the previous flamegraph did not have. This will make the differences really stand out in our flamegraph. Now a side by side comparison shows the problem:

Pretty fun stuff.

My Second LISA Conference

Click here to see my first writeup

Once again I convinced $dayjob to spring for (most of) my trip to LISA13 I did a big write up for management last year, and did several training sessions with other employees as it was very much worth the cost / time / effort in my opinion.

This year was quite a bit further for me to travel, as I am on the west coast, and LISA is on the east coast. Google maps says that it’s 4421 km (2747 miles). I don’t think I can complain about the travel time as there are people I met who came from the UK, Sweden, Germany, Denmark, etc.

Once again I did one day of training and 3 days of the technical sessions, and again I did get all of the training material (which is totally worth the day of training, regardless of what training you take).

My Training

The first course I took was Theodore Ts’o’s “Recovering from Linux Hard Drive Disasters” which, was a very informative course on the history of file systems but I would have preferred some demos of fixing some broken file systems. Enter debugfs and fix some deleted files, or show WHEN it is better to hit no in a fsck, etc. I think that Ted Ts’o is well informed and a great speaker, it was just the history material was a little long, and not really on topic.

The second course I took was Joshua Jensen’s “High-Availability Linux Clustering” which was a pretty good course. My only comments is, I would have liked to see a demo, even if it was a short one.

My session highlights

The Disney animation studio presentation was fantastic, I was very impressed with what they have done with traditional management roles and how they are evolving to something that I think will work a lot better for pretty much all medium to large IT organizations and I do hope that more people will follow their example. Disney Animation studios is now on my list of “cool companies I would like to work for someday”.

The “surprise plenary” session on flamegraphs was also very good. It was a surprise because the scheduled speaker got sick and had to go to hospital (she is okay) so they needed a plenary session and had 1 hour notice. Branden had one hours notice to take a 30 minute talk and turn it into a hour and half talk and he was up till 4 in the morning coding on a new type of graph (Chain graphs) I think he did a really good job.

Dan Kaminsky’s talk “Rethinking Dogma: Musings on the Future of Security” was very good (and thanks to Matt Simmons for the recommendation, I wasn’t sure which talk to go to for that time block).

The closing plenary

The closing plenary session was called “Post Ops: A Non-surgical (personal) tale of fragility and reliability” by Todd Underwood (Google). It was a fairly amusing talk with a lot of interesting points. Todd is offering that sysadmins are going to go the way of the dodo bird, or more specifically that the ops part of devops is going to go away and that everybody should become developers.

There is a part of me that agrees a little but I think there is a rather large disconnect with the industry that is, and what google is doing. I think Todd may have his google glass blinders on and is only seeing things from the google perspective. While I’m sure it’s nice place to live I don’t think it is grounded in reality.

There are people who are using 10 year old software and hardware now, a lot of people. The same is going to be true in 10 years from now. In 10 years there will still be people using windows 7 and 8, and just starting to get a plan together to adopt windows 2023 / windows 18 or whatever it is called. Administration of those devices is going to be the same then as it is now.

I can understand his point, if you think about a small office with 20 employees who are, say real estate agents who have 20 chromebooks or 21 with a spare, everything is wireless and they microsoft office 360 / google docs. I don’t see the need for an IT person. In the last 10 years or perhaps even today there would be a tipping point. Maybe when the office reaches 50 people or 100, an IT person would eventually be hired. Will the tipping point drastically move or be eliminated entirely? People are still going to be terrible with computers and need help. I think helpdesk people are safe at least. :)

Other highlights

The so called “Hallway track” was really great this year. I got to have breakfast with Tom Limoncelli and Tobias Oetiker and out for dinner with Matt Simmons and many others. We had some cool conversations about system administration as a whole and where we think it is going.

Birds of a feather

The birds of a feather this year were very good, although I did find that there were more vendor bofs than there were last year. Not sure if that is good or bad.

Books

Books that were recomended at the conference (That are not on safari bookshelf):

I’ve ordered:

Drift into Failure

Managing Transitions

Antifragile

Structure of scientific revolutions

Title Transition

It seems as though we are transitioning from the title “System Administrator” to “Site Reliability Engineer”. I think this is kind of an odd choice. The SRE name has been around for a long time, it’s just now instead of google using it many other companies are switching as well.

The old:

“System” is a broad term, and it fits. Most system administrators I know are in charge of anything that has blinking lights on it, and often many other things that don’t.

“Administrator” is a person who manages (takes charge of). Again I think this makes sense for what we do.

The new:

A site reliability engineer is a odd title to me. “Site” in the modern sense of the word infers a web-site, which can be very complicated, or very simple. In the old sense of the word like a job site, implies you are concerned with the reliability of an entire physical location (job-site).

“Reliability” makes sense, we you want to make sure the website/job-site is reliable.

“Engineer” is somewhat of an issue for me personally, but I think this is more to do with a Canadian upbringing. Where I am from, one doesn’t up and call themselves an engineer without having a degree in engineering. This comes from the Ritual of the Calling where after (some|most) Canadian engineers wear the iron ring. I am not an engineer, and around here people would consider it disingenuous if I called myself a site reliability engineer without having a proper engineering degree.

As well, I’m not just concerned with the reliability of the site, I’m also worried about capacity planning, future proofing, scaling, security, updates, etc.

I wonder how long it will be before someone wants to call me an SRE instead of sysadmin?

Small Infrastructure Bof

I attended the small infrastructure Bof, even though my infrastructure is not really that small. I did the same last year; I find them interesting. One of the topics that came up was:

Is LISA still relevant for the small / medium system administrators?

I think system administration is splitting into groups of complexity. According to me it looks something like this:

< 100 boxes/vms/nodes/devices/whatever you want to call it small shops

< 1,000 medium shops

< 5,000 large shops

< 50,000 very large shops

> 50,000 massive shops

LISA has a tendency to prefer talks from massive shops or contractors that work with massive shops. This makes sense because the massive shops have a tendency to be the ones who are pushing the industry forward. Also, the L in LISA refers to large. However when LISA started in 1986 a large system was what we now consider to be small.

I don’t currently think that we need to split LISA up, but I do think the LISA organizers need to be somewhat mindful to the small shops and what they are gaining from the conference. The company I work for fits somewhere between medium and large shops, and I was in the lisa session: Enterprise Architecture Beyond the Perimeter, and it WAS interesting information but this was more a massive shop problem. I didn’t find the talk relevant to anything I was doing or would be doing in the small / medium term. I could have got up and left the talk, but like I said, it was interesting information.

Random notes

Just some random notes I took:

Final Thoughts

I think attending LISA is a fantastic experience; and am looking forward to LISA 2014 in Seattle.

Linuxfest 2013

Linuxfest Northwest 2013

I went to Linuxfest Northwest again this year, and I had a great time.

This year I organized the Lightning Talks and plan on doing the lightning talks each year at linuxfest. We had a great variety of lightning talks, it was pretty fun.

I also did a session called “I wanna be the guy - The arduous path to senior sysadmin or How to be a better system administrator” which you can see here:

It is my first talk at a conference but I hope to do more in the future.

It took a while to get online, as I first did the video with openshot but what was on the video preview was not what the end video looked like. So I tried to work around it with openshot but eventually gave up. Did the same thing uses a friend’s copy of Vegas and it was a much nicer experience. Openshot is working on a new UI and I look forward to using it the next time I need to cut some video together.

The fest

Linuxfest Northwest 2013 was very fun, I like the sessions and meeting fellow linux enthusiasts. I’m more interested in finding other system administrators so I am always on the hunt to find them. Maybe I need to do an early birds of a feather for sysadmins so I can find out who the sysadmins are nice and early and then try to find them through-out the day.

The ID badge for lfnw deserves special mention. I have been to several conferences but this was the first that I have been to that had this type of badge. The badge was essentially a small booklet, on the front and back was your name and information; but inside the booklet was the conference schedule, and maps to the various rooms and events going on. This was very cool.

The after party on Saturday was at a museum called The lightcatcher. There were several nice exhibits, I liked the glass sculptures; but I didn’t think to take any pictures (which may have not been allowed, I don’t remember). I didn’t stay very late at the after party because I needed to finish up my slides (for the talk i was giving the next day).

I look forward to linuxfest 2014.

Getting Fedora 18 Desktop Ready

Upgreyedd

First and foremost, I am not a fan of FedUp

My luck hasn’t been good with it, but I do some stuff that is off the beaten path, so it is likely my fault. I am fortunate enough to have extra hard drives kicking around so install a OS on a spare hard drive to test out it’s installation is no problem for me. However in this case I have a shiny new PC to setup.

The new PC is ASUS P9X79LE, w/32 gigs of ram, and a 3.6 ghz proc.

Why use Fedora?

I use fedora because I work at a CentOS shop, and using fedora keeps me fresh on what changes are coming to the RHEL ecosystem. There are quite a few big changes coming in the RHEL pipeline so it’s just a good way of seeing what is on it’s way and gives me a chance to explore how the new stuff works.

Installation

Anaconda (The fedora gui installer) has been completely re-written, and was (at least partly) responsible for the delays in getting fedora 18 released.

The new gui looks nice, but I have some complaints. Currently when you select the encryption option it encrypts the entire physical volume, rather then encrypted the logical volume. This means, that it’s encrypting the whole hard drive instead of just a particular partition.

Typically in my linux installations I setup a /home partition and encrypt it using luks. The previous version of anaconda this was easy to do. I have commented on a bug report about this.

I setup my /home partition unencrypted at installation time, so I will fix this later in this writeup.

Dual Booting with UEFI

I have used UEFI bios many times in a server environment, but this is my first PC with a UEFI bios at home. One of the fun parts is, you don’t need to worry about windows over writing grub anymore (or grub forgetting to add the windows partition(s)). With a UEFI bios you pick which drive you are going to boot with using the bios menu. NOTE: I still have a windows partition for the occasional video game that I can’t get working with wine.

My motherboard is a Asus P9X79 LE, and the original bios version did not work with the F8 boot menu. If I selected the UEFI boot order in the bios itself it would work, but if I hit F8, it would not boot off any other drive. I did a BIOS update and that solved the problem.

Speaking of Bios Updates

I usually do BIOS updates using the windows boot disks, because generally speaking vendors test the windows portions more than the linux ones. So I saved the bios upgrade to c:\bios on my windows partition, loaded the asus EZ flash, navigated to the bios file, and it failed saying that the bios wasn’t a EFI bios.

After some googling I found that the EZ flash is able the navigate ntfs folders but can’t load bios files from ntfs (which is really odd). So I copied the bios file to /boot/efi on my linux partition (which is a VFAT partition iirc) and fired up the EZ flash again and it was able to find and use the bios upgrade just fine.

After the bios upgrade I am able to use the F8 bios boot menu just fine.

Post Installation

I like to record my steps on what I do after installing so I can come back if I ever need to.

First the bare essentials:

1
yum install vim gvim tmux terminator xchat pidgin git thunderbird wget curl
I will do a whole blog post on my vim setup on a much later date.
tmux is a screen replacement that is awesome.
terminator is a graphical terminal that is also awesome.
xchat for gui irc
pidgin for instant messaging
thunderbird for email
git for source control
and wget and curl for pure utility

Next Disable evil bash search prompt:

1
echo 'unset command_not_found_handle' >> ~/.bashrc && source ~/.bashrc

This is the thing that searches for typos when you typo. it slows things down far to much for fast typers. I have a fancy .bashrc that I’ll talk about some other time.

NOTE: Before the zsh people get on my case about not switching to zsh, it’s not something I’m ready to deploy on all my work servers, so I’m sticking with bash until I can deploy zsh everywhere I work.

1
yum update

Boot settings

fstab

NOTE: Don’t mess around with your fstab unless you know what you are doing.

I have a ssd disk, so I need to add the ‘discard’ option to the fstab for the partitions that are ssd backed.

1
2
3
4
5
old:
/dev/mapper/vg_shinix-lv_root /     ext4      defaults        1 1

new:
/dev/mapper/vg_shinix-lv_root /     ext4      discard,defaults        1 1

but do that for each partition that is ssd, though not on the luks partition as that would be foolish (iirc luks partitions ignore the discard option).

grub

NOTE: Don’t play with your grub settings unless you know what you are doing.

Being that I have a ssd drive, I will want elevator=noop for my scheduler, and I am not a fan of the graphical boot so I will disable that via removing the rhgb and quiet options.

I will also be using the nvidia driver, so I will disable the nouveau kernel module via rdblacklist=nouveau

If you haven’t used grub2 yet it might be a little jarring of an experience. Rather than the simple vi /boot/grub/grub.conf, you now get to:

1
2
3
4
5
modify:
/etc/defaults/grub
then
grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg
(or grub2-mkconfig -o /boot/grub2/grub.cfg and use a symlink).

If you think the old way is better you are not alone, but if you look at the mess that is your new grub.cfg you will be happy for the cfg generator.

1
2
3
4
vi /etc/defaults/grub
+ elevator=noop rdblacklist=nouveau
- rhgb quiet
grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg

reboot (actually reboot cuz you have installed a new kernel and nvidia instructions require you to be on the new kernel)

Nvidia Drivers

I have a fancy video card (Geforce GTX 660ti) and I plan on trying out steam for linux, so I need to get the nvidia drivers going.

The instructions I found on fedoraforum will work, but require a bit of tinkering.

NOTE: I am following the instructions here

but due to a bug added “-o nouveau” to the dracut option.

so:

1
dracut -o nouveau /boot/initramfs-$(uname -r).img /boot/$(uname -r)

Upon rebooting my system would crash. This was annoying. So I did a rescue, and changed to boot to runlevel 3 (no gui).

After a reboot, I checked a “lsmod |egrep ‘nouv|nv’” to see what drivers I was loading. There was no nvidia, and no nouveau. So the blacklist/dracut was working but I wasn’t loading the nvidia kernel module either. A manual modprobe resulted in:

1
2
modprobe nvidia
ERROR: could not insert 'nvidia': Required key not available

So what key are we talking about here? Let’s get an strace going:

1
2
3
4
5
6
7
8
9
10
11
open("/lib/modules/3.7.2-204.fc18.x86_64/extra/nvidia/nvidia.ko", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1", 6)               = 6
lseek(3, 0, SEEK_SET)                   = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=15202848, ...}) = 0
mmap(NULL, 15202848, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fe78915b000
init_module(0x7fe78915b000, 15202848, "") = -1 ENOKEY (Required key not available)
munmap(0x7fe78915b000, 15202848)        = 0
close(3)                                = 0
write(2, "ERROR: could not insert 'nvidia'"..., 61ERROR: could not insert 'nvidia': Required key not available
) = 61
munmap(0x7fe78a0c8000, 324765)          = 0

I’ve had some experience with signed modules before, but it was a long time ago. Hrm

1
2
readelf -S /lib/modules/3.7.2-204.fc18.x86_64/extra/nvidia/nvidia.ko |grep -i sig
(nothing returned)

no signature. This must be something else.

I tried booting with the kernel option:

enforcemodulesig=0 and module.sig_enforce=no, neither worked.

After a lot of googling, I found that the issue was with secure boot as soon as I turned it off (in the bios settings), I was able to load the nvidia module fine. There is surprisingly little documentation on the internets about this error.

I posted this in hopes to save someone else some time (It took longer than I would have liked to figure it out).

Desktop

I have been using cinnamon or frippery (cinnamon on the laptop, frippery on the desktop) since fedora 15 came out (because gnome2 was ditched). Fedora 18 has MATE which is the gnome2 fork. I’m really looking forward to using my old desktop again.

MATE:

Ahh MATE, I used Gnome2 as my desktop for many years, so this was a big sigh of relief. Even though I am going crazy daily due to a keyboard shortcut bug I am loving using mate. As far as moving windows from monitor to monitor (I have a 3 monitor setup) mate works a lot smoother than everything else I have used so far. Mate has been available in other versions of fedora, I just totally missed the boat on it.

The other issue I have with fedora’s installation of mate, is that it doesn’t install the mate-screensaver rpm. I opened a bug report on this so hopefully it will be fixed soon.

To get MATE installed after installation do a :

1
2
yum groupinstall "MATE Desktop"
yum install mate-screensaver

Setup encrypted home partition

Lets get our /home partition encrypted.

WARNING: If you make a mistake here, you’re going to have a bad time! Don’t blame me. Back your stuff up.

My home partition is it’s own logical volume, if you have anything at all different setup these instructions are not for you!

Steps:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
1. hit ctrl-alt-f2, login as root, and type 'init 1'
2. mkdir /homebackup
3. rsync -av /home/ /homebackup
4. umount /home
5. lvdisplay (find your logical volume name mine is lv_home, find your volume
   group name, mine is vg_shinix)
6. cryptsetup --verbose --verify-passphrase luksFormat /dev/vg_shinix/lv_home
7. cryptsetup luksOpen /dev/vg_shinix/lv_home home
8. mkfs.ext4 /dev/mapper/home
9. mount /dev/mapper/home /home
10. rsync -av /homebackup/ /home/
11. restorecon -v -R /home
12. cryptsetup luksUUID /dev/vg_shinix/lv_home >> /etc/crypttab
13. vi /etc/crypttab
Your crypttab will have 1 UUID sitting there
ie:
37ff7165-31c0-4863-aba8-876692e6bc67

You need to take that uuid and add "luks-" then add a space "UUID=" the same
uuid again, space none. It should look like this when you are done:

luks-37ff7165-31c0-4863-aba8-876692e6bc67 UUID=37ff7165-31c0-4863-aba8-876692e6bc67 none

NOTE: i know the luks-string can be anything you want, this is keeping it like previous
versions of fedora, and I like consistency.

14. vi /etc/fstab

find your old /home entry, copy and paste it, comment out the orig.

#/dev/mapper/vg_shinix-lv_home /home                   ext4    defaults 1 2

modify the new one to point at luks-(the same uuid as before) so:

/dev/mapper/luks-37ff7165-31c0-4863-aba8-876692e6bc67    /home                   ext4    defaults 1 2

reboot to make sure it comes up on boot.

NOTE: if you have disabled graphical booting as I have, the password prompt
DOES show up, but due to the nature of systemd running many things at a time,
it scrolls by super fast, and is sitting waiting for your input. Hit backspace
once and the password prompt will appear.

Disabling services

Gone are the days of chkconfig –list, we need to use systemctl. Skipping over a debate about systemctl, let’s move right into how to get going with it.

1
systemctl list-unit-files --type=service

You will see a nice color coded list of your services that are set to enabled, disabled, or static. If a service is ‘static’ that means it’s a dependency of another service. For now, ignore static services and concentrate on enabled services.

Let’s figure out what all this stuff is.

1
2
3
4
5
6
7
8
9
10
11
12
13
#if you type
systemctl
#you will get short description of what everything is, you can work with just
#that no problem.

#I like to see what rpm the services come from, here's how I do that:

systemctl list-unit-files --type=service |grep enabled |awk '{print $1}' |xargs locate |grep "/usr/lib" >> /tmp/list1
for i in $(cat /tmp/list1); do printf "\nservice $i\n" >> /tmp/list2; rpm -qif $i >> /tmp/list2; done
less /tmp/list2

NOTE: Yea I could do this all in 1 long command, but this is easier to
understand for the sake of anyone who might read this. :)

The above should give you a nice txt file that will show the systemctl file, and the rpm information about that rpm, for each enabled service.

I’m a big fan of disabling what I don’t need.

NOTE: You may need some of these services, I don’t.

So looks like I’m disabling:

atd : the AT Daemon, I never use it. I don't know anyone that does anymore.
bluetooth : not using any bluetooth devices
cups : never will print from here
libvirtd : have a different box for virtual machine playtime.
rpcbind : not using nfs
spice-vdagentd : not using spice / libvirtd
avahi : don't want this service
ksm : not using qemu/libvirt
ksm-tuned : not using qemu/libvirt
rngd : don't have a hardware rng device
sendmail : eww. removing sendmail installing postfix.
sm-client : part of sendmail
systemd-readahead-collect.service : readahead is not needed for ssd drives imo
systemd-readahead-drop.service : disabling readahead
systemd-readahead-replay.service : disabling readahead

Next take a look at the sockets:

systemctl | grep socket

avahi-daemon.socket
cups.socket
pcscd.socket
rpcbind.socket

Going to make a quick file to speed up stopping and disabling everything:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
vi /tmp/disableme
atd.service
bluetooth.service
cups.service
libvirtd.service
rpcbind.service
spice-vdagentd.service
avahi-daemon.service
ksm.service
ksmtuned.service
rngd.service
sendmail.service
sm-client.service
systemd-readahead-collect.service
systemd-readahead-drop.service
systemd-readahead-replay.service
avahi-daemon.socket
cups.socket
pcscd.socket
rpcbind.socket
cups.path
(save exit)

for i in $(cat /tmp/disableme); do systemctl stop $i; systemctl disable $i; done

When I did this, I got:
    Warning: Stopping cups.service, but it can still be activated by:
    cups.path

so I did a plain 'systemctl' and went through everything. cups.path is all I
want to get rid of, so I added it to the list.

There are still some services that live in chkconfig, let’s see:

1
2
3
4
5
6
7
8
9
10
chkconfig --list

ebtables        0:off   1:off   2:off   3:off   4:off   5:off   6:off
iprdump         0:off   1:off   2:on    3:on    4:on    5:on    6:off
iprinit         0:off   1:off   2:on    3:on    4:on    5:on    6:off
iprupdate       0:off   1:off   2:on    3:on    4:on    5:on    6:off
iscsi           0:off   1:off   2:off   3:on    4:on    5:on    6:off
iscsid          0:off   1:off   2:off   3:on    4:on    5:on    6:off
netconsole      0:off   1:off   2:off   3:off   4:off   5:off   6:off
network         0:off   1:off   2:off   3:off   4:off   5:off   6:off

I don’t have any ibm power raid devices so the ipr stuff can go. I don’t have any iscsi gear at home so that can all go as well.

1
2
3
4
5
chkconfig iprdump off && service iprdump stop
chkconfig iprinit off && service iprinit stop
chkconfig iprupdate off && service iprupdate stop
chkconfig iscsi off && service iscsi stop
chkconfig iscsid off && service iscsid stop

Okay, that’s better. Let’s check xinetd:

1
2
grep -ir disable /etc/xinetd.d/
/etc/xinetd.d/rsync:    disable = yes

Nothing to worry about there. Let’s check the cron jobs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
ls /etc/cron\*
/etc/cron.deny  /etc/crontab

/etc/cron.d:
0hourly  raid-check  sysstat  vnstat

/etc/cron.daily:
cups  hplip logrotate  man-db.cron  mlocate.cron  prelink  tmpwatch

/etc/cron.hourly:
0anacron  mcelog.cron

/etc/cron.monthly:

/etc/cron.weekly:

cat /etc/crontab
(no jobs here)

Well. don't want about hplib and cups, but cups is a part of lsb, so not point
in getting rid if it (a yum remove cups would uninstall half your os).

yum remove hplip

The file for cups is just a tmpwatch so I'll leave it alone.

Misc

Utils

1
yum install htop iotop sysstat vnstat keepassx
htop is a fancier top
iotop is for tracking down IO issues
sysstat is for sar/iostat/mpstat/etc
vnstat is for network card monitoring
keepassx is a password manager

random

I setup my prompt_command so that it flushes my history after every single command. I also like to setup a much larger history file. I have a fairly fancy prompt_command setup but I’ll talk about that another time.

1
2
3
4
vi /etc/profile.d/histfile.sh
export HISTSIZE=9999
export HISTFILESIZE=999999
export PROMPT_COMMAND="${PROMPT_COMMAND:+$PROMPT_COMMAND ; }"'history -a'

I will be putting all my dot files on github, I just haven’t got around to it yet. Need to clean a few things up first. :)

Fonts

I am a big fan of dejavu-sans-mono for terminals / programming, which fedora has default installed now. Just need to change the system monospaced font to it. Droid sans mono is really nice as well.

Closing

Having been using Fedora 18 for about a week now, I’m pretty happy with it. I will be upgrading my laptop and work desktop in the next week or two, so I can get MATE on everything.

There is a lot of complaints about fedora 18, people are bitching about the gnome3 changes, I am not sure what they are talking about; I used gnome 3 long enough to install mate. :)

Gnome 3 is now the internet explorer of desktop environments.

The new anaconda is immature, there is no question. But it’s all python based, so I think it will evolve quickly to support what the old anaconda could do.

As far as Fedora being the worst redhat distro I have to disagree. RedHat linux 6 in 1998 or so (not to be confused with redhat enterprise linux) had a broken dhcp implementation, a grub bug where if you pushed a key to boot (the booting linux in 3 seconds bit), it would only use your first 4 partitions on your disk and ignore any other ones, and so many other horrible bugs. Redhat linux 5 was so much better. Seemed like a giant step backwards at the time (or was it from 6 to 7? hrm. maybe. it was 12-13 years ago it’s a bit fuzzy and so not worth googling).

That’s all for now.

My First LISA Conference

I convinced my $dayjob to spring for a trip to LISA12 in San Diego. Some places it is easier than others to convince management it’s a good idea, it wasn’t too bad for me.

For those who have never been to a LISA before, here is how it works:

Training Program

The event starts Sunday morning with training programs. Some are half day programs, and some are full day. You have to sign up in advance for what training program you want to attend*, and the training programs run until Friday. One day worth of training session(s) costs $710.00 USD (this is the usenix member price, but since joining usenix costs 50 dollars and the discount is worth $170.00 USD, pretty much everyone joins usenix). There are various deals for signing up for more training as well. The available training programs are layed out here.

Technical Sessions

The technical sessions start on Wednesday, and run through until Friday. The technical sessions are shorter (1.5 hours or so) but are generally more advanced topics than what is in the training programs. One day of technical sessions cost $405.00 USD (same 170 dollar discount applied already) or you can get 3 days of technical sessions for $965.00.

Golden Passport*

The golden passport allows you to go to any technical training, or technical session. As well, in the technical session conference rooms there are areas that are reserved seating (right at the front) for people with a golden passport. This is pretty cool, because if you decide that you don’t like the training that you are attending, you can leave and go to another one. The golden passport also comes with a cooler ID badge than everyone else, which declares to everyone that you are a golden passport holder. It costs $4,075.00 USD, and while expensive would be a great way to enjoy the conference.

Birds of a Feather

The training programs / technical sessions end at 5:00 PM, and there is a 2 hour break for dinner; after which the birds of a feather (aka bof (pronounced BOFF)) start. There are 2 kinds of bofs, a vendor bof or a individually organized bof.

A vendor bof usually has booze at the back, to entice you to attend; while the vendor either tries to sell you on their product, or sell you on working for them (they try to recruit you).

The user bofs are generally more random topics and a lot were organized at the last minute and written on the sign at the conference:

Vendor Booths

Lastly there was the Vendor area, which is what it sounds like. There were many (59) vendors that had lots of swag to hand out. This is the first conference that I have been to, that in order to get the better swag (tshirts, flashlights, etc) you needed to allow the vendor to scan your id badge (which had a 2D bar code on the back with all of your contact info. Hopefully I don’t start getting spammed like crazy, but I won’t hold my breath.

Some of my vendor highlights:

Google’s Quest for the pins:

Google gave out 1 pin to everybody, and via their questforthepins.com site you had to do some simple sysadmin questions to progress to the next pin. There were 5 pins in total, and the questions got harder as you go, but most people attending got all 5 pins (myself included).

Rackspace’s breakfix challenge: Rackspace had 2 laptops setup with virtual machines running CentOS 6.3 that were broken, and you had to fix the virtual machine. They timed how fast you could solve the problem. Us geeks love fixing things so this was enjoyed by many. I did the challenge in ~9 minutes or so, I could have gone faster but was typo’ing a lot on the tiny laptop keyboards (and the sun was in my eyes :) ).

My Experience

This year’s LISA was held at the San Diego Sheraton which was a pretty nice hotel. Here is the view from the balcony of my room:

While attending LISA (or any conference really) I think it is a must to stay at the conference hotel. I arrived Monday evening (So I could be ready for Tuesdays training programs), pretty much right after checking in I went to the bar and asked some guys wearing geek tshirts if I could join them, and was made welcome. Almost all of the tables in the bar and lobby were occupied by people using their phones/tablets/laptops and talking shop.

Many of the conversations I had in the lobbies and hallways were worth the price of admission by itself. If you were thinking about deploying $product you could easily find some people who had already done so, and were moving on to something better, or maybe tell you horror stories of just how bad (or fantastic) $product was. Or tell you what $product_competitor was like. The best part being, that these were colleagues not salesmen, and their input was insanely valuable.

I especially enjoyed the level of expertise at the conference. There was an extremely good chance someone more expert than you was nearby and happy to talk shop.

What I did

I wanted to checkout what the training was like, but was more interested in the technical sessions. So, I did 1 day of training, and 3 days of technical sessions.

The nice thing about doing 1 day worth of technical sessions, you get a USB key loaded with the training materials for all of the technical sessions. So according to me you would be remiss if you did not do at least one day of technical training. Having the training materials is not as nice as being in the classroom, and being able to ask questions and whatnot but it is pretty cool to be able to check out. So far I’ve only gone over a few of them; but this is due to the full schedule each day provides (if you choose to go to everything).

The training/sessions run from 9:00 AM until 5:00 PM (with breaks for food), then the bofs start at 7:00 PM and run until 11:00 PM. After the bofs I would hang around in the lobby or bar until 1:00 AM talking with various groups of sysadmins.

One of the things that surprised me, was just how far some people traveled to be at LISA. I met people from Belgium, Germany, Ireland, Norway, Australia, Brazil, etc. Next year’s conference is in Washington DC, which is quite a bit further for me to travel. However, when I think about all the international travellers I met, I don’t think I should complain.

Another cool thing, was after the first day, I started to see a few familiar faces at a few of the technical sessions that I was going to, so we started to hang out in between sessions. We then made plans to go out for lunches and dinners to talk shop. Since then we have all exchanged info, and I look forwarding to talking to them on IRC (I have put quite a few names to faces on the #lopsa channel on freenode).

My Highlight Reel

My favorite training session:
NOTE: I only took 2 half day sessions
Ganeti: Your private virtualization cloud - Tom Lemoncelli & Guido Trotter

My favorite bof:
I’ve made a huge mistake (organizer unknown)

This deserves a bit of a write up. My favorite bof was a user bof entitled ‘I’ve made a huge mistake’. It was very last minute. It was was kind of like an alcoholics anonymous for sysadmins. People shared their screw ups, and what they did to fix them. It was pretty awesome (I don’t think I can repeat any of those stories), including and especially my own :) )

My favorite technical session:
15 years of DevOps
I was really floored in seeing the 15 year old slides talking about the same problems and issues that we are having now. Here is a pic of the devops talk:

My favorite quote from the conference:
(from the disruptive technology panel)
“Software is going to stick with us like Herpes.” - Theo Schlossnagle, OmniTI

The reception (Shaken, Not Stirred)

After Thursday’s events were finished at 5:00 most people dropped off their gear and grabbed a coat (it was raining and slightly chilly) and several buses took us to the ‘Grape Street Peer’. We went aboard a large boat:

and had a nice meal with an open bar, and got to play blackjack, roulette, and craps with some fake money that was handed out. It was pretty funny, because I would say 50+ percent of attendees knew how to count cards. The staff handed out extra fake money to whoever lost quickly so it didn’t really matter. As a bonus we got to keep some of the custom LISA’12 casino chips at the end of the night.

Final Thoughts

I quite enjoyed the experience, and look forward to attending another LISA. I will hopefully be going to the Washington DC one, but if not I will for sure be at the 2014 one in Seattle.

All in all, I had a great time.

An Analysis on Dell’s Dset Tool

what is dset?

DSET is “Dell System E-Support Tool”, it is available for windows, 32bit linux and 64 bit linux. Essentially dell uses this tool to gather information about your server to help them troubleshoot. I’m only writing about the 64 bit linux version.

why do you care?

If you call dell tech support sooner or later (sooner I’m betting) they are going to ask for a dset. I wanted to dig into, what they are gathering and share with my fellow sysadmins.

The story

I had a raid controller battery go bad after 2 years, still under warranty, but dell wanted a dset. Never mind I had a dmidecode for the firmware versions and some megacli64 output to show the battery reports, they wanted a dset.

Here’s how my conversation went:

dell: There is a firmware bug that falsely reports a bad battery when your battery is still good, we are going to need a dset.
me: does dset need root?
dell: yes.
me: what does it need root for, as in, what exactly is it running?
dell: I don’t know.
me: okay I’ll download it, check it out, and call you back.

I’m in the medical industry, so installing new/unvetted software on production servers is usually a no-no, and it makes me nervous. So I wasn’t about to install it without some testing and analysis (on a blank virtual machine).

In fairness to dell, I could have said “I’m a gold customer with a 4 hour turnaround, ship me the new battery now please” and they would have. but I like to be co-operative with my fellow techies if I can.

The analysis

I’m not one to trust binaries that need to be run as root very much, so let’s take a look at what we are getting ourselves into.

Here’s dell’s dset page

at time of writing the latest version is 3.2.0.141_x64_A01.bin

Due to running this on a virtual machine it didn’t want to install from running the script, so I extracted it manually via:

tail -n+20 dell-dset-3.2.0.141_x64_A01.bin | tar -xvz

The tarball doesn’t make it’s own directory it litters a bit in your current directory (shame shame).

install.sh

The install.sh does a bunch of pre-checks to see if you have supported hardware before collecting it’s data. It’s fairly simple to disable the check and run it anyway or add your system to the ‘support_hw_list’ or just copy some exports from the install script and run (more on this later).

Once it decides it will install it will install one/some/all of the following depending on what options you choose:

rpm -Uhv rpms/srvadmin-hapi* >/dev/null 2>&1
rpm -Uhv rpms/srvadmin-storelib-sysfs*.rpm >/dev/null 2>&1
rpm -Uhv rpms/dell-dset-common* --nodeps >/dev/null 2>&1
rpm -Uhv rpms/dell-dset-collector* --nodeps >/dev/null 2>&1
rpm -Uhv rpms/dell-dset-provider* --nodeps >/dev/null 2>&1

rhel only:

rpm -ihv rpms/RHEL/sblim-sfcb*.rpm --nodeps >/dev/null 2>&1
rpm -Uhv rpms/RHEL/sblim-cmpi-base*.rpm --nodeps >/dev/null 2>&1

sles only:

rpm -Uhv rpms/SLES/cim-schema*.rpm >/dev/null 2>&1
rpm -ihv rpms/SLES/sblim-sfcb*.rpm --nodeps >/dev/null 2>&1
rpm -Uhv rpms/SLES/sblim-indication_helper*.rpm >/dev/null 2>&1
rpm -Uhv rpms/SLES/sblim-cmpi-base*.rpm --nodeps >/dev/null 2>&1

Okay, I’m not a fan of the –nodeps, but if they are staying in their own sandbox I can forgive them. So let’s find out. On the first set of rpms:

$ rpm -qlp *.rpm |grep -v "^/opt"
/etc/init.d/dsm_sa_ipmi
/etc/init.d/instsvcdrv
/etc/ld.so.conf.d/srvadmin-hapi-x86_64.conf
/etc/sysconfig/dsm_sa_ipmi
/usr/lib64/libdchapi.so.5
/usr/lib64/libdchapi64.so
/usr/lib64/libdchbas.so.5
/usr/lib64/libdchbas64.so
/usr/lib64/libdchcfl.so.5
/usr/lib64/libdchcfl64.so
/usr/lib64/libdchesm.so.5
/usr/lib64/libdchesm64.so
/usr/lib64/libdchipm.so.5
/usr/lib64/libdchipm64.so
/usr/lib64/libdchtvm.so.5
/usr/lib64/libdchtvm64.so

Alright, not going to install this in the production environment already, but lets throw caution to the wind on this vm.

I’m running centos, so depending on what options I select, the dset installer may install sblim (sublime), with nodeps of course. This could be problematic if you are also using the epel sublime package.

At least they are no longer using rpm –force –nodeps like they were in previous versions of dset (which they still get you to use if you are using rhel 5.x)).

The install.sh parses out what you want to do and runs the ‘collector’ program with the various options (more on this later).

Manual Install

Going to play with the dell-dset*.rpm’s first.

For fun I looked at a rpm -ivh dell-dset*.rpm to see what dependencies I was about to ignore. What’s weird is they are packaging all of their dependencies, so I’m not sure why they don’t just update the spec to do a provides: bla and fix it. Maybe they are trying not to mess with the rpm database, but if that was the case, why are we using rpms at all? Running rpm –nodeps is almost the same as doing a tarball. How much stuff are they doing in their rpm’s %pre and %post that they can’t do in their install script? I digress.

Let’s get this installed.

rpm --nodeps -ivh dell-dset*.rpm

This will install to /opt/dell/advdiags/dset

The Collector

cd /opt/dell/advdiags/dset/bin
./collector --help
  File "/usr/lib/python3.1/site-packages/cx_Freeze/initscripts/Console3.py", line 27, in <module>

Now we know it’s a python 3.1 script. :)

$ file collector
collector: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.4, stripped

But not something we can easily look at without getting a python decompiler or strace involved.

Our paths won’t be correct but dell gave us a collector.sh which will setup the correct paths for you (and restore your old path when you are done) that’s nice.

./collector.sh --help

there we go. a nice help file so we can figure out what we want to do. Being that I’m on a vm, the hardware option isn’t going to do much for me. So I’ll start with the software.

./collector.sh -d sw

It asks right away for my root password (I’m already running it as root). If I reached this stage in production I would abort, but I’m on a vm, so I’ll break out and change root to ‘1234’ and run it again. But not to worry, the dset documentation states:

NOTE: Root credentials are necessary for the DSET
        Provider to collect inventory or
        configuration information about the system.
        DSET does not store this password. The root
        password must be specified each time a report
        is collected.

We’ll find out if this is true or not shortly.

./collector.sh -d sw -p 1234

huzzah! I have a report.

./collector.sh -d sw -p 1234 -v yes

Now I have a report with privacy enabled to compare.

The Report

The report gets thrown into a passworded zip file. The password is completely meaningless as, if you unzip it with no password, it unzips a text file which tells you the password is ‘dell’. So I unzip’ed again this time with the super secret password. The password is the same in privacy mode or non privacy mode.

The non privacy report

The non privacy report gathers it’s data from many cfg files and logs. It also parses the data into xml/xsl pages which I assume dell has a nice tool to go through quickly to see what’s what.

From looking at the logs directory the collector is gathering the following:

cat boot/grub/menu.lst
cat boot/grub/device.map
ls /boot
uptime
cat /proc/meminfo
lsmod
cat /etc/modprobe.conf
cat lib/modules/current/modules.dep
ifconfig
cat /etc/resolv.conf
cat /etc/hosts
cat /etc/sysconfig/network-scripts/ifcfg-*
df
cat /proc/scsi/scsi
fdisk -l
free
hostname
list of all installed rpms / versions / etc
iptables dump
ldconfig
lspci
mount
osversion
print's environment settings
ps
pstree
route (old school route not ip route ls, shame shame)
selinux policies
runs thru the init.d and does a service status on each
sestatus
uname

See that ps up there?

grep "collector" ps
root     20589 13.0  1.5 133620 15408 pts/1    S+   23:17   0:00 ./collector -d sw -p 1234

So much for not saving my root password. It also shows up in:

rawxml/getprocesslist.xml
xml/processlist.xml

Before anyone gets mad at dell: I’m running this in a non standard way, if I used the install.sh the script would not use the -p option, and would instead prompt me for a password (which would not show up in the processlist). Still not sure why it wants me to type the root password when I am already running it as root, but ok.

Next the collector has straight up copied my:

boot/grub/grub.conf
boot/grub/menu.lst
etc/aliases
etc/cron* (crontab, crondirs, etc)
etc/fstab
etc/host*
etc/ld.so.conf
etc/modprobe.conf
etc/redhat-release
etc/resolv.conf
etc/sysctl.conf
etc/mail/sendmail.cf
etc/pam.d/* (WAT?)
etc/sysconfig/* (entire dir and subdirs)
etc/X11/XF86Config
lib/modules/current/modules.dep
proc/ (many files copied here, 778)
var/log/dmesg
var/log/messages

This is WAY to much information to send to dell for any reason. How are you sending it to dell as well? I sure wouldn’t email it.

The privacy report

In the privacy report you get a lot less data. The logs directory is now blank, so everything is in the gui directory only. Which means we get to go through some annoying xml/xls (note: I find all xml/xsl annoying (don’t ask)). Allot of the same data is gathered, but now just dumped into xml.

We are gathering:

/boot/grub/grub.conf
ls /boot
ls /boot/grub
cat boot/grub/device.map
chassis info
etc/X11/XF86Config
lsmod
cat /etc/modprobe.conf
cat lib/modules/current/modules.dep
list all rpms / publisher / size / install date, urlinfo, description
hardware io ranges
hardware irq info
cat /proc/meminfo
cat /etc/fstab
ifconfig (with ip info and mac addresses "Omitted by user")
cat resolv.conf (everything is omitted by user but domain is still listed. WAT?)
runs thru the init.d and does a service status on each
storage info (df info)
os version and kernel
connected usb devices

I like how the kernel version is omitted in the uname but is in the syssumlist. Heh.

Other options

The collector script has many different options, I don’t have any non-production dell gear right now, so I’m not willing to run the hardware report on a server that has hardware to run it on.

I ran the -lg and -ad options in the collector as well, but there was no difference to the sw logs. I imagine this would be different if I was running on an actual dell machine with actual dell hardware instead of the virtual machine that I’m running this on. :)

Conclusion

I won’t be running the dset tool on any production gear because:

  • The package installation could cause issues with your system (not staying in /opt/dell, using –nodeps, conflicting sblim package with epel’s)
  • It wants your root password to be entered at a prompt of a python program, even if you are currently root
  • the non privacy report gathers way to much info about your system, under no circumstances should this be sent to anyone ever.
  • the password on the report zip is incredible insecure
  • add the domain in the resolv.conf to private information, or just don’t parse the resolv.conf at all

So what happened?

I told the dell tech that I couldn’t run the dset tool due to dset doing some bad behaviors; but I had a dmidecode, and some megacli logs to send.

After dell reviewed the logs I emailed they shipped me a new battery.

tldr version

don’t run dset. If you absolutely must use dset, use the privacy option.

Well Hullo There.

Oh Hai.

I decided that I may have a thing or two to post about, it may be (read: will be) very infrequent, but such is life.