$dayjob sent me to LISA again, for which I am quite grateful. It almost didn’t
happen due to me moving to another business unit (read: promotion) a few weeks
before the conference, but my new manager was able to pull some strings and
make it happen (Thank you Brent!).
The Training
Like every year I showed up on the Monday, so I could get settled and do 1 day of
training on the Tuesday and then 3 days of sessions.
The training on Tuesday was not a great selection in my opinion, there were too
many full day training sessions which didn’t allow for a lot of choice. Of the
remaining choices, one of the courses I had taken 2 years ago, so my choice was
even further restricted. I ended up taking “Build a Sysadmin Sandbox” as the
first course and wasn’t really getting anything out of it, so I bailed and went
to puppetcamp which was also occurring on the same day. It was good course but
the material wasn’t anything I didn’t already know.
Puppetcamp was very packed, but I was interested to see Garrett Honeycutt speak,
as I do use a few of his modules and he is well known in the puppet community.
After that particular session I decided to just do the hallway track until my
second training session.
The second training session was an introduction to Go for systems programming.
I have heard a lot of hubub about Go, and was interested to see if the
training/trainer could convince me that it was a good idea to stop using python
and start using go. Long story short, I remain unconvinced that it would be a
good idea to move from python to go for sysadmin type scripts. However if you
need to worry about multithreading / multiprocessing then it would be a good
idea to use go (go with go?). The material was good and the trainer did a good
job; it’s just that according to me, go adds a bit of complexity that you don’t
need to worry about in python. I do have a freakishly long python script that
is multiprocess that would be fun to rewrite in go, but that project won’t see
the light of day for years. :)
Tuesday night we headed out to the Taphouse with a fairly large (18ish) group
to socialize and swap stories. It was pretty fun and right across the street
from the conference hotel.
Thomas Uphill had a puppet session called
Puppet for the Enterprise that I went to that
was good, but either the crowd didn’t want to participate or the level of
puppet knowledge in the crowd was very very new. I didn’t realize before
attending that he wrote Mastering puppet. I read the book shortly after it
came out and do recommend it. I had some conversations Thomas later at the EMP museum about workflows with large teams and he seams to think we are doing
everything right but I have issues with the current workflow that I don’t like.
I believe I have a solution but I’ll save that for another blog post.
Birds of a feather
The birds of a feather were quite good.
There was also 1 BOF on Thursday, which I ran. Didn’t take a picture of the
board, as there was only one (Besides the Google BOF).
I stole the idea from LISA 2012, and hosted the BOF “I’ve made a huge mistake”.
Due to scheduling I had to have this during the Google BOF, so I didn’t really
think many people were going to show up. Much to my surprise, so many people
showed up we had to move to a bigger room. I should have taken a picture but
totally didn’t. We moved to a larger room and everyone shared stories of their
epic failures. Some of the stories were really hilarious.
Other than my own BOF, I liked the RedHat and Puppet vendor BOFs. There was
also a Infrastructure Testing BOF that had me take several notes.
Random
Every LISA that I have been to so far, there has been an event that “paid for
the conference”. And sometimes you don’t know when that will occur. For
example, last year the flamegraph talks paid for the conference, as I have
used that knowledge to solve several problems that would have taken weeks of
man hours if I had not. Then months later we were looking for a Sunray
replacement, and I remembered a hallway track conversation that I had with
Tobi Oetiker about what he was doing with Sunrays, and I was able to suggest
that we test that product. No-one in my organization had heard of that product
it wasn’t even in our eval radar. Adding that product to our eval and having us
use it will have saved the company the price of the conference several times
over.
This year, Brendan Gregg’s talk paid for the conference. I’m working on a few
other ideas inspired by the conference that may pay for the conference several
times over. I am always very invigorated after a good conference, I think I
will be realizing additional value of the conference for quite some time.
I got to meet Dave Josephson from Librato who recently collaborated with
me on graphios to add librato and statsd support. We had some
excellent conversations and from those conversations I will be making some
changes to what I am monitoring to try and preserve the interesting data
without having to keep my data forever. Dave also did 2 talks at the Nagios
world conference that mention graphios here and here. Not to mention he
did two talks at LISA
here and here. I caught the second talk at LISA
and I quite enjoyed it.
Notes for future first timers
Wear your badge from the beginning of the day until the end of the day.
Otherwise people aren’t going to know that you are an attendee. Wearing the
badge makes you approachable for hallway track conversations.
If you get a table for breakfast or dinner in the hotel, get a larger table and
tell the greeter to send other attendees your way. (You may want to check if someone else is
already doing this!).
Do a write up for your manager / coworkers when the whole thing is over. You need to pass on what you
learned at the conference, and thank your manager for the opportunity of
going. Your goal should be to prove that the company spent money wisely when
they opted to send you there, and have justification for sending you again.
Do 1 day of training, so you can get all the training material.
If you aren’t sick of chatting with people, wear your LISA shirt at the
airport. I had some excellent airport conversations, with other attendees
because they recognized my shirt.
Facebook’s Opencompute datacenter is 38% more energy efficient, at 24% less
cost (energy cost). If this was the only benefit it would be worth looking into, but there
is so much more.
Final Thoughts
I had a great time at LISA, and look forward to attending again next year.
Whilst my little brother has been telling me for years that I should join
twitter, I have continuously resisted.
Here is the conversation that made me cave:
Me> Twitter is stupid. It’s just IRC. I’ve been on IRC for nearly 20 years, and
I’m in around 30 channels on 4 servers. Why would I want to subscribe to
someone and see everything they say for every channel they are in? That’s crazy
talk.
X> Okay. Sure. Twitter is just IRC. I get that. So let me ask you this. Are you
on any IRC channels with Adrian Cockcroft? Brendan Gregg? John Allspaw?
Me> Well.. no.
X> Would you like to be?
Me> Well.. yes.
So here I am: @systemtemplar. I’m still
learning the ropes.
Also I’m planning on updating this webpage a bit more often that twice a year,
so we’ll see how that goes. I have another LISA writeup that I’m working on. So
there will be at least that. :)
Flamegraph is a utility written
by Brendan Gregg that I have recently used
a lot of, and I thought I would do a writeup for people who want to get their
hands dirty. If you don’t know anything at all about flamegraph I recomend
watching this video from LISA13
or if you just want some slides these ones
are very good.
Flamegraph is visualization graph to help identify what is going on your
system. Using flamegraph requires a few steps. Let’s learn by doing.
I am using CentOS 6.4, so my instructions will need to be tweaked for
Debian/Ubuntu/other users.
We need to get either perf of system tap working on our test machine. Perf is
easier so we will start with that.
yum install perf
Now that perf is installed you could start recoring events now, but if you did
you would find that all of the symbols would be missing and your flamegraph
would have a bunch if ‘Unknown’ entries in it. To address this we need to
install debug symbols of the programs we want to debug.
To start off we are going to want at least the kernel and glibc debug packages,
and after that what debug symbols you want depends on what you are doing. In
the examples I’m doing I want to also debug qemu-kvm, so I’ll be installing
those symbols.
Now that I have the debug packages installed I’m ready to start perf.
perf record -a -g -F 997 -o first.perf sleep 60
-a = all cpus
-g = do call-graph (backtrace) recording
-F = frequency (how often to collect data), which we are setting to 997 instead
of 1000 based on Brendan’s advice (watch the video).
-o = output file name. By default it will record to a file called perf.record,
and it will get confusing when you are doing a lot of perf recording.
sleep 60 = perf can record events for any command you do, in this case we don’t
care about what we are doing, we are recording events on all cpus. So here we
are just saying we want to record for 60 seconds.
NOTE: 60 seconds can be 50-100 mb or more depending on how busy your system
is.
Next we want to convert the perf file (which is a binary file) into a text file so we can process it
with flamegraph.
A new feature I added for flamegraph is to use a
consistent palette. Let’s use this in a practical scenario.
I have 2 servers both running java inside of KVM. One is working great, the
other is not. On each machine I did a perf record, just like above. I saved the
files to working.script and broken.script (ran perf record, and perf script on
each box).
Next I transfered the script files to my workstation, and ran:
the –cp option will use the same palette.map that was generated from the
previous run of flamegraph. This time we also have –colors mem, which uses the
mem palette for any NEW symbols that the previous flamegraph did not have. This
will make the differences really stand out in our flamegraph. Now a side by
side comparison shows the problem:
Once again I convinced $dayjob to spring for (most of) my trip to LISA13
I did a big write up for management last year, and did several training sessions
with other employees as it was very much worth the cost / time / effort in my
opinion.
This year was quite a bit further for me to travel, as I am on the west coast,
and LISA is on the east coast. Google maps says that it’s 4421 km (2747 miles).
I don’t think I can complain about the travel time as there are people I met
who came from the UK, Sweden, Germany, Denmark, etc.
Once again I did one day of training and 3 days of the technical sessions, and again I
did get all of the training material (which is totally worth the day of
training, regardless of what training you take).
My Training
The first course I took was Theodore Ts’o’s “Recovering from Linux Hard Drive
Disasters” which, was a very informative course on the history of file systems
but I would have preferred some demos of fixing some broken file systems. Enter debugfs
and fix some deleted files, or show WHEN it is better to hit no in a fsck, etc.
I think that Ted Ts’o is well informed and a great speaker, it was just the
history material was a little long, and not really on topic.
The second course I took was Joshua Jensen’s “High-Availability Linux
Clustering” which was a pretty good course. My only comments is, I would have
liked to see a demo, even if it was a short one.
My session highlights
The Disney animation studio presentation was fantastic, I was very impressed
with what they have done with traditional management roles and how they are
evolving to something that I think will work a lot better for pretty much all
medium to large IT organizations and I do hope that more people will follow
their example. Disney Animation studios is now on my list of “cool companies
I would like to work for someday”.
The “surprise plenary” session on flamegraphs was also very good. It was a
surprise because the scheduled speaker got sick and had to go to hospital (she
is okay) so they needed a plenary session and had 1 hour notice. Branden had
one hours notice to take a 30 minute talk and turn it into a hour and half talk
and he was up till 4 in the morning coding on a new type of graph (Chain
graphs) I think he did a really good job.
The closing plenary session was called “Post Ops: A Non-surgical (personal)
tale of fragility and reliability” by Todd Underwood (Google). It was a fairly
amusing talk with a lot of interesting points. Todd is offering that sysadmins
are going to go the way of the dodo bird, or more specifically that the ops
part of devops is going to go away and that everybody should become developers.
There is a part of me that agrees a little but I think there is a rather large
disconnect with the industry that is, and what google is doing. I think Todd
may have his google glass blinders on and is only seeing things from the google
perspective. While I’m sure it’s nice place to live I don’t think it is grounded
in reality.
There are people who are using 10 year old software and hardware now, a lot of
people. The same is going to be true in 10 years from now. In 10 years there
will still be people using windows 7 and 8, and just starting to get
a plan together to adopt windows 2023 / windows 18 or whatever it is called.
Administration of those devices is going to be the same then as it is now.
I can understand his point, if you think about a small office with 20 employees
who are, say real estate agents who have 20 chromebooks or 21 with a spare,
everything is wireless and they microsoft office 360 / google docs. I don’t see
the need for an IT person. In the last 10 years or perhaps even today there
would be a tipping point. Maybe when the office reaches 50 people or 100, an
IT person would eventually be hired. Will the tipping point drastically move
or be eliminated entirely? People are still going to be terrible with computers
and need help. I think helpdesk people are safe at least. :)
Other highlights
The so called “Hallway track” was really great this year. I got to have breakfast
with Tom Limoncelli and Tobias Oetiker and out for dinner with Matt Simmons
and many others. We had some cool conversations about system administration as
a whole and where we think it is going.
Birds of a feather
The birds of a feather this year were very good, although I did find that there
were more vendor bofs than there were last year. Not sure if that is good or
bad.
Books
Books that were recomended at the conference (That are not on safari bookshelf):
It seems as though we are transitioning from the title “System Administrator”
to “Site Reliability Engineer”. I think this is kind of an odd choice. The SRE
name has been around for a long time, it’s just now instead of google using it
many other companies are switching as well.
The old:
“System” is a broad term, and it fits. Most system administrators I know are in
charge of anything that has blinking lights on it, and often many other things
that don’t.
“Administrator” is a person who manages (takes charge of). Again I think this
makes sense for what we do.
The new:
A site reliability engineer is a odd title to me. “Site” in the modern sense
of the word infers a web-site, which can be very complicated, or very simple.
In the old sense of the word like a job site, implies you are concerned with
the reliability of an entire physical location (job-site).
“Reliability” makes sense, we you want to make sure the website/job-site is
reliable.
“Engineer” is somewhat of an issue for me personally, but I think this is more to
do with a Canadian upbringing. Where I am from, one doesn’t up and call
themselves an engineer without having a degree in engineering. This comes from
the Ritual of the Calling
where after (some|most) Canadian engineers wear the iron ring.
I am not an engineer, and around here people would consider it disingenuous if
I called myself a site reliability engineer without having a proper engineering degree.
As well, I’m not just concerned with the reliability of the site, I’m also
worried about capacity planning, future proofing, scaling, security, updates,
etc.
I wonder how long it will be before someone wants to call me an SRE instead of
sysadmin?
Small Infrastructure Bof
I attended the small infrastructure Bof, even though my infrastructure is not
really that small. I did the same last year; I find them interesting. One
of the topics that came up was:
Is LISA still relevant for the small / medium system administrators?
I think system administration is splitting into groups of complexity. According
to me it looks something like this:
< 100 boxes/vms/nodes/devices/whatever you want to call it
small shops
< 1,000
medium shops
< 5,000
large shops
< 50,000
very large shops
> 50,000
massive shops
LISA has a tendency to prefer talks from massive shops or contractors that work
with massive shops. This makes sense because the massive shops have a tendency
to be the ones who are pushing the industry forward. Also, the L in LISA refers
to large. However when LISA started in 1986 a large system was what we now
consider to be small.
I don’t currently think that we need to split LISA up, but I do think the LISA
organizers need to be somewhat
mindful to the small shops and what they are gaining from the conference. The
company I work for fits somewhere between medium and large shops, and I was in
the lisa session: Enterprise Architecture Beyond the Perimeter, and it WAS
interesting information but this was more a massive shop problem. I didn’t find
the talk relevant to anything I was doing or would be doing in the small /
medium term. I could have got up and left the talk, but like I said, it was
interesting information.
This year I organized the Lightning Talks
and plan on doing the lightning talks each year at linuxfest. We had a great
variety of lightning talks, it was pretty fun.
I also did a session called “I wanna be the guy - The arduous path to senior
sysadmin or How to be a better system administrator” which you can see here:
It is my first talk at a conference but I hope to do more in the future.
It took a while to get online, as I first did the video with
openshot but what was on the video preview was
not what the end video looked like. So I tried to work around it with openshot
but eventually gave up. Did the same thing uses a friend’s copy of
Vegas and it was a much
nicer experience. Openshot is working on a new
UI
and I look forward to using it the next time I need to cut some video together.
The fest
Linuxfest Northwest 2013 was very fun, I like the sessions and meeting fellow
linux enthusiasts. I’m more interested in finding other system administrators
so I am always on the hunt to find them. Maybe I need to do an early birds of a
feather for sysadmins so I can find out who the sysadmins are nice and early
and then try to find them through-out the day.
The ID badge for lfnw deserves special mention. I have been to several
conferences but this was the first that I have been to that had this type of
badge. The badge was essentially a small booklet, on the front and back was
your name and information; but inside the booklet was the conference schedule,
and maps to the various rooms and events going on. This was very cool.
The after party on Saturday was at a museum called The
lightcatcher. There were several nice exhibits, I liked the
glass sculptures; but I didn’t think to take any pictures
(which may have not been allowed, I don’t remember). I didn’t stay very late at
the after party because I needed to finish up my slides (for the talk i was
giving the next day).
My luck hasn’t been good with it, but I do some stuff that is off the beaten
path, so it is likely my fault. I am fortunate enough to have extra hard drives
kicking around so install a OS on a spare hard drive to test out it’s
installation is no problem for me. However in this case I have a shiny new PC
to setup.
The new PC is ASUS P9X79LE, w/32 gigs of ram, and a 3.6 ghz proc.
Why use Fedora?
I use fedora because I work at a CentOS shop, and using fedora keeps me fresh
on what changes are coming to the RHEL ecosystem. There are quite a few big
changes coming in the RHEL pipeline so it’s just a good way of seeing what is
on it’s way and gives me a chance to explore how the new stuff works.
Installation
Anaconda (The fedora gui installer) has been completely re-written, and was
(at least partly) responsible for the delays in getting fedora 18 released.
The new gui looks nice, but I have some complaints. Currently when you select
the encryption option it encrypts the entire physical volume, rather then
encrypted the logical volume. This means, that it’s encrypting the whole hard
drive instead of just a particular partition.
Typically in my linux installations I setup a /home partition and encrypt it
using luks. The previous version of anaconda this was easy to do. I have
commented on a bug report
about this.
I setup my /home partition unencrypted at installation time, so I will fix this
later in this writeup.
Dual Booting with UEFI
I have used UEFI bios many times in a server environment, but this is my first PC with a UEFI
bios at home. One of the fun parts is, you don’t need to worry about windows
over writing grub anymore (or grub forgetting to add the windows partition(s)).
With a UEFI bios you pick which drive you are going to boot with using the bios
menu. NOTE: I still have a windows partition for the occasional video game that
I can’t get working with wine.
My motherboard is a Asus P9X79 LE, and the original bios version did not work
with the F8 boot menu. If I selected the UEFI boot order in the bios itself it
would work, but if I hit F8, it would not boot off any other drive. I did a
BIOS update and that solved the problem.
Speaking of Bios Updates
I usually do BIOS updates using the windows boot disks, because generally
speaking vendors test the windows portions more than the linux ones. So I saved
the bios upgrade to c:\bios on my windows partition, loaded the asus EZ flash,
navigated to the bios file, and it failed saying that the bios wasn’t a EFI
bios.
After some googling I found that the EZ flash is able the navigate ntfs folders
but can’t load bios files from ntfs (which is really odd). So I copied the bios
file to /boot/efi on my linux partition (which is a VFAT partition iirc) and
fired up the EZ flash again and it was able to find and use the bios upgrade
just fine.
After the bios upgrade I am able to use the F8 bios boot menu just fine.
Post Installation
I like to record my steps on what I do after installing so I can come back if I
ever need to.
I will do a whole blog post on my vim setup on a much later date.
tmux is a screen replacement that is awesome.
terminator is a graphical terminal that is also awesome.
xchat for gui irc
pidgin for instant messaging
thunderbird for email
git for source control
and wget and curl for pure utility
This is the thing that searches for typos when you typo. it slows things down
far to much for fast typers. I have a fancy .bashrc that I’ll talk
about some other time.
NOTE: Before the zsh people get on my case about not switching to zsh, it’s not
something I’m ready to deploy on all my work servers, so I’m sticking with bash
until I can deploy zsh everywhere I work.
1
yum update
Boot settings
fstab
NOTE: Don’t mess around with your fstab unless you know what you are doing.
I have a ssd disk, so I need to add the ‘discard’ option to the fstab for
the partitions that are ssd backed.
but do that for each partition that is ssd, though not on the luks partition as
that would be foolish (iirc luks partitions ignore the discard option).
grub
NOTE: Don’t play with your grub settings unless you know what you are doing.
Being that I have a ssd drive, I will want elevator=noop for my scheduler, and
I am not a fan of the graphical boot so I will disable that via removing the
rhgb and quiet options.
I will also be using the nvidia driver, so I will disable the nouveau kernel
module via rdblacklist=nouveau
If you haven’t used grub2 yet it might be a little jarring of an experience.
Rather than the simple vi /boot/grub/grub.conf, you now get to:
12345
modify:
/etc/defaults/grub
then
grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg
(or grub2-mkconfig -o /boot/grub2/grub.cfg and use a symlink).
If you think the old way is better you are not alone, but if you look at the
mess that is your new grub.cfg you will be happy for the cfg generator.
but due to a bug added “-o nouveau” to the dracut option.
so:
1
dracut -o nouveau /boot/initramfs-$(uname -r).img /boot/$(uname -r)
Upon rebooting my system would crash. This was
annoying. So I did a rescue, and changed to boot to runlevel 3 (no gui).
After a reboot, I checked a “lsmod |egrep ‘nouv|nv’” to see what drivers I was
loading. There was no nvidia, and no nouveau. So the blacklist/dracut was working but
I wasn’t loading the nvidia kernel module either. A manual modprobe resulted
in:
12
modprobe nvidia
ERROR: could not insert 'nvidia': Required key not available
So what key are we talking about here? Let’s get an strace going:
1234567891011
open("/lib/modules/3.7.2-204.fc18.x86_64/extra/nvidia/nvidia.ko", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1", 6) = 6
lseek(3, 0, SEEK_SET) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=15202848, ...}) = 0
mmap(NULL, 15202848, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fe78915b000
init_module(0x7fe78915b000, 15202848, "") = -1 ENOKEY (Required key not available)
munmap(0x7fe78915b000, 15202848) = 0
close(3) = 0
write(2, "ERROR: could not insert 'nvidia'"..., 61ERROR: could not insert 'nvidia': Required key not available
) = 61
munmap(0x7fe78a0c8000, 324765) = 0
I’ve had some experience with signed modules before, but it was a long time
ago. Hrm
12
readelf -S /lib/modules/3.7.2-204.fc18.x86_64/extra/nvidia/nvidia.ko |grep -i sig
(nothing returned)
no signature. This must be something else.
I tried booting with the kernel option:
enforcemodulesig=0 and module.sig_enforce=no, neither worked.
After a lot of googling, I found that the issue was with secure boot
as soon as I turned it off (in the bios settings), I was able to load the nvidia module fine. There is
surprisingly little documentation on the internets about this error.
I posted this
in hopes to save someone else some time (It took longer than I would have liked to
figure it out).
Desktop
I have been using cinnamon or frippery (cinnamon on the laptop, frippery on the
desktop) since fedora 15 came out (because gnome2 was ditched). Fedora 18 has
MATE which is the gnome2 fork. I’m really looking forward to using my old
desktop again.
MATE:
Ahh MATE, I used Gnome2 as my desktop for many years, so this was a big sigh of
relief. Even though I am going crazy daily due to
a keyboard shortcut bug
I am loving using mate. As far as moving windows from
monitor to monitor (I have a 3 monitor setup) mate works a lot smoother than
everything else I have used so far. Mate has been available in other versions
of fedora, I just totally missed the boat on it.
The other issue I have with fedora’s installation of mate, is that it doesn’t
install the mate-screensaver rpm. I
opened a bug report on this
so hopefully it will be fixed soon.
1. hit ctrl-alt-f2, login as root, and type 'init 1'
2. mkdir /homebackup
3. rsync -av /home/ /homebackup
4. umount /home
5. lvdisplay (find your logical volume name mine is lv_home, find your volume
group name, mine is vg_shinix)
6. cryptsetup --verbose --verify-passphrase luksFormat /dev/vg_shinix/lv_home
7. cryptsetup luksOpen /dev/vg_shinix/lv_home home
8. mkfs.ext4 /dev/mapper/home
9. mount /dev/mapper/home /home
10. rsync -av /homebackup/ /home/
11. restorecon -v -R /home
12. cryptsetup luksUUID /dev/vg_shinix/lv_home >> /etc/crypttab
13. vi /etc/crypttab
Your crypttab will have 1 UUID sitting there
ie:
37ff7165-31c0-4863-aba8-876692e6bc67
You need to take that uuid and add "luks-" then add a space "UUID=" the same
uuid again, space none. It should look like this when you are done:
luks-37ff7165-31c0-4863-aba8-876692e6bc67 UUID=37ff7165-31c0-4863-aba8-876692e6bc67 none
NOTE: i know the luks-string can be anything you want, this is keeping it like previous
versions of fedora, and I like consistency.
14. vi /etc/fstab
find your old /home entry, copy and paste it, comment out the orig.
#/dev/mapper/vg_shinix-lv_home /home ext4 defaults 1 2
modify the new one to point at luks-(the same uuid as before) so:
/dev/mapper/luks-37ff7165-31c0-4863-aba8-876692e6bc67 /home ext4 defaults 1 2
reboot to make sure it comes up on boot.
NOTE: if you have disabled graphical booting as I have, the password prompt
DOES show up, but due to the nature of systemd running many things at a time,
it scrolls by super fast, and is sitting waiting for your input. Hit backspace
once and the password prompt will appear.
Disabling services
Gone are the days of chkconfig –list, we need to use systemctl. Skipping over
a debate about systemctl, let’s move right into how to get going with it.
1
systemctl list-unit-files --type=service
You will see a nice color coded list of your services that are set to
enabled, disabled, or static. If a service is ‘static’ that means it’s a
dependency of another service. For now, ignore static services and concentrate
on enabled services.
Let’s figure out what all this stuff is.
12345678910111213
#if you type
systemctl
#you will get short description of what everything is, you can work with just
#that no problem.
#I like to see what rpm the services come from, here's how I do that:
systemctl list-unit-files --type=service |grep enabled |awk '{print $1}' |xargs locate |grep "/usr/lib" >> /tmp/list1
for i in $(cat /tmp/list1); do printf "\nservice $i\n" >> /tmp/list2; rpm -qif $i >> /tmp/list2; done
less /tmp/list2
NOTE: Yea I could do this all in 1 long command, but this is easier to
understand for the sake of anyone who might read this. :)
The above should give you a nice txt file that will show the systemctl file,
and the rpm information about that rpm, for each enabled service.
I’m a big fan of disabling what I don’t need.
NOTE: You may need some of these services, I don’t.
So looks like I’m disabling:
atd : the AT Daemon, I never use it. I don't know anyone that does anymore.
bluetooth : not using any bluetooth devices
cups : never will print from here
libvirtd : have a different box for virtual machine playtime.
rpcbind : not using nfs
spice-vdagentd : not using spice / libvirtd
avahi : don't want this service
ksm : not using qemu/libvirt
ksm-tuned : not using qemu/libvirt
rngd : don't have a hardware rng device
sendmail : eww. removing sendmail installing postfix.
sm-client : part of sendmail
systemd-readahead-collect.service : readahead is not needed for ssd drives imo
systemd-readahead-drop.service : disabling readahead
systemd-readahead-replay.service : disabling readahead
vi /tmp/disableme
atd.service
bluetooth.service
cups.service
libvirtd.service
rpcbind.service
spice-vdagentd.service
avahi-daemon.service
ksm.service
ksmtuned.service
rngd.service
sendmail.service
sm-client.service
systemd-readahead-collect.service
systemd-readahead-drop.service
systemd-readahead-replay.service
avahi-daemon.socket
cups.socket
pcscd.socket
rpcbind.socket
cups.path
(save exit)
for i in $(cat /tmp/disableme); do systemctl stop $i; systemctl disable $i; done
When I did this, I got:
Warning: Stopping cups.service, but it can still be activated by:
cups.path
so I did a plain 'systemctl' and went through everything. cups.path is all I
want to get rid of, so I added it to the list.
There are still some services that live in chkconfig, let’s see:
I don’t have any ibm power raid devices so the ipr stuff can go. I don’t have
any iscsi gear at home so that can all go as well.
12345
chkconfig iprdump off && service iprdump stop
chkconfig iprinit off && service iprinit stop
chkconfig iprupdate off && service iprupdate stop
chkconfig iscsi off && service iscsi stop
chkconfig iscsid off && service iscsid stop
Nothing to worry about there. Let’s check the cron jobs.
12345678910111213141516171819202122232425
ls /etc/cron\*
/etc/cron.deny /etc/crontab
/etc/cron.d:
0hourly raid-check sysstat vnstat
/etc/cron.daily:
cups hplip logrotate man-db.cron mlocate.cron prelink tmpwatch
/etc/cron.hourly:
0anacron mcelog.cron
/etc/cron.monthly:
/etc/cron.weekly:
cat /etc/crontab
(no jobs here)
Well. don't want about hplib and cups, but cups is a part of lsb, so not point
in getting rid if it (a yum remove cups would uninstall half your os).
yum remove hplip
The file for cups is just a tmpwatch so I'll leave it alone.
Misc
Utils
1
yum install htop iotop sysstat vnstat keepassx
htop is a fancier top
iotop is for tracking down IO issues
sysstat is for sar/iostat/mpstat/etc
vnstat is for network card monitoring
keepassx is a password manager
random
I setup my prompt_command so that it flushes my history after every single
command. I also like to setup a much larger history file. I have a fairly fancy
prompt_command setup but I’ll talk about that another time.
I will be putting all my dot files on github, I just haven’t got around to it
yet. Need to clean a few things up first. :)
Fonts
I am a big fan of dejavu-sans-mono for terminals / programming, which fedora
has default installed now. Just need to change the system monospaced font to
it. Droid sans mono
is really nice as well.
Closing
Having been using Fedora 18 for about a week now, I’m pretty happy with it. I
will be upgrading my laptop and work desktop in the next week or two, so I can
get MATE on everything.
There is a lot of complaints about fedora 18, people are bitching about the
gnome3 changes, I am not sure what they are talking about; I used gnome 3 long
enough to install mate. :)
Gnome 3 is now the internet explorer of desktop environments.
The new anaconda is immature, there is no question. But it’s all python based,
so I think it will evolve quickly to support what the old anaconda could do.
As far as Fedora being the worst redhat
distro I have to
disagree. RedHat linux 6 in 1998 or so (not to be confused with redhat
enterprise linux) had a broken dhcp implementation, a grub bug where if you
pushed a key to boot (the booting linux in 3 seconds bit), it would only use
your first 4 partitions on your disk and ignore any other ones, and so many
other horrible bugs. Redhat linux 5 was so much better. Seemed like a giant
step backwards at the time (or was it from 6 to 7? hrm. maybe. it was 12-13 years
ago it’s a bit fuzzy and so not worth googling).
I convinced my $dayjob to spring for a trip to
LISA12 in San Diego. Some places it
is easier than others to convince management it’s a good idea, it wasn’t too
bad for me.
For those who have never been to a LISA before, here is how it works:
Training Program
The event starts Sunday morning with training programs. Some are half day
programs, and some are full day. You have to sign up in advance for what
training program you want to attend*, and the training programs run until
Friday. One day worth of training session(s) costs $710.00 USD (this is the
usenix member price, but since joining usenix costs 50 dollars and the discount
is worth $170.00 USD, pretty much everyone joins usenix). There are various
deals for signing up for more training as well. The available training programs
are layed out
here.
Technical Sessions
The
technical sessions
start on Wednesday, and run through until Friday. The technical sessions are
shorter (1.5 hours or so) but are generally more advanced topics than what is
in the training programs. One day of technical sessions cost $405.00 USD (same
170 dollar discount applied already) or you can get 3 days of technical
sessions for $965.00.
Golden Passport*
The golden passport allows you to go to any technical training, or technical
session. As well, in the technical session conference rooms there are areas
that are reserved seating (right at the front) for people with a golden
passport. This is pretty cool, because if you decide that you don’t like the
training that you are attending, you can leave and go to another one. The
golden passport also comes with a cooler ID badge than everyone else, which
declares to everyone that you are a golden passport holder. It costs $4,075.00
USD, and while expensive would be a great way to enjoy the conference.
Birds of a Feather
The training programs / technical sessions end at 5:00 PM, and there is a 2
hour break for dinner; after which the birds of a feather (aka bof (pronounced
BOFF)) start. There are 2 kinds of bofs, a vendor bof or a individually
organized bof.
A vendor bof usually has booze at the back, to entice you to attend; while the
vendor either tries to sell you on their product, or sell you on working for
them (they try to recruit you).
The user bofs are generally more
random topics
and a lot were organized at the last minute and written on the sign at the conference:
Vendor Booths
Lastly there was the Vendor area, which is what it sounds like. There were many
(59) vendors that had lots of swag to hand out. This is the first conference
that I have been to, that in order to get the better swag (tshirts,
flashlights, etc) you needed to allow the vendor to scan your id badge (which
had a 2D bar code on the back with all of your contact info. Hopefully I don’t
start getting spammed like crazy, but I won’t hold my breath.
Some of my vendor highlights:
Google’s Quest for the pins:
Google gave out 1 pin to everybody, and via their questforthepins.com site you
had to do some simple sysadmin questions to progress to the next pin. There
were 5 pins in total, and the questions got harder as you go, but most people
attending got all 5 pins (myself included).
Rackspace’s breakfix challenge:
Rackspace had 2 laptops setup with virtual machines running CentOS 6.3
that were broken, and you had to fix the virtual machine. They timed
how fast you could solve the problem. Us geeks love fixing things so
this was enjoyed by many. I did the challenge in ~9 minutes or so, I
could have gone faster but was typo’ing a lot on the tiny laptop
keyboards (and the sun was in my eyes :) ).
My Experience
This year’s LISA was held at the
San Diego Sheraton which was a pretty
nice hotel. Here is the view from the balcony of my room:
While attending LISA (or any conference really) I think it is a must to stay at
the conference hotel. I arrived Monday evening (So I could be ready for
Tuesdays training programs), pretty much right after checking in I went to the
bar and asked some guys wearing geek tshirts if I could join them, and was made
welcome. Almost all of the tables in the bar and lobby were occupied by people
using their phones/tablets/laptops and talking shop.
Many of the conversations I had in the lobbies and hallways were worth the
price of admission by itself. If you were thinking about deploying $product
you could easily find some people who had already done so, and were moving on
to something better, or maybe tell you horror stories of just how bad (or
fantastic) $product was. Or tell you what $product_competitor was like. The
best part being, that these were colleaguesnot salesmen, and their input
was insanely valuable.
I especially enjoyed the level of expertise at the conference. There was an
extremely good chance someone more expert than you was nearby and happy to talk
shop.
What I did
I wanted to checkout what the training was like, but was more interested in the
technical sessions. So, I did 1 day of training, and 3 days of technical
sessions.
The nice thing about doing 1 day worth of technical sessions, you get a USB key
loaded with the training materials for all of the technical sessions. So
according to me you would be remiss if you did not do at least one day of
technical training. Having the training materials is not as nice as being in
the classroom, and being able to ask questions and whatnot but it is pretty
cool to be able to check out. So far I’ve only gone over a few of them; but
this is due to the full schedule each day provides (if you choose to go to
everything).
The training/sessions run from 9:00 AM until 5:00 PM (with breaks for food),
then the bofs start at 7:00 PM and run until 11:00 PM. After the bofs I would
hang around in the lobby or bar until 1:00 AM talking with various groups of
sysadmins.
One of the things that surprised me, was just how far some people traveled to
be at LISA. I met people from Belgium, Germany, Ireland, Norway, Australia,
Brazil, etc. Next year’s conference is in Washington DC, which is quite a bit
further for me to travel. However, when I think about all the international
travellers I met, I don’t think I should complain.
Another cool thing, was after the first day, I started to see a few familiar
faces at a few of the technical sessions that I was going to, so we started to
hang out in between sessions. We then made plans to go out for lunches and
dinners to talk shop. Since then we have all exchanged info, and I look
forwarding to talking to them on IRC (I have put quite a few names to faces
on the #lopsa channel on freenode).
My Highlight Reel
My favorite training session: NOTE: I only took 2 half day sessions
Ganeti: Your private virtualization cloud - Tom Lemoncelli & Guido Trotter
My favorite bof:
I’ve made a huge mistake (organizer unknown)
This deserves a bit of a write up. My favorite bof was a user bof entitled
‘I’ve made a huge mistake’. It was very last minute. It was was kind of
like an alcoholics anonymous for sysadmins. People shared their screw ups,
and what they did to fix them. It was pretty awesome (I don’t think I can
repeat any of those stories), including and especially my own :) )
My favorite technical session:
15 years of DevOps
I was really floored in seeing the 15 year old slides talking about the
same problems and issues that we are having now. Here is a pic of the
devops talk:
My favorite quote from the conference:
(from the disruptive technology panel)
“Software is going to stick with us like Herpes.” - Theo Schlossnagle, OmniTI
The reception (Shaken, Not Stirred)
After Thursday’s events were finished at 5:00 most people dropped off their
gear and grabbed a coat (it was raining and slightly chilly) and several buses
took us to the ‘Grape Street Peer’. We went aboard a large boat:
and had a nice meal with an open bar, and got to play blackjack, roulette, and
craps with some fake money that was handed out. It was pretty funny, because I
would say 50+ percent of attendees knew how to count cards. The staff handed
out extra fake money to whoever lost quickly so it didn’t really matter. As a
bonus we got to keep some of the custom LISA’12 casino chips at the end of the
night.
Final Thoughts
I quite enjoyed the experience, and look forward to attending another LISA. I
will hopefully be going to the Washington DC one, but if not I will for sure be
at the 2014 one in Seattle.
DSET is “Dell System E-Support Tool”, it is available for windows, 32bit linux and 64 bit linux. Essentially dell uses this tool to gather information about your server to help them troubleshoot. I’m only writing about the 64 bit linux version.
why do you care?
If you call dell tech support sooner or later (sooner I’m betting) they are going to ask for a dset. I wanted to dig into, what they are gathering and share with my fellow sysadmins.
The story
I had a raid controller battery go bad after 2 years, still under warranty, but dell wanted a dset. Never mind I had a dmidecode for the firmware versions and some megacli64 output to show the battery reports, they wanted a dset.
Here’s how my conversation went:
dell: There is a firmware bug that falsely reports a bad battery when your battery is still good, we are going to need a dset.
me: does dset need root?
dell: yes.
me: what does it need root for, as in, what exactly is it running?
dell: I don’t know.
me: okay I’ll download it, check it out, and call you back.
I’m in the medical industry, so installing new/unvetted software on production servers is usually a no-no, and it makes me nervous. So I wasn’t about to install it without some testing and analysis (on a blank virtual machine).
In fairness to dell, I could have said “I’m a gold customer with a 4 hour turnaround, ship me the new battery now please” and they would have. but I like to be co-operative with my fellow techies if I can.
The analysis
I’m not one to trust binaries that need to be run as root very much, so let’s take a look at what we are getting ourselves into.
at time of writing the latest version is 3.2.0.141_x64_A01.bin
Due to running this on a virtual machine it didn’t want to install from running the script, so I extracted it manually via:
tail -n+20 dell-dset-3.2.0.141_x64_A01.bin | tar -xvz
The tarball doesn’t make it’s own directory it litters a bit in your current directory (shame shame).
install.sh
The install.sh does a bunch of pre-checks to see if you have supported hardware before collecting it’s data. It’s fairly simple to disable the check and run it anyway or add your system to the ‘support_hw_list’ or just copy some exports from the install script and run (more on this later).
Once it decides it will install it will install one/some/all of the following depending on what options you choose:
Alright, not going to install this in the production environment already, but lets throw caution to the wind on this vm.
I’m running centos, so depending on what options I select, the dset installer may install sblim (sublime), with nodeps of course. This could be problematic if you are also using the epel sublime package.
At least they are no longer using rpm –force –nodeps like they were in previous versions of dset (which they still get you to use if you are using rhel 5.x)).
The install.sh parses out what you want to do and runs the ‘collector’ program with the various options (more on this later).
Manual Install
Going to play with the dell-dset*.rpm’s first.
For fun I looked at a rpm -ivh dell-dset*.rpm to see what dependencies I was about to ignore. What’s weird is they are packaging all of their dependencies, so I’m not sure why they don’t just update the spec to do a provides: bla and fix it. Maybe they are trying not to mess with the rpm database, but if that was the case, why are we using rpms at all? Running rpm –nodeps is almost the same as doing a tarball. How much stuff are they doing in their rpm’s %pre and %post that they can’t do in their install script? I digress.
Let’s get this installed.
rpm --nodeps -ivh dell-dset*.rpm
This will install to /opt/dell/advdiags/dset
The Collector
cd /opt/dell/advdiags/dset/bin
./collector --help
File "/usr/lib/python3.1/site-packages/cx_Freeze/initscripts/Console3.py", line 27, in <module>
But not something we can easily look at without getting a python decompiler or strace involved.
Our paths won’t be correct but dell gave us a collector.sh which will setup the correct paths for you (and restore your old path when you are done) that’s nice.
./collector.sh --help
there we go. a nice help file so we can figure out what we want to do. Being that I’m on a vm, the hardware option isn’t going to do much for me. So I’ll start with the software.
./collector.sh -d sw
It asks right away for my root password (I’m already running it as root). If I reached this stage in production I would abort, but I’m on a vm, so I’ll break out and change root to ‘1234’ and run it again. But not to worry, the dset documentation states:
NOTE: Root credentials are necessary for the DSET
Provider to collect inventory or
configuration information about the system.
DSET does not store this password. The root
password must be specified each time a report
is collected.
We’ll find out if this is true or not shortly.
./collector.sh -d sw -p 1234
huzzah! I have a report.
./collector.sh -d sw -p 1234 -v yes
Now I have a report with privacy enabled to compare.
The Report
The report gets thrown into a passworded zip file. The password is completely meaningless as, if you unzip it with no password, it unzips a text file which tells you the password is ‘dell’. So I unzip’ed again this time with the super secret password. The password is the same in privacy mode or non privacy mode.
The non privacy report
The non privacy report gathers it’s data from many cfg files and logs. It also parses the data into xml/xsl pages which I assume dell has a nice tool to go through quickly to see what’s what.
From looking at the logs directory the collector is gathering the following:
cat boot/grub/menu.lst
cat boot/grub/device.map
ls /boot
uptime
cat /proc/meminfo
lsmod
cat /etc/modprobe.conf
cat lib/modules/current/modules.dep
ifconfig
cat /etc/resolv.conf
cat /etc/hosts
cat /etc/sysconfig/network-scripts/ifcfg-*
df
cat /proc/scsi/scsi
fdisk -l
free
hostname
list of all installed rpms / versions / etc
iptables dump
ldconfig
lspci
mount
osversion
print's environment settings
ps
pstree
route (old school route not ip route ls, shame shame)
selinux policies
runs thru the init.d and does a service status on each
sestatus
uname
So much for not saving my root password. It also shows up in:
rawxml/getprocesslist.xml
xml/processlist.xml
Before anyone gets mad at dell: I’m running this in a non standard way, if I used the install.sh the script would not use the -p option, and would instead prompt me for a password (which would not show up in the processlist). Still not sure why it wants me to type the root password when I am already running it as root, but ok.
This is WAY to much information to send to dell for any reason. How are you sending it to dell as well? I sure wouldn’t email it.
The privacy report
In the privacy report you get a lot less data. The logs directory is now blank, so everything is in the gui directory only. Which means we get to go through some annoying xml/xls (note: I find all xml/xsl annoying (don’t ask)). Allot of the same data is gathered, but now just dumped into xml.
We are gathering:
/boot/grub/grub.conf
ls /boot
ls /boot/grub
cat boot/grub/device.map
chassis info
etc/X11/XF86Config
lsmod
cat /etc/modprobe.conf
cat lib/modules/current/modules.dep
list all rpms / publisher / size / install date, urlinfo, description
hardware io ranges
hardware irq info
cat /proc/meminfo
cat /etc/fstab
ifconfig (with ip info and mac addresses "Omitted by user")
cat resolv.conf (everything is omitted by user but domain is still listed. WAT?)
runs thru the init.d and does a service status on each
storage info (df info)
os version and kernel
connected usb devices
I like how the kernel version is omitted in the uname but is in the syssumlist. Heh.
Other options
The collector script has many different options, I don’t have any non-production dell gear right now, so I’m not willing to run the hardware report on a server that has hardware to run it on.
I ran the -lg and -ad options in the collector as well, but there was no difference to the sw logs. I imagine this would be different if I was running on an actual dell machine with actual dell hardware instead of the virtual machine that I’m running this on. :)
Conclusion
I won’t be running the dset tool on any production gear because:
The package installation could cause issues with your system (not staying in /opt/dell, using –nodeps, conflicting sblim package with epel’s)
It wants your root password to be entered at a prompt of a python program, even if you are currently root
the non privacy report gathers way to much info about your system, under no circumstances should this be sent to anyone ever.
the password on the report zip is incredible insecure
add the domain in the resolv.conf to private information, or just don’t parse the resolv.conf at all
So what happened?
I told the dell tech that I couldn’t run the dset tool due to dset doing some bad behaviors; but I had a dmidecode, and some megacli logs to send.
After dell reviewed the logs I emailed they shipped me a new battery.
tldr version
don’t run dset. If you absolutely must use dset, use the privacy option.