geek!daily

... it is by will alone i set my mind in motion ...

LiveBlog: Fixing Twitter

John Adams, Twitter Ops

Ops
  • Small team
  • SW perf
  • availability is their primary focus
All on managed services with NTT
  • No clouds—too high latency
  • NTT runs the NOC
  • Frees them to deal with real thinking compsci probs

752% growth in 2008, trend happens ~11/2008 and keeps climbing

Growth Pain Fear of what’s gonna happen

Mantra:
  • Find the weekest point (metrics + logging + analysis)
  • Take corrective action (process)
  • Repeat
Find weak points
  • Collect metrics and graphs (individual metrics are irrelevant)
  • Logs
  • SCIENCE!
  • Instrument everything! More info is better
Monitoring
  • Keep critical metrics as close to realtime as possible
  • Using RRD, Ganglia + gMetrics, MRTG
  • Mostly on 10s interval, some 5s, some 60s
  • Everyone in company has access to dashboard
  • “Criticals” view
  • Use google analytics for failwhale and other err pages
Analyze
  • Turn data into info
  • Are things better/worse post-deploy
  • Create env of capacity planning, not firefighting—no more cowboys in the wild west
Deploys
  • Ganglia shows final deploy info for twitter, summize, and search
Whale-watcher
  • simple script with massive win
  • 503 is a whale, 500 is a robot
  • Whales per second exceeds whale threshold then “There’s whales!”
  • Darkmode: selectively disable portions of site with automatic notification to product and eng teams to let them know
Config Mgmt
  • You need an automated cfg mgmt system NOW. Else you won’t scale
  • It intros complexity, with multiple admins, unknown interactions
  • Peer review solves most of this; they use reviewboard with svn precommit hook requiring “reviewed by” note in comment and postcommit hook sends note about what changed to people
High communication
  • They use chat (campfire) with docs, graphs, logs, etc.
  • skitch into campfire is a frequent working methodology
Subsystems
  • Many limiting factors in request pipeline
  • Oversubscribe mongrel 2:1 vs. cores
  • Attack plan per ssytem (e.g. bandwidth? bottleneck: network, vector: http latency, solution: servers+; timeline? db, update delay, better algo; search? db, delays, dbs+ and code; etc.)
CPUs:
  • switched to Xeon +30% gain
  • replace 2x and 4x core with 8x core +40%
Rails:
  • Stop blaming rails
  • Analysis: caching/cache invalidation, AR makes bad queries, queue latency, memcache/page corruption, rep lag
  • Not so much about Rails
Disk is the new Tape
  • Social networks is very O(n^y) oriented
  • Disk is too slow
  • Need lots of RAM

Lots of caching is possible. Moving libmemcached to native C gem was bigtime helpful.

Nick’s CacheMoney AR plugin: readthru/writethru caching with memcached!

Caching everything not smart, either
  • Cache evictions
  • Cold cache after host failure/new host spinup
  • Cache smarter: get rid of cache busting behaviors, varnish with failover, etc.
RDBMS vs message queues
  • Not everything needs ACID
  • message queues help
  • Most MQs suck at high load
  • They wrote Kestrel for this; looks like memcache
  • Starling was earlier version
Asynch == Good
  • They lean on mongrel heavily (they know it well)
  • Keep external service requests out of the pipeline via daemons which process message queues
  • Size worker daemons appropriately, have them kill themselves off rather than long-run
DB replication
  • Multiple functional read/write masters
  • never read from the master—slows it down too much
  • watch your slow queries
  • use mkill to kill long-running queries before they kill you.

Put up a status blog on some other service—transparency stops armchair engineering

2009.06.23 | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: velocity, velocityconf, velocityconf2009

LiveBlog: Next Web Challenges: It's Still All About UX

Velocity 2009, the conference about performance, is very high-performance about getting people on and off stage (and high density around content)

Umang Gupta, Vik Chaudhary (Keynote)

(omitted: Keynote history, 15 years of continuous improvement, etc.)

Debuting Transaction Perpective 9.0 (TXP9)

  • Embeds real IE browser for monitoring
  • Adds “screen sensing” technology
  • Esp. useful for “next web” apps: flash, video, voice, SMS/mobile—composit transactions or flows

(demo: reservations site for The Broadmoor in Colorado Springs, very flash-integrated with lots of client-side action. “Challenge of screen sensing what’s going on on the screen is non-trivial”. Also http://espn.go.com/video/ and Mini-Cooper flash site)

Using KITE platform/desktop environment to record what you’re doing. You click around, type, etc. and it records a script.

(This is somewhat like what they do at [DeviceAnywhere http://deviceanywhere.com/] for mobile device testing. They don’t focus on UX or perf; they’re more on QA testing side)

Script runs and collects UX and Network times. UX time is net time + client-side execution + rendering. Also shows augmented waterfall inclusive of client-side computation, etc.

2009.06.23 | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: velocity, velocityconf, velocityconf2009

LiveBlog: The User and Business Impact of Server Delays, Additional Bytes, and HTTP Chunking in Web Search

Eric Schurman (Microsoft/bing), Jake Brutlag (Google)

Experiments

  • Server delays (MS and Google)
  • Page weight variance
  • Progressive rendering

They have platforms for experimentation which allow fractional experiments

  • Divide users into small buckets
  • use good methodology (control group, experimental group(s))
  • Way better than usability tests

Server Delays

  • Goal [missed all of this due to an IM. Lesson learned]

Results

  • No statistically significant change @ ~50ms delay
  • Observable and fairly linear impact on delays 200/500/1000/2000ms.
  • Time to first click took ~2x delay—theory: user has opportunity to get distracted

Google Search Delay Experiment

  • Varied type of delay, magnitude, and duration (number of weeks) per user group
  • Pre-header delay: pause server processing upon receipt of req
  • Post-header delay: pause after sending on header, but before sending results
  • Post-ads delay: (ads are structurally first in page, can render before search result) put ads in separate http chunk, delay between ads and search results

Results:

  • Measure average daily searches per user
  • 50ms pre-header delays show no significant impact
  • 100ms pre-head, 200ms post-heads, 400ms post-head, 200ms post-ads (and others) showed linear progression in decreased avg daily searches
  • Also saw increase in internally monitored “abandonment rate”
  • Active users are more sensitive
  • drop-off continued to trend down linearly beyond 4 weeks; effect becomes more pronounced over time, and additive—200ms and 400ms groups diverge more strongly
  • Stopped injecting delays at week 7; recovery was significant immediately, but not fully realized at week 12—there was still a drop in activity for these groups

Page weight experiments

  • injected incompressible comments into various places of page
  • varied size of comments from 5% of page to 500% (most of larger loads were below the fold)
  • small payloads weren’t worrisome (tho stat’ly significant)
  • perf suffered slightly, but was US only experiment; global exp planned, will likely show significantly larger drop in perf
  • Click metrics were hurt more than query metrics

Progressive rendering experiment

  • Goal: determine impace sending visual header before results
  • Build page in phases, send using HTTP 1.1 chunked transfer encoding
  • Results: Large improvement due to parallelization. Time to first click was ~9% faster, more likely to refine query, more clicks, more likely to page thru results

HCI may state that 100-200ms isn’t perceptible; it still has effect.

Getting something to your user quickly is more important than when they receive their last byte

Experimentation platforms make all this research and hard numbers possible.

2009.06.23 | Permalink | Comments (1) | TrackBack (0)

Technorati Tags: velocity, velocityconf, velocityconf2009

LiveBlog: After the Click

More from Velocity 2009. Going really fast, sorry for all the sloppiness.

Jonathan Heiliger, VP Tech Ops FB

FB Mission: give people the power to share and make the world more open and connected.

2004: launch in MZ’s dorm room 2004-5: new apps launched (events, photos, mobile) 2006: news feed and open reg 2007: platform launch 2008: crowdsourced translations; reached 30 langs quickly (spanish 2 wks, french ~24 hrs)

[nice map viz for growth: colorize market penetration]

Radio took ~150 yrs (?) to reach 150M TV: 13 yrs Computers 4 years FB: 3 yrs

How FB deals

Classic battle of Ops v. Eng
  • Ops wants no change—stability
  • Eng wants lots of change—driven by users and site
  • Do you really want to fight it out? Teamwork is required
  • Enable individuals to reach goals, chase team success
  • Make it transparent to users and safe for employees to fail
  • Make it a point of pride: you don’t want to be the one who took down the site (but there’s some cache in that war story)
It’s the people
  • Everyone hires the smartest people
  • It’s about organizing and leading

Tuning the Operating Pipeline (Eng -> QA -> Ops aka Dev -> Test -> Deploy) (this isn’t how they did it)

Engineering is responsible for the efficacy and reliability of their code, writing their own tests, and full lifecycle of code including pushing it live.

Ops provides guard rails to keep eng safe from itself, prevent site downtime. Feature can go down, but rest of site is safe.

Complaints back in the day: Ops: Eng is way too unstructured, lobbing crap over the wall. Eng: Ops is not nimble

Make the problem joint; Eng owns the problem

Continuous build, code review, peer review, perf testing has kept things moving fast while moving to 200+ eng org.

Put engineers in operations
  • Site reliability team: stewards of the site
  • Operations engineering: tooling and glue apps (workflow/pipeline)
Put ops in engineering (consulting engineers)
  • Partners with backend service groups to think about architecture, scaling, reliability
  • Helps mentor into full SDLC responsibility—really understand complete DEV to PROD function of code

Software launch has warroom with PM, Eng, NetEng, SRE, Perf Eng, Site Integrity staff around. Always the right person on hand, physically present.

Getting it done
  • If you can’t work as a team, you’re done
  • Design is awesome, but it needs execution to succeed
Three things they did live expecting to break the site
  • See how the team worked, who would step up, etc.
CNN livefeed
  • Group of 20 some folks came together, marketing, eng, product, ops, etc.
  • Added much capacity, made warroom
  • Written from scratch in ~3 weeks
  • Replicated (and improved) for Oscars, etc.
  • Knew there would be point load much like DoS attack
  • Added throttles to direct features, as well as throttling things like chat, number of thumbnails shown on site, etc.
  • Friends had to be shown on the fly
  • Common content was cached in CDN; didn’t anticipate delay/latency from CDN
  • Didn’t expec users to maddly twiddle “Everyone” and “Friends” tabs (they did) – learned “cache everything”
  • During inauguraton 2M status updates, 8.5K spike at start
  • Dark launched everything with users exercising the stack without any visible UI to users
  • Also built perf framework to see what real user experience would be like
  • Used data from both to appropriately size
Like
  • Simple “I like this” on wall/status
  • Didn’t expect it to get a lot of traffic at first; totally wrong
    • 4.1m users liked 7.1 M times first day
    • 16.3/46.2 1st week
    • 39.6/226.8 1st month
Username allocations
  • Was initially to be auction (codename: hammer)
  • Decided to go first come first served, kept codename—it was going to hammer the site
  • Had to have blocked list of trademarks; didn’t block “asp.net”
  • Dark launch, found issues, delayed initial launch
  • Launched at 9p; huge cache hit within moments, no increase in idle latency (means they got it right, maybe a little overprovisioned)
  • Made pages as light as possible
  • Tiny blip in overall load
Datacenter infra/organization is hugely important
  • Untidiness reflects bad organization
  • DC/infra is 2nd biggest exp after people
  • Invest where appropriate

Distribute accountability Test with users “The only place success comes before work is in the dictionary” – Vince lombardi

Expects org to look different in a year—evolution is the key.

2009.06.23 | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: velocity, velocityconf, velocityconf2009

LiveBlog: Surviving the 2008 Elections at DailyKos.com

More from Velocity 2009:

Jeremy Bingham, DailyKos.com

Before the flood

  • fell over a lot
  • took a lot of admin time
  • slow load times

MySQL issues

  • legacy tables from nearly-decade-old Slash with bad primary keys
  • many MySQL 3.23isms—didn’t use any newer features
  • had to stop believing “things are there for a reason” (e.g. fulltext index: 9GB data, 17GB index)
  • keep all the old URLs working

IA Caucus first big night

Caching

  • Started with apache 1.3 as proxy, disk caching with push to disk
  • Brought site to its knees as everything updated cache in sync
  • switched to lighttpd using Vua, mod_magnet
  • switched to mod_mcpage, cache pages only

Hardware

  • 10 2x Xeons 2GB, 2 2x Opterons 8GB, image/memcached/combined search/SMTP server
  • All independent, update separately
  • updated to 6 4x Xeons 8GB, 2 8x Xeons 16GB, RAID 10 with well-tuned xfs for DB

Traffic more than doubled at election peak over normal monthly, almost 3x

People liked to talk about Sarah Palin … a lot. How nice that she provided things to talk about.

Changes were in place by April 2008/Pennsylvania primary

  • perf was good
  • flash electoral map was on 100Mb switch
  • webhead loads ~0.5-1.8
  • Ads and map added some slowdowns

2009.06.23 | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: velocity, velocityconf, velocityconf2009

LiveBlog: Hadoop Operations

I'm at Velocity 2009, sitting in on the "Hadoop Operations" talk.

Jeff Hammerbacher, Chief Scientist, Cloudera (email is first six of his last name at his company dot com). He has an ambitious agenda for this session and talks very fast, so sketchy notes and abbrevs for me. Pardon the crappy formatting.

slides are here.

Built data team at FB. ~30 ppl when he left. Built Hive and Cassandra.

Good resources:

  • “Hadoop: The Definitive Guide” by Tom White (must have)
  • “Hadoop Cluster Management” slides by Marco Nicosia’s 2009 USENIX talk

Hadoop: OSS for WSCs (warehouse-scale computers)

Typical cluster: 1U 2×4 core, 8GB RAM, 4×1TB SATA, 2×1 gE NIC; one switch per rack with 8 Gb intfc to backbone. Think 40-node-rack as unit.

HDFS: breaks files to 128MB, replicates blocks across nodes. W1RM design. checksumming, replication, compression included (tell you three times). Hooks in via Java, C, command line tools, FUSE, WebDAV, Thrift. Not usually mounted directly.

[how does it handle many small files? see HAR files below, see Common problems below, no statements about performance]

HDFS looks to diversly write blocks (across racks) using topology info.

MapReduce uses HDFS api to assign work to where the data is.

Avro: cross-language serialization for on-wire/RPC and persistence, includes versioning and security

HBase: Google’s BigTable lookalike on top of HDFS

Hive: SQL-like interface to structured data stored in HDFS. Replace DWH.

Pig: lang for dataflow programming.

Zookeeper: manage a distributed system

Good ways to dip your toes with Hadoop:

Projects:

  • Log or msg warehouse
  • DB archival store
  • ETL for DWH
  • Search team projects (autocomplete, did you mean, indexing)
  • Targeted web crawls (market research, etc)

Clusters:

  • use retired DB servers
  • use unused desktops
  • use EC2

[skipped a lot about how the project runs, apache voting, etc.]

Don’t run Hadoop across two data centers; one per and communicate at the app layer. [this sounds a lot like the rules for MPI et al ca. 1999-2000]

Make sure to use ECC RAM. High volume mem churn requires it.

Linux/CentOS “mildly preferred”

Mount local FS “noatime” for performance.

Recommend ext3 over xfs. Local FS performance improvements (e.g. xfs) don’t necessarily translate to global perf improvements (network bottlenecks consume it). Mentioned an xfs long-write problem.

JBOD over RAID0; slightly better performance and losing a disk doesn’t suck as much.

Java 6 update 14 or later (update 14 makes 64-bit pointers as cheap as 32-bit).

Installation: http://www.cloudera.com/hadoop

“In our distribution we put [things] where they ought to be.” Register with init.d, etc.

Configuration: http://my.cloudera.com/

You spec topology and whether JT/NN live on same machine, it spits out the rest. Hangs on to it for you, too.

Config modes

Standalone mode:

  • Everything in one JVM
  • Only one reducer, so you might not be able to find the bug

Pseudo-dist mode:

  • All daemons on one box using socket IPC

Dist mode:

  • For production

Config files

  • xml based
  • org.apache.hadoop.conf has Configuration class
  • Later resources overwrite earlier; “final” keyword prevents overwrite
  • common-site.xml, hdfs-site.xml, mapred-site.xml
  • Look in .template for examples

Cloudera admins their soft-layer cluster with Puppet “with varying level of success”. He’s seen Chef, cfengine, bcfg2, and others.

Problems in config:

  • “The problem is almost always DNS”—Todd Lipcon
  • Open the necessary ports (many) in firewall
  • Disting ssh keys (Cloudera uses expect)
  • directory permissions (writing logs)
  • Use all your disks!
  • Don’t try to use NFS for large clusters
  • JAVA_HOME set right (esp. on Macs)

Nehalems ~2x performance improvement

HDFS NameNode ("the master")

VERSION file specs layoutVersion (negative number, decrements for each new). You hope this doesn't change much; upgrade is painful

NN manages fs image (inode map, in mem) and edit log (journal, to disk).

Secondary NN (on different node) aka checkpoint node (v0.21): replays journal and tells primary to forget some history to prevent the edit log from becoming ridiculously large.

Backup node: write same data to NFS to recover if local node blows up

DataNode: round-robins blocks across all nodes.

  • Heartbeats to the nodes
  • dfs.hosts[.exlcude] to allow/deny clients

Client:

  • Use Java libs or command line
  • libhdfs c library lacks features and has memory leaks (and FUSE interface uses it)
  • Client only contacts NN for metadata
  • Client keeps distance-ranked list of block locations for data reads
  • Client maintains write queues: data queue and ack queue (writes three times, can't forget request until all three are ack'd).
  • First datanode in write takes responsibility for pass-down-the-line write requests rather than having client spray data at all 3/n data nodes expected to write.

Can't seek and write, nor append. So you create new each time.

HDFS Operator Utilities

Safe mode

  • Loads image file, applies edit log, creates new (empty) edit log
  • Datanodes send blocklists to NN
  • NN uses this during startup, will only service metadata reads while in safe mode
  • Exits safe mode after 99.9% of blocks have reported in (configurable); only one replica of block must be known (can rereplicate)

FS Check (hadoop fsck)

  • Just talks to NN to look at metadata
  • Looks for minimally rep'd, over/under rep'd blocks
  • Identify missing replicas and rereplicate, blocks with 0 replicas (corrupt files)
  • `hadoop fsck /path/to/file -files -blocks` to determine blocks for file
  • Run ~1 hr in production, store output

dfsadmin

  • admin quotas
  • add/remove datanodes
  • ckpoint fs image
  • monitor/manage fs upgrade

DataBlockScanner

  • cksum local blocks (with bandwidth throttling)
  • Runs ~3 weeks (configurable)

Balancer

  • goes thru cluster, makes disk utilization scores per datanode
  • rebalances if nodes are more than +/- 10% (with throttling)

Archive Tool

  • HAR file: like tar file, many entries in one HDFS namespace
  • Makes two index files and many part files (hopefully less than # of files you're har'g)
  • Index files are used for lookup into part files
  • Doesn't support compression and are W1RM.

distcp

  • Move large amounts of data in parallel
  • Implemented as MapReduce with no reducers
  • Can move data between data centers with this; can also saturate the network pipe

Quotas

  • apply to directories, not users or groups
  • namespace quotas constrain your use of the NN resources
  • diskspace quotas constrain your use of the datanodes' resources
  • No defaults (can't make new directories pick them up)

Users, Groups, Permissions

  • Relatively new
  • Very UNIXy
  • Executable bit means nothing on file
  • Need write on dir to add/remove files
  • need exec on dir to access child dirs
  • identity of NN process superuser

Audit logs

  • Not on by default, but useful for security

Topology

  • Uses to compute distance measures for replication
  • Node, Rack, Core Switch
  • Some work to infer from IP

Web UIs

  • There are many
  • NN @ port 50070: /metrics /logLevel /stacks
  • 2NN @ port 50090
  • Datanode @ port 50075

HFDS Proxy: http server access for non-HDFS clients

ThriftFS: thrift server for non-HDFS clients

Trash:

  • Helps recover from bad rm’s (indavertent rm -rf happened on FB cluster)

Common Problems

  • Disk capacity: crank up reserved space, keep close eye on space, watch hadoop logfiles
  • Slow disks which aren’t yet dead: can’t see as fail, but you have to watch
  • NIC goes out of gig-E mode
  • ckpoint and backup data: keep an eye on 2NN node, watch NN edit log size
  • check NFS mount for shared NN data structure
  • Long writes (> 1 hr) can see things get freaky; break them down
  • HDFS layoutVersion upgrades are scary
  • Many small files can consume namespace: keep an eye on consumption

Turn on fairshare schedulers (Cloudera rus it out of the box)

Use distributed cache to send common libs to all nodes

JobControl: good way to express job depedencies

Run canary jobs (sort, dfs write) to test functional status

Upgrades are scary. This will be less true as it reaches 1.0

One admin can easily carry a medium (100-node) cluster. Most activity is around commission/decommission.

Try not to lose more than N nodes, where N is your replication factor. You could hit the jackpot on those being the only three replicas of some needed block.

2009.06.22 | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: velocity, velocityconf, velocityconf2009

LiveBlog: Intro to Managed Infrastructure with Puppet

I'm at Velocity 2009 and very happy to see Luke Kanies presenting around Puppet. If you're not familiar, Luke's a hard-core sysadmin type with the typical sysadmin bent for highly automated systems management solutions which require as little maintenance as possible. If you're looking at things like cfengine, cft, etc. then you should be looking at Puppet as well.

Meanwhile, it's been a while since I've live-blogged a session, so let's see if my stream of consciousness technique still flows (or has stagnated):

It's a workshop, not a talk; to follow along git clone git://github.com/reductivelabs/velocity_puppet_workshop_2009.git

You'll also want the slideshow gem to see the slides (which are in the git repo): sudo gem install slideshow

You also probably want to install puppet. You can get it from github, through MacPorts, but easiest is sudo gem install puppet

Seek help: #puppet on freenode, puppet-users Google group

The usual problems obtain: keep everything configured correctly all the time.

Puppet provides a resource abstraction layer. Do you remember which command removes gems? Is it the same as the command to remove a package (via MacPorts, rpm, yum, et al)? Why work to remember all of those.

ralsh (resource abstraction layer shell) gives you direct access to the abstraction layer

ralsh package -- list all pacakges on your machine via any known package installer; responds in puppet code.

(it chokes a bit if you have rpm installed via MacPorts and don't run it as root. sudo ralsh package is a good workaround. That's a bug in the rpm port; it should be able to query as non-root users.)

ralsh user -- provides you the info around that user

ralsh user

ensure=present shell=/bin/tcsh home=/Users/

-- be sure the user exists. Runs idempotently. First run creates; second takes no action. Change args (shell, home) and it will change to agree. Change 'ensure=absent' and it's blown away.

Luke now uses ralsh for interactive administration to avoid having to remember all the various details of which args for what command, etc.

The language is mostly declarative and is very simple. No loops; only recently getting conditionals from "this crazy French guy".

Aliases allow you to think of things by alias/title rather than technical name:

package { ssh:
name => $operating systems ? {
debian => "ssh",
openssh => "openssh",
default => "sshd"
},
ensure => installed
}

... and now you can just talk about the "ssh" package, abstracting you from naming strangenesses. You'll never have a DB of all the strangenesses, so you can at least to insulate yourself.

Executables: puppet, ralsh, puppetd, puppetmasterd, puppetca (there's more, but that's what we'll touch today)

puppet executable allows you to fiddle and iterate via -e and --noop

>puppet -e 'file { "/tmp/foo" : ensure => present}'
notice: //File[/tmp/foo]/ensure: created

(note: puppet barfed for me here because ~/.puppet/var didn't exist; mkdir -p ~/.puppet/var fixed that up)

>puppet -e 'file { "/tmp/foo" : ensure => absent}'
notice: //File[/tmp/foo]: Filebucketed to with sum d41d8cd98f00b204e9800998ecf8427e
notice: //File[/tmp/foo]/ensure: removed

I was curious about the filebucketed thing, so I looked:

> ls ~/.puppet/var/clientbucket/d/4/1/d/8/c/d/9/d41d8cd98f00b204e9800998ecf8427e/
contents paths

Turns out contents is the file itself, paths is the path from which it was removed.

With noop:

> puppet --noop -e 'file { "/tmp/foo" : ensure => present}'
notice: //File[/tmp/foo]/ensure: is absent, should be present (noop)

You should keep a single repo of your config/code (see http://github.com/albanpeignier/gepetto/) which you can treat as an application.

A puppet "module" is related config and code (classes, plugins, etc) to handle a particular function -- "Why do you have this?"

(jump into repo/modules)

puppet --configprint modulepath -- what modules will be loaded? (also: confdir and vardir)

>puppet --modulepath $PWD/modules -e 'include foo'
notice: //foo/File[/tmp/foo]/ensure: created

Put your nodes into the site manifest: manifests/site.pp ... for simpler sites. As site gets more complex, there are ways to hook puppet to a DB.

default node matches all unmatched nodes. You can also inherit:


node my_host inherits default {
notice "I'm your host!"
}

puppet makes it easier to capture the many little uninteresting twiddles you forget you did in the middle of the night; you also capture them as somehting you can execute to repeat them.

puppet uses ssl certs for client and server to allow identity verification. Particularly, it uses self-signed certs

Running puppetmasterd in dev:

>mkdir /tmp/server
>puppetmasterd --verbose --no-daemonize --modulepath $PWD/modules \
--confdir /tmp/server --vardir /tmp/server \
--manifest $PWD/manifests/site.pp --certdnsnames localhost

... and it's all rock-and-roll from here.

Luke (paraphrased): "Puppet uses SSL just like your bank uses it -- so most SSL errors are not Puppet's fault. Be sure you know what you're doing before you twiddle your Puppet conf around SSL; most of the errors people report turn out to be their own attempted cleverness biting them."

Now use puppetd to talk to puppetmasterd to get config, etc. without knowing about anything:

> puppetd --test --confdir /tmp/server --vardir /tmp/server \
--no-daemonize --server localhost
info: Caching catalog at /tmp/server/state/localconfig.yaml
notice: Starting catalog run
info: Creating state file /tmp/server/state/state.yaml
notice: Finished catalog run in 0.01 seconds

It takes about 15 seconds for code changes in the repo to propagate through puppetmasterd.

... and then it's a blur as he crammed the rest of the prezzo into the last 15 minutes, so I quit typing and just listened.

2009.06.22 | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: velocity, velocityconf, velocityconf2009

LiveBlog: Death of a Web Server

The first Velocity 2009 session for me today. Semi-useful, but seemed very .NET focused. I didn't take a lot of notes around it for that reason.

For me, most of his prezzo reduced to:

  • Make sure your instrumentation has a light touch
  • Know what you're caching
  • Be sure it's getting hit (don't cache singleton queries)
  • Be sure your TTL is well-set (short, perhaps sliding)

Otherwise, most of the interesting bits came from Q&A:

Everyone has a favorite load generator; many love the appliances. They "make a webserver cry". Mercury LoadRunner seemed like a crowdpleaser.

Log playback can be challenging as you frequently don't have all the post data, response data, etc. Hard to know how "real" it is. An alternative is to inject a transparent proxy on the front end and capture everything both ways for short periods.

One fellow in the audience is using "Siege" and EC2 instances. [I think this is the Siege he meant. I wonder why no one mentions using Tsung, formerly Tsunami, and EC2 ... here's the one decent ref I could find]

Realisitic tests: be sure what you're using to load it resembles your production load or it's not very useful.

All rely on scripted sessions; there's a gap in converting log data into that script.

2009.06.22 | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: velocity, velocityconf, velocityconf2009

Got the "can't activate" Gem::Exception Blues? gem cleanup!

What to do when rubygems can't activate a version of a gem because it's already activated a different version of the same gem? My Google magic wasn't good enough to find this one quickly, so I'll happily point out Jesse Hu's nearly one-year-old post about the same problem, descended from a Ruby on Rails thread on ruby-forum.com where, unsurprisingly, it's Jeremy McAnally who knows that `gem cleanup` is the way to go.

Took nearly 45 minutes. Freed untold amounts of disk. Much happiness.

2009.06.14 in Ruby/Rails | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: ruby, rubygems

Cucumber, Webrat ... Who Names These Things?

I'm tinkering a bit with Cucumber and Webrat for my day job and am very excited by some of the prospects for our QA group's automation efforts. Along the way I'm finding that I have to explain how all the moving parts relate to each other, so I made this diagram:

Cucumber Diagram

Cucumber, based on RSpec, uses Webrat to drive Selenium. Seems to make it all make sense to the folks with the questions.

(and thanks, brynary + all the webrat contributors, aslakhellesoy + all the cucumber contributors, and dchelimsky + all the rspec contributors, and the whole selenium crew ... this is very, very cool stuff).

2009.06.10 in Ruby/Rails, Testing | Permalink | Comments (0) | TrackBack (0)

« Previous | Next »
My Photo

About

 Subscribe in a reader

AddThis Social Bookmark Button

Categories

  • Administrivia
  • Blogs
  • Books
  • Business
  • Computing
  • Data Portability
  • Economics
  • Electronics
  • Engineering
  • Environment
  • Facebook
  • Food and Drink
  • Fun!
  • Graphing.Social
  • Hacking
  • History
  • Identity
  • Leadership
  • Linux
  • MacOS X
  • Management
  • Metadata
  • Open Source
  • Organization
  • Parenting
  • People
  • Photography
  • Privacy
  • PublicSquare
  • RailsRumble
  • Reputation
  • Ruby/Rails
  • RubyConf 2007
  • Science
  • Social Networks
  • TagEverything
  • Technology
  • Testing
  • Thinking
  • Trust
  • UI
  • Web 2.0
  • Weblogs
  • Writing

Archives

  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • January 2009
  • September 2008
  • August 2008
  • July 2008

Words on a Page

  • Carol Tavris: Mistakes Were Made (But Not by Me): Why We Justify Foolish Beliefs, Bad Decisions, and Hurtful Acts

    Carol Tavris: Mistakes Were Made (But Not by Me): Why We Justify Foolish Beliefs, Bad Decisions, and Hurtful Acts

  • Steven Gary Blank: The Four Steps to the Epiphany

    Steven Gary Blank: The Four Steps to the Epiphany

  • Chip Heath: Made to Stick: Why Some Ideas Survive and Others Die

    Chip Heath: Made to Stick: Why Some Ideas Survive and Others Die

  • Patrick M. Lencioni: Silos, Politics and Turf Wars : A Leadership Fable About Destroying the Barriers That Turn Colleagues Into Competitors

    Patrick M. Lencioni: Silos, Politics and Turf Wars : A Leadership Fable About Destroying the Barriers That Turn Colleagues Into Competitors

  • Marc Ian Barasch: Field Notes on the Compassionate Life : A Search for the Soul of Kindness

    Marc Ian Barasch: Field Notes on the Compassionate Life : A Search for the Soul of Kindness

Pages

  • If
  • The Tagline Graveyard