geek!daily

... it is by will alone i set my mind in motion ...

Live Blog: Web Aggregation, What Works and What Doesn't

[note: I originally scribbled this on paper thinking I could hand it off immediately, preventing the obligation of typing, posting, etc. Turns out I don't get off that lightly, so here's the spew in electrons.]

Scraping isn't a scalable model.

There are biz issues around aggregating data: many businesses don't want you to get their data, though many are becoming more open.

Doing aggregation right:
* minimize latency
* maximize engagement

When latency is high, it causes confusion and takes you out of real-time

Doing conditional gets can be somewhat useful.

Plaxo had to shard their crawlers, which lands you in the shared state/sync problem of any stateful system you want to scale horizontally.

Gnip integration has been good:
* Offload the long-running processes
* Gnip offers alerting or "fat ping" (ping includes update data)

Plaxo likes using the alert to escalate the priority of the crawler which fetches the rich data related to the update. This approach allows you to use a consistent model for content ingestion vs. get info from fat ping, then augment later.

Smarr: "Brad Fitzpatrick said, 'Make polling a special case of push.'" He attributed this to someone but I missed the attribution.

(Don't try to keep up with Joseph Smarr on paper. He's thinks too many cogent thoughts too quickly to preserve legibility)

Plaxo uses TripIt's RSS feed as alerting, grabs item ID, then uses their APIs to fetch rich data.

There's a move to homogenize the info from sites, which may not be a good idea. It suppresses the distinctive look and feel/experience of the publishing site. Allowing for these differences means more labor spent on making one-off shims, which increases maintenance. Still, right choice in order to provide value to the user.

Activity streams seek to provide more rich data in a somewhat normalized, extensible format.

Many/most sites aren't yet perfectly architected for real-time's push, ping, etc.

PubSubHubBub and Activity Streams are externally represented data shards

Plaxo's Pulse started with known architecture issues (in order to ship) and hit the wall sooner than expected. Threw hardware/software optimizations at the problem to move the wall far enough to give time for rearchitecture, sharding, and working out how to propagate changes throughout the system properly.

None of the NoSQL alternatives are quite ready for prime-time. Smarr: "It should be something that's just a primitive."

Conversation platforms are slightly different sorts of aggregation platforms. There are UI diffs (e.g. pause the stream when indicating interest). Handling the transition from slightly-latent/passive real-time to synchronous real-time/active not yet well-developed (think: when a comment inspires a conversation)

90-99% of the value of the real-time web is realized in not-real-time [unreal-time? ;] This is a big deal for discovery. Twitter and FB make this harder by obscuring history.

Ideal scalability/performance would be an index per user. This would be grossly inefficient due to the number of duplicate entries.

No one has nailed reader-controlled aggregation (Show me Joe's tweets and blogs but not his photos) quite yet.

Smarr: "If we're all kinda [sharing], we're all making each other smarter"

The firehose of info is a hard model to scale to. Ben Metcalfe proposes the garden hose -- a firehose filtered at the source according to your interests, which helps aggregators by allowing them to request the superset of all filters from a given publisher.

We really want to push contexts to the publishers and let them determine which content fits that context. Context shifts over time: Joe doesn't normally read my tweets (and why would he?) but when we're at a conference together, he's much more interested (thus the popularity of hashtags). This is a geographic and purpose-driven context (the conference) as well as Joe's context on me (Jim knows where the good bars are).

Folks like Twitter are so overloaded with info that they might not recognize non-immediate contexts that are interesting to me.

There's also the risk of exposing users to the amount of correlatable public data they have. Many don't want you to apply a transitive closure to identify them in all spaces even though doing so allows you to present a much more convenient UX around what they want you to aggregate.

Someone likened the real-time aggregation problem to a bar conversation: you get snippets here and there and follow your own thread of interestingness.

Three fundamental themes:
* How to specify contexts to data provider/publisher
* How to control access to private data (and carry ACLs with that data)
* How to do all this efficiently

Plaxo implemented polling-back-off (poll infrequently updated sources less frequently). Turns out this is a bad idea, as it introduces latency which makes it feel broken.

There's also the issue of aggregating conversation about web objecs (like blog posts) and how not to divert the conversation from the publisher's site. However, sometimes you want a private discussion of a public object (cf. LinkedIn company groups discussing an article)

Q: What's the state of open standards around this?
A: PubSubHubBub and Activity Streams are very exciting. OAuth as access delegation. There's still a lot of ground to cover.

2009.10.15 in Data Portability, Identity, Social Networks, Web 2.0 | Permalink | Comments (0) | TrackBack (0)

Technorati Tags: rtws, rtwsummit

Reports of the Business Card's Death Are Greatly Exaggerated, or why bzCards != biz cards

Caught the article "rmbr launches mobile app to get rid of business cards" on VentureBeat which covers the announcement rmbrME, the new electronic connection tool from rmbr. (note: the rmbr.com site, which launched as a photo organizing site, currently redirects to rmbrme.com) The general idea is to trade vCards over email, IM, SMS, etc. and get rid of those pesky piles of pasteboard. If the ease of use isn't incentive enough, they're mixing in the funware idea via contests and leaderboards.

You may have guessed by now, but I think they've got the beginnings of a good idea and a poor implementation. I'm particularly amused by rmbr founder Gabe Zichermann's assertion that the business card's time has come and gone. Here's why I think he's wrong:

  1. Business Cards Are Infinitely Customizable: I can quickly extend or personalize the information provided by my business card*. I can also correct bad information on the spot*, e.g. my title's changed, or I have a new phone number. And the only device required is any writing instrument.
  2. Business Cards Don't Require Information Exchange: I don't have to ask you to give me contact info in order to give you my contact info. This preserves a level of anonymity which should not be undervalued. I wouldn't give contact info to everyone I've accepted a business card from; I've later chosen to contact some of those people.
  3. Business Cards Are Trivial To Distribute: I can hand you a business card in just a few seconds, less if I'm already handing them out. I can place a stack of business cards in a tray on a counter to be taken by those interested without having to make any contact at all. I can drop them in the fishbowl of an excellent bakery and cafe in Wilson, NC in hopes of winning some goodies for the next time I'm there. I can hand it to someone who doesn't have a device. These examples all demonstrate the business card's continued practical utility.
  4. Business Cards Are Static: They provide a clearly limited, time-sensitive set of information about me. If I change jobs, companies, phone numbers, email, or what-have-you, your data isn't current anymore. Effectively, the data ages and, in doing so, provides me some additional privacy with regards to those people with whom I've not formed a more permanent relationship than the original business card exchange (which, if I have, I likely want to provide a more dynamic link to my contact info). This is also true of a vCard exchange, but unclear with regards to bzCards. In my opinion, the lack of an analog to this aging process is a flaw in social networks which is becoming evident; it's currently all or nothing.

That said, I applaud the attempt toward a more dynamic contacts list and easier connections. There are some pieces still missing that, imho, are being overlooked by folks headed in that direction, but they'll come soon enough. And I'm utterly underwhelmed by the idea of competitions to send out the most business cards. It's quality, folks, not quantity.

* The business card with hand-written phone number comes from a 1997 Mother Jones article. Ironically, that lovely corrected business card image is from the "Why Use It?" page of MyDetails.biz, another digital business card replacement.

 

2008.08.20 in Business, Identity, People, Social Networks | Permalink | Comments (2) | TrackBack (0)

Technorati Tags: business cards, bzcards, digital identity, rmbrme

Stupidity In The Name Of Security

Just created an account with WordPress.com and find it amusing that they show a password strength meter to encourage your to choose a cryptographically strong password (which is good), then display your password in large type that's clearly readable from 12-15 feet away on the unsecured activation page. Oh, and they email it to you, too.

Bruce Schneier would be amused.

2008.08.12 in Identity | Permalink | Comments (0) | TrackBack (0)

Data Portability In A Nutshell: You Own Your Data

I've already gotten pings (don't you people have other things to do? ;) about this morning's post on the LinkedIn blog, mostly around a more detailed explanation of the data portability goals with some privacy overtones, so I'll give a terse summary here:

Some useful jumping-off points:

  • Data Portability "Action Packs" (aka summaries, executive and other)
  • DPW Design Goals and Principles
  • Work To Be Done/How To Contribute
  • (Incredibly Rough Working Draft) Technical Blueprint

I'm only just getting caught up with what's gone before on this effort, so I don't have much more to offer than to say openness is good as long as it clearly respects privacy. It's that last bit about privacy which I think is getting lost in the thundering herd of press coverage, but that's what motivated me to get involved. It's my firm, personal belief that portability must account for privacy; you own your profile and your connection to me, but you don't get my profile and personal data in the bargain (unless I offer it).

While I don't agree with Mr. Howlett's title assertion, I absolutely adore the UK Data Protection Act (hey, Congress, you might have read this prior to the DMCA ...) and its intent and I'd expect it can and will be fully embraced in this effort; in that regard, I think Danny Ayers has some ideas heading down the right path, and in fact Robert Scoble had thoughts along these lines just after the debacle that shined a spotlight on data portability in the first place.

Context matters. Context always matters.

2008.01.10 in Data Portability, Identity, Privacy, Reputation, Social Networks, Trust, Web 2.0 | Permalink | Comments (0) | TrackBack (0)

One Social Graph To Rule Them All? Not Hardly.

On the heels of my corny-but-very-workable galactic identity metaphor, I'll risk another post on identity to follow up on something said by Reid Hoffman in his keynote today at the Graphing Social Patterns conference. It's being picked up all over the blogosphere: One Social Graph To Rule Them All?

Reid's conclusion was no, there won't be. As he themed his talk, it all comes down to use cases and, at the end of the day, those use cases inform what problems you're solving and how you go about solving them. Each solution you implement closes some (hopefully less interesting) doors and opens some (hopefully more interesting) ones. Just like with people, the choices you make along the way define who you are. Mercator, Lambert, they served different purposes.

He's right, and there's another reason he didn't really touch on: sometimes folks don't want the graphs to overlap, or want to actively hide any overlap. One example I've been quoting in conversation lately is the S&M Grandma*. She really, really wants those graphs to be separate. Other sides of the world kinda separate. She wants you to think that Greenland is Iceland. Believe it.

(I love blogging; it makes me do research and find great sites like Radical Cartography and read cool things in Wikipedia)

* Rule 1, remember?

2007.10.08 in Graphing.Social, Identity, Social Networks | Permalink | Comments (1) | TrackBack (0)

The Galaxy of Identity

I've been learning about, thinking about, and talking about identity and reputation for some time now and only recently hit upon a comfortable metaphor whose analogy seems to extend far enough in all the needed directions. And it's geeky, so you know I'm pleased.

Consider yourself a planet at the center of your own universe. It is, after all, all about you. You're located at some unique, identifiable place such that anyone given that location will find you and know it's you (e.g. OpenID). Arrayed about you are many other celestial bodies, some of which are in tight or loose orbit of you (family, friends, colleagues, etc.), many of which have occasionally tangential orbits (acquaintances); mapping your galaxy is the job of the social graph.

There are also stars of various (possibly variable) brightness, as well as comets which look a lot like a variable star but occasionally pass very close, increasing their brilliance. Not to mention novas, supernovas, black holes, etc. Yeah, I like this metaphor.

But let's ignore the galaxy for a moment and focus on the planet-that-is-you (I said it's all about you, right?). We can never see all of you at once; at best, we get half. You've got terrain — places you're higher, deeper, broader, narrower than others. You've got mysteries, regions no one's ever seen or documented*. Weather frequently distorts or obscures visibility. There's some periodic cycle you observe.

Okay enough metaphorical fun. What can we infer from this model and knowledge of our own globe? Here's a couple; I'm sure that more will emerge.

1. No perspective is ever complete; it's only by observing over time we can begin to get a complete picture. Even then, there's no map which documents the terrain completely and accurately. That's usually due to cartographic (e.g. Mercator vs. Polar projection) or political biases (e.g. Greenland is icy, Iceland is green)

2. Certain perspectives are valuable in certain contexts and valueless in others. The WOUB Weather Man is very interested in one particular perspective; his interest wanes inversely to how congruent any other view is.

I'm going to have fun with this metaphor. I hope you'll enjoy the ride.

* Rule 1: Don't Visualize. Rule 2: No, Really, DON'T VISUALIZE. If you break the rules, you suffer the consequences.

2007.10.08 in Identity | Permalink | Comments (0) | TrackBack (0)

My Photo

About

 Subscribe in a reader

AddThis Social Bookmark Button

Categories

  • Administrivia
  • Blogs
  • Books
  • Business
  • Computing
  • Data Portability
  • Economics
  • Electronics
  • Engineering
  • Environment
  • Facebook
  • Food and Drink
  • Fun!
  • Games
  • Graphing.Social
  • Hacking
  • History
  • Identity
  • Leadership
  • Linux
  • MacOS X
  • Management
  • Metadata
  • Open Source
  • Organization
  • Parenting
  • People
  • Photography
  • Privacy
  • PublicSquare
  • RailsRumble
  • Reputation
  • Ruby/Rails
  • RubyConf 2007
  • Science
  • Social Networks
  • TagEverything
  • Technology
  • Testing
  • Thinking
  • Trust
  • UI
  • Web 2.0
  • Weblogs
  • Writing

Archives

  • April 2013
  • March 2012
  • August 2010
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • January 2009

Words on a Page

  • Carol Tavris: Mistakes Were Made (But Not by Me): Why We Justify Foolish Beliefs, Bad Decisions, and Hurtful Acts

    Carol Tavris: Mistakes Were Made (But Not by Me): Why We Justify Foolish Beliefs, Bad Decisions, and Hurtful Acts

  • Steven Gary Blank: The Four Steps to the Epiphany

    Steven Gary Blank: The Four Steps to the Epiphany

  • Chip Heath: Made to Stick: Why Some Ideas Survive and Others Die

    Chip Heath: Made to Stick: Why Some Ideas Survive and Others Die

  • Patrick M. Lencioni: Silos, Politics and Turf Wars : A Leadership Fable About Destroying the Barriers That Turn Colleagues Into Competitors

    Patrick M. Lencioni: Silos, Politics and Turf Wars : A Leadership Fable About Destroying the Barriers That Turn Colleagues Into Competitors

  • Marc Ian Barasch: Field Notes on the Compassionate Life : A Search for the Soul of Kindness

    Marc Ian Barasch: Field Notes on the Compassionate Life : A Search for the Soul of Kindness

Pages

  • If
  • The Tagline Graveyard