Welcome! Scroll down for the latest blog posts, find out a bit about me, or check out my projects.
A quick twitter bot, @bc_l
Several months ago, on a whim inspired by an off-hand comment from Chris, I created a bot to bring the wonders of the Unix bc language to twitter.
bc is a command-line calculator that’s fast and has the capacity to do some fairly complex math.
Try it out on the command line:
echo '100 / 10' | bc -l
…Or by sending a direct message to bc_l (if you follow bc_l it will follow you back within a few hours).
I released the code under GPL, and it’s available on github: http://github.com/hmason/tweetbc.
John Cook mentions the bot and makes some great observations in his post three surprises with bc.
July 26, 2010 1 Comment
Conference: Web2 Expo SF
I gave a talk called A Data-driven Look at the Realtime Web Ecosystem at the Web2Expo SF conference in May in San Francisco. I attempted to highlight some of the interesting facets of the bit.ly data set, and it appeared to be well-received (showing up on TechCrunch, ZDNet, and a few other places).
I attended the full conference, and it was great. The attendees were extremely international and I met a ton of fascinating people.
I’m still getting a couple of e-mail requests per week for my slides and materials, so they’re posted below for posterity.
The slides:
And the video:
As always, I welcome your questions or comments.
June 24, 2010 3 Comments
E-mail automation, questions and answers
Welcome! I’ve gotten several hundred e-mails about my e-mail management code. I do want to share it as soon as possible. Here are the answers to the most common questions.
Why separate scripts?
My philosophy is based on the unix command-line tool model; Each script should be simple and useful alone, but when combined together they become extremely powerful.
Why don’t we have the code yet?!
I had no idea the talk would be shared beyond the couple hundred people in the audience or that it would be so popular! I started my position at bit.ly the same day I gave that IgniteNYC presentation, and I also have some other awesome projects that are competing for time.
I have to admit that the trained classifiers are all based on my personal data and were also trained mostly through tweaking in ipython. I need to finish a generic framework for people to train their own filters before I can publish that piece of the system. I promise, I’m working on it.
Keep nagging me — nagging works!
Are you going to commercialize your scripts / can I invest?
I have certainly thought about commercializing the application, but I’m uncomfortable asking people to give me access to their personal e-mail data (even if there are very interesting things to be learned by aggregate analysis).
Just imagine how much more creative, interesting work could be done if we could partially free the world from the e-mail workload… that alone is worth making the code open.
How does it work? What tech are you using?
The scripts run on my gmail account through IMAP (and should work with any IMAP interface, though I’m sure there is debugging to be done). They live on a Linode VPS and run individually via cron jobs.
Most of the scripts are in Python. I use NLTK and libsvm (in addition to my own code) for the data analysis.
I primarily use the gmail web interface (though I’ve flipflopped between Mail.app and Thunderbird for a while), and the only cost is that I have to manually reload the page to see new labels and new drafts appear.
Do your scripts go mad with power and e-mail inappropriately? Are you some kinda robot?
I have all of the scripts deposit suggested responses in the draft folder, and then I use the gmail “multiple inboxes” feature to keep the draft folder up in the UI. It’s very easy to go through and modify or delete responses before they are sent.
Of course, I only thought of that after one of the script DID go a bit mad. I’m still sorry about that, Mom.
I’m not a robot, though of course I would say that anyway! The point of the automation is to remove the stupid parts of e-mail and leave me free to personally address the interesting messages.
If you’ve read this far, there are a few things I would love your feedback on:
What’s a kickass name for this project?
More important, which features/scripts are you most interested in seeing first? The nag script is about ready to go, but I’d like to know where to focus my time.
THANK YOU!
May 27, 2010 42 Comments
Stop talking, start coding
I read Out of the Loop in Silicon Valley in the NYTimes today, which explores how and why women are under-repesented in tech startups. From the number of retweets I saw and the clicks through bit.ly links (12,579 at the time of this posting), it’s been getting a lot of attention.
There are some very strong, compelling themes in this article. Computer science and engineering to have an “image problem”; the way we teach math to elementary school students is horrible and turns way too many away.
I don’t want to nitpick the article, but there are a few statements that reinforce the very damaging stereotypes that the article sets out to dispel.
“When women take on the challenges of an engineering or computer science education in college, some studies suggest that they struggle against a distinct set of personal, psycho-social issues… Even women who soldier though [sic] demanding computer science and engineering programs in college…”
I’ve been both a computer science student and a computer science professor. I have not seen any evidence that the average undergraduate computer science education is harder than physics, math, chemistry, biology, or many other ‘hard’ disciplines with a much stronger gender balance. Implying that women are unwilling to meet the intellectual challenges of the discipline is bullshit.
“Girls have certain family goals they want to accomplish,” she says. “Working 60 hours a week is difficult because it requires a life sacrifice.”
The men that I know and work with also have wonderful personal lives. Working 60 hour weeks is a sacrifice for them, too.
Please read the whole article. Let me know what you think when you see the material in context.
I’m going to make the assumption that we all believe that having more women in technology is a Good Thingtm.
Many groups have popped up that support women in technology, like Girls in Tech, She’s Geeky, and many others (enumerated in Digiphile’s thoughtful post Why Including women matters for the future of technology and society). More often than not, these groups are the canned food drives of the women in technology movement. They make you feel better, they might do a little good, but they offer no fundamental change to the system that created the problem in the first place.
The Grace Hopper Celebration of Women in Computing does this well. GHC invites women to come to one place, be together, and do science together.
We don’t need affirmative action for women in tech. We need to create experiences that nurture women and men so that more people are inspired to can create beautiful, technical things together.
April 18, 2010 54 Comments
Art and Technology: Seven on Seven
I’m honored and excited to be participating in Rhizome’s new conference Seven on Seven, where technologists and artists are paired up to create a completely new project in 24-hours.
The formal description:
Seven on Seven will pair seven leading artists with seven game-changing technologists in teams of two, and challenge them to develop something new –be it an application, social media, artwork, product, or whatever they imagine– over the course of a single day. The seven teams will unveil their ideas at a one-day event at the New Museum on April 17th.
I really love this idea because the time constraints and the inherent discomfort of the situation (working in an unfamiliar space with an unfamiliar person) makes it likely that we’ll be able to accomplish something creative and unexpected. Or else it will go completely awry, which will still be amusing for the audience.
I’ve had a lot of fun and been able to work on some interesting projects at hackathons in the past, and I hope this one will be even better.
The event has been covered by an amazing assortment of blogs, including TechCrunch, BoingBoing, Art Fag City, Stowe Boyd and Andrew Parker.
March 14, 2010 1 Comment
Conference: Search and Social Media 2010
I recently attended the Third Annual Workshop on Search and Social Media, an academic workshop with very strong industry participation. The workshop was packed, and had some of the most informative and interesting panel discussions I’ve seen (not counting the one I spoke on!).
Daniel Tunkelang did a great job of writing up the specific presentations on his site and on the ACM blog, so I won’t attempt to re-create the presentations line by line at this late date. Rather, I’d like to highlight a few open problems and research questions that came out of the discussions that I hope to see developed in the next year.
Social search consists of a set of problems including (but hardly limited to) search of social content like status updates, real-time search, generating, labeling, and finding user-generated content, ‘long-tail’ events and interests, finding vs re-finding, and trend identification.
What data is available to social search? There are many kinds of social data, from e-mail (private) to blogs (public) and tweets (mostly public) — what is and should be searchable? How do we handle issues of privacy and identity management?
How do we compute relevance, taking into account freshness, accuracy, and degrees of social separation?
Will the architecture of these search engines look like the search engines we’re currently familiar with?
How do we evaluate accuracy and truthiness of social data?
How do we characterize social connections, around concepts like strong vs weak ties, and friend-of-a-friend vs friend-of-a-friend’s-friend? Can we converge on a single social graph representation?
How do we best filter social data to lead to accurate recommendations for content discovery? How do we accommodate the fact that as we move beyond static factual data, two people using the same query may be looking for very different results?
Finally, how do we deal with the chasm between the industry participants (who have LOTS of data) and the academic participants, who suffer from a lack of public (and publishable) data?
Thanks again to the organizers – Eugene Agichtein, Marti Hearst, Ian Soboroff, and Daniel Tunkelang – who put together a fantastic event.
For more on this and a cool demo, check out Gene Golovchinsky’s look at the SSM2010 twitter coverage.
February 16, 2010 2 Comments
SMS to e-mail gateway: The SMS doorbell
Over at NYC Resistor, it was getting cold, and we needed a doorbell so visitors wouldn’t be stranded outside when the building was locked. A standard wireless model didn’t work reliably (the space is on the fifth floor, just out of range), so various members generally resorted to writing their phone numbers on a sign on the front door when they were expecting guests.
Since almost everyone has a mobile phone already, and SMS-based solution seemed appropriate. In order to implement this we need two things:
- An SMS shortcode
- A system to notify when the shortcode is triggered
It’s irritating and expensive to acquire your own shortcode, but there are several services that will allow you to use one in exchange for a small fee or advertisements in your messages. TextMarks is my favorite (I used TextMarks for my WhereAmI project). While TextMarks markets their service as a system for mobile mailing lists, they allow you to reserve a keyword and define a behavior (that can include pulling data from a URL!) to occur when that keyword is triggered.
Configuring TextMarks
Sign up for TextMarks and choose a keyword. Configure the keyword to respond with the “First 120 characters on web page”, and point it at the future home of your script (you can always come back and modify this later).
Note the \0 as the value of the msg parameter — this instructs TextMarks to send along any additional message contents as the value of that parameter. That means if someone were to text 41411 “doorbell hi this is hilary”, TextMarks would call the script with the param msg=hi this is hilary. This can be quite useful.
The Script
This script is written in Python, but you can use any scripting language you like. This particular script just sends an e-mail to an account when the ‘doorbell’ is rung, but you could have it do pretty much anything up to and including ringing a real bell (which may be coming soon!).
#!/usr/bin/env python # encoding: utf-8 """ doorbell.py Created by Hilary Mason, feel free to use this code in your own projects. """ import sys, os import smtplib import cgi import cgitb; cgitb.enable() class Doorbell(object): GMAIL_USERNAME = 'YOURGMAILACCOUNT@gmail.com' GMAIL_PASSWORD = 'YOURPASSWORD' def __init__(self, msg): message = """\ From: YOURGMAILACCOUNT@gmail.com To: YOURGMAILACCOUNT@gmail.com Subject: KNOCK KNOCK, someone is at the door! %s """ % msg server = smtplib.SMTP('smtp.gmail.com:587') server.ehlo() server.starttls() server.ehlo() server.login(self.GMAIL_USERNAME, self.GMAIL_PASSWORD) server.sendmail('YOURGMAILACCOUNT@gmail.com', ['YOURGMAILACCOUNT@gmail.com'], message) server.quit() print "You knocked! You can also call us at 347-586-9270. <3, NYC Resistor" if __name__ == '__main__': print "Content-Type: text/plain\n\n" form = cgi.FieldStorage() if 'msg' in form: w = Doorbell(form['msg'].value) else: w = Doorbell('There is an anonymous monkey at the door.')
And that’s it! Provided you have your keyword configured to point at your script, and the script living at an accessible address, you’ll get an e-mail whenever your SMS doorbell is rung and the person who sent the message will get back a cute response confirming their action.
Finally…
This setup can be easily extended such that a message containing ‘doorbell hilary’ could e-mail only me, or be forwarded to my phone.
I’m curious to see if having a remotely accessible ‘doorbell’ will encourage pranksters — we might need to add a password.
January 3, 2010 4 Comments
IgniteNYC: The video!
The video of my IgniteNYC presentation is up, and has gotten a great response!
I’m working on removing the me-specific bits from the code and I’ll be posting it as open-source very soon!
December 24, 2009 10 Comments
IgniteNYC: How to Replace Yourself with a Very Small Shell Script
I recently gave a talk at IgniteNYC on How to Replace Yourself with a Very Small Shell Script.
The Ignite events are a fun blend of performance, technology, and speaking skill. Each presenter gives a five minute talk with twenty slides that auto-advance after 15 seconds.
The title of my talk is a classic geek reference (you can get the t-shirt). I’m very interested in developing automated techniques for handling the massive and growing amounts of information that we all have to deal with. I started with e-mail and twitter, both of which are easy to access programmatically (via IMAP and the Twitter API).
In the talk, I went through several of the simple and successful e-mail management scripts that I’ve developed.
I decided to talk about this project because I’m not sure where this should go next, but I got some great feedback and I’m looking forward to future work on the project!
The slides are below, and the full talk will be online soon.
November 25, 2009 12 Comments
My code is on TV (and so am I)!
FoxNY did a piece featuring me and Diana as hackers who use our technical powers for good, not evil.
There are way too few female technologists on television, and I’m happy to do what I can to show that women kick ass with code! Look for my mischievous I’m-writing-infinite-nested-loops grin in the clip where I’m programming.
If this looks like fun to you, come join us at NYC Resistor (where the segment was filmed!) for Thursday night craft nights or for one of many awesome classes.
November 10, 2009 4 Comments






