I Broke Justin.tv
I broke Justin.tv on my 2nd day on the job. I was fresh out of college, and I’d never done any real programming work. I typed video33 instead of video3 in a console. This small mistake shutdown a production server with thousands of streams. Damn! Three weeks later, I failed to remove daemon init scripts from a re-purposed server, and unleashed a pack of rogue daemon processes on the site. (Rogue daemons on the loose! Unholy apocalypse!!!) Finally, a few months into the job, I wrote “break” where I meant “continue,” and caused our payments servers to shutdown one-by-one over the course of several weeks.
These were not my brightest moments as a programmer. Still, I think that these examples show some of the best things about working at Justin.tv: 1) We work fast, 2) we push code often, and 3) we throw new engineers into real projects and expect them to do real work.
This culture has allowed us to build our own global live video CDN with a capacity of hundreds of gigabits per second, to archive and transcode all of this video, and to create the Internet’s largest live video community in just a few years — all with under 20 engineers. Pushing code fast and often does have a cost, but the benefit in productivity is well worth it.
Beyond productivity, our coding culture is rewarding. When I interviewed at Justin.tv, I was also considering a bunch of larger tech companies. What stood out at JTV was the scrappiness. No RFID badges or red tape here. I was interviewed in a storage closet, between a drink cooler and a broken chair. I remember Emmett telling me during my Interview that the company had more systems than engineers, and that I would thus be fully responsible for at least one critical system. This was scary at the time, but also exciting. And it’s why I chose JTV over my other offers.
It was a great way to start a career. My first assignment was to come up with a better content replication algorithm that would allow us to expand to multiple data centers. I spent a week trying different things, while learning Python and trying to get a handle on the codebase. After I had a working algorithm, I put together a simulation to show that it was better than what we had before. The first attempt to push it to production hosed the DB. But after I’d optimized the queries and re-pushed, it worked. Three weeks out of college, and I’d just re-written the core content replication algorithm of the site.
After starting, I went to work full-time on the video back-end. We added data centers across the US, built real-time transcoding, developed an iPhone viewer, and fended off DOS attacks. After 8-months, when my teammate decided to move to rails work, I became the lead video engineer. Finally, six months ago, Socialcam was born, and I moved to that team. We spent two months designing protocols and interfaces, working hard to bring Socialcam 1.0 to the iPhone App Store and Android Market. Socialcam is off the ground now – yay for 1M downloads!, but we’re still in early product mode, making big changes with every release.
Things are not quite as wild-west at Justin.tv as they once were. We now have QA (Thanks B!), monitoring and deploy management. The office is nicer (interviews no longer take place in the storage closet). The site goes down less often now (editor’s note: It never goes down now). But we still have way more critical systems than engineers, and we push code to production tens of times every day. And I’m still looking forward to what’s coming.