Cameron Parkins, June 9th, 2008
Lingro is a project that aims to create an online environment that will allow anyone, in reading a foreign language website, a quick and easy means to translate words they don’t understand. Simple in concept, yet profound in implication, Lingro (which we have blogged about twice before) uses open dictionaries and user-submitted, CC BY-SA licensed, definitions to expand its ever-growing database. We recentlly caught up with co-founder Paul Kastner and were able to discuss in-depth the philosophies behind Lingro, how it accomplishes what it does, how it uses CC licenses, and what its future holds.
What is Lingro’s history? How did it get started? Who is involved?
The idea to create a new kind of on-line dictionary which would help people learn languages was conceived by my co-founder, Artur Janc. A few years ago, Artur was practicing his Spanish by reading Harry Potter y la piedra filosofál. He had taken all the advanced Spanish courses at the university where he was studying, and like most students, had a good grasp on the grammar and core vocabulary of the language. When he started reading, he found that while he could understand the structure of the writing, there were so many words he hadn’t come across before that he was spending more time looking up words in a dictionary than actually reading!
Artur thought there must be a better way, and built a prototype of what would become Lingro, allowing him to look up words in a document he was reading just by clicking on the word. This was a huge improvement in terms of speed and reducing distraction compared to the usual method of looking up words in a dictionary. He also built a flashcard game, which let him review the words he looked up while reading after he was done. We realized that this tool could be useful for other people as well, so we set out to build a version that anybody could use, with many more languages than the original English dictionary. We got a lot of help along the way from Holmes Wilson, one of the people behind Miro, the free open-source video player as well as downhillbattle.org, promoting a fairer music industry.
That’s only half the story – since we launched in November 2007, there’s been a really great community developing around the site. Lots of people have been contributing translations and expanding the dictionaries. We’ve also gotten a ton of feedback on the tools, which has been invaluable as we build and expand them. All the support we’ve gotten has been really flattering, and the project definitely wouldn’t be anything close to what it is now without all of the dedicated people who have contributed.
Lingro takes aim at a problem that has plagued those in web development for some time – making their sites readable by people who don’t speak the site’s native language. You have already developed functional dictionaries for a bevy of different languages. What kind of tools does Lingro employ to tackle this problem, both in pooling definitions and making these definitions usable by both web-developers and site-visitors alike?
There’s a big gap in language learning between what’s taught in a classroom and what you need to know to use a language in day-to-day activities. Some people will travel to a country where the language they’re learning is spoken in order to reach a sufficient level of fluency. Short of that, there are surprisingly few ways for people to get support after they’re done with their coursework.
Lingro aims to fill this gap by giving everyone quick access to translations while they’re reading foreign-language web pages and documents, right at the moment they need to know what a new word means. When you’re using Lingro to read a web page, you can click on any word in the text to bring up a translation on the same page. This eliminates the need to move away from what you’re reading and go to a separate dictionary site, or thumb through a paper dictionary.
When we started searching for content for Lingro, it was really important to us from an ethical standpoint to use open dictionaries. Language is one of the basic, common, and deeply necessary aspects of humanity, and to have the core information about it controlled by a few large publishers seems wrong on many levels. Especially in this age, where the growth of society is driven by global interaction and cross-cultural communication, the means of communicating across language barriers need to be as accessible as possible.
We set out to start assembling open dictionaries to include in Lingro, but we found they were, well, a mess. A lot of people have done some really great work building open dictionaries, but their efforts are scattered across many different sites and projects. What’s more, most of these dictionaries aren’t machine-readable. A big contributor to the success of Creative Commons licenses has been making it easy to include machine-readable data along with the work being licensed. This allows works to be easily found through search engines by someone looking to reuse it. Flickr is a great example of this – when you’re searching for a particular photo, you can specify your requirements for license terms, and the results will show only those photos that fit your needs.
Imagine how hard it would be to do this kind of search if every flickr user had a different way of specifying the licenses of their works. This is pretty much the way it is with the open dictionaries out there. The way a dictionary for one language pair, say German to English, encodes information such as translation text, part of speech, noun gender, etc. is usually completely different from the way another language pair does it. This makes it nearly impossible for a project like Lingro (which would eventually like to support translating from every language to every other language) to incorporate dictionaries from multiple sources.
To overcome this, we’ve been writing software that takes in this mish-mosh of different dictionary formats and puts out dictionaries in a clear, simple, machine-readable format. We then load them into Lingro’s back-end so that people can access all the dictionaries through a common interface.
The process doesn’t stop there. Once the dictionaries are loaded into Lingro, we encourage users to contribute translations to continue expanding the dictionaries. We’ve put a lot of effort into creating the Lingro dictionary builder which helps people easily add translations and definitions. Once someone has chosen a language pair they’re fluent in, the builder shows them a list of words missing from that particular dictionary, ordered by how common they are in the language (the word “the” would be near the top, while “onomatopoeia” is further down). They can also see sentences showing the words used in context to help recall the meanings. These are the same kinds of tools used by the big publishers to create their dictionaries – we’re not just opening up the dictionaries themselves, we’re opening the entire process of creating them.
We’ve also created tools for webmasters of other sites that allow them to directly access Lingro’s dictionaries. Anyone can add Lingro’s translation pop-up translations to their pages, which is a really great way for sites with a big international audience to make it easier on their readers. We’ve also built a tiny search-as-you-type dictionary that webmasters can include on their pages. These tools further Lingro’s mission of making translations as accessible as possible for as many people as we can.
Part of Lingro’s core is user submitted, CC BY-SA licensed, word definitions. Why did you choose to go with CC licensing (and specifically CC BY-SA)? Have you found CC licensing to be a good fit for what Lingro is attempting to accomplish? How do CC-licensed definitions compare to those pooled from other resources? Has there been any unique instances or anecdotes you can think of that were enabled by CC-licensing?
Just as the the formatting of all the open dictionaries is a mess, the licensing landscape is just as convoluted. Most of the dictionaries out there were started before the creation of Creative Commons licenses. Some dictionaries, like Wiktionary, use the GNU Free Documentation License (FDL), while others, such as the XDXF project, use the GNU General Public License (GPL). Even worse, some have no formal license at all, and we’ve had to get in touch with some of the authors to ask permission to include their dictionaries in Lingro.
When we were starting out, we were fortunate enough to get in touch with Lawrence Lessig (founder of Creative Commons and generally recognized as the foremost expert on cyberlaw) about the problem. He recommended that we dual-license all the new user contributions under the CC BY-SA license and the GNU FDL. This allows us to contribute our user translations back to existing projects like Wiktionary, while also making them available under the much easier to understand terms of a Creative Commons license.
We especially like the CC BY-SA because it ensures that the content created on Lingro will be free forever. Anyone building on the work our contributors have done will be able to freely share it with the community in the same way. This freedom is central to the creation of a commons of knowledge and allowing people to collaborate across cultures.
Since we made the decision to use the CC BY-SA, there have been some really exciting developments in the world of content licenses. Back in December, Wikimedia (the parent organization of Wikipedia and Wiktionary) announced that the board had passed a resolution to work with the Free Software Foundation on updating the GNU FDL (used by Wikimedia) to allow for migration of their content to the Creative Commons BY-SA. This is an important step because the FDL was never designed for projects like Wiktionary; it was written with the intent of licensing software manuals. The CC BY-SA, in contrast, was designed for projects just like Wikipedia (and Lingro as well), which have a strong emphasis on collaboration. The move to the CC BY-SA license means that people will have a much easier time knowing their rights and restrictions when reusing Wikipedia content.
That said, I’ve been somewhat disappointed with the lack of progress since then. There was a good deal of fanfare surrounding the original announcement, but it’s been more than half a year since then and we’ve barely heard another word about it from the organizations involved. Great things can happen when organizations which share such similar philosophies come together for a common goal, so I hope Wikimedia, the Free Software Foundation, and Creative Commons all continue their collaboration to make this a reality!
Lingro always seems to be adding new features and functions to its already long list of amenities. What is next for Lingro? Is there anything else you’d like our readers to know?
We’re working on some really cool study tools which will help people review the words they’ve translated. On the educational side of things, one of the unique aspects of Lingro is the ability to provide a personalized learning experience, which is really necessary at the more advanced levels of language learning. Lingro keeps track of the words you look up while reading so that after you’ve done with the web page or document, you can use study tools, such as the flashcard game and sentence history page, to help you review these words. We’ve got some big improvements and additions to this part of the site planned for the coming months!
Another part of the project we’re developing very actively are the tools for webmasters and language learning sites. We’re creating easy ways to include Lingro’s translations both as full-featured pop-ups as well as dictionary widgets. We’re also making a flexible dictionary API so that people can come up with new ways of reusing the content. What we’re shooting for is reaching a point where anyone looking to include translation capabilities on their site can decide to use open dictionaries not just because of ethical considerations, but because of the high quality and ease of use of the dictionaries.
We’re also working with volunteers to add more dictionaries to Lingro, especially the widely-spoken languages such as Chinese. There’s a lot of political tension between the West and China right now, and as with all international disputes, one of the keys to resolution is communication. By bringing together tools and dictionaries to help people communicate across language and cultural boundaries, we think Lingro is able to make the world a better place for all of its inhabitants.
If you would like to get involved with the project, please feel free to e-mail me at paul AT lingro DOT com. We’re always looking for more volunteers to help expand the range and depth of the dictionaries available through Lingro.