AI alert for summarizers

Monday 19th September 2011
Taylor Berg-Kirkpatrick and fellow EECS Ph.D. student Mohit Bansal. Courtesy: Rachel Shafer Photo.

Gaberlunzie is devoutly in love with the Berkeley Summarizer. It can actually spell!! He was being idle. Well a slight amount of sunshine, a bank holiday and a missing editor was all he need to DEMO IT!

DEMO IT (on the Berkley Edu web page).

Slow off the mark, he opted first to try the word "summarize" in features.

To keep you amused while you wait breathlessly ...

...little texts appear...

Gathering links...

Scraping article.... 

Found 14.... 


Parsing ....


Then, just as you start to hyperventilate, this appears:-

Quite amazing! Gaberlunzie slyly had a second try for Summarizer and had his knuckles rapped

Then he couldn't resist seeing what the approach was to "summarising." 

Don't you just think that divine? It certainly spells better than Gaberlunzie does!!

Originators indeed are Taylor Berg-Kirkpatrick and fellow EECS Ph.D. student Mohit Bansal, 25, who have worked on an “Automatic Summarization of Mobile Search.”

The tool scans hundreds of documents online and automatically compiles a short summary from the text. The clean simple summary, fits onto a mobile phone screen. Users can lengthen or shorten this, scroll through multiple news items for top news or view summaries based on a search query. 

Qualcomm selected Bansal and Berg-Kirkpatrick as one of eight teams in May to receive its highlycompetitive Qualcomm Innovation Fellowship, which comes with a one-year, $100,000 grant. 

“We liked their project because it was closely coupled with their PhD research, but they had a greatunderstanding of potential applications and how users could benefit 5-10 years down the road,” says John Smee, director of engineering for Qualcomm’s corporate R&D group and one of the competition’s judges.

Bansal and Berg-Kirkpatrick specialise in artificial intelligence (AI) working in EECS associate professor Dan Klein’s natural language processing and machine learning group

Their project applies A.I. methodologies to what is essentially the job of a human editor or researcher but much more quickly handling 500 documents in a few seconds, but it needs to both research and rewrite.

Berg-Kirkpatrick began by building a summarization model that uses the cutting-plane algorithm. To build a democratic summary, one that covers the most common ideas found in the articles, Berg-Kirkpatrick’s model scans documents looking for intersections—repeated sentences, phrases or series of words—and pulls the common points. For example, say one of the main points identified is “Lindsay Lohan jailed.”

This figure details how the summarizer builds a summary of the document collection by eliminating unnecessary sentences, phrases and words. Those cuts are balanced against the values of the concepts covered by the resulting summary. Both costs and values are learned automatically from human-written summaries in a structured learning algorithm.

Next step is to add mathematical “features,” qualities such as how often an idea appeared across the document collection. A programmer designs these features in order to indicate to the system which aspects of a summary may be important. 

An important feature of the phrase “Lindsay Lohan jailed” is that it occurs in many of the news articles within the document set.
Special features can even be derived by measuring the frequency and context of words and phrases on the Internet.

The model scans the web and finds that “Lindsay Lohan jailed” occurs somewhat on the web, yet occurs a lot within the document collection, and therefore might be especially important for the summary.

“Lindsay Lohan jailed,” the model decides, should go in the summary. On the other hand, the concept “It was reported” occurs everywhere on the web, and hence the model recognises a throwaway unimportant phrase. These web-                             based features, based on Bansal’s Ph.D. work, will better estimate which concepts should be included in the summary.

So far, the features might produce a good summary or an error-laden one. It’s up to the learning algorithm to decide how to value features. The researchers use machine learning to train the model by giving it sample article sets and human-written summaries.

Based on these examples, the model learns how to optimally set the feature values. When presented with a new article set, the model makes generalisations from the samples to compile a new summary.
“Now you have a system that can do it forever,” says Berg-Kirkpatrick.

The EECS researchers say they will spend the next year optimising the summariser to accommodate a wide range of news topics and scaling up the model to accommodate thousands, even hundreds of thousands of documents.

In the future, they hope their project will be adopted by industry, but they themselves would like to remain in academic research. 

If anyone wants more summarizers, Gaberlunzie found Copernic (a web and desk-top search engine he lost many moons ago)  has one, Tools4Noobs also and so has Intellexer if you want to play comparisons.

Custom Search

Scotland, Computer News in Scotland, Technology News in Scotland, Computing in Scotland, Web news in Scotland computers, Internet, Communications, advances in communications, communications in Scotland, Energy, Scottish energy, Materials, Biomedicine, Biomedicine in Scotland, articles in Biomedicine, Scottish business, business news in Scotland.

Website : beachshore