Google would like to know how it will change your life? So
Thursday to "Ouagadougou" conference room see. Mountain View,
California, this conference room, dozens of engineers, product
managers and executives gathered to discuss how to make Google's
search engine is more intelligent. This year, Google will be its
legendary search engine algorithm is improved by about 500, and
each one must be improved through such a decision.
Once a week "search quality, the General Assembly", will likely
affect the Google search results - "Samsung SF-755p Printer",
"MySpace Layout Code," or even "the capital of Burkina Faso", the
country's capital to coincide with the Google this conference room
with the same name. Since 2006, has served as head of Wudimanbo
search Google (Udi Manber) any person in charge of the meeting. The
meeting will be the implementation of 11 proposed improvements also
will discuss the past few months in different countries, different
languages, environment test results. Divided into two columns show
on the screen, change before and after the same query to return
results. In the discussion "guitar voice simulation" of the query
results, Mambo exclaimed: "This I found too."
You might think that Google has dominated search engine market
as long as a decade, should be able to sit back and relax. After
all, Google's search engine market share as high as 65%, or even
the word Google has become synonymous with search. Like Google
unwilling to stop before the title, like its competitors do not
want to throw in the towel easily. Over the years, Google has been
using its mystery, it appears all-knowing algorithm to "organize
the world information." But the last five years, many companies are
beginning to Google's an important point challenge.
Google believes that a single search engine, through technical
innovation and continuous improvement to meet any search request.
The Facebook was launched earlier this challenge, that some people
prefer to access information through a friend, rather than a no
name algorithm formula. Twitter can continually updated from time
to time analysis of messages, which makes "real-time search" has
become reality - Mining the ongoing discussions or chat.
Shopping comparison sites Yelp through the mass-to-business
level assessment, to help people find restaurants, dry cleaners and
baby-sitters and other related information. Although these can not
be singled out rising star pose a threat to Google, but they
indicate a common search industry, an open, chaos in the future -
is not dominated by a search engine, but with a rich diversity of
services.
The challenge will be from the
However, Google's biggest threat to Microsoft's will be (Bing).
Will be reminiscent of the exploration of the legendary U.S. singer
Bing Crosby, and "The Sopranos" in the Bada Bing nightclub, this
renovated, was given a new brand of search engine launched in June
last year when they won the optimistic assessment. "Wall Street
Journal's" claim "than Google more attractive." A new look and 1
million advertising campaign to make Microsoft's U.S. search market
share jumped from 8% to 11% - if the necessary regulatory approval
should be Yahoo's search service providers, this share will
increase more than doubled.
Must be team has been focused on the Google algorithm is unable
to meet demand. For example, Google specializes in search public
network, but could not real-time tracking the ever-changing flight
schedules and ticket prices. So Microsoft bought Farecast site, the
site to track the changes in ticket prices, and to predict the
fare, according to Change. Currently, Microsoft has Farecast's
technology will be added to the search results. Microsoft, in its
view that Google algorithm has no advantage, such as health,
shopping areas, have also made similar acquisitions.
Even if the team will be recognized in accordance with search
terms to return useful information, Google is also far ahead. But
they still believe that if the will should be able to provide some
areas of expertise, users will gradually accustomed to using will
be to conduct a specific search. Vice president of Microsoft's core
search Brian McDonald (Brian MacDonald), said: "algorithm for the
search engines is critical, but it is not everything, just like
your car not only because of its engine."
Google is still the most "intelligent" search engine
An interesting example of "mike siwek lawyer mi" indicates
Google will be relative to the advantages.
阿米特辛格 (Amit Singhal) is Google's chief engineer, 40 years old,
gentle character, he was convicted in 2001 rewrote the Google
search engine to obtain awards. He will enter these words Google
search box and hit the Enter key. In a very short period of time,
the search results will be displayed. The top of the page gives
links to Grand Rapids, Michigan, a lawyer named Michael Siwek.
This is a very general search, Google every day to deal with
tens of thousands of such searches. But the fact is that the search
process is very complex and may cause misunderstanding of some
search engines. If these words will be entered into the first
result is that the U.S. National Football League roster over the
years, including one named Lawyer Milloy. Search results in the
following pages, and no lawyer Siwek-related content.
This contrast shows the power of Google algorithm, or even can
be said that intelligence, which is achieved through repeated
amendments. Seems Google has a magic power to interpret user needs
- no matter how uncommon a search, or there are spelling errors.
This capability is called the Google search quality, and improved
algorithms for many years been trying to generate accurate search
results.
Now I sit with Singh, Google's 43 office buildings, as Google
gives me an unprecedented opportunity - let me know how they ensure
the quality of search. The meaning behind it is very clear: You
might think that method is only one engine, but lifted its veil of
secrecy only way you can find that it is how all-powerful.
Innovative start: PageRank
Google algorithm begins with PageRank, which is 1997 Larry
(Larry Page), when a graduate student at Stanford University
developed. Page's innovative thinking is: Based on the number and
importance of the links into the pages of the ratings, that is, the
collective wisdom of the network to determine which site is most
useful. With Google quickly became the most successful Internet
search engine, Larry Page and Google's other founder, Sergey Brin
(Sergey Brin) will be the simple concept of PageRank as Google's
most fundamental innovation.
But this is not the whole story. Mambo said: "People trust
PageRank because it can be confirmed, but to provide the most
useful results also require additional technology." This involves
some signals, the use of context, so that for any query, search
engines can be The most useful results will be ranked first.
Web search is a multi-party process. First of all, Google robots
to obtain the contents of each can access the site. These data will
be broken down into an index (via text organization, like the
book's catalog), so that you can find any pages based on content.
Whenever a user types a query, Google will search for relevant
pages in the index, and then return a page that contains a list of
up to several million. Is a list of the most complicated sort, that
is, to decide which pages should appear at the top.
At this point, there will be some handy context. All search
engines will be the introduction of context, but there is no one
like the Google so as to introduce too many applications so freely.
PageRank itself is a signal, but also an attribute of the page
(referring to its importance relative to other pages), the property
could help determine their relevance with the query, some of which
signal now seems obvious.
All along, Google algorithms are on the title of the page and
give them special attention, so the title became an important
signal to determine relevance. Another important technology is the
anchor text, referring to the hyperlink in the visible text.
Therefore, "when you search, the search engine can always give the
right of the page, even if the page you are looking for the key
word is not." This is an early Google architect 斯科特哈桑 (Scott Hassa)
point of view, He has worked with Larry Page and Sergey Brin at
Stanford with the work. After the search engine signal of concern
include freshness (for some queries, the newly pages more valuable
pages earlier), and geographical location (Google searchers know
the approximate geographic coordinates of local information will be
among the top surface) and so on. Google currently uses more than
200 kinds of signals to help determine the ranking of search
results.
Google engineer discovered that some of the most important
signal that might come from Google itself. PageRank will be
implanted in the popularity of the search engine: thousands of
websites democratically decide which sites are linking to. However,
Singh said that Google engineers also made use of another democracy
- a Google search on hundreds of thousands of users. Users to
search data generated in the process proved to be equally valuable,
these data, including what they click on results, when the right
keywords are not satisfied with the changes, the query keywords and
the relationship between geographical location. The most direct
example of this process is that Google calls "Personalized Search"
- This is an optional feature, using the user's search history and
geography to determine the content he wants to find (need to be
logged in to use this feature Google account). More usual approach
is to Google to use its collection of large amounts of data to
support its algorithms, Google has deep understanding of this, you
can interpret the intention of hidden complexity of the query.
"Hot dog" and "boiled dog"
Google synonyms in order to determine the method as an example.
Singh said: "We have long had an interesting finding, that is,
users will change the query keywords. For example some people will
search 'dog', and then into 'puppy', so search engines will be
informed that 'dog 'and' puppy 'may be interchangeable. engine will
know, when you burn (boil) water, the water becomes hot (hot). We
have to learn the semantics of human beings, this is a big step
forward. "
But there are obstacles. Google's synonym for the system know
that "dog" and "puppy" is similar to boiling (boiling) water is hot
(hot). But it will also consider "hot dog" (sausage sandwich bread)
and the "boiling puppy" (boiled dog) is the same. By Ludwig
Wittgenstein (Ludwig Wittgenstein) theory, the problem was solved
in 2002. The theory involved in the context of how to determine the
meaning of words. When Google crawls and store hundreds of millions
of documents and web pages, it will be an analysis of what words
are adjacent to each other. Contains the "hot dog" page usually
contains the "bread", "mustard", and "baseball", and not including
the contents of hybrid hunting dogs. This helps search engines
understand the "hot dog" and other tens of thousands of words mean.
Singh said: "Now, the search engine will know that bio in the
'Gandhi bio' is 'biography' (biography) and abbreviations, and in
the 'bio warfare' is 'biological' (biological) abbreviation."
In Google's development process, the company's continuing effort
to add new sorting signal, and the balance will not affect the core
of the user experience. Every few years they make a major change to
the system (a bit like a new version of Windows), which is well
known in the Mountain View, but others will not be concerned
about.
Singh said: "Our task is to frame this in the 1000 kilometers
per hour and 30,000 feet high altitude aircraft to make fundamental
changes to the system." In 2001, in response to the rapid
development of the Internet, Singh in fact, completely rewritten
Page and Brin's algorithms, so that Google can quickly add a new
signal (the new one of the signals can be distinguished business
pages and non-commercial pages, so as to provide better results
shoppers), also in that year, a name is called Kelis naber Halat
(Krishna Bharat) that the authority of the engineers on the website
link should have a greater weight, so designed a powerful signal in
order to give greater credibility of these links (which is Google
The first patent), the latest of a revised code-named "caffeine" on
the whole system was changed, making it easier for engineers to add
new signals.
Google identified the process of semantics
Google in order to be good to encourage these innovations is
known for the company will host an annual "crazy creative search"
internal show in order to encourage those strange but there are
potential applications of innovation. However, most of the time,
the improved process is full of hardships, the need for sense of
determination, to face the process of trying to blow. There is an
unsuccessful search has become a legend: in 2001, Singh was
informed that enter the "audrey fino" when you can not be found to
the desired content, but only to return some commend Audrey Hepburn
(Audrey Hepburn) in India web page, because in India the phrase
"fino" is a good meaning. Singh said: "We know audrey fino is a
personal name, but our system did not so smart."
This failure to make Singh spent many years trying to improve
Google search results on the name - because the names of the total
search volume of up to 8%. In order to solve this problem, he had
to master the "bi-gram split", that is, the number of words divided
into separate units. For example, "new york" together to form a
bi-gram, referring to New York. But there are also three words of
the situation, such as "new york times", meaning the New York
Times, it is obvious the two do not mean the same thing. If the
user enters the "new york times square", meaning they became the
New York Times Square. Can easily make a distinction between human
beings, rather than by manual control Google, which relies on an
algorithm.
"Mike Siwek" This is a Google search can explain how to solve
this problem. Singh entered the order to show the code, we can see
that the signal is how to sort the search result: The bi-gram can
determine the mike siwek is a name, lawyer is a synonym, mi is a
place names. Singh said: "From the engineer's point of view
deconstruction, the system will be split these words, it will find
that lawyer is not a family name, siwek is not a middle name. At
the same time lawyer is not a town in Michigan, so it is a synonym
for attorney."
This is a Google search never gained a number of valuable
knowledge. Stone can be a "rock", can be a "stone", also can be a
"boulder" (boulders). If the user enters "rokc", Google will know
that he was looking for "rock". However, if "rokc" preceded by
"little", Google will know that it is "Arkansas" (Arkansas) in the
capital. "Arkansas" is the abbreviation for "ark", Noah's Ark with
the same shape, but Google will be to distinguish between the two.
Singh said: "The search of the most important is to understand the
user's intent, so you are not matching words, but in the match
meant."
Google has been continuous improvement. Recently, Google
engineers 莫琳海曼斯 (Maureen Heymans) found a "Cindy Louise Greenslade"
search results problem. When a user enters those words, the
algorithm would think that they should look to a person named Cindy
Louise, so in the city of Garden Grove, California, to find a
psychologist, did not get the name of "Cindy Louise Greenslade"
people's website on the results of the top ten. Heymans found that
this is because "Cindy Louise Greenslade" the name used to be
abbreviated as "Cindy L. Greenslade". She said: "Our search engine
should be a little smarter." Then she added a signal, is used to
find the middle name initials. Now the correct results have been
ranked in fifth place.
Innovative,
At any time, Google and efficient operation of the test system
will be carried out dozens of such improvements. Google around the
world specially hired hundreds of people, these people sitting at
home in front of the computer to determine the changed result is
better or worse. But Google also has a larger test group, which is
tens of thousands of Google users who unwittingly joined the
quality of this long-term experiments.
When engineers want to test a technical adjustments, they will
be a small group of random users to run these algorithms, while the
vast majority of users play the role of reference groups. Changes
need to be tested too much, so Google is only giving up one test a
technical adjustment strategy. Search quality engineer Patrick
Riley (Patrick Riley), said: "In most search process, you are at
the same time, several 'experimental group' and 'reference group'
being." But he then made a correction: " In fact, all of the search
have been involved into the experiment, so each time a user using
Google search, they made a 'mouse'. "
This flexibility - to increase the signal, change the code, the
ability to immediately test - that is why Google team, saying that
they can react to will be, Twitter and Facebook, the reasons for
any challenge. In fact, in the past six months, Google had more
than 200 improvements, some of which seem to imitate (but not
beyond) its competitors (Google said that it was a coincidence,
saying that for years has been adding new features) One of them is
real-time search.
Page said Google should be a few months ago every second search
the entire network, thus making this feature is expected. When a
user searches for time-sensitive nature of the topic, the Google
results page 10 there is a blue link, "the latest results" box.
Drag the box with the article shows from the news media, blog and
Twitter and so on to get the latest content. Similarly, Google used
the signal to ensure that the most useful tweet (Twitter posted a
message) appears in the real-time information box.
In addition to real-time search, Google also introduced a new
feature, called "Goggles". This functionality can be photographs
taken by mobile phone users as a search request. Google has been
trying to search into an act at any time, "Goggles" is also a part
of that effort. With the camera and voice recognition, smart phones
will become your eyes and ears. If one can find the right signals,
anything can be turned into search request.
The root causes of success: hiring the right people
Google's powerful computing power and bandwidth offers
indisputable advantages. It is claimed that advantage to make their
start-up companies could not challenge. However, Mambo said that
Google be a leader not just because of its infrastructure. He said:
"The most, most, most important factor is that we hired the right
talent."
According to all standards, Lu Qi is considered as a suitable
talents, he was a 48-year-old computer scientist. Yahoo has been
working with the Mambo Lu Qi said: "I cherished the highest respect
for him." But early last year, Lu Qi left Google, as Microsoft will
be going to Team Leader. When asked about his mission, when,
dressed in jeans and T shirt will be a diminutive Lu Qi carefully
worded to softly replied: "We need always to remember that this is
a long journey, it is extremely important." His eyes China reveals,
"I will not leave" the look, like the movie "Kill Bill" in the Uma
Thurman (Uma Thurman) the same.
In the past decade, has won the browser war, Microsoft, in the
search seems to cherish the "gentlemen's revenge, not later than
ten years" idea, because it was convinced that users of Google
algorithm also needs something else. Microsoft search development
director Halishamu (Harry Shum) said: "If we do not change the
algorithm, it will be difficult to contend with the current winner,
but we intend to algorithm improvement."
However, even if the algorithm will respond to improvements,
Google is likely to make the same changes. This is the Google
become such a formidable opponent of the reason, it has developed a
nimble enough machines, so you can learn anything that might pose a
threat to its innovative - while providing a quality opponent can
not match the search results. Anyone can buy air tickets to invent
a new way, but only Google knows how to find Mike Siwek.
(Chin-liang)
Google algorithm is a work in progress - constantly adjusted and
improved to provide higher quality search results. The following is
PageRan increase after the release of some of the major items and
changes. - Steven Levy
Attached: Google Search Events
September 1997: Backrub Search Engine
Backrub search engines at Stanford University's servers in
operation for almost two years later renamed Google. Its
groundbreaking innovations: In accordance with a site is linked
(the network link to the site's link) the quantity and quality of
search results to sort.
August 2001: New Algorithm
Search algorithm was completely rewritten to make it easier to
add a new sort criteria.
February 2003: Local Area Connection Analysis
This feature gives an authoritative site links more weight, but
also so that Google was the first patent.
The summer of 2003: Fritz
The project so that Google can continue to update the index at
any time, without batch updates.
June 2005: Personalized Results
Users can choose to have Google analyze their search behavior in
order to provide personalized results.
December 2005: Bigdaddy
The engine has been updated so that it can crawl Web content
more widely.
May 2007: Universal Search
In Image Search, Google News, Book Search, based on the GM
search so that users can get the same search results page on the
different media content.
December 2009: Real-time search
Real-time display on the Twitter and blog updates.