I really like this programming challenge from Yelp. You know that "Search for" box on Yelp? You do a search and then you get a bunch of review fragments. The problem asks you to implement production level code to implement the algorithm that extracts the most meaningful fragment from a review (given the search terms) to display in search results - the yellow highlighted bits on the right.
What I love about this problem is that it asks for production level code, and the problem is much deeper than it might appear at first glance. There are also many different ways to skin this cat and there is no "right answer".
When I implemented this, I had to see how my code compared with the production level code on Yelp, so in the source code (see link below), you'll see some test code that compares the two. I ended up preferring the results that my code produced in a number of cases. Here's one example. What do you think?
Here is the original review in its entirety:
I've only been here once, but that one visit was enough for me to know that the pizza here is awesome. It's as good as it gets. It's just sad that they don't have a joint here in Cupertino, but then again I'd be willing to drive to SF from time to time just to enjoy their pizza. The time I went there, the restaurant was bursting with people. We had to wait for at least half an hour or even more, can't really recall. But let me tell you, it was worth the wait. We had the deep dish pizza, the thin crust pizza, the chicken wings and as we plow through the first round of pizzas, we realize that we want more, so there goes the second round. Another round of the deep dish spinach-feta cheese-mushroom pizza and other yummy stuff, we were good to go. I highly recommend the spinach-cheese-mushroom deep dish pizza. I would definitely come back and try their other pizzas. Aaaaahhhhh who's down to go grab some little star pizzas with me right now!!! haha four thumbs up (including my two toes hahaha) That's how good this place is! Oh and make sure you go with the right company - that will make the experience a golden one =DHere is the review fragment that Yelp currently produces:
can't really recall. But let me tell you, it was worth the wait. We had the deep dish pizza, the thin crust pizza, the chicken wings and as we plow through the first round of pizzas, we realize thatHere is the review fragment that my code produces:
We had the deep dish pizza, the thin crust pizza, the chicken wings and as we plow through the first round of pizzas, we realize that we want more, so there goes the second round. I highly recommend the spinach-cheese-mushroom deep dish pizza.Which review fragment do you like better?
Download a ZIP containing the source code to my solution to this problem.
Here is the original problem statement:
For yelp search we need to highlight document snippets that match a query. For
example a yelp search for [deep dish pizza] returns documents that match the
query as well as highlights that try to show why the document is relevant.
c=San+Francisco%2C+CA ). Note that highlights (1) highlight all the words in
the query and (2) are not necessarily the full document (they are
instead only a relevant snippet).
For this question you will write a function that finds the most relevant snippet for
a document and highlights all the query terms that appear in the snippet
(basically the highlights you saw on the linked search page).
It is up to you to define what constitutes a good snippet and how big the snippets
Indicate highlights by surrounding the text to be highlighted with [[HIGHLIGHT]]
For instance "Little star's deep dish pizza sure is fantastic." would look like "Little
star's [[HIGHLIGHT]]deep dish pizza[[ENDHIGHLIGHT]] sure is fantastic."
One highlighting example might be:
highlight_doc("I like fish. Little star's deep dish pizza sure is fantastic. Dogs are
funny.", "deep dish pizza") -> "Little star's [[HIGHLIGHT]]deep dish
pizza[[ENDHIGHLIGHT]] sure is fantastic."
Note that your highlighter doesn't have to have the exact same result on this
example, since *you* are defining what a good snippet is.
The solution can be in any language (but python is preferred). Attach your native
source files (.py, .java, .cpp, etc...). Hint: Write your code as if your peers may be
maintaining your code. Your code should be as if it were for a production
environment. And don't forget to include unit tests!
I would like a function with the signature similar to the following:
def highlight_doc(doc, query):
doc - String that is a document to be highlighted
query - String that contains the search query
The the most relevant snippet with the query terms highlighted.