tag:blogger.com,1999:blog-8540221137227899134.post2783196526837492524..comments2024-03-17T00:36:18.345-07:00Comments on A Dash of Technology: Building a search engine using Redis and redis-pyJosiah Carlsonhttp://www.blogger.com/profile/16662314724540946069noreply@blogger.comBlogger14125tag:blogger.com,1999:blog-8540221137227899134.post-57984844884252755682012-05-10T08:53:07.686-07:002012-05-10T08:53:07.686-07:00I am a couple weeks away from writing half a chapt...I am a couple weeks away from writing half a chapter on custom searching and sorting. I will try to get a preview up here before it hits the ebook.Josiah Carlsonhttps://www.blogger.com/profile/16662314724540946069noreply@blogger.comtag:blogger.com,1999:blog-8540221137227899134.post-18174692782734719662012-05-10T02:39:27.721-07:002012-05-10T02:39:27.721-07:00Would it be possible to implement custom scoring n...Would it be possible to implement custom scoring now that redis supports scripting? Any plans on writing an update?<br /><br />Thanks a lot! Very helpful.lectrenoreply@blogger.comtag:blogger.com,1999:blog-8540221137227899134.post-90874435115890108852011-05-05T09:34:17.344-07:002011-05-05T09:34:17.344-07:00That's a bummer, but I understand ...
:O(
May...That's a bummer, but I understand ...<br />:O(<br /><br />Maybe I should ask Redis for the functionality too (at a minimum I can cast one more vote for MUL functionality.)<br /><br />It would seem to be a logical addition to their ZINTERSTORE operations, and a necessity for more than one scoring calculation ...<br /><br />Thanks again for your replies and assistance!<br /><br />Shane MetlerShane Metlerhttp://www.nobullart.comnoreply@blogger.comtag:blogger.com,1999:blog-8540221137227899134.post-86170216798565122922011-05-03T08:50:35.956-07:002011-05-03T08:50:35.956-07:00The patch used to exist, but I seem to have mispla...The patch used to exist, but I seem to have misplaced it. I'll have to rewrite it (it's a short patch).Josiah Carlsonhttps://www.blogger.com/profile/16662314724540946069noreply@blogger.comtag:blogger.com,1999:blog-8540221137227899134.post-11087582199459292592011-05-03T07:55:30.359-07:002011-05-03T07:55:30.359-07:00Hi Josiah,
I wasn't sure if that patch code i...Hi Josiah,<br /><br />I wasn't sure if that patch code is available online somewhere. I tried searching around, and did find some discussions on Redis community forum (but couldn't find any patch or code.)<br /><br />Is this patch available somewhere? If not, could you forward it to me?<br /><br />I'm at shane [at] no bull art dot com<br /><br />It would be greatly appreciated, and I would really like to try it out!!shane@http://nobullart.comnoreply@blogger.comtag:blogger.com,1999:blog-8540221137227899134.post-11463004669081516372011-04-24T21:49:48.291-07:002011-04-24T21:49:48.291-07:00Hi Josiah,
Thanks for your reply.
Yes, it would ...Hi Josiah,<br /><br />Thanks for your reply.<br /><br />Yes, it would be perfect to use a sorted set for Custom Scores, and I'd love to try a MUL patch for ZINTERSTORE ...<br /><br />Out of curiosity, does MUL end up being a more costly operation than SUM? I wouldn't imagine that it could be, and it seems like MUL would be a *very* useful option for Redis to offer.<br /><br />Thanks again,<br />Shane MetlerShane Metlerhttp://www.nobullart.comnoreply@blogger.comtag:blogger.com,1999:blog-8540221137227899134.post-5014726747838108542011-04-24T16:35:40.379-07:002011-04-24T16:35:40.379-07:00@Shane
While writing the blog post, I realized th...@Shane<br /><br />While writing the blog post, I realized that one of the issues with adding Pagerank (or a few of the other additional scoring helpers) is that the patch that I had submitted wasn't accepted. The change to the algorithm that would make these alternate scoring functions work relies on the ability to use a new aggregate function: MUL (which multiplies the values that are intersected or unioned). This is applied as a last step to adjust the scores based on the individual scores of the items.<br /><br />There are some tricks that can be done by using log(value) and using the SUM aggregate, but then the algorithm doesn't work correctly in the first part of TF/IDF; where you sum multiple word scores, then multiply the total by your overall document quality.<br /><br />As an alternative, if there existed a way of applying one of a few functions to the scores in a zset (like 'log'), then there would also be a solution there.<br /><br />Do you feel like running a patched version of Redis?Josiah Carlsonhttps://www.blogger.com/profile/16662314724540946069noreply@blogger.comtag:blogger.com,1999:blog-8540221137227899134.post-87284744190214215702011-04-24T11:44:56.393-07:002011-04-24T11:44:56.393-07:00@Shane
I have a simple answer, but it's sort ...@Shane<br /><br />I have a simple answer, but it's sort of difficult to explain in a comment. I'm going to update the gist write a new post later today.Josiah Carlsonhttps://www.blogger.com/profile/16662314724540946069noreply@blogger.comtag:blogger.com,1999:blog-8540221137227899134.post-31071584000791128962011-04-22T13:52:01.396-07:002011-04-22T13:52:01.396-07:00Hi there,
Quote "With a few small changes to...Hi there,<br /><br />Quote "With a few small changes to what I provide, you can integrate your own document importance scoring, and if one of my patches gets merged into Redis, you could combine TF/IDF with your pre-computed Pagerank..."<br /><br />I have this same implementation running as a search feature using Redis & PHP. It works great for TF/IDF ordered results. Super fast!<br /><br />Can you please elaborate on how to incorporate custom scoring or PageRank into the equation?<br /><br />I don't really see how to accomplish this in any elegant way ... ?Shane Metlerhttp://www.nobullart.comnoreply@blogger.comtag:blogger.com,1999:blog-8540221137227899134.post-2877859613292490892010-10-11T23:15:19.817-07:002010-10-11T23:15:19.817-07:00The underlying ZUNIONSTORE command is...
ZUNIONST...The underlying ZUNIONSTORE command is...<br /><br />ZUNIONSTORE destination N key1 key2 ... keyN WEIGHTS weight1 weight2 ... weightN<br /><br />We call it via conn.zunionstore(destination, {key1:weight1, key2:weight2, ...}) .<br /><br />Within Redis, the weight for a given key is multiplied by the scores of the entries in the zset before it is unioned with the other zsets.<br /><br />For example, if you have a zset 'key1' -> {'name1':1.25, 'name2':2, 'name3':1.75} and you passed {'key1':4} to the ZUNIONSTORE command, then 'key1' temporarily becomes {'name1':5, 'name2':8, 'name3':7}.Josiah Carlsonhttps://www.blogger.com/profile/16662314724540946069noreply@blogger.comtag:blogger.com,1999:blog-8540221137227899134.post-81725786311507963692010-10-11T21:51:10.683-07:002010-10-11T21:51:10.683-07:00I dont understeand how zunionstore works... how ca...I dont understeand how zunionstore works... how can I view the multiply part? What it's doing ?Rodrigonoreply@blogger.comtag:blogger.com,1999:blog-8540221137227899134.post-15250069755270104942010-07-06T17:31:25.659-07:002010-07-06T17:31:25.659-07:00FYI, Blogger is having problems with their comment...FYI, Blogger is having problems with their comments, which is why Jason and Salvatore's comments are not currently showing.<br /><br />http://blogger-status.blogspot.com/2010/07/were-currently-working-through-multiple.htmlJosiah Carlsonhttps://www.blogger.com/profile/16662314724540946069noreply@blogger.comtag:blogger.com,1999:blog-8540221137227899134.post-47172645071485270072010-07-06T10:14:50.680-07:002010-07-06T10:14:50.680-07:00@Jason Adding a word to the set/zset keyed by the ...@Jason Adding a word to the set/zset keyed by the metaphone of the word is usually sufficient to get you 80-90% of the way towards great suggestions (set for when you don't care about occurrences, zset when you do). Throwing in 1 or 2 words of context can take you to 90-95%, at the cost of complexity, space, lookup speed, etc., and depending on your corpus, may require throwing in parts of other datasets to be good (like the Google 1T 5-gram corpus). Recently, I've erred on the side of simplicity, so I keep myself from over-optimizing the suggestions (as there are usually much bigger fish to fry).Josiah Carlsonhttps://www.blogger.com/profile/16662314724540946069noreply@blogger.comtag:blogger.com,1999:blog-8540221137227899134.post-49949902855391384142010-07-05T18:57:01.097-07:002010-07-05T18:57:01.097-07:00Nice. Have you implemented any "did you mean....Nice. Have you implemented any "did you mean...?" style suggestions for querying? In the past, I've applied metaphone to suggest similar words (and then going into things like edit distance between metaphone hashes, and the document co-occurrence of those suggested words with the others in the query string, etc). Though I suppose at that point you're in the realm of highly diminishing returns for that volume of tweaking, search suggestions are a nice usability improvement.Unknownhttps://www.blogger.com/profile/10495249785951829098noreply@blogger.com