|
|
Welcome to the Invelos forums. Please read the forum
rules before posting.
Read access to our public forums is open to everyone. To post messages, a free
registration is required.
If you have an Invelos account, sign in to post.
|
|
|
|
Invelos Forums->General: Website Discussion |
Page:
1... 5 6 7 8 9 ...26 Previous Next
|
goodguy's Credit Lookup Plus |
|
|
|
Author |
Message |
Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | Quoting AiAustria: Quote: From my side there definitly is interest on this topic: getting rid of IE and all its legacy stuff should be a goal for everyonye... Ok, so now is the best time if you have suggestions from what you see here. I will try to do a little video before I go much further, and get more suggestions before "finishing". Oh, if you have come across a particularly interesting one - complexity and/or processing time, give it to me, and we can compare. Oh, and: (1) Is there a standard way of entering a CLT search in a single text box, or is FN , MN, LN, BY in separate boxes OK? (2) Is there a way to specify variants in a single search, or are the searches usually separate? (e.g. zhang zhi, zhi zhang, zhang ziyi) ? | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | If a complex search is needed, my code is already structured to loop on name variants once inside the bowels of a profile. So, I would accept a proposal for (or explanation of an existing one) method for specifying the syntax, and I would appreciate a tested Regex expression that parses the text into matches so I can just index into sets of FN,MN,LN,BY to use as variants. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: May 19, 2007 | Reputation: | Posts: 5,715 |
| Posted: | | | | Quoting mediadogg: Quote: (1) Is there a standard way of entering a CLT search in a single text box, or is FN , MN, LN, BY in separate boxes OK? (2) Is there a way to specify variants in a single search, or are the searches usually separate? (e.g. zhang zhi, zhi zhang, zhang ziyi) ? (1) I never thought about that. - I usually refer to IMDb for the existing name variants and copy/paste them to the CLT tools. This works rather well, because the CLT tools only offer a single imput field. I don't know, if it is usefull to seperate the name into F/M/L, because the CLT does the oposite. It equals First Middle//Last with First/Middle/Last... But it would definitly be a valuable piece of information, if existing other parsings in our CLT are listed (-> Zhang Zhi example). I don't think, entering a BY is usefull, because the BY is only valid with a single name variant; and usually not all profiles with this name variant include the needed BY (many profiles simply are outdated)... - The Invelos CLT tool completely ignores the BY - Goodguy's tool searches without any BY and offers to filter the result list with the found BYs. (2) The existing tools do no parsing at all: one single line of input. | | | Complete list of Common Names • A good point for starting with Headshots (and v11.1) |
| Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | Quoting AiAustria: Quote: ... I don't know, if it is usefull to seperate the name into F/M/L, because the CLT does the oposite. It equals First Middle//Last with First/Middle/Last...
Eventually though, somebody has to decide, because the underlying database has three fields plus BY. Now that I have recoded to let the user decide, I might leave it that way and shoot my video and see how you like it. I think I am basically doing what Goodguy did: first run the CLT tool (you can't beat Ken's speed because his is native, and scrape the web pages for profile IDs. Then Loop on the profile IDs, grabbing the credits from the online database. And just as you say he does, this is where I filter for BY.) The web browser I use is built into the plugin, and I am sure that is IE based. But it runs wherever DVD Profiler and the plugin runs. You can use whatever browser you want for your other work. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | Can someone give me some sample timings for CLTPlus results? How long does it take before you can export the data for John Wayne, or Tom Cruise, or any other example. I want to compare with what I am getting with my code.
Oh, I see some earlier. 15 min for Clint Eastwood. So that's my bar. I'm not that fast yet, but I am displaying a progress and other stuff. UI output really slows things down, so I will give it a try with minimal display output. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | Well my Clint Eastwood took about an hour on my machine. 18 minutes for the initial scraping to get the list of profiles, and the rest for scanning the XML for credits. I found over 6,000 due to the fact that Eastwood often "does it all". Here is an example: Quote:
<CLTCredits> <CLTCredit type ="cast" Episode="" GroupName="" FirstName="Clint" MiddleName="" LastName="Eastwood" BirthYear="0" CreditType="cast" CreditedAs="" Role="Luther Whitney" Voice="False" Uncredited="False" Puppeteer="False"/> <CLTCredit type ="crew" Episode="" GroupName="" FirstName="Clint" MiddleName="" LastName="Eastwood" BirthYear="0" CreditType="crew" CreditedAs="" Role="Director" Voice="False" Uncredited="False" Puppeteer="False"/> <CLTCredit type ="crew" Episode="" GroupName="" FirstName="Clint" MiddleName="" LastName="Eastwood" BirthYear="0" CreditType="crew" CreditedAs="" Role="Producer" Voice="False" Uncredited="False" Puppeteer="False"/> <CLTCredit type ="crew" Episode="" GroupName="" FirstName="Clint" MiddleName="" LastName="Eastwood" BirthYear="0" CreditType="crew" CreditedAs="" Role="Composer" Voice="False" Uncredited="False" Puppeteer="False"/> </CLTCredits>
By the way "type" and "CreditType" are duplicate attributes. I will be removing "type". I will experiment with speeding things up and also running at least two name variants in parallel. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | I have made a video first look at a plugin I call "CLTBoss". Actually it is a menu option off BulkEdit, as a convenience to me, because supporting a separate plugin takes more time than I can give at the moment. Any ideas for enhancements or changes are welcome. After publication, I will be more reluctant to consider changes other than bug or performance fixes. First Looks at CLTBoss. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. |
| Registered: May 19, 2007 | Reputation: | Posts: 5,715 |
| Posted: | | | | Timing of CLT Plus here in Europe:
John Wayne: about 14mins 11secs Tom Cuise: about 18mins 47secs Zhang Ziyi: about 1min 33secs
Environment: Asynchronous cable internet 150/15Mbit, not heavily loaded, but I watched your video in parallel ;-) RTT to the farest responding hop on the way to www.invelos.com: 17 144 ms 139 ms 150 ms c-73-152-128-139.hsd1.va.comcast.net [73.152.128.139] | | | Complete list of Common Names • A good point for starting with Headshots (and v11.1) |
| Registered: May 19, 2007 | Reputation: | Posts: 5,715 |
| Posted: | | | | Concerning the number of profiles in the video: Since each page has 25 entries, it is easy to verify, that Zhing Ziyi has 313 entries in the Invelos CLT (12 full pages à 25 entries plus 13 entries on the 13th page - hopefully nobody around is believing in bad luck ) Maybe an issue with localities? | | | Complete list of Common Names • A good point for starting with Headshots (and v11.1) |
| Registered: May 19, 2007 | Reputation: | Posts: 5,715 |
| Posted: | | | | Concerning running scraping in parallel: This is not possible due to the session handling of invelos.com. If you start more than one CLT Lookup in one browser (two tabs or windows), the results get mixed up.
But it would be interesting to "batch start" more than one query at once.
e.g. after entering First=Zhang, Last=Ziyi you get multiple lines of search strings prefilled with: "Zhang Ziyi", "Ziyi Zhang". Then the user can decide to remove lines from this search strings and add others before starting the whole batch...
Other examples: First=James, Middle=R., Last=Alexander -> Prefilled Strings: James R. Alexander, James Alexander, Alexander James R. -> the user removes the "Alexander James R." and adds "Jim Alexander" to the search list. First=Robert, Last=Downey, Jr. -> Prefilled Strings: Robert Downey, Jr., Robert Downey, Jr, Robert Downey Jr, ... | | | Complete list of Common Names • A good point for starting with Headshots (and v11.1) |
| Registered: May 19, 2007 | Reputation: | Posts: 5,715 |
| Posted: | | | | As far as I can say, Goodguy used only the data scraped from the website. He did no extra run through the data base. All essential data is presented on the detail screens: Although there is a problem, when both "Credited as" and "BY" are used in one profile. But this is a DVD Profiler problem which also shows up, when contributing new BYs with credited as entries... | | | Complete list of Common Names • A good point for starting with Headshots (and v11.1) |
| Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | Yes, it is well established that Goodguy used only scraping. In fact, the plugin API didn't allow online access back when he did that code (at least, I didn't think so. What does his downloaded XML consist of?). And of course I know that the data is available by clicking on the profile links. (Edited)
Two issues: First, I thought people wanted other data. If this is not the case, then what am I doing? Second, so far nobody has been able to duplicate Goodguy's clever scraping methods. I know that I can't, but now that I can access the database, why scrape if you are in a plugin? Now if somebody wants to attempt to scrape the credits, I would be happy to donate my scraping of the profile ID list to the cause. Then, "all" you would need to do is write a program to click on the links, wait for the info to download to browser, and then scrape it. Goodguy was able to do it amazingly fast. I don't have the time or willpower to even try to duplicate that, and furthermore, those methods no longer work in today's browsers (if I am wrong on that, I would LOVE to know about it!!!)
I was not aware of the problem with CLT session handling. I will explore it. When running in a plugin, there would be multiple browser instances, not multiple tabs. Maybe that will allow it to work. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | Quoting AiAustria: Quote: Timing of CLT Plus here in Europe:
John Wayne: about 14mins 11secs Tom Cuise: about 18mins 47secs Zhang Ziyi: about 1min 33secs
Environment: Asynchronous cable internet 150/15Mbit, not heavily loaded, but I watched your video in parallel ;-) RTT to the farest responding hop on the way to www.invelos.com: 17 144 ms 139 ms 150 ms c-73-152-128-139.hsd1.va.comcast.net [73.152.128.139] Oh, thanks very much for this. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. |
| Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | Quoting AiAustria: Quote: Concerning the number of profiles in the video: Since each page has 25 entries, it is easy to verify, that Zhing Ziyi has 313 entries in the Invelos CLT (12 full pages à 25 entries plus 13 entries on the 13th page - hopefully nobody around is believing in bad luck )
Maybe an issue with localities? Yep, I am hoping to get some code settling down enough to release, and along with deciding whether it is worth continuing, we can sort out bugs, refinements, etc. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. |
| Registered: March 18, 2007 | Reputation: | Posts: 6,461 |
| Posted: | | | | Quoting AiAustria: Quote: Concerning running scraping in parallel: This is not possible due to the session handling of invelos.com. If you start more than one CLT Lookup in one browser (two tabs or windows), the results get mixed up.
But it would be interesting to "batch start" more than one query at once.
e.g. after entering First=Zhang, Last=Ziyi you get multiple lines of search strings prefilled with: "Zhang Ziyi", "Ziyi Zhang". Then the user can decide to remove lines from this search strings and add others before starting the whole batch...
Other examples: First=James, Middle=R., Last=Alexander -> Prefilled Strings: James R. Alexander, James Alexander, Alexander James R. -> the user removes the "Alexander James R." and adds "Jim Alexander" to the search list. First=Robert, Last=Downey, Jr. -> Prefilled Strings: Robert Downey, Jr., Robert Downey, Jr, Robert Downey Jr, ... Looks like even multiple browsers suffer from the issue - but not multiple plugin instances. I tried that, and then DVDP ran out of stack space. Drat. I've got a couple of more ideas, and if they don't work, then my fallback will be: - sequentially scrape, by variant, for profile IDs - all the profile IDs will go into the same list - process the combined list for credits This will still save a bunch of time because there will likely be a lot of duplicate profiles if the variants are for the same person, so the credits scraping for the combined list should go a lot faster than the total of the individual variants' search. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: May 19, 2007 | Reputation: | Posts: 5,715 |
| Posted: | | | | Quoting mediadogg: Quote: Looks like even multiple browsers suffer from the issue... Thougt so either, but couldn't reproduce it lately... Quote: This will still save a bunch of time because there will likely be a lot of duplicate profiles if the variants are for the same person, so the credits scraping for the combined list should go a lot faster than the total of the individual variants' search. I don't know if this saves that much time. Since the data base is not normalized, each profile lists it's own variant of the name. Potential savings could only result out of profiles, where two different name variants of the searched person are listed for different roles and/or in both cast/crew sections... There are persons using different names for different unions or other personal reasons, but I don't know if their number is high enough to take care of them.... | | | Complete list of Common Names • A good point for starting with Headshots (and v11.1) |
|
|
Invelos Forums->General: Website Discussion |
Page:
1... 5 6 7 8 9 ...26 Previous Next
|
|
|
|
|
|
|
|
|