Welcome to the Invelos forums. Please read the forum rules before posting.

Read access to our public forums is open to everyone. To post messages, a free registration is required.

If you have an Invelos account, sign in to post.

    Invelos Forums->General: Website Discussion Page: 1... 9 10 11 12 13 ...26  Previous   Next
goodguy's Credit Lookup Plus
Author Message
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,461
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Quoting GSyren:
Quote:
Quoting mediadogg:
Quote:
I am counting a match on either (1), (2) or (3)

Sounds right to me. And I am presenting tha data based on F/M/L since that is the data that the Common Names are based on. Or at least that's how I have understood it. Personally I have no opinion on this, I am trusting that AiAustria will tell me if I'm doing it wrong.

That was my point. Given the same criteria, all programs should get the CLT result of 364. At the moment, I am getting 366, CLTInfo gets 330 and we don't know what CLTPlus gets (yet).

I am only trying to clean up my own bugs and looking for validation that I am counting correctly.
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,678
Posted:
PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
Quoting mediadogg:
Quote:
CLTInfo gets 330

No, CLTinfo gets 366, same as you, only divided into two separate groups based on F/M/L.
If you feel that that's the wrong way to do it, take it up with AiAustria. I have no personal stake in how the data is presented.
My freeware tools for DVD Profiler users.
Gunnar
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,461
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Quoting GSyren:
Quote:
Quoting mediadogg:
Quote:
CLTInfo gets 330

No, CLTinfo gets 366, same as you, only divided into two separate groups based on F/M/L.
If you feel that that's the wrong way to do it, take it up with AiAustria. I have no personal stake in how the data is presented.

Ok, I wasn't taking issue, I was just trying to take advantage of your help, but pointing out that the correspondence was likely a coincidence because my search did not (intentionally) include Zhang Ziyi, even though the profiles do include that variant.

But thanks for your quick response. I'll figure it out I guess.
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,678
Posted:
PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
Why would the correspondence be a coincidence? CLTinfo takes your output from CltBoss and formats it according to how AiAustria wanted it. If you get 366, I get 366.
My freeware tools for DVD Profiler users.
Gunnar
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,461
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Quoting GSyren:
Quote:
Why would the correspondence be a coincidence? CLTinfo takes your output from CltBoss and formats it according to how AiAustria wanted it. If you get 366, I get 366.

Again, please please please, I am honestly trying to not to offend you in any way - just trying to understand how to fix a bug if I have one. Please excuse me if it comes across any other way.

I understand that you were reading my dataset, that's why I was trying to understand what appeared to be a difference in the numbers. If you are not including "CreditedAs" in your search criteria, but I created the data that way, then I was simply postulating that perhaps the 366 was a coincidence. The two searches were based on different criteria is all I was saying, not right or wrong, just different.

This is what I think is going on: the matches that you found using F/M/L with "Zhang Ziyi" must also have CreditAs "Ziyi Zhang", hence we both pick up the credit. The difference is that I attribute all of the credits to the variant "Ziyi Zhang," whereas you method splits the credits.

That tells me that there is a flaw in my export process. For there is no way, apriori for anyone to know that my search was ONLY on "ziyi zhang". So, I am thinking I need to include an XML element that describes the search criteria, otherwise, I will need to include an output for all possible variants, which would be very difficult.
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
 Last edited: by mediadogg
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,678
Posted:
PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
There is no "search" in CLTinfo. It takes all of the info in your output and presents it in a structured way.

You certainly didn't offend me. I was just trying to clarify why my results corresponds to yours.
My freeware tools for DVD Profiler users.
Gunnar
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,461
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Quoting GSyren:
Quote:
There is no "search" in CLTinfo. It takes all of the info in your output and presents it in a structured way.

You certainly didn't offend me. I was just trying to clarify why my results corresponds to yours.

Ok, but if you could just read my prior post (I mean the technical parts) and see if it adds anything to your thoughts, I would appreciate it.
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
 Last edited: by mediadogg
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,461
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
So basically, I am asking if you agree that it would be a good idea for me to include a "<Variants>" element in my export, to make it clear how the included credits were found. That way, CLTInfo (for example), would know up front how the data was gathered, then of course offer other ways to view it.

Does that make sense?
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,678
Posted:
PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
Quoting mediadogg:
Quote:
So basically, I am asking if you agree that it would be a good idea for me to include a "<Variants>" element in my export, to make it clear how the included credits were found. That way, CLTInfo (for example), would know up front how the data was gathered, then of course offer other ways to view it.

Does that make sense?

Not sure what you mean by variants. Are those the (1), (2), (3) you mentioned before? Or are we talking about something else?
My freeware tools for DVD Profiler users.
Gunnar
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,461
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Quoting GSyren:
Quote:
Quoting mediadogg:
Quote:
So basically, I am asking if you agree that it would be a good idea for me to include a "<Variants>" element in my export, to make it clear how the included credits were found. That way, CLTInfo (for example), would know up front how the data was gathered, then of course offer other ways to view it.

Does that make sense?

Not sure what you mean by variants. Are those the (1), (2), (3) you mentioned before? Or are we talking about something else?

The whole point of the CLT and the notion of "common names" is that people are often known by different variants or aliases or variations in their names. It happens a lot with Asian names, as they are often reversed from what the actor uses in their native country. I apologize if "variant" is incorrect terminolgy.

In the contributions threads, it is often referred to as "name variant."

So, "ziyi zhang", "zhang ziyi", and "zhang zhi" are three variants of the credited name for the same "Crouching Tiger" actress.

But I know you knew all that, so maybe I misunderstood the question?
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
 Last edited: by mediadogg
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,678
Posted:
PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
Ok, I'm still not clear on how you intended to use the <Variants> field. Would it contain all name variants found, or just the variant(s) used in the search? Or something else? Would it be a single field for the entire export, or a field for every profile?
My freeware tools for DVD Profiler users.
Gunnar
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,461
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Quoting GSyren:
Quote:
Ok, I'm still not clear on how you intended to use the <Variants> field. Would it contain all name variants found, or just the variant(s) used in the search? Or something else? Would it be a single field for the entire export, or a field for every profile?

Well it is just an idea at this point, not implemented. The problem just became apparent, and once again it demonstrates the value of collaboration. It hadn't occurred to me that it was a problem.

It should reflect all the variants that were used to produce the XML credits collection in question. In the case we are just discussing, it would have "ziyi zhang". That would indicate that while other variations might be found in the profile XML (or even in the same credit because by definition creditedAs and F/M/L can coexist - its just that is inconsistently used). One entry for the entire collection.

So, you can see this by typing into the CLT tool:

"ziyi zhang", " ziyi zhang", "ziyi zhang " or " ziyi zhang "

all will yield 364 profiles (as of today).

Note that "ziyi  zhang" yields 0 profiles!!!!  (double space between the names)

This match can come  from either CreditedAs, or from any concatenation of F/M/L.

I have seen the credit completely contained in the firstname with middle and last and creditedas blank, or as first+last, or as middle+last. And in this case you have the possibility of getting double blanks, and is stuff like this that is driving me nuts. In order to be consistent with the CLT, I have to actually ignore a possible intended match that doesn't work just because somebody typed "ziyi  zhang" instead of "ziyi zhang" into the CreditedAs field.

I already have a "CLTName" class in my code, that I think I will map over into the XML. It will be something like:

<CLTSearchNames>    (or "<Variants>)
  <CLTSearchName firstname ="ziyi", middlename="", lastname="zhang", birthyear="", hashname="zhang_ziyi_">ziyi zhang</CLTSearchName>
</CLTSearchNames>

My instructions from AiAustria were to match "search name" first on CreditedAs if found in the credit entry, otherwise match on the concatenation of F/M/L, case insensitive and with leading and trailing blanks removed.

So, that's where I am. I appreciate your patience, and I am open to suggestions.
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
 Last edited: by mediadogg
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,678
Posted:
PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
Ok, so basically <Variants> would be the search arguments used. That sounds like it might be useful in some cases.
My freeware tools for DVD Profiler users.
Gunnar
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,461
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Quoting GSyren:
Quote:
Ok, so basically <Variants> would be the search arguments used. That sounds like it might be useful in some cases.

Thanks. I hope so. I will move along this path. So if somebody uses CLTBoss to scrape multiple variants (I can't tell whether they are really the same person in the code), then the resulting XML will always be an "OR" of the credits, with the combined set of profiles. My profile list grid actually contains the "hashname" version of the name, as part of my duplicates detection, so it makes me think ...

Your question about a variants per profile is now coming to mind, but I am still fuzzy. Do I need to reveal how each profile made it into the list? I think so, since my XML contains only the matched credits. If instead, one is looking at the "Invelos Export" output, you will of course get all credits, and all XML contents of each profile.

What should I do? I'm a bit confused. I hate to clutter up the XML, but how else to avoid the issue we hit earlier, by not knowing what filter was used to create the XML?
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
 Last edited: by mediadogg
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,678
Posted:
PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
I don't think we need to go into per-profile. Last keep it simple, at least for now.
My freeware tools for DVD Profiler users.
Gunnar
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,461
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Ok, I am just discovering a fast and powerful Xpath tool inside CookTop XML editor.

Using this statement, "nodes: /CLTInfo/DVD/CLTCredits/CLTCredit[@CreditedAs!='' and @FirstName!='' ]",

I confirmed that the file in question had exactly 36 credits where F/M/L = "Zhang Ziyi", but CreditedAs was "Ziyi Zhang". That why we got the same "36" from the same profiles. There are 3 others returned by the same XPath query where the exact opposite is true!!! F/M/L is "Ziyi Zhang", but CreditedAs is "Zhang Ziyi".

This is just way too much for CLTBoss. I want to just spit out the results (in either CLTBoss or Invelos formats), consistent with CLT, and make it clear where the profiles came from, and then hopefully other tools will allow more sophisticated filters to be applied. Above my pay grade and intentions.
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
    Invelos Forums->General: Website Discussion Page: 1... 9 10 11 12 13 ...26  Previous   Next