DPChallenge: A Digital Photography Contest You are not logged in. (log in or register
 

DPChallenge Forums >> General Discussion >> another programer guru question - data extraction
Pages:  
Showing posts 1 - 25 of 27, (reverse)
AuthorThread
07/27/2006 04:45:33 AM · #1
I am looking to build in a functin on my website where it continually updates information that is received from another website.

For example (although this is not what i want to do) have a page, perhaps in php, which continually updates my score from dpchallenge.

Is this possible, or how hard is it to do?
07/27/2006 04:59:23 AM · #2
Depends on how you are "receiving" information from another website. If the website provides RSS feeds, you could get it that way. If you have direct access to the other website's database, you could use php to pull it from that. Unless you have control over the other website(s), it would be fairly difficult. And even if you do, the level of difficulty depends on your skill / experience level.
07/27/2006 05:16:27 AM · #3
hmm....

well i have a little 'hash my way through' experience with php

What i want to do is create a page which automatically updates with for example the number of downloads of This Image

all i want is one little number :)

is that so tough??
07/27/2006 05:19:24 AM · #4
or, that is to say,

if i could do it in php or java I could probably figure it out... or if i knew what to search for on google.
07/27/2006 05:20:55 AM · #5
Originally posted by leaf:

or, that is to say,

if i could do it in php or java I could probably figure it out... or if i knew what to search for on google.

Search for "screen scraping" or something like that. That may get you all the stuff on a page and then you have to figure out how to get the "one little number" from that.
07/27/2006 05:57:55 AM · #6
ok thanks
07/27/2006 10:12:01 AM · #7
What platform are you doing this on? I run freebsd a home, a flavor of unix. To do what you want to do I would do it the quick and dirty way.

wget //www.dreamstime.com/woodtexture-image13352 | grep 'downloads:' > download_count.txt

And then in php I would echo the contents of download_count.txt into my own page. Wget is a handy little utility, and if you are running windows there are equivalents out there.
07/27/2006 11:32:29 AM · #8
Like Ken said it's called Screen Scraping and it can be done with many scripting languages including Perl and PHP.
07/27/2006 12:51:36 PM · #9
Just a warning, screen scraping is unreliable, because generally the slightest change to the interface of the site you're trying to scrape renders your script useless.
07/27/2006 01:00:25 PM · #10
Originally posted by Louis:

Just a warning, screen scraping is unreliable, because generally the slightest change to the interface of the site you're trying to scrape renders your script useless.

True, if done wrong. That's why you do it right the first time. ;)
07/27/2006 01:20:32 PM · #11
Originally posted by _eug:

Originally posted by Louis:

Just a warning, screen scraping is unreliable, because generally the slightest change to the interface of the site you're trying to scrape renders your script useless.

True, if done wrong. That's why you do it right the first time. ;)

I don't think there's a right way, to be perfectly honest... it's way too hacky for such a disciplined guy like myself. ;)
07/27/2006 01:24:54 PM · #12
Only change to the interface that would break what I suggested is removing the 'Downloads: #' line. If they remove that, then they've removed the piece of info he's interested in anyway so... They chang it to Dwnlds or something ignorant it requires a two second change on his part. What's the issue. This is like arguing that boating is only reliable so long as the ocean continues to be filled with water.
07/27/2006 01:39:42 PM · #13
Originally posted by routerguy666:

What platform are you doing this on? I run freebsd a home, a flavor of unix. To do what you want to do I would do it the quick and dirty way.

wget //www.dreamstime.com/woodtexture-image13352 | grep 'downloads:' > download_count.txt

And then in php I would echo the contents of download_count.txt into my own page. Wget is a handy little utility, and if you are running windows there are equivalents out there.


Yeah, wget is an awesome little tool. The lynx web browser can also be used to dump out the contents of a page to standard output. BTW, though it's nitpicky, the script used to retrieve the number of downloads should probably also account for the fact that the site may change capitalization of the downloads line. Using lynx for dumping the page contents, you can use the following command to grab JUST the number of downloads and stuff it into download_count.txt:

lynx -dump //www.dreamstime.com/woodtexture-image13352 | grep -i 'downloads:' | awk '{print $2}' > download_count.txt
07/27/2006 01:41:34 PM · #14
I knew someone would come by and throw some awk at the thread ;)
07/27/2006 01:55:56 PM · #15
Originally posted by routerguy666:

I knew someone would come by and throw some awk at the thread ;)


Of course! :D Us damn Gentoo geeks are so detail-oriented sometimes... ;)
07/27/2006 01:59:37 PM · #16
Originally posted by routerguy666:

Only change to the interface that would break what I suggested is removing the 'Downloads: #' line. If they remove that, then they've removed the piece of info he's interested in anyway so... They chang it to Dwnlds or something ignorant it requires a two second change on his part. What's the issue. This is like arguing that boating is only reliable so long as the ocean continues to be filled with water.

Like I said... it's a hack.
07/27/2006 02:32:24 PM · #17
Originally posted by Louis:

Originally posted by routerguy666:

Only change to the interface that would break what I suggested is removing the 'Downloads: #' line. If they remove that, then they've removed the piece of info he's interested in anyway so... They chang it to Dwnlds or something ignorant it requires a two second change on his part. What's the issue. This is like arguing that boating is only reliable so long as the ocean continues to be filled with water.

Like I said... it's a hack.


It's certainly not the cleanest way to do it, but he may not have any other choice. It's not like many sites provide RSS feeds for that type of information or access to an API. Sometimes one has to make due with cheap hacks. Hey, it works for Microsoft! ;)
07/27/2006 02:37:48 PM · #18
Originally posted by cutlassdude70:

Originally posted by Louis:

Originally posted by routerguy666:

Only change to the interface that would break what I suggested is removing the 'Downloads: #' line. If they remove that, then they've removed the piece of info he's interested in anyway so... They chang it to Dwnlds or something ignorant it requires a two second change on his part. What's the issue. This is like arguing that boating is only reliable so long as the ocean continues to be filled with water.

Like I said... it's a hack.


It's certainly not the cleanest way to do it, but he may not have any other choice. It's not like many sites provide RSS feeds for that type of information or access to an API. Sometimes one has to make due with cheap hacks. Hey, it works for Microsoft! ;)

Heh... :) I was then going to suggest that the admins (Langdon?) expose the scores, stats, and everything else via SOAP and let us tinker, but that may be a tad too much work. ;)
07/27/2006 02:45:46 PM · #19
Among the other hacks in my bag are graphing my challenge score via MRTG. Ahh, geek life.
07/27/2006 05:12:15 PM · #20
Originally posted by Louis:

Originally posted by cutlassdude70:

Originally posted by Louis:

Originally posted by routerguy666:

Only change to the interface that would break what I suggested is removing the 'Downloads: #' line. If they remove that, then they've removed the piece of info he's interested in anyway so... They chang it to Dwnlds or something ignorant it requires a two second change on his part. What's the issue. This is like arguing that boating is only reliable so long as the ocean continues to be filled with water.

Like I said... it's a hack.


It's certainly not the cleanest way to do it, but he may not have any other choice. It's not like many sites provide RSS feeds for that type of information or access to an API. Sometimes one has to make due with cheap hacks. Hey, it works for Microsoft! ;)

Heh... :) I was then going to suggest that the admins (Langdon?) expose the scores, stats, and everything else via SOAP and let us tinker, but that may be a tad too much work. ;)


Damn that would be cool though! Can you imagine how much easier that would make Southern Gentleman's life with the WPL stuff?!
07/27/2006 05:16:31 PM · #21
Originally posted by routerguy666:

Among the other hacks in my bag are graphing my challenge score via MRTG. Ahh, geek life.


Oooo, good idea! Now if they ever got around to implementing Louis' SOAP idea, one could have quite a bit of fun with rrdtool without even breaking a sweat! Life as a nerd is sweet... :D
07/27/2006 05:28:01 PM · #22
Originally posted by cutlassdude70:

Originally posted by Louis:

Originally posted by cutlassdude70:

Originally posted by Louis:

Originally posted by routerguy666:

Only change to the interface that would break what I suggested is removing the 'Downloads: #' line. If they remove that, then they've removed the piece of info he's interested in anyway so... They chang it to Dwnlds or something ignorant it requires a two second change on his part. What's the issue. This is like arguing that boating is only reliable so long as the ocean continues to be filled with water.

Like I said... it's a hack.


It's certainly not the cleanest way to do it, but he may not have any other choice. It's not like many sites provide RSS feeds for that type of information or access to an API. Sometimes one has to make due with cheap hacks. Hey, it works for Microsoft! ;)

Heh... :) I was then going to suggest that the admins (Langdon?) expose the scores, stats, and everything else via SOAP and let us tinker, but that may be a tad too much work. ;)


Damn that would be cool though! Can you imagine how much easier that would make Southern Gentleman's life with the WPL stuff?!


yeah i thought he had some sort of a automatic thing going.. but perhaps he is doing it all manually
07/27/2006 05:32:11 PM · #23
Originally posted by cutlassdude70:

Oooo, good idea! Now if they ever got around to implementing Louis' SOAP idea, one could have quite a bit of fun with rrdtool without even breaking a sweat! Life as a nerd is sweet... :D


I heard a rumor that Langdon doesn't use SOAP.
07/27/2006 07:08:33 PM · #24
Originally posted by Art Roflmao:

I heard a rumor that Langdon doesn't use SOAP.


ROFLMAO! :D
07/27/2006 07:42:39 PM · #25
FWIW Screen-scraping is generally frowned upon, and considered by many as stealing bandwidth, especially if done on every page hit.

The more "ethical" way to do it is to set up a cron job to update your page with the "scrapings" from the other site once an hour or so.
Pages:  
Current Server Time: 09/23/2025 08:05:48 AM

Please log in or register to post to the forums.


Home - Challenges - Community - League - Photos - Cameras - Lenses - Learn - Help - Terms of Use - Privacy - Top ^
DPChallenge, and website content and design, Copyright © 2001-2025 Challenging Technologies, LLC.
All digital photo copyrights belong to the photographers and may not be used without permission.
Current Server Time: 09/23/2025 08:05:48 AM EDT.