Web application with simple one-page interface.
Input: A list of URLs of thumbnail/movie gallery HTML pages which have textual, graphical, image, and video content in various formats and layouts.
Output: A CSV dump file which includes the Title, Description, Video thumbnail filename, Video filename (FLV format), Video duration (minutes:seconds), Video height (pixels), Video width (pixels) AND a ZIP file including the video and thumbnail files.
Magic:
1) Spider will need to navigate page and catalog all direct links to video files and video files hosted on the page.
1a) Spider will need to forge referral page information when requesting images and videos to get around webserver leaching restrictions.
2) If the video file is in WMV format, convert it to FLV and save a local copy for inclusion in ZIP file. If the video file is already in FLV format, just save a local copy.
3) If the video file is direct linked from the gallery page, and it is linked from this page via a linked image, save that image as the thumbnail for that video.
4) Determine the height and width of the video file and resize the image thumbnail to the same dimensions.
5) If there is no image thumbnail (see #3), create a thumbnail (with the same height/width as the video; jpeg format) from a random frame of the video file.
6) Determine the duration of the video clip and save this for export in the resulting CSV file
7) Save the title of the gallery page as the the title of the video clip. If there are multiple videos on a single gallery, append "#1", "#2", etc. to the end of the title.
7a) Rename video file and video thumbnail file to correspond to this same title format (to prevent filename collisions with past or future videos).
8) Catalog all text of at least 2 sentences long for export as description of video clips on page.
9) Use a predefined list of 'stop' words to disqualify certain sentences from being included in Description collation.
9) Export a CSV file with one line per video clip and all other information described in Output section above. There may be multiple entries if there are multiple videos hosted or linked from the gallery page.
10) Export a zip file which includes all of the thumbnails and FLV video files.
Requirements Interview Answers:
To help you bid more accurately, the employer was interviewed about the requirements for this project. Below are their answers.
Untitled Page
Other Requirements:
You will be required to develop and test this software with pages which contain adult content. You must accept this provision to be selected for this project.
Example gallery pages will be provided to use during development and test phases.
Remember that contacting the other party outside of the site (by email, phone, etc.) on all business projects < $500 (before the employer's money is escrowed) is a violation of both the employer and worker agreements.
vWorker.com monitors all site activity for such violations and can instantly expel transgressors on the spot, so we thank you in advance for your cooperation.
If you notice a violation please help out the site and report it. Thanks for your help.
Categories:
(Note: Like everything else on this page, these categories are part of the original contract for this project.)
Web development, Requirements, UNIX, Other (Technology), Web services, Linux, FreeBSD, XML / XHTML, Technology, Web programming, Tech details