Not logged in [ Register for account ] [ Login ]  
Cornell University

The Web Laboratory: Publications

Papers

  1. Felix Weigel, Biswanath Panda, Mirek Riedewald Johannes Gehrke, Christoph Koch: Collaborative Creation and Management of Large High-Quality Data Sets. North/East DB/IR Day, Stony Brook University, Fall 2007.
  2. Arms, W., Aya, S., Dmitriev, P., Kot, B., Mitchell, R., Walle, L., Building a Research Library for the History of the Web. Joint Conference on Digital Libraries, June 2006. MS Word.
  3. Arms, W., S. Aya, M. Calimlim, J. Cordes, J. Deneva, P. Dmitriev, J. Gehrke, L. Gibbons, C. D. Jones, V. Kuznetsov, D. Lifka, M. Riedewald, D. Riley, A. Ryd, and G. J. Sharp, Three Case Studies of Large-Scale Data Flows. In Proc. IEEE Workshop on Workflow and Data Flow for Scientific Applications (SciFlow). 2006. PDF.
  4. Arms, W., Aya, S., Dmitriev, P., Kot, B., Mitchell, R., Walle, L., A Research Library based on the Historical Collections of the Internet Archive. D-Lib Magazine, February 2006. http://www.dlib.org/dlib/february06/arms/02arms.html.

Proposals

  1. PetaByte Storage Services for Data-Driven Science. NSF grant CNS-0403340, 2004.
      Summary
      Research Description
      References
  2. Very Large Semi-Structured Datasets for Social Science Research. NSF grant SES-0537606, 2005.
      Summary
      Research Description
      References
  3. SGER: Exploratory Research: Using the Cyberinfrastructure to build a Full Text Index to the Web. NSF grant IIS-0634677, 2007.
      Summary
      Research Description
      References
  4. III-CXT: Computer Science Research Using the Cornell Web Lab to Study Social and Informational Processes on the Web. NSF grant IIS-705774, 2007.
      Summary
      Research Description
      References

Presentations

  1. The Web Laboratory, goals, progress report, and research challenges. Computer Science lunchtime talk, William Arms, April 19, 2005
  2. Computational Social Science, Temporal Evolution of Social and Information Networks. CTC review, Jon Kleinberg, March 2006.
  3. Humanities and Social Science Research Using Vast Amounts of Web Data. CaSTA06, William Arms, October 2006.
  4. Research Seminar: The Web Lab. Cornell Information Science, William Arms, Manuel Calimlim, Lucy Walle, Felix Weigel, January 23, 2008.
  5. The Web Lab Collaboration Server. Flash video, Felix Weigel, December 2007.

Student Reports

Fall 2004

  1. Karthik Jeyabalan, Jerrin Kallukalam, Ariel Rabkin, Patrick Reilly, Nurwati Widodo, Web Research Infrastructure Project Final Report Fall 2004, December 17, 2004

Spring 2005

  1. Mayank Gandhi, Jimmy Yanbo Sun, ARC Data Extraction and summarization, May 2005
  2. Karthik Jeyabalan, Jerrin Kallukalam, Representation of Web Graph for in Memory Computation, May 2005
  3. Shantanu Shah, Generating a Web Graph, May 2005
  4. Richard Yu Wang, Web Research Infrastructure Database Section Semester Research Report, May 2005

Fall 2005

  1. Benzaquen, S., Guo, W., The Web Laboratory: Preload System, Fall 2005 Final Report. December 2005
  2. Gerner, N., Sosa, C., Fall 2005 Semester Report for Web Lab Database Load Group. December 2005.
  3. Gu, M.-D., User Tools: Basic Access API Design and Implementation. December 2005.
  4. Jain, P., Shtokman, D., Tiwari, H., Data Movement Research Project. December 2005.
  5. Kohli, S., Sanghi, L., Data Monitoring and Tracking. December 2005.
  6. Siddavanahalli, M., Singhal, S., Web Lab - Subset Extraction. December 2005.
  7. Shah, S., Retro Browser. December 2005.

Spring 2006

  1. Gerner, N., "WebLibrary Design Progress Report. May 2006
  2. Murarka, S., Web Graph Project. May 2006.
  3. Sosa, C. B., Jain, P., Shtokman, D., Web Library: Data Movement Spring 2006 Report. May 2006.
  4. Zhu, N., Basic Access API. May 2006.

Fall 2006

  1. Adil Aijaz, Heritrix WebLab. December 2006.
  2. Andrzej Kielbasinski, Data Movement and Tracking. December 2006.
  3. Dmitriy Shtokman, Web Library: Data Movement Fall 2006 Report. December 2006.

Spring 2007

  1. Laran Evans, Web Research Infrastructure. May 2007.
  2. Kyeongseo Hwang, Jung Kwan Kim and Hardeep Singh, Index to the History of the Web. May 2007.
  3. Kwan Dong Kim and Chang Min Kim, PageRank Calculation using Sparse Matrix in Clustered Computer Environment. May 2007.
  4. Sangwoo Kim, Sanjay Rajan, and Sean Seguin, Web Graph Generation. May 2007.
  5. Andrzej Kielbasinski, Data Movement and Tracking, Spring 2007 Report. May 2007.
  6. Dmitriy Shtokman, Web Library: Data Movement Spring 2007 Report. May 2007.

Fall 2007

  1. Asha Balasubramaniam and Dmitriy Shtokman, The Web Laboratory: Data Movement and Tracking Team Fall 2007 Report. December 2007.
  2. Wioletta Holownia and Michal Kuklis, WebLab Site Development and Researchers' Tools. December 2007.
  3. Anthony Jawad and Jie Teng, Web Graph Generation: Fall 2007 Report. December 2007.
  4. Chang Min Kim and Thomas Chen, PageRank Calculation. December 2007.
  5. Ashish Virmani and Neha Arora, Anchor Text Analysis. December 2007.

Spring 2008

  1. Asha Balasubramaniam, The Web Laboratory Project Data Movement and Tracking Report. May 2008.
  2. Vijayanand Chokkapu and Asif-ul Haque, PageRank Calculation using Map Reduce. May 2008.
  3. Prashant Baktha Kumara Dhas and Jasim Mohammed, An anchor text analysis of links to five state government websites for the years 2004 and 2005. May 2008.
  4. Wioletta Holownia, Michal Kuklis and Natasha Qureshi, Web Lab Collaboration Server and Web Lab Website. May 2008.
  5. Manu Jain, Gayatri Kaul and Aditi Lyall, Web Graph Generation. May 2008.
  6. Lokesh K Sharma,Sandeep S Shekhawat and Sneha Khadye, Data Profiler Tool. May 2008.

Spring 2008

  1. Manu Jain, Web Graph Generation. August 2008.

Spring 2008

  1. Jacob Bank and Benjamin Cole, Calculating the Jaccard Similarity Coefficient with Map Reduce for Entity Pairs in Wikipedia. December 2008
  2. Nayan Busa, Unmesh Jagtap, and Utkarsh Prateek, PageRank Calculation using Map Reduce. December 2008.
  3. Xingfu Dong, Hubs and Authorities Calculation using MapReduce. December 2008.
  4. Zhiyu Zhang, Web Graph Cleaning using Map Reduce. December 2008.