The North East DB/IR Day is a semi-annual conference, which brings together database and information retrieval researchers and students from academic and research institutions in the area for an exciting technical program as well as informal discussion. The DB/IR day provides a regular forum for presenting diverse viewpoints on database systems and information retrieval, addressing current topics as well as promoting information exchange among researchers.

News: Thanks to all the participants that made this DB/IR Day really fun ! We would also like to thank our gold sponsors CA Labs and the FLWOR Foundation as well as our bronze sponsor Yahoo Research for making this happen. Statistics: We had over 130 participants registered, about 120 attendants (tallied at lunch) and around 25 registered posters. Pics can be found here.

Poster Prizes
First Prize Alpa Jain, Columbia University, SQOUT:SQL queries over text databases
Second Prize Qingqing Gan, Brooklyn Polytech, Automatic Detection of Web Spam
Third Prize (1) Jalal Mahmud, Stony Brook, Information Overload in Non-Visual Web Transaction: Context Analysis Spells Relief
Third Prize (2) Nikolay Archak, NYU Stern Graduate School of Business, Show me the Money! Deriving the pricing power of product features by mining consumer reviews

Short note on the name: The DB/IR day has been called numerous names in the past, including greater NY Area, Columbia, Greater Philadelphia etc. We chose to name it North East, with the aim of enlarging its scope and maybe growing it into an East Coast version of itself. Let us know what you think :)

The Fall 2007 DB/IR Day is hosted by Stony Brook University on Friday, October 5-6, 2007.

The program consists of technical keynote lectures from distinguished researchers in databases and information retrieval. In addition, there are group introductions, student presentations, and a poster session (with substantial prizes) to promote awareness of current DB/IR research at various graduate departments in the North-East area, and stimulate collaborations between academia and industry. A second optional day (Saturday) includes a series of social activities (short research rump sessions, wine tasting trip, boat tour and dinner) intended to both enable synergies between participants and showcase the beautiful areas and beaches of Long Island.

Directions and Parking
The DB/IR Day will be held in the Student Activities Center (Rooms: Auditorium, Ballroom A). Directions can be found here. Here are the google maps and the yahoo maps pointers to Stony Brook. Parking directions can be found on this map. We recommend parking in the "administration parking garage" depicted in E5 on the map. The SAC Building is depicted in D5 (3-5 minutes walk from garage). Another set of directions (to the Wang Center building in E4, right in front of the garage) with more options can be found here. If you would prefer to take the train from Penn Station in Manhattan, here is the schedule to Stony Brook (you likely need to make the 7:49am train in Penn, or, in the worst case, the 9:15 that gets here at 11:10). This is another map (#3 is the garage, #61 is the RR station, and #78 is the SAC where the action will be). Even more detailed instructions here.

The workshop hotel is the Radisson and offers a special rate of $115.00 per night that includes complimentary high speed wireless internet access and shuttle services to/from Stony Brook (just mention DBIR when reserving). There are a variety of other lodging choices in the area: the famous Three Village Inn in historical downtown Stony Brook (4 miles from campus), The Danfords Inn Marina, The Heritage Inn, and the Holly Berry Bed and Breakfast, all located near a a picturesque marina/harbor and the charming Port Jefferson village (5 miles from campus), and the Holiday Inn Express (3 miles) located on the Nesconset Highway.

Preliminary Program
09:00 - 17:00 Registration
09:15 - 09:30 Welcome and Opening Remarks
09:30 - 10:10 Group Presentations
10:10 - 10:20 Coffee Break
10:20 - 11:20 Invited Talk: Raghu Ramakrishnan (Yahoo! Research)
  Web Data Management: Powering the New Web
The Web is no longer a static repository of documents; it is a dynamic repository of information that connects people with their passions, and on a more prosaic note, the applications they use in their personal and professional lives. How is the Web evolving as an information source, and how does this affect the future of information discovery? What are the implications of the rapid growth of social networks? How does the emergence of the Web as a delivery channel for services affect the future of software? Technically, these trends have given rise to a new wave of challenges, and led to vigorous research on a number of fronts ranging from social network analysis, information extraction and community information management, massively distributed storage and computing platforms, and placed a premium on hosted service architectures. In this talk, I will discuss these issues and outline some of the solutions that are beginning to emerge.

Raghu Ramakrishnan is Chief Scientist for Audience and Research Fellow at Yahoo!, and heads the Community Systems Group in Yahoo! Research. He is on leave from the University of Wisconsin-Madison, where he is Professor of Computer Sciences, and was founder and CTO of QUIQ, a company that pioneered question-answering communities, powering Ask Jeeves' AnswerPoint as well as customer-support for companies such as Compaq. His research has influenced query optimization in commercial database systems, and the design of window functions in SQL:1999. His paper on the Birch clustering algorithm received the SIGMOD 10-Year Test-of-Time award, and he has written the widely-used text "Database Management Systems" (with Johannes Gehrke). He is Chair of ACM SIGMOD, on the Board of Directors of ACM SIGKDD and the Board of Trustees of the VLDB Endowment, and has served as editor-in-chief of the Journal of Data Mining and Knowledge Discovery, associate editor of ACM Transactions on Database Systems, and the Database area editor of the Journal of Logic Programming. Dr. Ramakrishnan is a Fellow of the Association for Computing Machinery (ACM), and has received several awards, including a Distinguished Alumnus Award from IIT Madras, a Packard Foundation Fellowship, an NSF Presidential Young Investigator Award, and an ACM SIGMOD Contributions Award.
11:20 - 11:30 Coffee Break
11:30 - 12:30 Invited Talk: Marianne Winslett (UIUC)
  Managing Scientific Data: New Challenges for Database Researchers
The database research community's appetite for new applications has led to increased interest in the data management needs of scientists. This area encompasses a huge range of applications, extending from public repositories of observational data such as the popular Sloan Digital Sky Survey to one-of-a-kind runs of simulation codes crafted by individual scientists. In this talk, we will survey the most common data management needs found in the hard sciences, describe the new database research challenges that arise from these needs, and outline ways to address some of these challenges.

Marianne Winslett has been a professor at the University of Illinois at Urbana-Champaign since 1987. Her current research interests include security in open systems and data management for scientific applications. She has served on the editorial boards of ACM Transactions on Database Systems and IEEE Transactions on Knowledge and Data Engineering, and is currently on the board of ACM Transactions on the Web. She is an ACM Fellow, a past vice-chair of ACM SIGMOD and the recipient of an NSF Presidential Young Investigator Award.
12:30 - 14:20 Lunch and Poster Session
14:20 - 15:10 Invited Talk: Divesh Srivastava (AT&T Labs Research)
  The Bellman data quality browser
Data quality is a serious concern in complex industrial-scale databases, which often have thousands of tables and tens of thousands of columns. Commonly encountered problems include duplicates and default values in columns treated as keys, data inconsistencies, and poor quality join paths. Compounding the data quality problems are incomplete and out-of-date metadata about the database and the processes used to populate the database. These problems make the task of analyzing data particularly challenging. The Bellman data quality browser has been built to effectively address such problems. Bellman profiles the database and computes concise statistical summaries of the contents of the database to identify approximate keys, frequent values of a field (often default values), joinable fields, and to understand database dynamics (changes in a database over time). In this talk, I'll describe the technology underlying Bellman and how it is used to help make sense of complex databases.

Divesh Srivastava is the head of Database Research at AT&T Labs Research. He received his Ph.D. from the University of Wisconsin, Madison, and his Bachelor of Technology from the Indian Institute of Technology, Bombay, India. His current research interests include data quality and data stream management systems.
15:10 - 15:20 Coffee Break
15:20 - 15:50 Short fast-forward research presentations
15:50 - 16:00 Poster Awards
19:00 Dinner Outing (in groups, on your own, with directions, mostly in the Port Jefferson Harbor)

10:00 - 14:00 Long Island Beach and Wine-Tasting Tour
14:00 - 16:00 Vineyard Lunch and Research Rump Sessions
15:00 - 18:00 Optional Boat Trip
16:00 Conclusion

Registration. While registration is free we appreciate your RSVP.

Past DB/IR Days. Past DB/IR Days were hosted by Columbia University (Spring 2005), University of Pennsylvania (Fall 2005), Rutgers University (Spring 2006), NYU (Fall 2006), and IBM Research (Spring 2007) .

Radu Sion (Workshop Chair)
Michael Kifer (Workshop Chair)
Shakeera Thomas (Local Arrangements)

