Episode 38

On Data, feat. Shayne Longpre | TRACES Appendix 38

In this conversation, Cristian and Shayne discuss the foundational role of data in AI and the challenges associated with data provenance and curation. They explore the organization and sourcing of data sets, the complexities of filtering and balancing data, and the legal and ethical implications of data usage. They also touch on the importance of transparency, accountability, and independent evaluation in the development of AI models. The conversation highlights the need for responsible data practices and the potential impact of AI on society. The conversation explores the protocols and challenges surrounding AI research and the need for infrastructure in the field. The discussion delves into the concept of safe harbor for good faith research and the importance of distinguishing between good and bad researchers. The conversation also touches on the changing landscape of the web and the impact on data access and consent. The enforceability of consent mechanisms and the complexities of copyright in the digital age are also discussed. Find me at cristian@ccb.life PRE-ORDER TRACES: A PSY-FI NOVEL NOW (https://ccblife.gumroad.com/l/traces) Also, who are you? Get a draft of TRACES if you fill out this form (https://forms.gle/rFnVFrCNUAJz7Fvn7) About the Guest: Shayne Longpre is a PhD Candidate at MIT, where he works on training language models, and understanding their broader social challenges. In particular he investigates their risks, access and transparency, with an emphasis on training data. He leads the Data Provenance Initiative, and co-organized the AI safe harbor open letter (co-signed by 350+ researchers and journalists), advocating for better independent research access to closed models. His work has been covered by the New York Times, the Washington Post, and VentureBeat. Set-Up: - Camera: https://amzn.to/3PZVscb (don't laugh) - Microphone: https://amzn.to/46f3pB5 - Teleprompter Stand: https://amzn.to/3tgS98y - Telepromter App: https://amzn.to/46jdH31 - Teleprompter Screen: https://amzn.to/3PNfKFI (yup) - Headphones: https://amzn.to/46gMSwo Timestamps 00:00 Introduction and Background 02:25 The Foundational Role of Data in AI 08:57 Challenges in Data Provenance and Curation 15:36 Transparency and Accountability in AI Development 21:49 Legal and Ethical Implications of Data Usage 29:56 The Potential of Foundation Models and Best Practices 41:59 Protocols and Infrastructure for AI Research 44:11 Distinguishing Good and Bad Researchers in AI 48:25 The Changing Landscape of the Web and Data Access 01:10:55 Enforceability of Consent Mechanisms and Copyright in the Digital Age Hashtags #DataProvenance #DataCuration #AIEthics #AITransparency #DataSets #AIChallenges #DataBalance #LegalImplications #AIResearch #DataUsage #ResponsibleAI #AIModels #DataOrganization #AIRegulations #SafeHarbor #GoodFaithResearch #AIResponsibility #WebEvolution #DataAccess #UserConsent #CopyrightLaws #DigitalEthics #AIImpact #AIAccountability #IndependentEvaluation

On Data, feat. Shayne Longpre | TRACES Appendix 38

Listen now

Full episode audio

July 12, 202450 min

On Data, feat. Shayne Longpre | TRACES Appendix 38

0:00 / 0:00

Notes

Show Notes

In this conversation, Cristian and Shayne discuss the foundational role of data in AI and the challenges associated with data provenance and curation. They explore the organization and sourcing of data sets, the complexities of filtering and balancing data, and the legal and ethical implications of data usage.


They also touch on the importance of transparency, accountability, and independent evaluation in the development of AI models. The conversation highlights the need for responsible data practices and the potential impact of AI on society. The conversation explores the protocols and challenges surrounding AI research and the need for infrastructure in the field.


The discussion delves into the concept of safe harbor for good faith research and the importance of distinguishing between good and bad researchers. The conversation also touches on the changing landscape of the web and the impact on data access and consent.


The enforceability of consent mechanisms and the complexities of copyright in the digital age are also discussed.


Find me at cristian@ccb.life


PRE-ORDER TRACES: A PSY-FI NOVEL NOW (https://ccblife.gumroad.com/l/traces)

Also, who are you? Get a draft of TRACES if you fill out this form (https://forms.gle/rFnVFrCNUAJz7Fvn7)


About the Guest:

Shayne Longpre is a PhD Candidate at MIT, where he works on training language models, and understanding their broader social challenges. In particular he investigates their risks, access and transparency, with an emphasis on training data. He leads the Data Provenance Initiative, and co-organized the AI safe harbor open letter (co-signed by 350+ researchers and journalists), advocating for better independent research access to closed models. His work has been covered by the New York Times, the Washington Post, and VentureBeat.


Set-Up:

  • Camera: https://amzn.to/3PZVscb (don't laugh)
  • Microphone: https://amzn.to/46f3pB5
  • Teleprompter Stand: https://amzn.to/3tgS98y
  • Telepromter App: https://amzn.to/46jdH31
  • Teleprompter Screen: https://amzn.to/3PNfKFI (yup)
  • Headphones: https://amzn.to/46gMSwo

Timestamps

00:00 Introduction and Background

02:25 The Foundational Role of Data in AI

08:57 Challenges in Data Provenance and Curation

15:36 Transparency and Accountability in AI Development

21:49 Legal and Ethical Implications of Data Usage

29:56 The Potential of Foundation Models and Best Practices

41:59 Protocols and Infrastructure for AI Research

44:11 Distinguishing Good and Bad Researchers in AI

48:25 The Changing Landscape of the Web and Data Access

01:10:55 Enforceability of Consent Mechanisms and Copyright in the Digital Age


Hashtags

#DataProvenance #DataCuration #AIEthics #AITransparency #DataSets #AIChallenges #DataBalance #LegalImplications #AIResearch #DataUsage #ResponsibleAI #AIModels #DataOrganization #AIRegulations #SafeHarbor #GoodFaithResearch #AIResponsibility #WebEvolution #DataAccess #UserConsent #CopyrightLaws #DigitalEthics #AIImpact #AIAccountability #IndependentEvaluation