Tumblr Owner Strikes Deals with OpenAI and Midjourney for Training Data, Report Reveals

WordPress and Tumblr, two prominent platforms in the digital landscape, are reportedly in discussions with AI companies Midjourney and OpenAI to collaborate on training data extracted from user-generated content. A recent report by 404 Media has brought to light the alleged negotiations between Automattic, the parent company of WordPress, and the two AI firms, indicating that potential deals are on the horizon. Speculation has been circulating within the Tumblr community, hinting at a prospective partnership with Midjourney that could introduce a new revenue stream for the platform.

The report suggests that Automattic is gearing up to introduce a new feature that allows users to opt-out of sharing their data with third-party entities, including AI companies. However, internal sources hint at a concerning development where the company inadvertently scraped a substantial amount of data, encompassing public post content from Tumblr spanning the years 2014 to 2023. This data reportedly includes content that was not intended for public viewing on blogs, raising questions about the handling and potential transfer of such sensitive information to Midjourney and OpenAI.

In response to these claims, both OpenAI and Midjourney have remained silent, declining to provide any comments to The Verge. Automattic, on the other hand, has directed attention to a public statement released following the report by 404 Media. The statement, titled "Protecting User Choice," vaguely references collaborations with undisclosed AI companies. It outlines the company's current practices of blocking major AI platform crawlers by default and emphasises a commitment to sharing only public content hosted on WordPress and Tumblr from sites that have not opted out. Moreover, Automattic asserts that it is engaging with select AI companies whose objectives align with user concerns such as attribution, opt-outs, and control.

The integration of AI technologies into various industries has become increasingly prevalent, with many companies forging partnerships to leverage training data for machine learning algorithms. Reddit, for instance, reportedly has a substantial annual agreement with Google for AI-related purposes, while Shutterstock has entered into a collaboration with OpenAI to utilise its vast photo library for training purposes. However, the utilisation of user-generated content for training AI models has sparked controversy, particularly within the creative community. Artists and writers have expressed reservations about their work being used without consent, highlighting the delicate balance companies must strike between innovation and respecting user rights.

The landscape of online platforms like Tumblr, which caters extensively to the creative community, has been fraught with challenges in navigating the ethical implications of AI integration. Platforms like DeviantArt have faced backlash for experimenting with AI tools without transparently addressing user concerns. As the digital realm grapples with evolving technologies, the need for clear guidelines and ethical frameworks governing data usage and AI applications has become increasingly pressing.

Automattic's foray into AI collaborations comes amidst its efforts to revitalise Tumblr, a platform it acquired from Verizon in 2019. While the company has established a strong presence in web hosting through WordPress and WordPress VIP, monetising Tumblr has proven to be a more intricate task. Automattic announced a scaling back of its ambitions for the platform last year, indicating a shift in strategy to align with changing market dynamics and user preferences.

The potential partnership between Automattic, Tumblr, and AI companies Midjourney and OpenAI signifies a strategic move towards harnessing the power of AI for content optimisation and user experience enhancement. However, the lack of transparency surrounding data practices and the handling of user-generated content raises valid concerns about privacy and consent. As the digital landscape continues to evolve, stakeholders are urged to prioritise user rights and ethical considerations in their pursuit of technological innovation.

The intersection of AI technologies and user-generated content presents both opportunities and challenges for digital platforms like Tumblr and WordPress. The evolving dynamics of data usage and AI integration underscore the importance of transparency, user consent, and ethical governance in shaping the future of online interactions. As companies navigate this complex terrain, a delicate balance must be struck between innovation and user protection to foster a sustainable and ethical digital ecosystem.