Syncing Zotero and OneDrive

Zotero
Python
macOS

How to use Python and macOS launch agents to sync PDFs stored by Zotero to a folder on OneDrive.

Author

Jan Derrfuss

Published

April 29, 2023

 

After finally ditching the once-great Papers by Mekentosj, I needed to find a new PDF management software. I tried Zotero, Endnote, Jabref, Bookends and Paperpile1. I wasn’t entirely happy with any of these tools, but eventually decided to give Zotero (together with Zotfile for renaming the PDF files) a serious try.

One major issue

While I was happy with Zotero overall, I encountered one major issue: I wanted to be able to read PDFs not only on my desktop, but also on my tablet.2 However, it turned out that neither GoodReader nor PDF Expert were able to sync the 7,000+ PDFs that I have accumulated over the past 20 or so years. The main issue seemed to be that both PDF readers struggled with the fact that Zotero places each and every PDF in its own folder (these folders are given a random 8-character name by Zotero). To illustrate how Zotero stores PDF files, here are the first three subfolders in my Zotero storage folder:

Image illustrating how Zotero stores PDF files.

A solution

I wanted to share a possible solution to this problem here in case others encounter the same issue. The basic idea is to keep Zotero’s storage folder as it is and to not sync this folder with OneDrive. Instead, I decided to have a copy of all PDFs in a separate folder that is synced with OneDrive. The file structure of the OneDrive folder is completely flat (i.e., there a no subfolders). The folder just contains the PDFs.

To sync the PDFs successfully in both directions (i.e., Zotero with OneDrive, and vice versa), information about the Zotero subfolder names needs to be maintained. I decided to do this by adding the Zotero subfolder name to the original file name. That is, in the OneDrive folder, Smith_2012_Cerebral Cortex.pdf becomes Smith_2012_Cerebral Cortex_2A9SZ8MM.pdf. This has the advantage that it also solves the potential problem of name conflicts (e.g., if there are two papers published by Smith in Cerebral Cortex in 2012 stored in Zotero).

Python scripts

In terms of the implementation, I wrote two Python scripts (one for syncing Zotero to OneDrive, and another for syncing OneDrive to Zotero). The actual syncing is done by a subprocess call to rsync.

Here is the Python code for syncing Zotero to OneDrive:

sync_zotero2onedrive.py
#!/usr/bin/python

import os
import subprocess
import filecmp

zoteroDir = "/your/path/to/Zotero/storage"
onedriveDir = "/your/path/to/onedrive/articles_zotero"

# note that os.walk is recursive; it starts like this:
# root: /your/path/to/Zotero/storage
# dirs: ['2A9FDDID', '2A9SZ8MM', '2AJDS299', '2AEBFC7A', '2AJ52RIM', '2AADPG7M', etc.]
# files: []
# then it goes through the subdirectories:
# root: /your/path/to/Zotero/storage/2A9FDDID
# dirs: []
# files: ['.zotero-ft-info', 'Houtkamp_2010_Journal of Cognitive Neuroscience.pdf', '.zotero-ft-cache']
# and so on
for root, dirs, files in os.walk(zoteroDir):
    for filename in files:
        if filename.endswith(".pdf"):
            # get the current subfolder, e.g. 2A9FDDID
            dir = root.split("/")[-1]
            # remove the file extension (i.e., .pdf) and keep the rest
            filenameBase = filename.rsplit(".", 1)[0]
            # create the new file name, e.g. Houtkamp_2010_Journal of Cognitive Neuroscience_2A9FDDID.pdf
            filenameNew = filenameBase + "_" + dir + ".pdf"
            # create path + file name for Zotero storage
            zoteroFile = os.path.join(root, filename)
            # create path + file name for OneDrive
            onedriveFile = os.path.join(onedriveDir, filenameNew)
            # check if the file exists on OneDrive
            # if not, use rsync to sync it to OneDrive
            if not os.path.isfile(onedriveFile):
                # call rsync
                subprocess.call(['rsync', '-au', '--min-size=10', '--delete', zoteroFile, onedriveFile])
            # if the file does exist
            else:
                # compare the two files
                compRes = filecmp.cmp(zoteroFile, onedriveFile, shallow=True)
                # if they are different according to filecmp, sync them
                # note that if the destination file (i.e. the OneDrive file) is newer, nothing happens
                if compRes == False:
                    subprocess.call(['rsync', '-au', '--min-size=10', '--delete', zoteroFile, onedriveFile])

And the Python code for syncing OneDrive to Zotero:

sync_onedrive2zotero.py
#!/usr/bin/python

import os
import subprocess
import filecmp

zoteroDir = "/your/path/to/Zotero/storage"
onedriveDir = "/your/path/to/onedrive/articles_zotero"

for root, dirs, files in os.walk(onedriveDir):
    for filename in files:
        if filename.endswith(".pdf"):
            # remove the file extension (i.e., .pdf) and keep the rest
            filenameBase = filename.rsplit(".", 1)[0]
            # split the filename into a list with two elements
            filenameList = filenameBase.rsplit("_", 1)
            # recreate the original file name
            # e.g., Houtkamp_2010_Journal of Cognitive Neuroscience.pdf
            filenameOrig = filenameList[0] + ".pdf"
            # get the Zotero directory name, e.g. 2A9FDDID
            dir = filenameList[1]
            # create path + file name for OneDrive
            onedriveFile = os.path.join(root, filename)
            # create path + file name for Zotero
            zoteroFile = os.path.join(zoteroDir, dir, filenameOrig)
            # files should never be created on OneDrive
            # it is therefore not necessary to check for new files on OneDrive
            # however, it is necessary to check if the file was removed from Zotero
            # if that is the case, it should also be removed from OneDrive
            if not os.path.isfile(zoteroFile):
                os.remove(onedriveFile)
            # else: the file still exists in the Zotero storage folder
            else:
                compRes = filecmp.cmp(zoteroFile, onedriveFile, shallow=True)
                # if they are different according to filecmp, sync them
                # note that if the destination file (i.e. the Zotero file) is newer, nothing happens
                if compRes == False:
                    subprocess.call(['rsync', '-au', '--min-size=10', '--existing', onedriveFile, zoteroFile])

Once you’ve saved the Python scripts locally, make sure to change the paths to match your file system. In addition, make the files executable by running the following commands in the Terminal (these commands assume that you are currently in the folder where you saved the files):

chmod +x sync_zotero2onedrive.py

and

chmod +x sync_onedrive2zotero.py

Running the scripts as launch agents

Finally, I set up two property list (.plist) files and added these to ~/Library/LaunchAgents, so that the Python scripts run automatically (I run them every five minutes; in the property lists below, this interval is specified in seconds).

This property list syncs Zotero to OneDrive:

com.jan.sync.zotero2onedrive.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.jan.sync.zotero2onedrive</string>
  <key>ProgramArguments</key>
  <array>
    <string>/your/path/sync_zotero2onedrive.py</string>
  </array>
  <key>KeepAlive</key>
  <false/>
  <key>StartInterval</key>
  <integer>300</integer>
</dict>
</plist>

This property list syncs OneDrive to Zotero:

com.jan.sync.onedrive2zotero.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.jan.sync.onedrive2zotero</string>
  <key>ProgramArguments</key>
  <array>
    <string>/your/path/sync_onedrive2zotero.py</string>
  </array>
  <key>KeepAlive</key>
  <false/>
  <key>StartInterval</key>
  <integer>300</integer>
</dict>
</plist>

To start running the launch agents, enter the following two commands in your Terminal window3:

launchctl load -w ~/Library/LaunchAgents/com.jan.sync.zotero2onedrive.plist

and

launchctl load -w ~/Library/LaunchAgents/com.jan.sync.onedrive2zotero.plist

I’m not sure if this is the most elegant or performant solution, but, hey, it works reliably and I’m finally able to read PDFs on my tablet again!

Footnotes

  1. I did not consider Mendeley as it encrypts the user’s data.↩︎

  2. I know that Zotfile has the option to sync PDFs with a tablet, but after trying it out I wasn’t really happy with this option either (see this page for more information on using Zotfile for syncing). In addition, as I wasn’t sure if I was going to continue using Zotero, I didn’t want to shell out $120/year for online storage.↩︎

  3. Basics about the usage of launchctl can be found in this answer on Stack Exchange.↩︎