Syncing Zotero and OneDrive
How to use Python and macOS launch agents to sync PDFs stored by Zotero to a folder on OneDrive.
After finally ditching the once-great Papers by Mekentosj, I needed to find a new PDF management software. I tried Zotero, Endnote, Jabref, Bookends and Paperpile1. I wasn’t entirely happy with any of these tools, but eventually decided to give Zotero (together with Zotfile for renaming the PDF files) a serious try.
One major issue
While I was happy with Zotero overall, I encountered one major issue: I wanted to be able to read PDFs not only on my desktop, but also on my tablet.2 However, it turned out that neither GoodReader nor PDF Expert were able to sync the 7,000+ PDFs that I have accumulated over the past 20 or so years. The main issue seemed to be that both PDF readers struggled with the fact that Zotero places each and every PDF in its own folder (these folders are given a random 8-character name by Zotero). To illustrate how Zotero stores PDF files, here are the first three subfolders in my Zotero storage folder:
A solution
I wanted to share a possible solution to this problem here in case others encounter the same issue. The basic idea is to keep Zotero’s storage folder as it is and to not sync this folder with OneDrive. Instead, I decided to have a copy of all PDFs in a separate folder that is synced with OneDrive. The file structure of the OneDrive folder is completely flat (i.e., there a no subfolders). The folder just contains the PDFs.
To sync the PDFs successfully in both directions (i.e., Zotero with OneDrive, and vice versa), information about the Zotero subfolder names needs to be maintained. I decided to do this by adding the Zotero subfolder name to the original file name. That is, in the OneDrive folder, Smith_2012_Cerebral Cortex.pdf
becomes Smith_2012_Cerebral Cortex_2A9SZ8MM.pdf
. This has the advantage that it also solves the potential problem of name conflicts (e.g., if there are two papers published by Smith in Cerebral Cortex in 2012 stored in Zotero).
Python scripts
In terms of the implementation, I wrote two Python scripts (one for syncing Zotero to OneDrive, and another for syncing OneDrive to Zotero). The actual syncing is done by a subprocess
call to rsync
.
Here is the Python code for syncing Zotero to OneDrive:
sync_zotero2onedrive.py
#!/usr/bin/python
import os
import subprocess
import filecmp
= "/your/path/to/Zotero/storage"
zoteroDir = "/your/path/to/onedrive/articles_zotero"
onedriveDir
# note that os.walk is recursive; it starts like this:
# root: /your/path/to/Zotero/storage
# dirs: ['2A9FDDID', '2A9SZ8MM', '2AJDS299', '2AEBFC7A', '2AJ52RIM', '2AADPG7M', etc.]
# files: []
# then it goes through the subdirectories:
# root: /your/path/to/Zotero/storage/2A9FDDID
# dirs: []
# files: ['.zotero-ft-info', 'Houtkamp_2010_Journal of Cognitive Neuroscience.pdf', '.zotero-ft-cache']
# and so on
for root, dirs, files in os.walk(zoteroDir):
for filename in files:
if filename.endswith(".pdf"):
# get the current subfolder, e.g. 2A9FDDID
dir = root.split("/")[-1]
# remove the file extension (i.e., .pdf) and keep the rest
= filename.rsplit(".", 1)[0]
filenameBase # create the new file name, e.g. Houtkamp_2010_Journal of Cognitive Neuroscience_2A9FDDID.pdf
= filenameBase + "_" + dir + ".pdf"
filenameNew # create path + file name for Zotero storage
= os.path.join(root, filename)
zoteroFile # create path + file name for OneDrive
= os.path.join(onedriveDir, filenameNew)
onedriveFile # check if the file exists on OneDrive
# if not, use rsync to sync it to OneDrive
if not os.path.isfile(onedriveFile):
# call rsync
'rsync', '-au', '--min-size=10', '--delete', zoteroFile, onedriveFile])
subprocess.call([# if the file does exist
else:
# compare the two files
= filecmp.cmp(zoteroFile, onedriveFile, shallow=True)
compRes # if they are different according to filecmp, sync them
# note that if the destination file (i.e. the OneDrive file) is newer, nothing happens
if compRes == False:
'rsync', '-au', '--min-size=10', '--delete', zoteroFile, onedriveFile]) subprocess.call([
And the Python code for syncing OneDrive to Zotero:
sync_onedrive2zotero.py
#!/usr/bin/python
import os
import subprocess
import filecmp
= "/your/path/to/Zotero/storage"
zoteroDir = "/your/path/to/onedrive/articles_zotero"
onedriveDir
for root, dirs, files in os.walk(onedriveDir):
for filename in files:
if filename.endswith(".pdf"):
# remove the file extension (i.e., .pdf) and keep the rest
= filename.rsplit(".", 1)[0]
filenameBase # split the filename into a list with two elements
= filenameBase.rsplit("_", 1)
filenameList # recreate the original file name
# e.g., Houtkamp_2010_Journal of Cognitive Neuroscience.pdf
= filenameList[0] + ".pdf"
filenameOrig # get the Zotero directory name, e.g. 2A9FDDID
dir = filenameList[1]
# create path + file name for OneDrive
= os.path.join(root, filename)
onedriveFile # create path + file name for Zotero
= os.path.join(zoteroDir, dir, filenameOrig)
zoteroFile # files should never be created on OneDrive
# it is therefore not necessary to check for new files on OneDrive
# however, it is necessary to check if the file was removed from Zotero
# if that is the case, it should also be removed from OneDrive
if not os.path.isfile(zoteroFile):
os.remove(onedriveFile)# else: the file still exists in the Zotero storage folder
else:
= filecmp.cmp(zoteroFile, onedriveFile, shallow=True)
compRes # if they are different according to filecmp, sync them
# note that if the destination file (i.e. the Zotero file) is newer, nothing happens
if compRes == False:
'rsync', '-au', '--min-size=10', '--existing', onedriveFile, zoteroFile]) subprocess.call([
Once you’ve saved the Python scripts locally, make sure to change the paths to match your file system. In addition, make the files executable by running the following commands in the Terminal (these commands assume that you are currently in the folder where you saved the files):
chmod +x sync_zotero2onedrive.py
and
chmod +x sync_onedrive2zotero.py
Running the scripts as launch agents
Finally, I set up two property list (.plist
) files and added these to ~/Library/LaunchAgents
, so that the Python scripts run automatically (I run them every five minutes; in the property lists below, this interval is specified in seconds).
This property list syncs Zotero to OneDrive:
com.jan.sync.zotero2onedrive.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
>
"http://www.apple.com/DTDs/PropertyList-1.0.dtd"plist version="1.0">
<dict>
<key>Label</key>
<string>com.jan.sync.zotero2onedrive</string>
<key>ProgramArguments</key>
<array>
<string>/your/path/sync_zotero2onedrive.py</string>
<array>
</key>KeepAlive</key>
<false/>
<key>StartInterval</key>
<integer>300</integer>
<dict>
</plist> </
This property list syncs OneDrive to Zotero:
com.jan.sync.onedrive2zotero.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
>
"http://www.apple.com/DTDs/PropertyList-1.0.dtd"plist version="1.0">
<dict>
<key>Label</key>
<string>com.jan.sync.onedrive2zotero</string>
<key>ProgramArguments</key>
<array>
<string>/your/path/sync_onedrive2zotero.py</string>
<array>
</key>KeepAlive</key>
<false/>
<key>StartInterval</key>
<integer>300</integer>
<dict>
</plist> </
To start running the launch agents, enter the following two commands in your Terminal window3:
launchctl load -w ~/Library/LaunchAgents/com.jan.sync.zotero2onedrive.plist
and
launchctl load -w ~/Library/LaunchAgents/com.jan.sync.onedrive2zotero.plist
I’m not sure if this is the most elegant or performant solution, but, hey, it works reliably and I’m finally able to read PDFs on my tablet again!
Footnotes
I did not consider Mendeley as it encrypts the user’s data.↩︎
I know that Zotfile has the option to sync PDFs with a tablet, but after trying it out I wasn’t really happy with this option either (see this page for more information on using Zotfile for syncing). In addition, as I wasn’t sure if I was going to continue using Zotero, I didn’t want to shell out $120/year for online storage.↩︎
Basics about the usage of
launchctl
can be found in this answer on Stack Exchange.↩︎